Grok Parser
The grok parser exposes the ability to parse incoming unstructured log data using the grok rules similar to those documented by logstash.
The juttle grok parser supports this set of built-in rules.
The usage of the grok parser is currently supported by the stdio and file adapters and can be expanded easily to others.
Usage
Field Names
If your grok rule specified in -pattern
option does not contain field name to
place the parsed data into, it will be placed into message
field. See
syslog example with -pattern '%{SYSLOGLINE}'
.
When the grok rule is used to parse timestamps from the incoming text, you
should assign field name time
in the rule, as that is where Juttle will look
for a valid timestamp. See custom file example with
-pattern "%{TIMESTAMP_ISO8601:time}
Parsing a syslog file
Since there quite a few built in rules already, parsing certain log types is very easy, such as:
read file -file '/var/log/syslog' -format 'grok' -pattern '%{SYSLOGLINE}'
| filter message~'*error*'
The above reads your local /var/log/syslog
file and parses it using the grok
built in pattern for syslog and then looks for the error
string in the
message of the parsed data points.
Parsing a custom file
Let's say you had a log file that looked like so:
2016-01-02 16:34:03 status installed linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116
2016-01-02 16:34:03 remove linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116 <none>
2016-01-02 16:34:13 status installed linux-image-3.13.0-73-generic:amd64 3.13.0-73.116
2016-01-02 16:34:13 remove linux-image-3.13.0-73-generic:amd64 3.13.0-73.116 <none>
2016-01-02 16:34:13 status half-configured linux-image-3.13.0-73-generic:amd64 3.13.0-73.116
2016-01-14 08:58:07 status unpacked isc-dhcp-common:amd64 4.2.4-7ubuntu12.4
2016-01-14 08:58:07 upgrade openssh-client:amd64 1:6.6p1-2ubuntu2.3 1:6.6p1-2ubuntu2.4
2016-01-14 08:58:07 status half-configured openssh-client:amd64 1:6.6p1-2ubuntu2.3
above is from the /var/log/dpkg.log of a debian machine
We want to parse this file using the stdio adapter and be able to gather some information about the most upgraded packages. So we want to definitely parse out the log line into the following sections:
| time | cmd | subcmd | pkg_name | pkg_version |
2016-01-02 16:34:03 status installed linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116
After reading up on grok the best approach to building up the custom grok rule is to add pattern components one at a time, starting with:
%{TIMESTAMP_ISO8601:time} %{GREEDYDATA:message}
Which you can quickly test like so:
tail -10 /var/log/dpkg.log | juttle -e 'read stdio -format "grok" -pattern "%{TIMESTAMP_ISO8601:time} %{GREEDYDATA:message}" | view text'
[
{"time":"2016-01-03T00:34:03.000Z","message":"status installed linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116"},
{"time":"2016-01-03T00:34:03.000Z","message":"remove linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116 <none>"},
{"time":"2016-01-03T00:34:13.000Z","message":"status installed linux-image-3.13.0-73-generic:amd64 3.13.0-73.116"},
{"time":"2016-01-03T00:34:13.000Z","message":"remove linux-image-3.13.0-73-generic:amd64 3.13.0-73.116 <none>"},
{"time":"2016-01-03T00:34:13.000Z","message":"status half-configured linux-image-3.13.0-73-generic:amd64 3.13.0-73.116"},
{"time":"2016-01-14T16:58:07.000Z","message":"status unpacked isc-dhcp-common:amd64 4.2.4-7ubuntu12.4"},
{"time":"2016-01-14T16:58:07.000Z","message":"upgrade openssh-client:amd64 1:6.6p1-2ubuntu2.3 1:6.6p1-2ubuntu2.4"},
{"time":"2016-01-14T16:58:07.000Z","message":"status half-configured openssh-client:amd64 1:6.6p1-2ubuntu2.3"}
]
Iterating on the rule by adding more pattern elements, we arrive at the pattern:
%{TIMESTAMP_ISO8601:time} %{WORD:cmd} %{NOTSPACE:subcmd} %{NOTSPACE:pkg_name} %{NOTSPACE:pkg_version}
With the new pattern our parsed data now looks like so:
cat /var/log/dpkg.log | juttle -e "read stdio -format 'grok' -pattern '%{TIMESTAMP_ISO8601:time} %{WORD:cmd} %{NOTSPACE:subcmd} %{NOTSPACE:pkg_name} %{NOTSPACE:pkg_version}' | view text"
[
{"time":"2016-01-03T00:34:03.000Z","cmd":"status","subcmd":"installed","pkg_name":"linux-image-extra-3.13.0-73-generic:amd64","pkg_version":"3.13.0-73.116"},
{"time":"2016-01-03T00:34:03.000Z","cmd":"remove","subcmd":"linux-image-extra-3.13.0-73-generic:amd64","pkg_name":"3.13.0-73.116","pkg_version":"<none>"},
{"time":"2016-01-03T00:34:13.000Z","cmd":"status","subcmd":"installed","pkg_name":"linux-image-3.13.0-73-generic:amd64","pkg_version":"3.13.0-73.116"},
{"time":"2016-01-03T00:34:13.000Z","cmd":"remove","subcmd":"linux-image-3.13.0-73-generic:amd64","pkg_name":"3.13.0-73.116","pkg_version":"<none>"},
{"time":"2016-01-03T00:34:13.000Z","cmd":"status","subcmd":"half-configured","pkg_name":"linux-image-3.13.0-73-generic:amd64","pkg_version":"3.13.0-73.116"},
{"time":"2016-01-14T16:58:07.000Z","cmd":"status","subcmd":"unpacked","pkg_name":"isc-dhcp-common:amd64","pkg_version":"4.2.4-7ubuntu12.4"},
{"time":"2016-01-14T16:58:07.000Z","cmd":"upgrade","subcmd":"openssh-client:amd64","pkg_name":"1:6.6p1-2ubuntu2.3","pkg_version":"1:6.6p1-2ubuntu2.4"},
{"time":"2016-01-14T16:58:07.000Z","cmd":"status","subcmd":"half-configured","pkg_name":"openssh-client:amd64","pkg_version":"1:6.6p1-2ubuntu2.3"}
]
Which is precisely what we wanted to be able to write some juttle that can help us analyze this log. For example if we wanted to find the top 3 most upgraded packages:
cat /var/log/dpkg.log | juttle -e "read stdio -format 'grok' -pattern '%{TIMESTAMP_ISO8601:time} %{WORD:cmd} %{NOTSPACE:subcmd} %{NOTSPACE:pkg_name} %{NOTSPACE:pkg_version}' | filter cmd = 'upgraded' | reduce count() by pkg_name | sort count -desc | head 3"