Grok Parser

The grok parser exposes the ability to parse incoming unstructured log data using the grok rules similar to those documented by logstash.

The juttle grok parser supports this set of built-in rules.

The usage of the grok parser is currently supported by the stdio and file adapters and can be expanded easily to others.


Field Names

If your grok rule specified in -pattern option does not contain field name to place the parsed data into, it will be placed into message field. See syslog example with -pattern '%{SYSLOGLINE}'.

When the grok rule is used to parse timestamps from the incoming text, you should assign field name time in the rule, as that is where Juttle will look for a valid timestamp. See custom file example with -pattern "%{TIMESTAMP_ISO8601:time}

Parsing a syslog file

Since there quite a few built in rules already, parsing certain log types is very easy, such as:

read file -file '/var/log/syslog' -format 'grok' -pattern '%{SYSLOGLINE}'
| filter message~'*error*'

The above reads your local /var/log/syslog file and parses it using the grok built in pattern for syslog and then looks for the error string in the message of the parsed data points.

Parsing a custom file

Let's say you had a log file that looked like so:

2016-01-02 16:34:03 status installed linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116
2016-01-02 16:34:03 remove linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116 <none>
2016-01-02 16:34:13 status installed linux-image-3.13.0-73-generic:amd64 3.13.0-73.116
2016-01-02 16:34:13 remove linux-image-3.13.0-73-generic:amd64 3.13.0-73.116 <none>
2016-01-02 16:34:13 status half-configured linux-image-3.13.0-73-generic:amd64 3.13.0-73.116
2016-01-14 08:58:07 status unpacked isc-dhcp-common:amd64 4.2.4-7ubuntu12.4
2016-01-14 08:58:07 upgrade openssh-client:amd64 1:6.6p1-2ubuntu2.3 1:6.6p1-2ubuntu2.4
2016-01-14 08:58:07 status half-configured openssh-client:amd64 1:6.6p1-2ubuntu2.3

above is from the /var/log/dpkg.log of a debian machine

We want to parse this file using the stdio adapter and be able to gather some information about the most upgraded packages. So we want to definitely parse out the log line into the following sections:

|       time       | cmd  | subcmd  |               pkg_name                  | pkg_version |  
2016-01-02 16:34:03 status installed linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116

After reading up on grok the best approach to building up the custom grok rule is to add pattern components one at a time, starting with:

%{TIMESTAMP_ISO8601:time} %{GREEDYDATA:message}

Which you can quickly test like so:

tail -10 /var/log/dpkg.log | juttle -e 'read stdio -format "grok" -pattern "%{TIMESTAMP_ISO8601:time} %{GREEDYDATA:message}" | view text'
{"time":"2016-01-03T00:34:03.000Z","message":"status installed linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116"},
{"time":"2016-01-03T00:34:03.000Z","message":"remove linux-image-extra-3.13.0-73-generic:amd64 3.13.0-73.116 <none>"},
{"time":"2016-01-03T00:34:13.000Z","message":"status installed linux-image-3.13.0-73-generic:amd64 3.13.0-73.116"},
{"time":"2016-01-03T00:34:13.000Z","message":"remove linux-image-3.13.0-73-generic:amd64 3.13.0-73.116 <none>"},
{"time":"2016-01-03T00:34:13.000Z","message":"status half-configured linux-image-3.13.0-73-generic:amd64 3.13.0-73.116"},
{"time":"2016-01-14T16:58:07.000Z","message":"status unpacked isc-dhcp-common:amd64 4.2.4-7ubuntu12.4"},
{"time":"2016-01-14T16:58:07.000Z","message":"upgrade openssh-client:amd64 1:6.6p1-2ubuntu2.3 1:6.6p1-2ubuntu2.4"},
{"time":"2016-01-14T16:58:07.000Z","message":"status half-configured openssh-client:amd64 1:6.6p1-2ubuntu2.3"}

Iterating on the rule by adding more pattern elements, we arrive at the pattern:

%{TIMESTAMP_ISO8601:time} %{WORD:cmd} %{NOTSPACE:subcmd} %{NOTSPACE:pkg_name} %{NOTSPACE:pkg_version}

With the new pattern our parsed data now looks like so:

cat /var/log/dpkg.log | juttle -e "read stdio -format 'grok' -pattern '%{TIMESTAMP_ISO8601:time} %{WORD:cmd} %{NOTSPACE:subcmd} %{NOTSPACE:pkg_name} %{NOTSPACE:pkg_version}' | view text"

Which is precisely what we wanted to be able to write some juttle that can help us analyze this log. For example if we wanted to find the top 3 most upgraded packages:

cat /var/log/dpkg.log | juttle -e "read stdio -format 'grok' -pattern '%{TIMESTAMP_ISO8601:time} %{WORD:cmd} %{NOTSPACE:subcmd} %{NOTSPACE:pkg_name} %{NOTSPACE:pkg_version}' | filter cmd = 'upgraded' | reduce count() by pkg_name | sort count -desc | head 3"