File Adapter
The file
adapter allows reading points from, or writing points to, a file on the local filesystem.
read file
Supported file formats are JSON array, JSON lines, CSV, column-aligned text, and unstructured logs (with grok parsing); see examples below.
read file -file <path> [-format <format>] [-timeField <fieldname>]
Parameter | Description | Required? |
---|---|---|
-file |
File path on the local filesystem, absolute or relative to the current working directory | Yes |
-format |
Input file format, supports: csv json jsonl columns or grok for text parseable by grok |
No; defaults to json |
-timeField |
Name of the field in the data which contains a valid timestamp | No; defaults to time |
-pattern |
When -format 'grok' you can specify the grok matching pattern here. More information on grok here |
No |
-separator |
When -format 'csv' is used, you can specify the separator between columns in a CSV file. |
No: defaults to , |
-commentSymbol |
When -format 'csv' is used, you can specify the comment character that prefixes comment lines in a CSV file. |
No: defaults to , |
-ignoreEmptyLines |
When -format 'csv' is used, you can skip empty lines in a CSV file. |
No: defaults to false |
-allowIncompleteLines |
When -format 'csv' is used, you can allow for parsing of incomplete lines in a CSV file. |
No: defaults to false |
The data is assumed to contain valid timestamps in a field named time
by default; a different name for the time field may be specified with -timeField
option. If the data contains fields time
and another named field specified with -timeField
, the original contents of field time
will be overwritten by the valid timestamp from timeField
.
Timeless data, which contains no timestamps, is acceptable; however certain operations which expect time to be present in the points, such as reduce -every :interval:
, will execute with warnings or error out. Timeless data can be joined in the Juttle flowgraph with other data which contains timestamps; a common use case for reading timeless data from a file or another backend is to join it with streaming data for enrichment.
The file adapter does not support any kind of filtering (neither filter expressions, nor full text search). In order to filter the data read from file, pipe to the filter proc.
Example: read from a JSON array data file
The source file has data in JSON array format:
[
{ "time": "2015-11-06T04:28:32.304Z", "hostname": "lemoncake", "state": "ok" },
{ "time": "2015-11-06T04:28:32.304Z", "hostname": "applepie", "state": "warn" },
{ "time": "2015-11-06T04:28:42.405Z", "hostname": "lemoncake", "state": "ok" },
{ "time": "2015-11-06T04:28:42.502Z", "hostname": "applepie", "state": "ok" }
]
This program reads the file in, and filters on a specific hostname:
read file -file 'docs/examples/datasets/input1.json'
| filter hostname = 'lemoncake'
| view table
Example: read from a JSON lines data file
The source file has data in JSON lines format, i.e. newline separated JSON objects:
{ "time": "2015-11-06T04:28:32.304Z", "hostname": "lemoncake", "state": "ok" }
{ "time": "2015-11-06T04:28:32.304Z", "hostname": "applepie", "state": "warn" }
{ "time": "2015-11-06T04:28:42.405Z", "hostname": "lemoncake", "state": "ok" }
{ "time": "2015-11-06T04:28:42.502Z", "hostname": "applepie", "state": "ok" }
This program reads the file in, and filters on a specific hostname:
read file -file 'docs/examples/datasets/json_lines.jsonl'
| filter hostname = 'lemoncake'
| view table
The above examples produce the same output table:
┌────────────────────────────────────┬──────────────┬──────────┐
│ time │ hostname │ state │
├────────────────────────────────────┼──────────────┼──────────┤
│ 2015-11-06T04:28:32.304Z │ lemoncake │ ok │
├────────────────────────────────────┼──────────────┼──────────┤
│ 2015-11-06T04:28:42.405Z │ lemoncake │ ok │
└────────────────────────────────────┴──────────────┴──────────┘
write file
Data is written out to the filename specified in the format you specify.
write file -file <path>
Parameter | Description | Required? |
---|---|---|
-file |
File path on the local filesystem, absolute or relative to the current working directory | Yes |
-format |
Input input format: csv for CSV data, json for JSON data, or jsonl for JSON lines data |
No; defaults to json |
-append |
Specifies if the data should be appended to the file or if the file should be overwritten | No, defaults to false |
If the file already exists it will be truncated and overwritten.
Example: writing data to a file
emit -from :2015-01-01: -limit 2
| put name = 'write_me', value = count()
| write file -file '/tmp/write_me.json'
The resulting file /tmp/write_me.json
contains:
[
{
"time": "2015-01-01T00:00:00.000Z",
"name": "write_me",
"value": 1
},
{
"time": "2015-01-01T00:00:01.000Z",
"name": "write_me",
"value": 2
}
]