batch
Create batches by segmenting a sequence of points ordered by time stamp, each segment spanning a specified interval of time.
batch
-every duration
-on duration-or-calendar-offset batch-interval
Parameter | Description | Required? |
---|---|---|
-every or batch-interval |
The time interval for batches, specified as one of the following:
|
Yes |
-on |
A time alignment for the batches. It may be a duration or a calendar offset less than batch interval. For example, -every :hour: -on :00:30:00: batches points over an hour on the half-hour, while -every :month: -on :day 10: batches monthly starting on day 10 of the month. If the beginning or ending of your data does not align evenly with these times, the first and last batch will contain less than the specified interval. |
No; if -on is not specified, output batches are aligned with the UNIX epoch. If batch-interval equals one day, then batch boundaries are at midnight UTC. |
Many processors do their work over groups of points called batches. For example, the sort processor orders everything within a batch and the reduce processor aggregates points within a batch.
The batch processor creates batches by segmenting a sequence of points ordered by time stamp, each segment spanning batch-interval seconds of time. It does not alter points or their travel in any way. Instead, it adds information to the sequence that segments it into disjoint groups of points.
The end time of one batch is the start time of the following batch. Batch boundaries are time values that exist independent of the points, and there may or may not be points having these values as their time stamps. When batching is in place, any points that share a given time stamp are guaranteed to lie within the same batch. A batch processor downstream from an earlier batch replaces the earlier grouping with the new one.
Example: Call records, day by day
// Call Record billing example:
//
// Call records arrive as a stream of points indicating duration in minutes.
// Your phone bill is the total of these, charged at $.05/minute, from the
// 20th of each month.
//
// This program displays a day-by-day running total of your bill:
//
sub call_record() {
emit -from :2014-01-01: -limit 4000 -every :h:
| put name = 'duration'
| put value = (Math.random() - .5) * 20 + (Math.random() - .5) * 10 + 5
}
call_record
| batch
-every :month:
-on :day 20:
| put name = 'total', value = sum(value) * 0.05
| view timechart
;
//
// This program displays a table with monthly totals
//
call_record
| reduce
-every :month:
-on :day 20: value = sum(value) * 0.05
| put name = "total", value = Math.floor(value * 100) / 100
| view table
Example: create one-second interval batches
In this example, emit sends 20 points at a rate of 5 points per second. The points are then divided into batches at one-second intervals. The tail processor outputs the last point from each batch, resulting in 4 output points. Without the batch processor, tail would handle all 20 points at once, resulting in a single output point.
emit -from :0: -hz 5 -limit 20
| batch 1
| tail 1
| view text
Example: batching live data
In contrast with the earlier example that used historical mode to emit data with past timestamps, all at once, this example generates data in real time, divides into batches, and uses the tail processor to output the last 2 points from each batch.
emit -hz 10 -limit 100
| batch 1
| tail 2
| view text