batch

Create batches by segmenting a sequence of points ordered by time stamp, each segment spanning a specified interval of time.

batch
  -every duration
  -on duration-or-calendar-offset batch-interval
Parameter Description Required?
-everyor batch-interval The time interval for batches, specified as one of the following:
n
The number of seconds in the time interval
:duration:
The time interval expressed as a moment literal
Yes
-on A time alignment for the batches. It may be a duration or a calendar offset less than batch interval. For example, -every :hour: -on :00:30:00:batches points over an hour on the half-hour, while -every :month: -on :day 10:batches monthly starting on day 10 of the month. If the beginning or ending of your data does not align evenly with these times, the first and last batch will contain less than the specified interval. No; if -onis not specified, output batches are aligned with the UNIX epoch. If batch-intervalequals one day, then batch boundaries are at midnight UTC.

Many processors do their work over groups of points called batches. For example, the sort processor orders everything within a batch and the reduce processor aggregates points within a batch.

The batch processor creates batches by segmenting a sequence of points ordered by time stamp, each segment spanning batch-interval seconds of time. It does not alter points or their travel in any way. Instead, it adds information to the sequence that segments it into disjoint groups of points.

The end time of one batch is the start time of the following batch. Batch boundaries are time values that exist independent of the points, and there may or may not be points having these values as their time stamps. When batching is in place, any points that share a given time stamp are guaranteed to lie within the same batch. A batch processor downstream from an earlier batch replaces the earlier grouping with the new one.

Example: Call records, day by day

// Call Record billing example:
//
// Call records arrive as a stream of points indicating duration in minutes.
// Your phone bill is the total of these, charged at $.05/minute, from the
// 20th of each month.
//
// This program displays a day-by-day running total of your bill:
//
sub call_record() {
  emit -from :2014-01-01: -limit 4000 -every :h:
  | put name = 'duration'
  | put value = (Math.random() - .5) * 20 + (Math.random() - .5) * 10 + 5
}
call_record
| batch 
    -every :month:  
    -on :day 20: 
| put name = 'total', value = sum(value) * 0.05
| view timechart
;
//
// This program displays a table with monthly totals
//
call_record
| reduce 
    -every :month:
    -on :day 20: value = sum(value) * 0.05
| put name = "total", value = Math.floor(value * 100) / 100 
| view table 

Example: create one-second interval batches

In this example, emit sends 20 points at a rate of 5 points per second. The points are then divided into batches at one-second intervals. The tail processor outputs the last point from each batch, resulting in 4 output points. Without the batch processor, tail would handle all 20 points at once, resulting in a single output point.

emit -from :0: -hz 5 -limit 20 
| batch 1 
| tail 1 
| view text

Example: batching live data

In contrast with the earlier example that used historical mode to emit data with past timestamps, all at once, this example generates data in real time, divides into batches, and uses the tail processor to output the last 2 points from each batch.

emit -hz 10 -limit 100 
| batch 1
| tail 2 
| view text