Aggregator¶
Overview¶
The Aggregator API provides utilities for storing and aggregating timeseries data that is accessible via HTTP API. It offers full integration with InfluxDB, a timeseries database, as a data backend.
Basic Usage¶
Instantiate an InfluxDB Aggregator¶
from ligo.scald.io import influx
# instantiate the aggregator
aggregator = influx.Aggregator(hostname='influx.hostname', port=8086, db='your_database')
Register a Schema¶
Before storing data, register a measurement schema that defines how data is stored in the backend:
# register a measurement schema (how data is stored in backend)
measurement = 'my_meas'
columns = ('column1', 'column2')
column_key = 'column1'
tags = ('tag1', 'tag2')
tag_key = 'tag2'
aggregator.register_schema(measurement, columns, column_key, tags, tag_key)
Storing Data¶
The Aggregator provides two methods for storing data: row format and column format. All data that is ingested will be downsampled to a maximum sampling rate of 1 Hz based on an aggregate quantity (min, median or max).
Row Format¶
Store data in row form where each row represents a single time point:
# option 1: store data in row form
row_1 = {'time': 1234567890, 'fields': {'column1': 1.2, 'column2': 0.3}}
row_2 = {'time': 1234567890.5, 'fields': {'column1': 0.3, 'column2': 0.4}}
row_3 = {'time': 1234567890, 'fields': {'column1': 2.3, 'column2': 1.1}}
row_4 = {'time': 1234567890.5, 'fields': {'column1': 0.1, 'column2': 2.3}}
rows = {('001', 'andrew'): [row_1, row_2], ('002', 'parce'): [row_3, row_4]}
aggregator.store_rows(measurement, rows)
Column Format¶
Store data in column form where each entry contains arrays of values:
# option 2: store data in column form
cols_1 = {
'time': [1234567890, 1234567890.5],
'fields': {'column1': [1.2, 0.3], 'column2': [0.3, 0.4]}
}
cols_2 = {
'time': [1234567890, 1234567890.5],
'fields': {'column1': [2.3, 0.1], 'column2': [1.1, 2.3]}
}
cols = {('001', 'andrew'): cols_1, ('002', 'parce'): cols_2}
aggregator.store_columns(measurement, cols)
Schema Components¶
Understanding the schema components:
measurement: The name of the measurement/table in the database
columns: Tuple of column names that will store the actual data values
column_key: The primary column used for aggregation operations
tags: Tuple of tag names used for indexing and grouping data
tag_key: The primary tag used for organizing data series
Tags vs Fields¶
Tags are indexed and used for filtering and grouping data
Fields contain the actual measurement values and are not indexed
Choose tags for metadata that you’ll frequently filter by
Use fields for numerical data that you want to aggregate or visualize