Aggregator

Overview

The Aggregator API provides utilities for storing and aggregating timeseries data that is accessible via HTTP API. It offers full integration with InfluxDB, a timeseries database, as a data backend.

Basic Usage

Instantiate an InfluxDB Aggregator

from ligo.scald.io import influx

# instantiate the aggregator
aggregator = influx.Aggregator(hostname='influx.hostname', port=8086, db='your_database')

Register a Schema

Before storing data, register a measurement schema that defines how data is stored in the backend:

# register a measurement schema (how data is stored in backend)
measurement = 'my_meas'
columns = ('column1', 'column2')
column_key = 'column1'
tags = ('tag1', 'tag2')
tag_key = 'tag2'

aggregator.register_schema(measurement, columns, column_key, tags, tag_key)

Storing Data

The Aggregator provides two methods for storing data: row format and column format. All data that is ingested will be downsampled to a maximum sampling rate of 1 Hz based on an aggregate quantity (min, median or max).

Row Format

Store data in row form where each row represents a single time point:

# option 1: store data in row form
row_1 = {'time': 1234567890, 'fields': {'column1': 1.2, 'column2': 0.3}}
row_2 = {'time': 1234567890.5, 'fields': {'column1': 0.3, 'column2': 0.4}}

row_3 = {'time': 1234567890, 'fields': {'column1': 2.3, 'column2': 1.1}}
row_4 = {'time': 1234567890.5, 'fields': {'column1': 0.1, 'column2': 2.3}}

rows = {('001', 'andrew'): [row_1, row_2], ('002', 'parce'): [row_3, row_4]}

aggregator.store_rows(measurement, rows)

Column Format

Store data in column form where each entry contains arrays of values:

# option 2: store data in column form
cols_1 = {
    'time': [1234567890, 1234567890.5],
    'fields': {'column1': [1.2, 0.3], 'column2': [0.3, 0.4]}
}
cols_2 = {
    'time': [1234567890, 1234567890.5],
    'fields': {'column1': [2.3, 0.1], 'column2': [1.1, 2.3]}
}
cols = {('001', 'andrew'): cols_1, ('002', 'parce'): cols_2}

aggregator.store_columns(measurement, cols)

Schema Components

Understanding the schema components:

  • measurement: The name of the measurement/table in the database

  • columns: Tuple of column names that will store the actual data values

  • column_key: The primary column used for aggregation operations

  • tags: Tuple of tag names used for indexing and grouping data

  • tag_key: The primary tag used for organizing data series

Tags vs Fields

  • Tags are indexed and used for filtering and grouping data

  • Fields contain the actual measurement values and are not indexed

  • Choose tags for metadata that you’ll frequently filter by

  • Use fields for numerical data that you want to aggregate or visualize