Skip to content

Data Model

This page explains the design decisions behind the arrakis data model: why timestamps are in nanoseconds, how channels are structured, and how data is organized.

GPS Time in Nanoseconds

Arrakis uses GPS time throughout. The user-facing API accepts GPS seconds as float values, but internally all timestamps are stored as integer nanoseconds. This avoids floating-point precision issues that accumulate over long time ranges.

The conversion is handled by arrakis.block.time_as_ns:

from arrakis.block import time_as_ns

# 1187008882.443 GPS seconds -> nanoseconds
ns = time_as_ns(1187008882.443)

The Time enum provides named multipliers for readability:

from arrakis import Time

timestamp = 1187008882 * Time.SECONDS  # -> 1187008882000000000
offset = 500 * Time.MILLISECONDS       # -> 500000000

SeriesBlock: The Core Container

A arrakis.block.SeriesBlock groups timeseries data for multiple channels at a single timestamp. Every block has:

  • time_ns -- the GPS start time in nanoseconds.
  • data -- a dictionary mapping channel names to NumPy arrays.
  • channels -- a dictionary mapping channel names to Channel metadata.

All channels in a block must span the same time duration. This is enforced at construction time -- the duration of each channel's data array (computed from its length and sample rate) must agree.

Series: Single-Channel View

Indexing a SeriesBlock by channel name returns a arrakis.block.Series. This is a lightweight view combining the data array with its channel metadata, providing convenient properties like sample_rate, duration, times, and dt.

Channel Naming Convention

Channel names follow the LIGO convention:

<domain>:<subsystem>[-_]<rest>

For example, H1:CAL-DELTAL_EXTERNAL_DQ:

Part Value Meaning
Domain H1 The detector (Hanford)
Subsystem CAL Calibration subsystem
Delimiter - Subsystem separator
Rest DELTAL_EXTERNAL_DQ Specific signal identifier

The arrakis.channel.Channel class validates this format on construction and exposes domain and subsystem as properties.

Partitioning

For Kafka-based streaming and publishing, channels are grouped into partitions. Each partition:

  • Has a unique partition_id that maps to a Kafka topic (arrakis-{partition_id}).
  • Contains channels of the same data type.
  • Assigns each channel a partition_index -- a compact integer used in the wire format instead of the full channel name string.

Partitioning is managed by the server. Publishers receive partition assignments during registration; consumers receive them as part of the channel metadata.

Gap Representation

Missing data is represented as NumPy masked arrays (numpy.ma.MaskedArray) with all values masked. This preserves the expected array shape and dtype while clearly indicating that no real data is available.

Gap blocks can be created explicitly:

  • SeriesBlock.full_gap(time_ns, duration_ns, channels) -- a block where every channel is a gap.
  • block.create_gaps(channels) -- adds gap entries for channels not already present in the block.

The multiplexer creates gap blocks automatically when data does not arrive before the configured timeout.

Freq Enum

The Freq enum converts between sample rates and nanosecond periods. Multiplying a number by a Freq member yields a stride in nanoseconds:

from arrakis import Freq

stride = 64 * Freq.Hz   # nanosecond period for 64 Hz
stride = 1 * Freq.kHz   # nanosecond period for 1 kHz

This is computed as (Freq.value / rate) * Time.SECONDS, giving the time between samples in nanoseconds.