Data Model¶
This page explains the design decisions behind the arrakis data model: why timestamps are in nanoseconds, how channels are structured, and how data is organized.
GPS Time in Nanoseconds¶
Arrakis uses GPS time throughout. The user-facing API accepts GPS seconds as
float values, but internally all timestamps are stored as integer
nanoseconds. This avoids floating-point precision issues that accumulate over
long time ranges.
The conversion is handled by arrakis.block.time_as_ns:
from arrakis.block import time_as_ns
# 1187008882.443 GPS seconds -> nanoseconds
ns = time_as_ns(1187008882.443)
The Time enum provides named multipliers for readability:
from arrakis import Time
timestamp = 1187008882 * Time.SECONDS # -> 1187008882000000000
offset = 500 * Time.MILLISECONDS # -> 500000000
SeriesBlock: The Core Container¶
A arrakis.block.SeriesBlock groups timeseries data for multiple channels at a single timestamp. Every block has:
time_ns-- the GPS start time in nanoseconds.data-- a dictionary mapping channel names to NumPy arrays.channels-- a dictionary mapping channel names toChannelmetadata.
All channels in a block must span the same time duration. This is enforced at construction time -- the duration of each channel's data array (computed from its length and sample rate) must agree.
Series: Single-Channel View¶
Indexing a SeriesBlock by channel name returns a arrakis.block.Series.
This is a lightweight view combining the data array with its channel metadata,
providing convenient properties like sample_rate, duration, times, and
dt.
Channel Naming Convention¶
Channel names follow the LIGO convention:
<domain>:<subsystem>[-_]<rest>
For example, H1:CAL-DELTAL_EXTERNAL_DQ:
| Part | Value | Meaning |
|---|---|---|
| Domain | H1 |
The detector (Hanford) |
| Subsystem | CAL |
Calibration subsystem |
| Delimiter | - |
Subsystem separator |
| Rest | DELTAL_EXTERNAL_DQ |
Specific signal identifier |
The arrakis.channel.Channel class validates this format on construction
and exposes domain and subsystem as properties.
Partitioning¶
For Kafka-based streaming and publishing, channels are grouped into partitions. Each partition:
- Has a unique
partition_idthat maps to a Kafka topic (arrakis-{partition_id}). - Contains channels of the same data type.
- Assigns each channel a
partition_index-- a compact integer used in the wire format instead of the full channel name string.
Partitioning is managed by the server. Publishers receive partition assignments during registration; consumers receive them as part of the channel metadata.
Gap Representation¶
Missing data is represented as NumPy masked arrays (numpy.ma.MaskedArray)
with all values masked. This preserves the expected array shape and dtype while
clearly indicating that no real data is available.
Gap blocks can be created explicitly:
SeriesBlock.full_gap(time_ns, duration_ns, channels)-- a block where every channel is a gap.block.create_gaps(channels)-- adds gap entries for channels not already present in the block.
The multiplexer creates gap blocks automatically when data does not arrive before the configured timeout.
Freq Enum¶
The Freq enum converts between sample rates and nanosecond
periods. Multiplying a number by a Freq member yields a stride in
nanoseconds:
from arrakis import Freq
stride = 64 * Freq.Hz # nanosecond period for 64 Hz
stride = 1 * Freq.kHz # nanosecond period for 1 kHz
This is computed as (Freq.value / rate) * Time.SECONDS, giving the time
between samples in nanoseconds.