Running the Streaming Pipeline¶
This is the how to guide for new folks who want to run their own version of the pipeline. The syntax for iDQ’s executables are fairly standard (and fairly limited), and so much of the work setting up iDQ lies with correctly specifying the config file. We address both points in this tutorial.
While we note what would be needed to configure iDQ for several different sources of features (triggers), we only provide a full example for synthetic trigger streams (idq.io.MockClassifierData
).
Configuration file¶
A complete description of iDQ’s configuration (INI) files can be found here: Configuration. However, we provide enough information below to get you started.
The INI file has several required sections, which we’ll introduce in turn. An example is provided alongside the source code (~etc/idq.ini), but we repeat a simplified version below.
In addition to the INI file, analysts must manage the list of channels to be used in the analysis. These are typically determined via safety studies (i.e.: hardware injections), but this simplified INI uses synthetic data generated on the fly. As such, you will also have to manage a config file for your synthetic data in addition to the channel list.
idq.ini
#-------------------------------------------------
# high-level shared parameters
[general]
tag = test
instrument = Fake1
rootdir = .
classifiers = ovl
[samples]
target_channel = target_channel
target_bounds =
dirty_bounds =
dirty_window = 0.
#-------------------------------------------------
# parameters for training jobs
[train]
workflow = block
log_level = 10
random_rate = 0.1
[train data discovery]
flavor = MockClassifierData
time = time
ignore_segdb = False
columns = ['time', 'snr', 'frequency']
config =
[train stream]
stride =
delay =
[train reporting]
flavor = PickleReporter
#-------------------------------------------------
# parameters for evaluation jobs
[evaluate]
workflow =
log_level =
random_rate =
[evaluate data discovery]
flavor = MockClassifierData
time = time
ignore_segdb = False
columns = ['time', 'snr', 'frequency']
config =
[evaluate stream]
stride =
delay =
[evaluate reporting]
flavor = QuiverReporter
#-------------------------------------------------
# parameters for calibration jobs
[calibrate]
workflow = block
log_level = 10
[calibrate reporting]
flavor = CalibrationMapReporter
#-------------------------------------------------
# parameters for timeseries jobs
[timeseries]
workflow = block
log_level = 10
srate = 128
[timeseries data discovery]
flavor = MockClassifierData
time = time
ignore_segdb = False
columns = ['time', 'snr', 'frequency']
config =
[timeseries stream]
stride =
delay =
[timeseries reporting]
flavor = GWFSeriesReporter
#-------------------------------------------------
# parameters for classifiers
[ovl]
flavor = OVL
incremental = 100
num_recalculate = 10
metric = eff_fap
minima = {'eff_fap': 3, 'poisson_signif':5, 'use_percentage':1e-3}
time = time
significance = significance
mcd.ini
WRITE AN EXAMPLE HERE WITH VERY FEW CHANNELS
channels.txt
WRITE this
Streaming Tasks¶
Describe how to manage (asynchronous) processes via idq-stream
.
Describe how that will manage
idq-streaming_train
idq-streaming_evaluate
idq-streaming_calibrate
idq-streaming_timeseries
Describe the input/output data streams for each.