Running the Batch Pipeline

This tutorial goes through the steps of setting up a configuration file needed by iDQ to run the batch pipeline.

Configuration file

In order to run one-off or batch tasks, you’ll need to provide iDQ with a TOML-formatted file. An example configuration is located at etc/config.toml. Below is a guide that will get you started with common configuration options, an exhaustive list of options is located at Configuration.

Common options that will be used throughout batch jobs:

[general]
tag = "test"
instrument = "L1"
rootdir = "/path/to/analysis"

classifiers = ["classifier1", "classifier2"]

Options for defining glitch/clean samples:

[samples]
target_channel = "channel_name"
dirty_window = time_window (in seconds)

[samples.target_bounds]
significance = [lower_bound, upper_bound]
frequency = [lower_bound, upper_bound]

[samples.dirty_bounds]
significance = [lower_bound, upper_bound]
frequency = [lower_bound, upper_bound]

Glitches are determined by looking at a specified target channel and finding features that fall within these target bounds.

Clean samples are determined by first finding all times that fall within dirty bounds and removing all times within the dirty window specified. Whatever time is left will be sampled to generate clean samples.

Options for reading input features:

[features]
flavor = "omicron"

columns = ["time", "snr", "frequency"]

time = "time"
significance = "snr"
frequency = "frequency"

This section tells iDQ where and how to read input features, define which columns to use and also which columns are used to used to determine the time, significance and frequency.

In this case, we use the omicron flavor which searches for Omicron triggers. We want to use the three columns defined above and in addition, we define that determine times from triggers by using the time column, assigning the significance for determining target and dirty bounds are determined by looking at the snr column.

Options for batch jobs:

Possible job options are:

  1. train

  2. evaluate

  3. calibrate

  4. timeseries

[job]
workflow = "workflow_type"

random_rate = rate (train/evaluate only)
min_stride = stride (train only)
srate = rate (timeseries only)

[job.reporting]
flavor = "reporter_type"
whatever kwargs are needed by this reporter

Classifier options:

Here, you’ll be creating a section, one per classifier, with the keyword arguments needed for that particular classifier. For example, for a support vector machine classifier:

[[classifier]]
name = "svm"
flavor = "sklearn:svm"

# feature vector options
default = 0
window = 0.1
whitener = "standard"

# parallelization options
num_cv_proc = 8

# hyperparameters
[classifier.params]
classifier__C: 100
classifier__gamma = 10

Segment options:

In addition, you’ll need to provide a way for iDQ to query DQSegDB for valid segments.

[segments]
segdb_url = "https://segments.ligo.org"

intersect = "H1:DMT-ANALYSIS_READY:1"

Condor options:

If you’re planning on using condor workflows in any part of iDQ, you’ll also have to specify options for condor submission as well.

[condor]
universe = "vanilla"
retry = 3

accounting_group = "your.accounting.group"
accounting_group_user = "albert.einstein"

After you’ve set up your configuration file, you’re ready to launch one-off or batch iDQ tasks (running the full workflow).

One-off Tasks

  • idq-train:

usage: idq-train [-h] [-q | -v] [-e EXCLUDE EXCLUDE] CONFIG START END

positional arguments:
  CONFIG
  START
  END

options:
  -h, --help            show this help message and exit
  -q, --quiet           If set, only display warnings and errors.
  -v, --verbose         If set, display additional logging messages.
  -e EXCLUDE EXCLUDE, --exclude EXCLUDE EXCLUDE
                        exclude this segment from the analysis. Can be
                        repeated to excludemultiple segments. Useful for
                        round-robin training/evaluation.
digraph idq_train { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; DataSrc [label="auxiliary features"]; Train [label="idq-train"]; Model [label="model"] DataSrc -> Train; Train -> Model; }
  • idq-evaluate:

usage: idq-evaluate [-h] [-q | -v] CONFIG START END

positional arguments:
  CONFIG
  START
  END

options:
  -h, --help     show this help message and exit
  -q, --quiet    If set, only display warnings and errors.
  -v, --verbose  If set, display additional logging messages.
digraph idq_evaluate { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; DataSrc [label="auxiliary features"]; Model [label="model"] Evaluate [label="idq-evaluate"]; Quiver [label="quiver"]; DataSrc -> Evaluate; Model -> Evaluate; Evaluate -> Quiver; }
  • idq-calibrate:

usage: idq-calibrate [-h] [-q | -v] CONFIG START END

positional arguments:
  CONFIG
  START
  END

options:
  -h, --help     show this help message and exit
  -q, --quiet    If set, only display warnings and errors.
  -v, --verbose  If set, display additional logging messages.
digraph idq_calibrate { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; Calibrate [label="idq-calibrate"]; CalibMap [label="calibration map"]; Quiver [label="quiver"]; Quiver -> Calibrate; Calibrate -> CalibMap; }
  • idq-timeseries

usage: idq-timeseries [-h] [-q | -v] CONFIG START END

positional arguments:
  CONFIG
  START
  END

options:
  -h, --help     show this help message and exit
  -q, --quiet    If set, only display warnings and errors.
  -v, --verbose  If set, display additional logging messages.
digraph idq_timeseries { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; DataSrc [label="auxiliary features"]; Model [label="model"] CalibMap [label="calibration map"]; Timeseries [label="idq-timeseries"]; PGlitch [label="p(glitch) timeseries"]; DataSrc -> Timeseries; Model -> Timeseries; CalibMap -> Timeseries; Timeseries -> PGlitch; }

Batch Tasks

  • idq-batch:

usage: idq-batch [-h] [-q | -v] [-w WORKFLOW] [-i INITIAL_LOOKBACK]
                 [--skip-timeseries] [--skip-report] [-c] [-n NUM_BINS]
                 [-N NUM_SEGS_PER_BIN] [-b]
                 CONFIG START END

positional arguments:
  CONFIG
  START
  END

options:
  -h, --help            show this help message and exit
  -q, --quiet           If set, only display warnings and errors.
  -v, --verbose         If set, display additional logging messages.
  -w WORKFLOW, --workflow WORKFLOW
                        workflow for launching batch jobs
  -i INITIAL_LOOKBACK, --initial-lookback INITIAL_LOOKBACK
                        if causal batch is specified, that look back this much
                        before t_start to use as a seed to evaluate starting
                        at t_start.
  --skip-timeseries     do not generate timeseries
  --skip-report         do not generate report
  -c, --causal          use causal round-robin binning
  -n NUM_BINS, --num-bins NUM_BINS
                        the number of round-robin bins to generate.Divisions
                        are made according to walltime
  -N NUM_SEGS_PER_BIN, --num-segs-per-bin NUM_SEGS_PER_BIN
                        the number of segments per bin within the round-robin
                        procedure. If this is greater than 1, segments will be
                        organized in a checkerboard pattern in order to sample
                        from the entire range in both training and evaluation.
                        Note, this is only used if --causal is NOT supplied.
  -b, --block           if supplied, this process will block until the DAG has
                        completed. Used when workflow=condor.
digraph idq_batch { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; DataSrc [label="auxiliary features"]; Model [label="model"] Quiver [label="quiver"]; CalibMap [label="calibration map"]; Train [label="batch.train"]; Evaluate [label="batch.evaluate"]; Calibrate [label="batch.calibrate"]; Timeseries [label="batch.timeseries"]; PGlitch [label="p(glitch) timeseries"]; DataSrc -> Train; Train -> Model; DataSrc -> Evaluate; Model -> Evaluate; Evaluate -> Quiver; Quiver -> Calibrate; Calibrate -> CalibMap; DataSrc -> Timeseries; Model -> Timeseries; CalibMap -> Timeseries; Timeseries -> PGlitch; }