.. _running-batch:

Running the Batch Pipeline
####################################################################################################

This tutorial goes through the steps of setting up a configuration file needed by iDQ to run
the batch pipeline.

.. _running-batch_config:

Configuration file
====================================================================================================

In order to run one-off or batch tasks, you'll need to provide iDQ with a
TOML-formatted file. An example configuration is located at
``etc/config.toml``. Below is a guide that will get you started with common
configuration options, an exhaustive list of options is located at
:doc:`../configuration`.

**Common options that will be used throughout batch jobs:**

.. code-block:: toml

    [general]
    tag = "test"
    instrument = "L1"
    rootdir = "/path/to/analysis"

    classifiers = ["classifier1", "classifier2"]


**Options for defining glitch/clean samples:**

.. code-block:: toml

    [samples]
    target_channel = "channel_name"
    dirty_window = time_window (in seconds)

    [samples.target_bounds]
    significance = [lower_bound, upper_bound]
    frequency = [lower_bound, upper_bound]

    [samples.dirty_bounds]
    significance = [lower_bound, upper_bound]
    frequency = [lower_bound, upper_bound]


Glitches are determined by looking at a specified target channel and finding
features that fall within these target bounds.

Clean samples are determined by first finding all times that fall within dirty
bounds and removing all times within the dirty window specified. Whatever time
is left will be sampled to generate clean samples.

**Options for reading input features:**

.. code-block:: toml

    [features]
    flavor = "omicron"

    columns = ["time", "snr", "frequency"]

    time = "time"
    significance = "snr"
    frequency = "frequency"

This section tells iDQ where and how to read input features, define which columns to use
and also which columns are used to used to determine the time, significance and frequency.

In this case, we use the ``omicron`` flavor which searches for Omicron triggers. We want
to use the three columns defined above and in addition, we define that determine times
from triggers by using the ``time`` column, assigning the ``significance`` for determining
target and dirty bounds are determined by looking at the ``snr`` column.

**Options for batch jobs:**

Possible job options are:

  1. train
  2. evaluate
  3. calibrate
  4. timeseries

.. code-block:: toml

    [job]
    workflow = "workflow_type"

    random_rate = rate (train/evaluate only)
    min_stride = stride (train only)
    srate = rate (timeseries only)

    [job.reporting]
    flavor = "reporter_type"
    whatever kwargs are needed by this reporter

**Classifier options:**

Here, you'll be creating a section, one per classifier, with the keyword
arguments needed for that particular classifier. For example, for a support
vector machine classifier:

.. code-block:: toml

    [[classifier]]
    name = "svm"
    flavor = "sklearn:svm"

    # feature vector options
    default = 0
    window = 0.1
    whitener = "standard"

    # parallelization options
    num_cv_proc = 8

    # hyperparameters
    [classifier.params]
    classifier__C: 100
    classifier__gamma = 10


**Segment options:**

In addition, you'll need to provide a way for iDQ to query DQSegDB for valid segments.

.. code-block:: toml

    [segments]
    segdb_url = "https://segments.ligo.org"

    intersect = "H1:DMT-ANALYSIS_READY:1"

**Condor options:**

If you're planning on using condor workflows in any part of iDQ, you'll also
have to specify options for condor submission as well.

.. code-block:: toml

    [condor]
    universe = "vanilla"
    retry = 3

    accounting_group = "your.accounting.group"
    accounting_group_user = "albert.einstein"


After you've set up your configuration file, you're ready to launch one-off or
batch iDQ tasks (running the full workflow).

.. _running-batch_one_off:

One-off Tasks
====================================================================================================

* ``idq-train``:

.. program-output:: idq-train --help
      :nostderr:

.. graphviz::

   digraph idq_train {
      labeljust = "r";
      label=""
      rankdir=LR;
      graph [fontname="Roman", fontsize=24];
      edge [ fontname="Roman", fontsize=10 ];
      node [fontname="Roman", shape=box, fontsize=11];
      style=rounded;
      labeljust = "r";
      fontsize = 14;


      DataSrc [label="auxiliary features"];
      Train [label="idq-train"];
      Model [label="model"]

      DataSrc -> Train;
      Train -> Model;

   }

* ``idq-evaluate``:

.. program-output:: idq-evaluate --help
      :nostderr:

.. graphviz::

   digraph idq_evaluate {
      labeljust = "r";
      label=""
      rankdir=LR;
      graph [fontname="Roman", fontsize=24];
      edge [ fontname="Roman", fontsize=10 ];
      node [fontname="Roman", shape=box, fontsize=11];
      style=rounded;
      labeljust = "r";
      fontsize = 14;


      DataSrc [label="auxiliary features"];
      Model [label="model"]
      Evaluate [label="idq-evaluate"];
      Quiver [label="quiver"];

      DataSrc -> Evaluate;
      Model -> Evaluate;
      Evaluate -> Quiver;

   }

* ``idq-calibrate``:

.. program-output:: idq-calibrate --help
      :nostderr:

.. graphviz::

   digraph idq_calibrate {
      labeljust = "r";
      label=""
      rankdir=LR;
      graph [fontname="Roman", fontsize=24];
      edge [ fontname="Roman", fontsize=10 ];
      node [fontname="Roman", shape=box, fontsize=11];
      style=rounded;
      labeljust = "r";
      fontsize = 14;


      Calibrate [label="idq-calibrate"];
      CalibMap [label="calibration map"];
      Quiver [label="quiver"];

      Quiver -> Calibrate;
      Calibrate -> CalibMap;

   }

* ``idq-timeseries``

.. program-output:: idq-timeseries --help
      :nostderr:

.. graphviz::

   digraph idq_timeseries {
      labeljust = "r";
      label=""
      rankdir=LR;
      graph [fontname="Roman", fontsize=24];
      edge [ fontname="Roman", fontsize=10 ];
      node [fontname="Roman", shape=box, fontsize=11];
      style=rounded;
      labeljust = "r";
      fontsize = 14;


      DataSrc [label="auxiliary features"];
      Model [label="model"]
      CalibMap [label="calibration map"];
      Timeseries [label="idq-timeseries"];
      PGlitch [label="p(glitch) timeseries"];

      DataSrc -> Timeseries;
      Model -> Timeseries;
      CalibMap -> Timeseries;
      Timeseries -> PGlitch;

   }

.. _running-batch_stream:

Batch Tasks
====================================================================================================

* ``idq-batch``:

.. program-output:: idq-batch --help
      :nostderr:

.. graphviz::

   digraph idq_batch {
      labeljust = "r";
      label=""
      rankdir=LR;
      graph [fontname="Roman", fontsize=24];
      edge [ fontname="Roman", fontsize=10 ];
      node [fontname="Roman", shape=box, fontsize=11];
      style=rounded;
      labeljust = "r";
      fontsize = 14;


      DataSrc [label="auxiliary features"];

      Model [label="model"]
      Quiver [label="quiver"];
      CalibMap [label="calibration map"];

      Train [label="batch.train"];
      Evaluate [label="batch.evaluate"];
      Calibrate [label="batch.calibrate"];
      Timeseries [label="batch.timeseries"];

      PGlitch [label="p(glitch) timeseries"];

      DataSrc -> Train;
      Train -> Model;

      DataSrc -> Evaluate;
      Model -> Evaluate;
      Evaluate -> Quiver;

      Quiver -> Calibrate;
      Calibrate -> CalibMap;

      DataSrc -> Timeseries;
      Model -> Timeseries;
      CalibMap -> Timeseries;
      Timeseries -> PGlitch;

   }