.. _running-batch: Running the Batch Pipeline #################################################################################################### This tutorial goes through the steps of setting up a configuration file needed by iDQ to run the batch pipeline. .. _running-batch_config: Configuration file ==================================================================================================== In order to run one-off or batch tasks, you'll need to provide iDQ with a TOML-formatted file. An example configuration is located at ``etc/config.toml``. Below is a guide that will get you started with common configuration options, an exhaustive list of options is located at :doc:`../configuration`. **Common options that will be used throughout batch jobs:** .. code-block:: toml [general] tag = "test" instrument = "L1" rootdir = "/path/to/analysis" classifiers = ["classifier1", "classifier2"] **Options for defining glitch/clean samples:** .. code-block:: toml [samples] target_channel = "channel_name" dirty_window = time_window (in seconds) [samples.target_bounds] significance = [lower_bound, upper_bound] frequency = [lower_bound, upper_bound] [samples.dirty_bounds] significance = [lower_bound, upper_bound] frequency = [lower_bound, upper_bound] Glitches are determined by looking at a specified target channel and finding features that fall within these target bounds. Clean samples are determined by first finding all times that fall within dirty bounds and removing all times within the dirty window specified. Whatever time is left will be sampled to generate clean samples. **Options for reading input features:** .. code-block:: toml [features] flavor = "omicron" columns = ["time", "snr", "frequency"] time = "time" significance = "snr" frequency = "frequency" This section tells iDQ where and how to read input features, define which columns to use and also which columns are used to used to determine the time, significance and frequency. In this case, we use the ``omicron`` flavor which searches for Omicron triggers. We want to use the three columns defined above and in addition, we define that determine times from triggers by using the ``time`` column, assigning the ``significance`` for determining target and dirty bounds are determined by looking at the ``snr`` column. **Options for batch jobs:** Possible job options are: 1. train 2. evaluate 3. calibrate 4. timeseries .. code-block:: toml [job] workflow = "workflow_type" random_rate = rate (train/evaluate only) min_stride = stride (train only) srate = rate (timeseries only) [job.reporting] flavor = "reporter_type" whatever kwargs are needed by this reporter **Classifier options:** Here, you'll be creating a section, one per classifier, with the keyword arguments needed for that particular classifier. For example, for a support vector machine classifier: .. code-block:: toml [[classifier]] name = "svm" flavor = "sklearn:svm" # feature vector options default = 0 window = 0.1 whitener = "standard" # parallelization options num_cv_proc = 8 # hyperparameters [classifier.params] classifier__C: 100 classifier__gamma = 10 **Segment options:** In addition, you'll need to provide a way for iDQ to query DQSegDB for valid segments. .. code-block:: toml [segments] segdb_url = "https://segments.ligo.org" intersect = "H1:DMT-ANALYSIS_READY:1" **Condor options:** If you're planning on using condor workflows in any part of iDQ, you'll also have to specify options for condor submission as well. .. code-block:: toml [condor] universe = "vanilla" retry = 3 accounting_group = "your.accounting.group" accounting_group_user = "albert.einstein" After you've set up your configuration file, you're ready to launch one-off or batch iDQ tasks (running the full workflow). .. _running-batch_one_off: One-off Tasks ==================================================================================================== * ``idq-train``: .. program-output:: idq-train --help :nostderr: .. graphviz:: digraph idq_train { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; DataSrc [label="auxiliary features"]; Train [label="idq-train"]; Model [label="model"] DataSrc -> Train; Train -> Model; } * ``idq-evaluate``: .. program-output:: idq-evaluate --help :nostderr: .. graphviz:: digraph idq_evaluate { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; DataSrc [label="auxiliary features"]; Model [label="model"] Evaluate [label="idq-evaluate"]; Quiver [label="quiver"]; DataSrc -> Evaluate; Model -> Evaluate; Evaluate -> Quiver; } * ``idq-calibrate``: .. program-output:: idq-calibrate --help :nostderr: .. graphviz:: digraph idq_calibrate { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; Calibrate [label="idq-calibrate"]; CalibMap [label="calibration map"]; Quiver [label="quiver"]; Quiver -> Calibrate; Calibrate -> CalibMap; } * ``idq-timeseries`` .. program-output:: idq-timeseries --help :nostderr: .. graphviz:: digraph idq_timeseries { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; DataSrc [label="auxiliary features"]; Model [label="model"] CalibMap [label="calibration map"]; Timeseries [label="idq-timeseries"]; PGlitch [label="p(glitch) timeseries"]; DataSrc -> Timeseries; Model -> Timeseries; CalibMap -> Timeseries; Timeseries -> PGlitch; } .. _running-batch_stream: Batch Tasks ==================================================================================================== * ``idq-batch``: .. program-output:: idq-batch --help :nostderr: .. graphviz:: digraph idq_batch { labeljust = "r"; label="" rankdir=LR; graph [fontname="Roman", fontsize=24]; edge [ fontname="Roman", fontsize=10 ]; node [fontname="Roman", shape=box, fontsize=11]; style=rounded; labeljust = "r"; fontsize = 14; DataSrc [label="auxiliary features"]; Model [label="model"] Quiver [label="quiver"]; CalibMap [label="calibration map"]; Train [label="batch.train"]; Evaluate [label="batch.evaluate"]; Calibrate [label="batch.calibrate"]; Timeseries [label="batch.timeseries"]; PGlitch [label="p(glitch) timeseries"]; DataSrc -> Train; Train -> Model; DataSrc -> Evaluate; Model -> Evaluate; Evaluate -> Quiver; Quiver -> Calibrate; Calibrate -> CalibMap; DataSrc -> Timeseries; Model -> Timeseries; CalibMap -> Timeseries; Timeseries -> PGlitch; }