Using bilby_pipe

Command line interface

The primary user-interface for this code is a command line tool bilby_pipe, for an overview of this and other executables, see the executable reference.

Basics

The primary user-interface for this code is a command line tool bilby_pipe which is available after following the installation instructions. To see the help for this tool, run

$ bilby_pipe --help

(the complete output is given in the reference)

To run bilby_pipe, you first need to define an define an ini file; examples for different types of ini files can be found below.

Once you have an ini file (for the purpose of clarity, lets say my-run.ini), you initialize your job with

$ bilby_pipe my-run.ini

This will produce a directory structure as follows:

my-run.ini
outdir/
  -> data/
  -> final_result/
  -> log_data_analysis/
  -> log_data_generation/
  -> log_results_page/
  -> result/
  -> results_page/
  -> submit/

Most of these folders will initially be empty, but as the job progresses will be populated. The data directory will contain all the data to be analysed while result will contain the *result.hdf5 result files generated by bilby` along with any plots. The final_result directory will contain the final result file, which is created by merging the individual result files. Note that the location to the log and results_page folders can be modified.

The final folder, submit, contains all the of the DAG submission scripts. To submit your job, run condor_submit_dag giving as a first argument the file prepended with dag under outdir/submit (instructions to do this are printed to the terminal after). Alternatively, you can initialise and submit your jobs with

$ bilby_pipe my-run.ini --submit

Running all or part of the job directly

In some cases, you may need to run all or part of the job directly (e.g., not through a scheduler). This can be done by using the file prepended with bash in the submit/ directory. This file is a simple bash script that runs all commands in sequence. One simple way to run part of the job is to open the bash file and copy the commands you require to another script and then run that. For convenience, we also add if statements to the bash script to enable you to run parts of the analysis by providing a pattern as a command line. For example, to run the data generation step, you can call the bash script with generation in the arguments, e.g.:

$ bash outdir/submit/bash_my_label.sh generation

If you want to run the analysis step and n-parallel=1, then you would use

$ bash outdir/submit/bash_my_label.sh analysis

Note, if n-parallel > 1 this will run all the parallel jobs. To run just one, run (replacing par0 with the analysis you want to run):

$ bash outdir/submit/bash_my_label.sh par0

Finally to merge the analyses, run

$ bash outdir/submit/bash_my_label.sh merge

Internally, the bash script is simply matching the given argument to the job name. This works in simple cases, but will likely fail or need inspection of the base file itself in complicated cases. Moreover, if you use any of the special key words (generation, analysis, par, or merge) in your label, the ability to filter to single jobs will be lost.

Using the slurm batch scheduler

By default, bilby_pipe runs under a HTCondor environment (the default for the IGWN grid). It can also be used on a slurm-based cluster. Here we give a brief description of the steps required to run under slurm, but a full list of available options, see the output of bilby_pipe --help.

To use slurm, add scheduler=slurm to your ini file. Typically, slurm needs you to configure the correct environment, you can do this by passing it in to scheduler-env=my-environment. This will add the following line to your submit scripts.

$ source activate my-environment

(Note: for conda users, this is equivalent to conda activate my-environment).

If the cluster you are using does not provide network access on the compute nodes, the data generation step may fail if an attempt is made to remotely access the data. (If you are creating simulated data, or have local copies of the data, this is, of course, not a problem). To resolve this issue, you can set local-generation=True in your ini file. The generation steps will then be run on the head node when you invoke bilby_pipe after which you simply submit the job.

Slurm modules can be loaded using scheduler-modules, a space-separated list of modules to load. Additional commands to sbatch can be given using the scheduler-args command.

Putting all this together, adding these lines to your ini file

scheduler = slurm
scheduler-args = arg1=val1 arg2=val2
scheduler-modules = git python
scheduler-env = my-environment
scheduler-analysis-time = 1-00:00:00   # Limit job to 1 day

Will produce a slurm submit files which contains

#SBATCH --arg1=val1
#SBATCH --arg2=val2

module load git python

and individual bash scripts containing

module load git python

source activate my-environment

Summary webpage

bilby_pipe allows the user to visualise the posterior samples through a ‘summary’ webpage. This is implemented using PESummary.

To generate a summary webpage, the create-summary option must be passed in the configuration file. Additionally, you can specify a web directory where you would like the output from PESummary to be stored; by default this is placed in outdir/results_page. If you are working on an LDG cluster, then the web directory should be in your public_html. Below is an example of the additional lines to put in your configuration file to generate ‘summary’ webpages:

create-summary = True
email = albert.einstein@ligo.org
webdir = /home/albert.einstein/public_html/project

If you have already generated a webpage in the past using PESummary, then you are able to pass the existing-dir options to add further results files to a single webpage. This includes all histograms for each results file as well as comparison plots. Below is an example of the additional lines in the configuration file that will add to an existing webpage:

create-summary = True
email = albert.einstein@ligo.org
existing-dir = /home/albert.einstein/public_html/project

Main function

Functionally, the main command line tool is calling the function bilby_pipe.main.main(), which is transcribed here:

def main():
    """ Top-level interface for bilby_pipe """
    from bilby_pipe.job_creation.dag import Dag
    args, unknown_args = parse_args(sys.argv[1:], create_parser())
    inputs = MainInput(args, unknown_args)
    # Create a Directed Acyclic Graph (DAG) of the workflow
    Dag(inputs)

As you can see, there 3 steps. First the command line arguments are parsed, the args object stores the user inputs and any defaults (see Command line interface) while unknown_args is a list of any unknown arguments.

The logic of handling the user input (in the form of the args object) is handled by the Main Input() object. Following this, the logic of generated a DAG given that user input is handled by the Dag() object.