Running BayesWave: How-Tos

This page describes how to generate a BayesWave analysis through a few examples.

To see specific examples of how to use BayesWave for different analyses, see here

Note:

  • LDG=LIGO Data Grid: comprised of the “usual” LIGO clusters like CIT, LHO, LLO, Nemo, Atlas, …

  • OSG=Open Science Grid: international network of shared compute resouces, outside of direct LIGO control.

The first examples below use a canonical GW150914 analysis to demonstrate how each run option works.

Table of contents

  1. Run BayesWave From A Local Installation on a LIGO cluster like CIT

  2. Run BayesWave On The OSG

Run BayesWave Using A Local Installation

WARNING: we’re about to set up an analysis of GW150914 using a MEANINGLESS MCMC configuration. The objective is to demonstrate the workflow, NOT a scientific result. Please adjust the configuration file accordingly if you desire science.

Example directory: This example lives in the repository here

This is the setup most LIGO users will be familiar with: clone a repository, build and install software, execute an analysis.

Here, we assume you have compiled and installed an appropriate branch of BayesWave and BayesWavePipe. See here for installation information.

  1. Copy this configuration file to your working directory.

  2. Modify paths in the [engine] section to point at the desired version of the BayesWave executables and libraries.

  3. Run the pipeline to set up an analysis of a single trigger time (if you installed BayesWavePipe with e.g., –user, it should already be in your path):

bayeswave_pipe LDG-GW150914.ini \
   --trigger-time 1126259462.420000076 \
   --workdir LDG-GW150914

This sets up a workflow for a BayesWave analysis of a single trigger time: that of GW150914, of course.

The configuration file specifies the various BayesWave commandline options, as well as things like condor memory requests, accounting tags etc.

For convenience, this command is provided in makework-LDG-GW150914.sh.

This sets up four condor jobs:

  1. BayesWave: main MCMC sampling.

  2. BayesWavePost: combine samples to compute waveform reconstructions and moments.

  3. megaplot.py: plot waveforms, moment distributions & generate web output.

  4. megasky.py: compute and plot posterior probability density for source sky-location (”skymap”).

Workflow files (e.g., .dag, .sub, …) are written to the directory specified with --workdir. That then contains a single output directory for each BayesWave analysis time specified (in this case 1) with all the usual BayesWave analysis products, including webpage and plots.

That’s it! To start the analysis, simply follow the on-screen prompt:

    To submit:
        cd LDG-GW150914
        condor_submit_dag bayeswave_LDG-GW150914.dag

Run BayesWave On The OSG

Example directory: This example lives in the repository here

In this example we compute the signal evidence for 100 CWB time-slide background triggers, read from a CWB trigger file. Again, job configuration is designed to result in minimal run times, results should not be considered scientifically valid.

The Open Science Grid (OSG) offers a multitude of additional resources which are ideal for offline injection and background analyses. BayesWave’s OSG deployment relies on singularity containers. Briefly:

  • Container image: a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings.

  • Container: instantiation of an image.

  • Docker: popular software for creating and running containers.

  • Singularity: slightly less popular software for creating and running containers but favored by admins of scientific clusters for security reasons.

  • Registry: a service which manages images. Sort of like a repository.

  • CVMFS: a scalable, reliable and low-maintenance software distribution service (The /cvmfs directory hosts singularity images, software and, in some sense, frame data).

From the user-perspective, the procedure for running from a container is nearly identical to above (minus installing anything), we just add the path to the container in the configuration file and point at the correct executables:

bayeswave_pipe \        
   --workdir O2background \
   --cwb-trigger-list 100_cwb_triggers.dat \
   --osg-jobs \
   --glide-in \
   --skip-post \
   --skip-megapy

where the [engine] section of LDG-GW150914-singularity.ini now the path to the desired singularity image:

singularity="/cvmfs/ligo-containers.opensciencegrid.org/lscsoft/bayeswave:master"

Important Note BayesWave and all post-processing codes are baked into the container in /opt/bayeswave. To use the bayeswave executables in the container, the [engine] section must read:

bayeswave=/opt/bayeswave/bin/BayesWave
bayeswave_post=/opt/bayeswave/bin/BayesWavePost
megaplot=/opt/bayeswave/postprocess/megaplot.py
megasky=/opt/bayeswave/postprocess/skymap/megasky.py
postprocess=/opt/bayeswave/postprocess
utils=/opt/bayeswave/utils

Other features to note

  • Python code is also installed to /opt/bayeswave in the container (contrast with the src location when running from your own build)

  • The container can see your /home: you are free to point to your own versions of the bayeswave executables for e.g., code development.

  • No bayeswave installation required (You do still need BayesWavePipe, though)

  • All dependencies are baked into the image

  • You are guarenteed to find exactly the same image on all clusters (with CVMFS) when you use that image path: no need to maintain multiple BayesWave installations at different sites!

Power Users

An important point for power users who may wish to reproduce the exact command a condor job runs at the commandline: singularity must be executed with the --writable and --bind options in order that we can write to our /home and to access frame data. To run a singularity job which reads frames at CIT (which live in /hdfs), you need to run:

singularity exec \
    --writable \
    --bind /hdfs \
    /cvmfs/ligo-containers.opensciencegrid.org/lscsoft/bayeswave:master \
    /opt/bayeswave/bin/BayesWave "$@"

Power Users

There are a host of practical differences and additional options available which the general user might not care about and which are handled by the pipeline:

  • OSG workflows require file transfers: input files must be transferred with the jobs and the job output must be shipped back to the submission site. This is handled by submission file directives like should_transfer_files and are set up by BayesWavePipe.

  • Frame data is distributed using the CernVM file system (CVMFS). Consequently, the datafind command must specify a specific server (datafind.ligo.org:443) which returns frame locations in CVMFS, which are then common to all sites. This removes the need for data discovery at specific sites and we don’t have to deal with Pegasus. This server is used whenever --osg-jobs is passed.

  • At some OSG and LDG sites (e.g., CIT), the CVMFS directories for frames are really symlinks. The underlying parent directory for CVMFS frame data must be bound into the singularity container. That is, the image must contain directories like /cvmfs, /hdfs, /hadoop and more as we get more sites.

  • Parts of our CVMFS-based container images contain the @ symbol. Singularity versions equal to and earlier than v2.2 cannot handle this symbol. The submission file contains a regexp requirement which ensures the OSG_SINGULARITY_VERSION attribute is later than 2.2.