Running BayesWave: How-Tos
This page describes how to generate a BayesWave analysis through a few examples.
To see specific examples of how to use BayesWave for different analyses, see here
LDG=LIGO Data Grid: comprised of the “usual” LIGO clusters like CIT, LHO, LLO, Nemo, Atlas, …
OSG=Open Science Grid: international network of shared compute resouces, outside of direct LIGO control.
The first examples below use a canonical GW150914 analysis to demonstrate how each run option works.
Table of contents
Run BayesWave Using A Local Installation
WARNING: we’re about to set up an analysis of GW150914 using a MEANINGLESS MCMC configuration. The objective is to demonstrate the workflow, NOT a scientific result. Please adjust the configuration file accordingly if you desire science.
Example directory: This example lives in the repository here
This is the setup most LIGO users will be familiar with: clone a repository, build and install software, execute an analysis.
Here, we assume you have compiled and installed an appropriate branch of BayesWave and BayesWavePipe. See here for installation information.
Copy this configuration file to your working directory.
Modify paths in the
[engine]section to point at the desired version of the BayesWave executables and libraries.
Run the pipeline to set up an analysis of a single trigger time (if you installed BayesWavePipe with e.g., –user, it should already be in your path):
bayeswave_pipe LDG-GW150914.ini \ --trigger-time 1126259462.420000076 \ --workdir LDG-GW150914
This sets up a workflow for a BayesWave analysis of a single trigger time: that of GW150914, of course.
The configuration file specifies the various BayesWave commandline options, as well as things like condor memory requests, accounting tags etc.
For convenience, this command is provided in makework-LDG-GW150914.sh.
This sets up four condor jobs:
BayesWave: main MCMC sampling.
BayesWavePost: combine samples to compute waveform reconstructions and moments.
megaplot.py: plot waveforms, moment distributions & generate web output.
megasky.py: compute and plot posterior probability density for source sky-location (”skymap”).
Workflow files (e.g., .dag, .sub, …) are written to the directory specified
--workdir. That then contains a single output directory for each
BayesWave analysis time specified (in this case 1) with all the usual BayesWave
analysis products, including webpage and plots.
That’s it! To start the analysis, simply follow the on-screen prompt:
To submit: cd LDG-GW150914 condor_submit_dag bayeswave_LDG-GW150914.dag
Run BayesWave On The OSG
Example directory: This example lives in the repository here
In this example we compute the signal evidence for 100 CWB time-slide background triggers, read from a CWB trigger file. Again, job configuration is designed to result in minimal run times, results should not be considered scientifically valid.
The Open Science Grid (OSG) offers a multitude of additional resources which are ideal for offline injection and background analyses. BayesWave’s OSG deployment relies on singularity containers. Briefly:
Container image: a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings.
Container: instantiation of an image.
Docker: popular software for creating and running containers.
Singularity: slightly less popular software for creating and running containers but favored by admins of scientific clusters for security reasons.
Registry: a service which manages images. Sort of like a repository.
CVMFS: a scalable, reliable and low-maintenance software distribution service (The
/cvmfsdirectory hosts singularity images, software and, in some sense, frame data).
From the user-perspective, the procedure for running from a container is nearly identical to above (minus installing anything), we just add the path to the container in the configuration file and point at the correct executables:
bayeswave_pipe \ --workdir O2background \ --cwb-trigger-list 100_cwb_triggers.dat \ --osg-jobs \ --glide-in \ --skip-post \ --skip-megapy
[engine] section of
now the path to the desired
BayesWave and all post-processing codes are baked into the container in
/opt/bayeswave. To use the bayeswave executables in the container, the
[engine] section must read:
bayeswave=/opt/bayeswave/bin/BayesWave bayeswave_post=/opt/bayeswave/bin/BayesWavePost megaplot=/opt/bayeswave/postprocess/megaplot.py megasky=/opt/bayeswave/postprocess/skymap/megasky.py postprocess=/opt/bayeswave/postprocess utils=/opt/bayeswave/utils
Other features to note
Python code is also installed to
/opt/bayeswavein the container (contrast with the
srclocation when running from your own build)
The container can see your
/home: you are free to point to your own versions of the bayeswave executables for e.g., code development.
No bayeswave installation required (You do still need BayesWavePipe, though)
All dependencies are baked into the image
You are guarenteed to find exactly the same image on all clusters (with CVMFS) when you use that image path: no need to maintain multiple BayesWave installations at different sites!
An important point for power users who may wish to reproduce the exact command
a condor job runs at the commandline: singularity must be executed with the
--bind options in order that we can write to our /home and
to access frame data. To run a singularity job which reads frames at CIT
(which live in /hdfs), you need to run:
singularity exec \ --writable \ --bind /hdfs \ /cvmfs/ligo-containers.opensciencegrid.org/lscsoft/bayeswave:master \ /opt/bayeswave/bin/BayesWave "$@"
There are a host of practical differences and additional options available which the general user might not care about and which are handled by the pipeline:
OSG workflows require file transfers: input files must be transferred with the jobs and the job output must be shipped back to the submission site. This is handled by submission file directives like
should_transfer_filesand are set up by BayesWavePipe.
Frame data is distributed using the CernVM file system (CVMFS). Consequently, the datafind command must specify a specific server (
datafind.ligo.org:443) which returns frame locations in CVMFS, which are then common to all sites. This removes the need for data discovery at specific sites and we don’t have to deal with Pegasus. This server is used whenever
At some OSG and LDG sites (e.g., CIT), the CVMFS directories for frames are really symlinks. The underlying parent directory for CVMFS frame data must be bound into the singularity container. That is, the image must contain directories like
/hadoopand more as we get more sites.
Parts of our CVMFS-based container images contain the
@symbol. Singularity versions equal to and earlier than v2.2 cannot handle this symbol. The submission file contains a
regexprequirement which ensures the
OSG_SINGULARITY_VERSIONattribute is later than 2.2.