Data Acquistion and Authentication
There are multiple possible pathways to specify what data should be analyzed. There are two broad categories of data that can be analyzed: a noise realization can be generated or data from a frame file can be analyzed.
Noise simulation
For simulated data, the user should specify either the gaussian-noise
or zero-noise
arguments.
In the former case, the data will be generated from a Gaussian distribution with the
specified PSDs (see below).
The specific noise realization can be controlled using the generation-seed
option.
For the latter, the data are assumed to contain no noise, while this is a valid realization
of the noise, it is not representative of typical noise.
For this case, the PSDs to be used can also be specified, although they don’t influence the data.
Frame reading
When working with real data or more realistically simulated data, the data are read from frame files.
These can be in any format that can be read using gwpy.timeseries.TimeSeries.read
.
There are four methods that can be used to read data from frame files that proceed in the following order:
queried from the gravitational-wave open science center (GWOSC), to use this method, the
channel-dict
option should be set to, e.g.,{"H1": "GWOSC"}
. This is the recommended method for most analyses of real data.explicitly passed using the data-dict option, in this case the data are read using
gwpy.timeseries.TimeSeries.read
and thechannel-dict
option should match a channel contained in the frame file.found using
gwdatafind.find_urls
whenbilby_pipe
is run. This method uses thechannel-dict
,frame-type-dict
,data-find-url
, anddata-find-urltype
options. Unless the channel being used is non-standard, e.g., contains glitch-subtracted data, theframe-type-dict
,data-find-url
, anddata-find-urltype
can be usually left as their default values. When using this method, thetransfer-files
option should be set toTrue
to make sure frames are properly copied to the working directory byHTCondor
.using
gwpy.timeseries.TimeSeries.get
during the data generation job. This method uses thechannel-dict
option and is the legacy method for finding data. If data reading from the above methods fails this will be used as the fallback option. It is not recommended to use this method unless the data are not available using the other methods.
Note
The gaussian-noise
and zero-noise
options supersede any option to read data from frame files.
PSD reading/generation
In addition to the data containing the signal, the data options also determine how the noise PSDs are
specified.
When using the gaussian-noise
or zero-noise
options, the PSDs are either specified using the
psd-dict
option or using the default PSD for each specified detector.
When analysing time-domain data from frame files, the PSDs can be specified using the psd-dict
option,
or generated from data before the analysis segment read in the same way as above.
If a PSD file is specified through the psd-dict
for any interferometer it must be specified for all
interferometers.
To avoid forgetting to specify the PSD in one detector, if the default fallback option is desired for one
detector, the psd-dict
option can be set to None
, e.g., if
psd-dict={'H1': '/PATH/TO/PSD/filename.txt', 'L1': None}
is passed, /PATH/TO/PSD/filename.txt
will be used for LIGO Hanford and the fallback method will be used for LIGO Livingston.
Authentication
Data finding is done using the scitokens
method.
We recommend that users consult this page for
additional instructions.
In order to read proprietary frame files, the user must have a valid scitoken for the detector the
data comes from.
The first time submitting a job via HTCondor using scitoken authentication, the user should run
condor_vault_storer -v "igwn"
and follow the prompts to configure the credentials.
After this, it should be sufficient to create a kerberos token using kinit
.
More fine grained control over the generated token can be done by defining the HTGETTOKENOPTS
environment variable.
This is especially useful when using robot authentication using the --role
and --credkey
options.
When do I need to authenticate?
If the data are being read from a proprietary frame file stored on e.g.,
CVMFS
.If another file (e.g., PSD, ROQ basis) being used is in a proprietary location.
If data are queried using
gwpy.timeseries.TimeSeries.get
.