# How to add a new Data Quality Product¶

There are two ways to get a new data quality product incorporated into the DQR.

We provide tutorials for both below. For the sake of clarity, we will assume your new product is called my awesome product throughout.

## Adding a new check that will be run within the DQR’s DAG¶

If your new data quality product does not have specialized dependencies, it may be straightforward to include it within the DQR’s DAG. This has the advantage that you, the developer, do not need to manage or monitor persistent processes to ensure your results are included in the DQR. To do this, follow these steps

### Write an executable¶

Your follow-up should be encapsulated in a single executable. You can write this in any language you wish, but it must be in the DQR’s PATH and therefore discoverable when submitted to condor as part of the DAG. You can find several examples within the DQR repository itself, such as omegascan.

We also require your executable to always post a report to GraceDb. This means you must properly catch all errors and post an associated error report. Specifications for the content of the reports can be found here: Technical Design.

The DQR provides convenient formatting libraries in Python that comply with the required format. Here’s an example, which would be contained in an executable called your-awesome-product-dqr, showing how they can be used, including catching errors and reporting the errors.:

>> from ligo.gracedb.rest import GraceDb
>> import sys
>> import json
>> from dqr import json as dqrjson
>>
>> import yourAwesomeLibrary ### import what you need to compute your data quality product
>>
>> __process_name__ = 'my awesome product"
>> __author__ = 'your name (your.name@ligo.org)'
>>
>> graceid = sys.argv[1] ### the GraceDb ID number for an interesting event
>> option1 = sys.argv[2] ### some option assoicated with your library
>>
>> try:
>>     ### your code returns a pass/fail/hin state
>>     ### if it needs to post images/files to GraceDb,
>>     ### that should be done within the delegation
>>     state = yourAwesomeLibrary.do_some_thing(graceid, option1)
>>
>>     ### format the report for the DQR
>>     report = dqrjson.format_report(
>>         "warn", ### we always warn to get human input for omegascans!
>>         __process_name__,
>>         __author__,
>>     )
>>
>> except Exception as e:
>>     import traceback
>>
>>     ### format an error report for the DQR
>>     report = dqrjson.format_failure(
>>         utils.__process_name__,
>>         utils.__author__,
>>         traceback_string=traceback.format_exc(),
>>     )
>>
>> finally:
>>     ### actually upload the report to the DQR
>>     ### do this with the GraceDb REST interface
>>     reportpath = '%s-%s.json'%(__process_name__.replace(' ',''), graceid)
>>     with open(reportpath, 'w') as file_obj:
>>         json.dump(report, file_obj)
>>
>>     gdbconn = gracedb.connect(opts.gracedb_url)
>>     gdbconn.writeLog(
>>         graceid,
>>         process_name+' report',
>>         filename=reportpath,
>>         tagname=[__process_name__],
>>     )


Please note: the convention for naming JSON reports is to remove all spaces and append the GraceID as is done above. If you do not follow this format, the DQR will not be able to discover your JSON report in GraceDb. The look-up is actually based on the task name as defined in dqr.condor (see below), and it is the responsibility of developers to make sure the strings used in conditionals defined wihtin dqr.condor match the naming convention of their JSON reports.

Additionally: please always include a link to your product’s documentation, as is done above. You should write basic documentation for your task within the Sphinx docs for this repository. Instructions for how to do this can be found in the detailed guied to How to contribute to the DQR to the repo. When adding a new task, you should create a file specifically for that task and include the following sections

• What does this task do?

• What are its return states?

• How was it reviewed?

• How should results be interpreted?

• What INI options, config files are required?

• Are there any derived tasks that are based on this one?

### Modify the DAG generation¶

In order for the DQR to schedule your follow-up task, it must know how to generate an associated Condor SUB file for it. This is managed within dqr.condor and relies on knowledge of your product’s name. Again for concreteness, we assume your project is called your awesome product.

#### Write a sub-generation helper function¶

You need to tell the DQR how to write a Condor SUB file for your new product. This is done via a function with a standard signature. You can specify as many options as you like through the dqr.ini config file, but for now we will assume there is a single option like we’ve written in the executable above.

Below, you’ll find an example function for your awesome product. You may need to modify this a bit to fit your specific needs. We’ve specifically called out only the required args and kwargs; this example grabs the one option as part of **kwargs but that of course can be changed as needed.:

>> def sub_your_awesome_product(
>>         graceid,
>>         gps,
>>         output_dir,
>>         output_url,
>>         gracedb_url=__default_gracedb_url__, ### not strictly required because it can be absorbed by **kwargs, but without **kwargs this is needed
>>         verbose=False,                       ### not strictly required because it can be absorbed by **kwargs, but without **kwargs this is needed
>>         email_upon_error=None,
>>         **kwargs,
>>     ):
>>    """
>>    write subfile for "your awesome product"
>>    return path/to/file.sub
>>    """
>>    option1 = kwargs.get('option1', 'default')                                                         ### retrieve the one option for your_awesome_product
>>
>>    condor_classads['executable'] = which('your_awesome_product')                                      ### find the full path to your executable
>>    condor_classads['log']    = os.path.join(output_dir, 'condor-your_awesome_product-%s.log'%graceid) ### set up Condor's output
>>    condor_classads['arguments'] = '%s %s'%(graceid, option1)                                          ### set up the arguments for your script
>>
>>    if email_upon_error:
>>                                                                                                       ### Condor if requested
>>
>>    path = os.path.join(output_dir, 'your_awesome_product.sub')                                        ### actually write the file
>>    with open(path, 'w') as file_obj:
>>        file_obj.write('\n'.join(' = '.join(item) for item in condor_classads.items())+'\nqueue 1')
>>    return path


If you look around dqr.condor, you’ll find more complicated examples that accept condor_classads as kwargs. Mimicking that structure will allow you to inherit Condor job specifications from the DEFAULT section of dqr.ini, but is not strictly necessary.

As a concrete example, this:

>> path = sub_your_awesome_product('G123456', 8675309.0, '.', '.')
>> print(path)
>> ./your_awesome_product.sub


should write the following into the SUB file:

executable = /full/path/to/your_awesome_product
log = ./condor-your_awesome_product-G123456.log
output = condor-your_awesome_product-G123456.out
error = condor-your_awesome_product-G123456.error
arguments = 'G123456 default'
queue 1


#### Modify the routing function¶

The DQR listener looks up which function to use when builing a SUB file within dqr.condor.sub(). Therefore, you must add a conditional statement associated with the helper function you just wrote and your product’s name. As an example, you should modify dqr.condor.sub() to look something like this:

>> def sub(task):
>>     """..."""
>>         return sub_some_name
>>
>>         return sub_another_name
>>
>>         return sub_your_awesome_product
>>
>>     else:


Now, when dqr.condor.sub() is called, it will return your helper function.

Please note: the DQR expects the JSON report generated by a task to be based on the the task string used in this routing function. For example, the task called “another name” should create a file for GraceId=G1234 called “anothername-G12134.json”.

Now that you’ve told the DQR how to run your script, you need to configure it so that it actually will run your script. This is done with sections in dqr.ini. Within your copy of dqr.ini, create a new section that looks like the following:

[your awesome product]
# tell the DQR to include this as part of the DAG it generates
include_in_dag = True

# which latency tier to include this check in
tier = 0

# which high-level question this product addresses
question = A high level question

# configure which states your product is allowed to return
allow = human_input_needed pass

# configure toggles as a space-delimited list if desired
toggles = H1 L1 V1

# options specific to your product
option1 = "the option for your script"


While tier is only required to be an integer and question is only required to be a string, it is likely that you’ll want to re-use values that are already present in other sections. Please check and see whether any of those fit your product and only specify a new tier or question if absolutely necessary. In either case, you’ll need to tell your reviewers which tier and question you’ve chosen.

Please note: the section name must exactly match the string used within dqr.condor, otherwise your task will not be discoverable.

You now need to test your changes. We provide a more complete tutorial on how to test your code and the DQR in general (Testing a Technical Solution), but as a starting point you should try to install the code via:

>> pip install . --prefix=path/to/install/opt


If you’re running on a cluster and want to use default system packages instead of installing your own copies of the dependencies, you can instead do:

>> pip install . --prefix=path/to/install/opt --no-deps


See How to Install and Run the DQR in production for more details. Your review committee will almost certainly require you to demonstrate functionality using the tools outlined in Testing a Technical Solution, which goes beyond demonstrating that the code still installs.

### Initiate a merge request to get your changes reviewed and included¶

Once you’ve tested your code, you should merge it into the production repository. The DQR uses a fork-and-pull development model, and the full procedure is described in more detail here (How to contribute to the DQR).

When creating the merge request, you should assign it to one of your reviewers. They will look over the changes, try to reproduce your tests, and iterate with you until the code is ready to be deployed.

## Adding a new check that will not be run within the DQR’s DAG¶

We strongly encourage incorporating your new product within the DQR’s DAG (Adding a new check that will be run within the DQR’s DAG). However, this may not be practical in several situations, such as

• the latency requirement for your product is faster than the DQR’s scope. This is the case for extrem low-latency checks performed by gwcelery, which will be managed outside the DQR’s DAG but still report their results to GraceDb in a DQR-compatible format.

• the data needed for your product is not available at CIT, such as Virgo auxiliary channel information.

In this case, you will need to manage your own follow-up scheduler in addition to changing a few things in the DQR repo.

### Write an executable¶

You will still need to write an executable, just like before (Write an executable). The requirements are the same as before: your executable must always post a properly formatted report to GraceDb but can be written in any language you wish.

### Set up and manage your own follow-up scheduler¶

Once you’ve written your executable, you’ll need to manage a follow-up scheduler. You can find a instructions for interacting with GraceDb’s REST interface and LVAlert, as well as a tutorial within the GraceDb docs.

You will also need to add section to dqr.ini, similar to what is described above (Add a section to your dqr.ini). However, because your product is managed outside the DQR, you will only need to specify the tier and question options.

Within your copy of dqr.ini, create a new section that looks like the following:

[your awesome product]
# tell the DQR to not include this as part of the DAG it generates
include_in_dag = False

# which latency tier to include this check in
tier = 0

# which high-level question this product addresses
question = A high level question

allow = pass fail human_input_needed

# toggles (if desired)
toggles = H1


You now need to test your changes. We provide a more complete tutorial on how to test your code and the DQR in general (Testing a Technical Solution). Some of the tools described therein will be useful for testing your follow-up scheduler along with the DQR’s scheduler, and you are encouraged to use them.

Your reviewers will set the exact testing requirements for your product to be incorporated in the production DQR.

### Initiate a merge request to get your changes reviewed and included¶

Once you’ve tested your code, you should merge it into the production repository. The DQR uses a fork-and-pull development model, and the full procedure is described in more detail here (How to contribute to the DQR). Please be sure to include

• the name of your new product.

• the permissions awarded to the product (pass, fail, and/or human_input_needed).

• how the product will be managed (within DQR’s DAG or external to it).

• if it will be run within the DQR’s DAG, please clearly enumerate all dependencies (e.g.: gwpy >= 0.12.0)

• if it will run outside the DQR’s DAG, please specify where it will be run and how it those processes will be monitored

• how you’ve tested the new code.

• how you’ve tested the new product for efficacy.

When you open the merge request, be sure to assign it to one of your reviewers. If git.ligo.org will not allow you to assign the merge request, please tag @reed.essick (or one of the other maintainers) and they’ll make sure your request is processed. You should also specify that your new product will be run outside of the DQR’s DAG and provide references to where your own follow-up scheduler lives and how it will be run. While not conceptually more difficult, this will require more monitoring and oversight than incorporating your product directly within the DQR’s DAG.