Usage
This library provides a high-level API on top of htcondor.dags, with various helper classes to define jobs (nodes), job groupings (layers), and the parent-child relations between them, described as a DAG.
The node and layer terminology is borrowed from htcondor.dags
.
Extra information about htcondor.dags
can be found here:
The general procedure for creating a DAG is as follows:
- Create a
DAG
. - For each grouping of nodes, create a
Layer
. All jobs corresponding to this layer share the same submit description. - Create a
Node
with job arguments, inputs and outputs and attach them to the layer, one for each distinct job. - Attach the
Layer
to theDAG
. - Repeat with every grouping of nodes, adding parent layers first.
- Write the
DAG
to disk.
Setting up the DAG
from ezdag import Argument, DAG, Option, Layer, Node
dag = DAG("workflow")
A DAG
represents the workflow and captures all parent-child relations between
nodes. Adding layers to the DAG tracks inputs and outputs and automatically
determines the parent-child relations between them. It is important that parent
layers are added to the DAG before child layers so the linking can be done
correctly.
The DAG
extends from
htcondor.dags.DAG
so all keyword arguments for initialization and methods can be used in the same
way here.
Defining groups of nodes
Submit descriptors
First, we define submit descriptors for the jobs in question:
description = {
"environment": {
"OMP_NUM_THREADS": 1,
},
"requirements": [
"HAS_CVMFS_oasis_opensciencegrid_org=TRUE",
],
"request_cpus": 1,
"request_memory": 2000
}
This capture the job's requirements in terms of environment variables, resource
allocation, etc. Note that submit description values can take standard python
types, and the formatting will be handled automatically when writing the submit
file to disk. e.g. passing in a list of getenv
variables will format them to
be comma separated.
Also note that this captures a subset of the submit descriptors needed for the jobs. Descriptors such as the universe, executable, arguments, and all descriptors related to logging and file transfer are added by the layer itself.
The layer (job groupings)
Then, we define the layer itself with the executable and the submit description:
layer = Layer("process_bins", submit_description=description)
By default, the layer name is taken from the executable name. In addition, the
path to the executable is resolved at DAG generation through $PATH
. Both of
these can be customized by providing a different layer name and providing a valid
path to the executable in question, respectively:
layer = Layer(
"/path/to/process_bins",
name="my_super_cool_process",
submit_description=description
)
Also by default, Condor file transfer is enabled. Any files provided in job inputs
and outputs (seen later) will be resolved and the relevant submit descriptors will
be added to the submit file. To disable this behavior, you can set transfer_files
to False
.
Other relevant parameters to Layer
include:
universe
: Set the execution environment for the jobs.retries
: Number of retries given for jobs.log_dir
: Set the directory where job logs are written to.
The nodes (jobs)
Nodes define the individual job arguments, as well as any file inputs and outputs for file transfer and for determining parent-child relationships between jobs.
for i in range(3):
node = Node(
arguments = [
Argument("job-index", i), # {i}
Option("verbose"), # --verbose
Option("bins", [3 * j + i for j in range(3)]), # --bins {i} --bins {3 + i} --bins {6 + i}
"--num-cores", 1, # --num-cores 1
],
inputs = Option("input", "data.txt"), # --input data.txt
outputs = Argument("output", f"output_{i}.txt") # output_{i}.txt
)
layer.append(node)
In order to aid in generating job arguments for jobs, we also provide a few
helper classes, the Argument
and Option
which provide positional arguments
and options, respectively. Some examples of the output they provide to the
job's arguments are shown in the comments on the right. Both of these are used
to parameterize job arguments so they can be changed on a per-job level for a
given layer without having to do this manually, as in htcondor.dags
.
In addition to these, non-parameterized arguments can be provided as primitive
types which will be passed directly to arguments
in the submit description.
Inputs and outputs are used to track which files the job requires and provides, respectively, and are used to track job dependencies. Any path or URL supported by Condor are accepted and will be modified accordingly so that Condor file transfer works as expected.
Finally, nodes can take meta-parameters (variables
) which can be referred
to within the submit description. These can be useful for example, when
parameterizing log filenames.
Defining node parameters
As mentioned above, the Argument
and Option
are helpers to generate job
arguments, which provide positional arguments and options, respectively.
Both take a parameter which defines the parameter name, as well as the value to be provided to the job itself. For arguments, only the value is used, while for options, the name is used within the argument to define the flag name. In both cases, the value can be any primitive type or a list of such.
Some examples:
Argument("factor", 4.1)
→4.1
Argument("files", ["file1.txt", "file2.txt"])
→file1.txt file2.txt
Option("verbose")
→--verbose
Option("num-jobs", 2)
→--num-jobs 2
Option("input", ["file1.txt", "file2.txt"])
→--input file1.txt --input file2.txt
As jobs may not always have an intuitive CLI or provide and/or generate files
directly through their CLI, some extra options are provided to Argument
and
Option
to deal with these edge cases. For example, jobs may generate extra
files which are not specified anywhere, but may be required for child jobs. In
this case, you can define these implicit job outputs by providing
suppress=True
which will hide the command from the job's arguments
.
If you don't want to track any files in the inputs or outputs for inter-job
relationships, you can provide track=False
to Argument
or Option
. This
can be useful when jobs need files to be transferred in, but you may not want
or need this file to be a decision when determining job relationships.
Adding inter-layer relationships
To define inter-layer relationships between layers, simply add the layers in the DAG with parent layers first:
dag.attach(layer)
When the layers are added to the DAG, node inputs and outputs determine how they are connected to each other.
Submitting the DAG
You can submit the DAG directly:
dag.submit()
This will build and write the DAG as well as all the submit files to disk prior
to submission. dag.submit()
returns a
htcondor.SubmitResult
containing the cluster ID and ClassAd information, which can be used to further
interact with the DAG after submission. See
Advanced Job Submission and Management
for more information.
The submitted DAG will take the name given upon creation, i.e. for
DAG("workflow")
, this will create workflow.dag
. If you want to instead
write the DAG to disk without submitting it:
dag.write()
Both methods take a path
parameter which changes where the DAG is written
to disk. By default, this is written to the current working directory.