Usage¶
Apart from this guide, a good place to get help is from the tool itself:
$ dcc --help
Usage: dcc [OPTIONS] COMMAND [ARGS]...
dcc 0.8.0
Tools for viewing and updating records, metadata and files in the LIGO
Document Control Center (DCC).
Website: https://docs.ligo.org/sean-leavey/dcc/
dcc comes with ABSOLUTELY NO WARRANTY. This is free software, and you are
welcome to redistribute it under certain conditions. See the GNU General
Public Licence for details.
Copyright 2022 Sean Leavey, Jameson Graef Rollins, Christopher Wipf
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
archive Archive remote DCC records locally.
convert Extract DCC numbers from a target file or URL.
list List records in the local archive.
open Open remote DCC record page in the default browser.
open-file Open file attached to DCC record using operating system.
update Update remote DCC record metadata.
view View DCC record metadata.
Help for the available dcc
subcommands can be shown in the same way using e.g. dcc
view --help
.
Obtaining a Kerberos ticket for accessing restricted resources¶
Access to most DCC records and files requires credentials such as those from ligo.org or another provider. You typically only get these if you’re a member of a scientific collaboration.
By default, dcc
assumes you can authenticate yourself and therefore builds and
requests URLs for records and files within the restricted part of the DCC, prompting for
credentials or using an existing Kerberos ticket. To avoid being prompted every time
dcc
is invoked, run kinit albert.einstein@LIGO.ORG
(where albert.einstein
is
your login and LIGO.ORG
is your Kerberos realm) before first use each day (tickets
are typically granted for 24 hours). Subsequent interaction with the DCC will
transparently use a Kerberos token if one is available. The token can be verified with
klist
and revoked with kdestroy
.
You can specify the --public
flag to restrict dcc
to
accessing public records. With this flag, you don’t need to enter your credentials or
obtain a Kerberos ticket, though you will only be able to access public resources.
Configuring a local archive¶
Every dcc
command that involves downloading a remote record or file can cache the
results in a local archive. This allows for quick subsequent access to the same records,
by retrieving the local copy instead of connecting to the DCC. With a configured local
archive retrieval of cached records and files is transparent, with requests being made
to the DCC only if they don’t yet exist in the local archive (or if the remote version
is explicitly requested).
Downloaded data is then stored in the given directory hierarchically, e.g.:
$ tree /path/to/archive
/path/to/archive
└── T010075
└── T010075-v3
├── Change Record for T010075-v3.docx
├── Change Record for T010075-v3.pdf
├── meta.toml
├── T010075-v3 aLIGO System Description.pdf
└── T010075-v3 System Description.zip
The meta.toml
file contains the human-readable (TOML-formatted) metadata for the
record. This can also be read by dcc
using DCCRecord.read()
.
By default, dcc
uses a temporary directory for downloads that gets removed
immediately before the program exits. To persist downloaded records and files between
runs, pass the -s
or --archive-dir
option to any command that supports it or set the DCC_ARCHIVE environment
variable. Whichever method you use, the value should be a path (relative or absolute) to
a directory.
Warning
The local archive built by dcc
is not guaranteed to remain consistent with that
of the remote DCC host. To ensure you have the latest version of a record or file,
set the --force
flag when requesting it.
Record archival¶
DCC records can be archived locally using dcc archive. This downloads
records’ metadata, and optionally attached files, and stores them in the local
archive for later retrieval. The command requires one or more
NUMBER
arguments and/or a --from-file
option followed by a path to a file containing the DCC numbers
(separated by whitespace) to archive. For example:
# Archive the latest version of T010075:
$ dcc archive -s /path/to/archive T010075
# Archive a specific version of T010075:
$ dcc archive -s /path/to/archive T010075-v1
# Archive multiple records:
$ dcc archive -s /path/to/archive T010075 E1300945
# Alternatively specify the path to a file containing the records to archive:
$ echo "T010075 E1300945" > to-archive.txt
$ dcc archive -s /path/to/archive --from-file to-archive.txt
Similar to the behaviour of standard Unix utilities, the --from-file
option can also be set to stdin
by specifying -
:
$ echo "T010075 E1300945" | dcc archive -s /path/to/archive --from-file -
Files are not automatically archived. To fetch them too, specify the --files
flag. By default, files of any size will be retrieved. To limit the
maximum size of files retrieved, specify the --max-file-size
option, specifying a maximum file size in MB.
Interactive mode¶
Specifying -i
or --interactive
will prompt you for confirmation before downloading each record’s files,
giving you the opportunity to skip unnecessary files. This flag implies --files
.
Scraping a URL for links to DCC records¶
The command dcc convert scrapes DCC numbers from a file or URL and writes them to a file:
# Fetch DCC numbers in the "System Engineering" topic and write to 'out.txt'.
$ dcc convert https://dcc.ligo.org/cgi-bin/private/DocDB/ListBy?topicid=18 out.txt
It is easy to combine dcc convert and dcc archive to automatically scrape a URL for DCC numbers and archive them locally. For example:
# Fetch the "System Engineering" topic page, then extract and archive its DCC
# numbers.
$ dcc convert https://dcc.ligo.org/cgi-bin/private/DocDB/ListBy?topicid=18 - | dcc archive -s /path/to/archive --from-file -
Archival of referenced and referencing records¶
DCC records can contain “related to” and “referenced by” records, and dcc
archive can archive them as well. The --depth
option controls
how far in the chain from the original documents the archival can traverse. For example,
setting --depth
to 1 will fetch the records that are listed in
the specified DCC numbers, and setting it to 2 will additionally fetch the references of
those documents. The default is 0, meaning only the records specified in the input are
fetched.
When --depth
is nonzero, by default only “related to” records
are fetched. To also fetch “referenced by” records, specify the
--fetch-referencing
flag. The fetching of “related
to” and “referenced by” records can be switched on and off using
--fetch-related
/ --no-fetch-related
and --fetch-referencing
/
--no-fetch-referencing
, respectively.
Warning
The DCC is a highly connected graph and as such setting a high --depth
is likely to lead to thousands of records being downloaded. Typically only
a value of 1 or 2 is sufficient to archive almost every relevant related record.
For example, the referenced documents of E1300945
can be archived alongside
E1300945
itself using:
# Fetch "related to" documents as well as E1300945 itself:
$ dcc archive -s /path/to/archive E1300945 --depth 1
# Fetch "referenced by" documents as well:
$ dcc archive -s /path/to/archive E1300945 --depth 1 --fetch-referencing
Updating record metadata¶
Record metadata can be updated via dcc
using dcc update. This accepts a
DCC number
and one or more of the following options:
--title
, --abstract
,
--keyword
, --note
,
--related
, and --author
.
The --keyword
, --related
, and --author
options can be specified
multiple times to set multiple values. Author names should be as written, e.g. “Albert
Einstein”, and should correspond to real DCC users. For example:
# Update the title of T2200016.
$ dcc update T2200016 --title "A new title"
By default, dcc update will prompt for confirmation before sending the
updated record to the DCC. To make changes without any confirmation, specify the flag
--no-confirm
. Submitted changes are irreversible, so
be careful.
Note
The DCC does not appear to perform error checking on author names. If an author is not given correctly, it is simply discarded.
Changing the DCC or login host¶
By default, dcc
interacts with the DCC host at https://dcc.ligo.org/, or that of the
environment variable DCC_HOST
if set. Some users may wish to change this to
something different, such as one of the backup servers (https://dcc-backup.ligo.org/,
https://dcc-lho.ligo.org/, https://dcc-llo.ligo.org/) or a DCC server for a different
project (e.g. https://dcc.cosmicexplorer.org/). This can be done by specifying a
different host using the --host
flag on commands that support it.
Warning
dcc
does not distinguish between DCC hosts when archiving records and files
locally. To prevent mixing records from separate projects within the same hierarchy,
specify a different local archive setting for each project.
It is also possible to change the identity provider (IDP) host, used to authenticate
your login credentials. By default it is set to https://login.ligo.org/, or that of the
environment variable ECP_IDP
, but can be changed to the backup
(https://login2.ligo.org/) or that of another project (see cilogon.org for a list of available IDP hosts) using
the --idp-host
flag on commands that support it.