Usage

Apart from this guide, a good place to get help is from the tool itself:

$ dcc --help
Usage: dcc [OPTIONS] COMMAND [ARGS]...

  dcc 0.8.0

  Tools for viewing and updating records, metadata and files in the LIGO
  Document Control Center (DCC).

  Website: https://docs.ligo.org/sean-leavey/dcc/

  dcc comes with ABSOLUTELY NO WARRANTY. This is free software, and you are
  welcome to redistribute it under certain conditions. See the GNU General
  Public Licence for details.

  Copyright 2022 Sean Leavey, Jameson Graef Rollins, Christopher Wipf

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  archive    Archive remote DCC records locally.
  convert    Extract DCC numbers from a target file or URL.
  list       List records in the local archive.
  open       Open remote DCC record page in the default browser.
  open-file  Open file attached to DCC record using operating system.
  update     Update remote DCC record metadata.
  view       View DCC record metadata.

Help for the available dcc subcommands can be shown in the same way using e.g. dcc view --help.

Obtaining a Kerberos ticket for accessing restricted resources

Access to most DCC records and files requires credentials such as those from ligo.org or another provider. You typically only get these if you’re a member of a scientific collaboration.

By default, dcc assumes you can authenticate yourself and therefore builds and requests URLs for records and files within the restricted part of the DCC, prompting for credentials or using an existing Kerberos ticket. To avoid being prompted every time dcc is invoked, run kinit albert.einstein@LIGO.ORG (where albert.einstein is your login and LIGO.ORG is your Kerberos realm) before first use each day (tickets are typically granted for 24 hours). Subsequent interaction with the DCC will transparently use a Kerberos token if one is available. The token can be verified with klist and revoked with kdestroy.

You can specify the --public flag to restrict dcc to accessing public records. With this flag, you don’t need to enter your credentials or obtain a Kerberos ticket, though you will only be able to access public resources.

Configuring a local archive

Every dcc command that involves downloading a remote record or file can cache the results in a local archive. This allows for quick subsequent access to the same records, by retrieving the local copy instead of connecting to the DCC. With a configured local archive retrieval of cached records and files is transparent, with requests being made to the DCC only if they don’t yet exist in the local archive (or if the remote version is explicitly requested).

Downloaded data is then stored in the given directory hierarchically, e.g.:

$ tree /path/to/archive
/path/to/archive
└── T010075
    └── T010075-v3
        ├── Change Record for T010075-v3.docx
        ├── Change Record for T010075-v3.pdf
        ├── meta.toml
        ├── T010075-v3 aLIGO System Description.pdf
        └── T010075-v3 System Description.zip

The meta.toml file contains the human-readable (TOML-formatted) metadata for the record. This can also be read by dcc using DCCRecord.read().

By default, dcc uses a temporary directory for downloads that gets removed immediately before the program exits. To persist downloaded records and files between runs, pass the -s or --archive-dir option to any command that supports it or set the DCC_ARCHIVE environment variable. Whichever method you use, the value should be a path (relative or absolute) to a directory.

Warning

The local archive built by dcc is not guaranteed to remain consistent with that of the remote DCC host. To ensure you have the latest version of a record or file, set the --force flag when requesting it.

Record archival

DCC records can be archived locally using dcc archive. This downloads records’ metadata, and optionally attached files, and stores them in the local archive for later retrieval. The command requires one or more NUMBER arguments and/or a --from-file option followed by a path to a file containing the DCC numbers (separated by whitespace) to archive. For example:

# Archive the latest version of T010075:
$ dcc archive -s /path/to/archive T010075

# Archive a specific version of T010075:
$ dcc archive -s /path/to/archive T010075-v1

# Archive multiple records:
$ dcc archive -s /path/to/archive T010075 E1300945

# Alternatively specify the path to a file containing the records to archive:
$ echo "T010075 E1300945" > to-archive.txt
$ dcc archive -s /path/to/archive --from-file to-archive.txt

Similar to the behaviour of standard Unix utilities, the --from-file option can also be set to stdin by specifying -:

$ echo "T010075 E1300945" | dcc archive -s /path/to/archive --from-file -

Files are not automatically archived. To fetch them too, specify the --files flag. By default, files of any size will be retrieved. To limit the maximum size of files retrieved, specify the --max-file-size option, specifying a maximum file size in MB.

Interactive mode

Specifying -i or --interactive will prompt you for confirmation before downloading each record’s files, giving you the opportunity to skip unnecessary files. This flag implies --files.

Archival of referenced and referencing records

DCC records can contain “related to” and “referenced by” records, and dcc archive can archive them as well. The --depth option controls how far in the chain from the original documents the archival can traverse. For example, setting --depth to 1 will fetch the records that are listed in the specified DCC numbers, and setting it to 2 will additionally fetch the references of those documents. The default is 0, meaning only the records specified in the input are fetched.

When --depth is nonzero, by default only “related to” records are fetched. To also fetch “referenced by” records, specify the --fetch-referencing flag. The fetching of “related to” and “referenced by” records can be switched on and off using --fetch-related / --no-fetch-related and --fetch-referencing / --no-fetch-referencing, respectively.

Warning

The DCC is a highly connected graph and as such setting a high --depth is likely to lead to thousands of records being downloaded. Typically only a value of 1 or 2 is sufficient to archive almost every relevant related record.

For example, the referenced documents of E1300945 can be archived alongside E1300945 itself using:

# Fetch "related to" documents as well as E1300945 itself:
$ dcc archive -s /path/to/archive E1300945 --depth 1

# Fetch "referenced by" documents as well:
$ dcc archive -s /path/to/archive E1300945 --depth 1 --fetch-referencing

Updating record metadata

Record metadata can be updated via dcc using dcc update. This accepts a DCC number and one or more of the following options: --title, --abstract, --keyword, --note, --related, and --author.

The --keyword, --related, and --author options can be specified multiple times to set multiple values. Author names should be as written, e.g. “Albert Einstein”, and should correspond to real DCC users. For example:

# Update the title of T2200016.
$ dcc update T2200016 --title "A new title"

By default, dcc update will prompt for confirmation before sending the updated record to the DCC. To make changes without any confirmation, specify the flag --no-confirm. Submitted changes are irreversible, so be careful.

Note

The DCC does not appear to perform error checking on author names. If an author is not given correctly, it is simply discarded.

Changing the DCC or login host

By default, dcc interacts with the DCC host at https://dcc.ligo.org/, or that of the environment variable DCC_HOST if set. Some users may wish to change this to something different, such as one of the backup servers (https://dcc-backup.ligo.org/, https://dcc-lho.ligo.org/, https://dcc-llo.ligo.org/) or a DCC server for a different project (e.g. https://dcc.cosmicexplorer.org/). This can be done by specifying a different host using the --host flag on commands that support it.

Warning

dcc does not distinguish between DCC hosts when archiving records and files locally. To prevent mixing records from separate projects within the same hierarchy, specify a different local archive setting for each project.

It is also possible to change the identity provider (IDP) host, used to authenticate your login credentials. By default it is set to https://login.ligo.org/, or that of the environment variable ECP_IDP, but can be changed to the backup (https://login2.ligo.org/) or that of another project (see cilogon.org for a list of available IDP hosts) using the --idp-host flag on commands that support it.