.. _usage:
Usage
=====
Apart from this guide, a good place to get help is from the tool itself:
.. command-output:: dcc --help
Help for the available ``dcc`` subcommands can be shown in the same way using e.g. ``dcc
view --help``.
.. _ligo_org_authentication:
Obtaining a Kerberos ticket for accessing restricted resources
--------------------------------------------------------------
Access to most DCC records and files requires credentials such as those from `ligo.org
`__ or another provider. You typically only get these if you're a
member of a scientific collaboration.
By default, ``dcc`` assumes you can authenticate yourself and therefore builds and
requests URLs for records and files within the restricted part of the DCC, prompting for
credentials or using an existing Kerberos ticket. To avoid being prompted every time
``dcc`` is invoked, run ``kinit albert.einstein@LIGO.ORG`` (where ``albert.einstein`` is
your login and ``LIGO.ORG`` is your Kerberos realm) before first use each day (tickets
are typically granted for 24 hours). Subsequent interaction with the DCC will
transparently use a Kerberos token if one is available. The token can be verified with
``klist`` and revoked with ``kdestroy``.
You can specify the :option:`--public ` flag to restrict ``dcc`` to
accessing public records. With this flag, you don't need to enter your credentials or
obtain a Kerberos ticket, though you will only be able to access public resources.
.. _local_archive:
Configuring a local archive
---------------------------
Every ``dcc`` command that involves downloading a remote record or file can cache the
results in a local archive. This allows for quick subsequent access to the same records,
by retrieving the local copy instead of connecting to the DCC. With a configured local
archive retrieval of cached records and files is transparent, with requests being made
to the DCC only if they don't yet exist in the local archive (or if the remote version
is explicitly requested).
Downloaded data is then stored in the given directory hierarchically, e.g.:
.. code-block:: text
$ tree /path/to/archive
/path/to/archive
└── T010075
└── T010075-v3
├── Change Record for T010075-v3.docx
├── Change Record for T010075-v3.pdf
├── meta.toml
├── T010075-v3 aLIGO System Description.pdf
└── T010075-v3 System Description.zip
The ``meta.toml`` file contains the human-readable (TOML-formatted) metadata for the
record. This can also be read by ``dcc`` using :meth:`.DCCRecord.read`.
By default, ``dcc`` uses a temporary directory for downloads that gets removed
immediately before the program exits. To persist downloaded records and files between
runs, pass the :option:`-s ` or :option:`--archive-dir `
option to any command that supports it or set the :ref:`env_dcc_archive` environment
variable. Whichever method you use, the value should be a path (relative or absolute) to
a directory.
.. warning::
The local archive built by ``dcc`` is not guaranteed to remain consistent with that
of the remote DCC host. To ensure you have the latest version of a record or file,
set the :option:`--force ` flag when requesting it.
Record archival
---------------
DCC records can be archived locally using :program:`dcc archive`. This downloads
records' metadata, and optionally attached files, and stores them in the :ref:`local
archive ` for later retrieval. The command requires one or more
:option:`NUMBER ` arguments and/or a :option:`--from-file ` option followed by a path to a file containing the DCC numbers
(separated by whitespace) to archive. For example:
.. code-block:: text
# Archive the latest version of T010075:
$ dcc archive -s /path/to/archive T010075
# Archive a specific version of T010075:
$ dcc archive -s /path/to/archive T010075-v1
# Archive multiple records:
$ dcc archive -s /path/to/archive T010075 E1300945
# Alternatively specify the path to a file containing the records to archive:
$ echo "T010075 E1300945" > to-archive.txt
$ dcc archive -s /path/to/archive --from-file to-archive.txt
Similar to the behaviour of standard Unix utilities, the :option:`--from-file ` option can also be set to ``stdin`` by specifying ``-``:
.. code-block:: text
$ echo "T010075 E1300945" | dcc archive -s /path/to/archive --from-file -
Files are not automatically archived. To fetch them too, specify the :option:`--files
` flag. By default, files of any size will be retrieved. To limit the
maximum size of files retrieved, specify the :option:`--max-file-size ` option, specifying a maximum file size in MB.
Interactive mode
~~~~~~~~~~~~~~~~
Specifying :option:`-i ` or :option:`--interactive ` will prompt you for confirmation before downloading each record's files,
giving you the opportunity to skip unnecessary files. This flag implies :option:`--files
`.
Scraping a URL for links to DCC records
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The command :program:`dcc convert` scrapes DCC numbers from a file or URL and writes
them to a file:
.. code-block:: text
# Fetch DCC numbers in the "System Engineering" topic and write to 'out.txt'.
$ dcc convert https://dcc.ligo.org/cgi-bin/private/DocDB/ListBy?topicid=18 out.txt
It is easy to combine :program:`dcc convert` and :program:`dcc archive` to automatically
scrape a URL for DCC numbers and archive them locally. For example:
.. code-block:: text
# Fetch the "System Engineering" topic page, then extract and archive its DCC
# numbers.
$ dcc convert https://dcc.ligo.org/cgi-bin/private/DocDB/ListBy?topicid=18 - | dcc archive -s /path/to/archive --from-file -
Archival of referenced and referencing records
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DCC records can contain "related to" and "referenced by" records, and :program:`dcc
archive` can archive them as well. The :option:`--depth ` option controls
how far in the chain from the original documents the archival can traverse. For example,
setting :option:`--depth ` to 1 will fetch the records that are listed in
the specified DCC numbers, and setting it to 2 will additionally fetch the references of
those documents. The default is 0, meaning only the records specified in the input are
fetched.
When :option:`--depth ` is nonzero, by default only "related to" records
are fetched. To also fetch "referenced by" records, specify the
:option:`--fetch-referencing ` flag. The fetching of "related
to" and "referenced by" records can be switched on and off using
:option:`--fetch-related ` / :option:`--no-fetch-related ` and :option:`--fetch-referencing ` /
:option:`--no-fetch-referencing `, respectively.
.. warning::
The DCC is a highly connected graph and as such setting a high :option:`--depth ` is likely to lead to thousands of records being downloaded. Typically only
a value of 1 or 2 is sufficient to archive almost every relevant related record.
For example, the referenced documents of ``E1300945`` can be archived alongside
``E1300945`` itself using:
.. code-block:: text
# Fetch "related to" documents as well as E1300945 itself:
$ dcc archive -s /path/to/archive E1300945 --depth 1
# Fetch "referenced by" documents as well:
$ dcc archive -s /path/to/archive E1300945 --depth 1 --fetch-referencing
.. _updating_record_metadata:
Updating record metadata
------------------------
Record metadata can be updated via ``dcc`` using :program:`dcc update`. This accepts a
:option:`DCC number ` and one or more of the following options:
:option:`--title `, :option:`--abstract `,
:option:`--keyword `, :option:`--note `,
:option:`--related `, and :option:`--author `.
The :option:`--keyword `, :option:`--related `, and :option:`--author ` options can be specified
multiple times to set multiple values. Author names should be as written, e.g. "Albert
Einstein", and should correspond to real DCC users. For example:
.. code-block:: text
# Update the title of T2200016.
$ dcc update T2200016 --title "A new title"
By default, :program:`dcc update` will prompt for confirmation before sending the
updated record to the DCC. To make changes without any confirmation, specify the flag
:option:`--no-confirm `. Submitted changes are irreversible, so
be careful.
.. note::
The DCC does not appear to perform error checking on author names. If an author is
not given correctly, it is simply discarded.
.. _changing_host:
Changing the DCC or login host
------------------------------
By default, ``dcc`` interacts with the DCC host at https://dcc.ligo.org/, or that of the
environment variable ``DCC_HOST`` if set. Some users may wish to change this to
something different, such as one of the backup servers (https://dcc-backup.ligo.org/,
https://dcc-lho.ligo.org/, https://dcc-llo.ligo.org/) or a DCC server for a different
project (e.g. https://dcc.cosmicexplorer.org/). This can be done by specifying a
different host using the :option:`--host ` flag on commands that support it.
.. warning::
``dcc`` does not distinguish between DCC hosts when archiving records and files
locally. To prevent mixing records from separate projects within the same hierarchy,
specify a different :ref:`local archive ` setting for each project.
It is also possible to change the identity provider (IDP) host, used to authenticate
your login credentials. By default it is set to https://login.ligo.org/, or that of the
environment variable ``ECP_IDP``, but can be changed to the backup
(https://login2.ligo.org/) or that of another project (see `cilogon.org
`__ for a list of available IDP hosts) using
the :option:`--idp-host ` flag on commands that support it.