Archive presentations from a conference session

This can be useful to help mitigate the effects of a DCC outage during a conference.

Find the relevant event page on the DCC, find the URL for the session, and specify it as the argument to dcc convert. Chain the output into dcc archive, passing a directory to store the results.

# Scrape the page for the QNWG session from the September 2021 LVK Meeting, then
# archive the records and attachments corresponding to the extracted DCC numbers.
$ dcc convert "https://dcc.ligo.org/cgi-bin/private/DocDB/DisplayMeeting?sessionid=5120" - | dcc archive -s /path/to/archive --from-file - --files --force

The archive directory at /path/to/archive will then contain the DCC records and files associated with the session.

Check existing archive for missing downloads

The --max-file-size option allows ignoring large files when archiving. You may wish to archive files of a certain type without size limits. This script searches the archive for (latest) records with missing files with certain extensions and reports them:

from pathlib import Path
from dcc import DCCArchive

# Attachment extensions to check exist.
KEEP = [".stl", ".step", ".dwg"]

# The local archive.
archive = DCCArchive("/path/to/archive")

for record in archive.latest_revisions:
    report = False
    for file_ in record.files:
        if file_.exists():
        path = Path(file_.filename)
        if path.suffix.casefold() in KEEP:
            report = True

    if report:

The output from this program is in a form that can be easily passed to dcc archive:

# Assume script above is stored in "file_missing.py".
$ python find_missing.py > missing.txt
$ dcc archive -s /path/to/archive --from-file missing.txt --files

The interactive mode flag -i (or --interactive) can be useful here, which prompts before downloading each file, allowing you to skipones you don’t want.