Archive presentations from a conference session¶
This can be useful to help mitigate the effects of a DCC outage during a conference.
Find the relevant event page on the DCC, find the URL for the session, and specify it as the argument to dcc convert. Chain the output into dcc archive, passing a directory to store the results.
# Scrape the page for the QNWG session from the September 2021 LVK Meeting, then # archive the records and attachments corresponding to the extracted DCC numbers. $ dcc convert "https://dcc.ligo.org/cgi-bin/private/DocDB/DisplayMeeting?sessionid=5120" - | dcc archive -s /path/to/archive --from-file - --files --force
The archive directory at
/path/to/archive will then contain the DCC records
and files associated with the session.
Check existing archive for missing downloads¶
--max-file-size option allows ignoring large files
when archiving. You may wish to archive files of a certain type without size limits.
This script searches the archive for (latest) records with missing files with certain
extensions and reports them:
from pathlib import Path from dcc import DCCArchive # Attachment extensions to check exist. KEEP = [".stl", ".step", ".dwg"] # The local archive. archive = DCCArchive("/path/to/archive") for record in archive.latest_revisions: report = False for file_ in record.files: if file_.exists(): continue path = Path(file_.filename) if path.suffix.casefold() in KEEP: report = True if report: print(record.dcc_number)
The output from this program is in a form that can be easily passed to dcc archive:
# Assume script above is stored in "file_missing.py". $ python find_missing.py > missing.txt $ dcc archive -s /path/to/archive --from-file missing.txt --files