Data Replication

Replication Rules

Files, datasets and containers are replicated according to replication rules: syntactically simple statements about how many copies of a given DID should exist across a selection of RSEs.

This page discusses some examples of simple replication rules and checking the results. For this discussion, we assume some dataset has been registered using these steps.

See also: rucio documenation for replication rules.

RSE Expressions

The selection of RSEs a DID should be replicated to is determined by an RSE Expression. When we set up our RSEs, we defined a series of attributes which can be used to group RSEs together. To list all RSEs, list those with attribute ALL:

(gwrucio) $ rucio list-rses --expression ALL

Similarly, find all RSEs which are not official archives (i.e., our writable test RSEs):

(gwrucio) $ rucio list-rses --expression ARCHIVE=0

Finally, combine constraints to find all LIGO-lab RSEs which are not official archives:

rucio list-rses --expression 'ARCHIVE=0&LIGO_LAB'

Tip: rucio list-rses --expression is particularly useful to check replication rules are going to work as expected and in determining the number of copies of a DID should be specified.

Replication To A Single RSE

In an earlier example we registered a dataset ER9:H-H1_HOFT_C00. To replicate that dataset from LIGO-WA-ARCHIVE to the RSE at LIGO-CIT (a writeable RSE on the CIT cluster), we create the rule:

(gwrucio) $ rucio add-rule ER9:H-H1_HOFT_C00 1 LIGO-CIT

Get rule info:

(gwrucio) $ rucio rule-info c5fbf01b35464c2ab285f3598b30eeaf
Id:                         c5fbf01b35464c2ab285f3598b30eeaf
Account:                    root
Scope:                      ER9
Name:                       H-H1_HOFT_C00
RSE Expression:             LIGO-CIT
Copies:                     1
State:                      OK
Grouping:                   DATASET

Once rules have been entered into rucio, daemons regularly evaluate and implement the replication policies and communicate transfer requests to the FTS server etc.

After a short time, the replicas show up at the targeted location:

(gwrucio) $ for file in $(rucio list-dids ER9:* --filter type=file --short); do rucio list-file-replicas $file; done
| SCOPE   | NAME                              | FILESIZE   | ADLER32   | RSE: REPLICA                                                                                                                                    |
| ER9     | H-H1_HOFT_C00-1151848448-4096.gwf | 319.162 MB | c3e2cd8a  | LIGO-CIT: gsi                |
| ER9     | H-H1_HOFT_C00-1151848448-4096.gwf | 319.162 MB | c3e2cd8a  | LIGO-WA-ARCHIVE: gsi |

This command loops through the list of ER9 files and retrives information on the replicas available for each file.

Replicating To Multiple RSEs

We can now replicate this data to the rest of our data storage network. The LSC/Virgo have traditionally used a hub-spoke data distribution model: data is replicated from the observatories to each of the other observatories and the Caltech cluster. From those locations, data is then replicated to other major compute centers and higher Tier centers

Traditional hub-spoke model for LIGO/Virgo data distribution. Image illustrates distribution topology, the sites in the network have since changed.

Consider the example of distributing raw HOFT from the LIGO Hanford observatory taken during O1.

  • Assume a dataset O1:H1_HOFT_C00 has been registered at the LIGO-WA-ARCHIVE RSE following these procedures.

  • Set a rule to replicate LHO data to the Livingston observatory (LIGO-LA) and to Caltech (LIGO-CIT-ARCHIVE):

$ rucio add-rule `O1:H1_HOFT_C00` \
  --source-replica-expression LIGO-WA-ARCHIVE'

i.e., replicate 2 copies of O1:H1_HOFT_C00 to the 2 RSEs which are the set {LIGO-LA or LIGO-CIT}, using the LIGO-WA-ARCHIVE as the source of the replication.

  • Finally, define a rule to distribute that dataset over all RSEs:

$ rucio add-rule `O1:H1_HOFT_C00` \
    $(rucio list-rses --expression ALL | wc -l) ALL \
    --source-replica-expression LIGO-CIT

where we have determined the number of copies to make by counting the number of RSEs which match our list of targets and, in keeping with the diagram above, we use the Caltech cluster as the distribution hub. Rucio will identify RSEs which already host that dataset and refrain from copying to those locations.

Between the data registration procedures and the replication rules described here, we essentially have all the tools required to archive and replicate bulk gravitational wave data across a network of HTC computing centers. To understand how to bring all of this together for continuous operation during an observing run, see ER13 for a description of a live implementation.

Transfer Monitoring and Debugging

A complete monitoring infrastructure has yet to be deployed but it can be useful to:

  • Tail the daemon logs on the rucio server:

    • /var/log/rucio/judge-evaluator: evaluates/implements replication rules

    • /var/log/rucio/conveyor-transfer-submitter: submits transfer requests to the FTS server

  • Use the FTS web-monitor at fts_url:8449/fts3/ftsmon. Note that these pages require SSL identification. LIGO CAs seem not to be recognised but rucio admins can simply use the rucio server host certificate.