# Data Replication ## Replication Rules Files, datasets and containers are replicated according to `replication rules`: syntactically simple statements about how many copies of a given DID should exist across a selection of RSEs. This page discusses some examples of simple replication rules and checking the results. For this discussion, we assume some dataset has been registered using [these steps](https://git.ligo.org/james-clark/gwrucio/wikis/rucio-operations/data-registration). * [RSE Expressions](#rse-expressions) * Replication examples: * [Point-point replication](#point-point-transfers) * [Network data distribution](#network-data-distribution) * [Basic monitoring](#transfer-monitoring-and-debugging) See also: [rucio documenation for replication rules](https://rucio.readthedocs.io/en/latest/overview_Replica_management.html). ### RSE Expressions The selection of RSEs a DID should be replicated to is determined by an [RSE Expression](https://rucio.readthedocs.io/en/latest/rse_expressions.html). When we set up our RSEs, we defined a series of attributes which can be used to group RSEs together. To list all RSEs, list those with attribute `ALL`: ``` (gwrucio) $ rucio list-rses --expression ALL LIGO-CIT LIGO-WA-ARCHIVE LIGO-CIT-ARCHIVE UNL LIGO-WA ``` Similarly, find all RSEs which are not official archives (i.e., our writable test RSEs): ``` (gwrucio) $ rucio list-rses --expression ARCHIVE=0 LIGO-CIT LIGO-WA UNL ``` Finally, combine constraints to find all LIGO-lab RSEs which are not official archives: ``` rucio list-rses --expression 'ARCHIVE=0&LIGO_LAB' LIGO-CIT LIGO-WA ``` **Tip**: `rucio list-rses --expression` is particularly useful to check replication rules are going to work as expected and in determining the number of copies of a DID should be specified. ## Replication To A Single RSE In an [earlier example](https://git.ligo.org/james-clark/gwrucio/wikis/rucio-operations/data-registration#one-time-registration) we registered a dataset `ER9:H-H1_HOFT_C00`. To replicate that dataset from `LIGO-WA-ARCHIVE` to the RSE at `LIGO-CIT` (a writeable RSE on the CIT cluster), we create the rule: ``` (gwrucio) $ rucio add-rule ER9:H-H1_HOFT_C00 1 LIGO-CIT c5fbf01b35464c2ab285f3598b30eeaf ``` Get rule info: ``` (gwrucio) $ rucio rule-info c5fbf01b35464c2ab285f3598b30eeaf Id: c5fbf01b35464c2ab285f3598b30eeaf Account: root Scope: ER9 Name: H-H1_HOFT_C00 RSE Expression: LIGO-CIT Copies: 1 State: OK Locks OK/REPLICATING/STUCK: 9/0/0 Grouping: DATASET ``` Once rules have been entered into rucio, daemons regularly evaluate and implement the replication policies and communicate transfer requests to the FTS server etc. After a short time, the replicas show up at the targeted location: ``` (gwrucio) $ for file in $(rucio list-dids ER9:* --filter type=file --short); do rucio list-file-replicas $file; done +---------+-----------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------+ | SCOPE | NAME | FILESIZE | ADLER32 | RSE: REPLICA | |---------+-----------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------| | ER9 | H-H1_HOFT_C00-1151848448-4096.gwf | 319.162 MB | c3e2cd8a | LIGO-CIT: gsiftp://ldas-pcdev6.ligo.caltech.edu:2811/mnt/rucio/ER9/hoft/H1/H-H1_HOFT_C00-11518/H-H1_HOFT_C00-1151848448-4096.gwf | | ER9 | H-H1_HOFT_C00-1151848448-4096.gwf | 319.162 MB | c3e2cd8a | LIGO-WA-ARCHIVE: gsiftp://ldas-pcdev6.ligo-wa.caltech.edu:2811/archive/frames/ER9/hoft/H1/H-H1_HOFT_C00-11518/H-H1_HOFT_C00-1151848448-4096.gwf | +---------+-----------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------+ ``` This command loops through the list of ER9 files and retrives information on the replicas available for each file. ## Replicating To Multiple RSEs We can now replicate this data to the rest of our data storage network. The LSC/Virgo have traditionally used a hub-spoke data distribution model: data is replicated from the observatories to each of the other observatories and the Caltech cluster. From those locations, data is then replicated to other major compute centers and higher Tier centers | | Traditional hub-spoke model for LIGO/Virgo data distribution. Image illustrates distribution topology, the sites in the network have since changed.| | ---- | ---- | Consider the example of distributing raw HOFT from the LIGO Hanford observatory taken during O1. * Assume a dataset `O1:H1_HOFT_C00` has been registered at the `LIGO-WA-ARCHIVE` RSE following [these procedures](https://git.ligo.org/james-clark/gwrucio/wikis/rucio-operations/data-registration). * Set a rule to replicate LHO data to the Livingston observatory (`LIGO-LA`) and to Caltech (`LIGO-CIT-ARCHIVE`): ``` $ rucio add-rule `O1:H1_HOFT_C00` \ 2 'LIGO-LA|LIGO-CIT-ARCHIVE' \ --source-replica-expression LIGO-WA-ARCHIVE' ``` i.e., replicate 2 copies of `O1:H1_HOFT_C00` to the 2 RSEs which are the set {`LIGO-LA` or `LIGO-CIT`}, using the `LIGO-WA-ARCHIVE` as the source of the replication. * Finally, define a rule to distribute that dataset over all RSEs: ``` $ rucio add-rule `O1:H1_HOFT_C00` \ $(rucio list-rses --expression ALL | wc -l) ALL \ --source-replica-expression LIGO-CIT ``` where we have determined the number of copies to make by counting the number of RSEs which match our list of targets and, in keeping with the diagram above, we use the Caltech cluster as the distribution hub. Rucio will identify RSEs which already host that dataset and refrain from copying to those locations. Between the [data registration](https://git.ligo.org/james-clark/gwrucio/wikis/rucio-operations/data-registration) procedures and the replication rules described here, we essentially have all the tools required to archive and replicate bulk gravitational wave data across a network of HTC computing centers. To understand how to bring all of this together for continuous operation during an observing run, see [ER13](https://git.ligo.org/james-clark/gwrucio/wikis/applications/ER13) for a description of a live implementation. ## Transfer Monitoring and Debugging A complete monitoring infrastructure has yet to be deployed but it can be useful to: * Tail the daemon logs on the rucio server: * `/var/log/rucio/judge-evaluator`: evaluates/implements replication rules * `/var/log/rucio/conveyor-transfer-submitter`: submits transfer requests to the FTS server * Use the FTS web-monitor at [`fts_url:8449/fts3/ftsmon`](https://fts3-pilot.cern.ch:8449/fts3/ftsmon). Note that these pages require SSL identification. LIGO CAs seem not to be recognised but rucio admins can simply use the rucio server host certificate.