Data Replication¶
Replication Rules¶
Files, datasets and containers are replicated according to replication rules
:
syntactically simple statements about how many copies of a given DID should exist
across a selection of RSEs.
This page discusses some examples of simple replication rules and checking the results. For this discussion, we assume some dataset has been registered using these steps.
Replication examples:
See also: rucio documenation for replication rules.
RSE Expressions¶
The selection of RSEs a DID should be replicated to
is determined by an RSE Expression.
When we set up our RSEs, we defined a series of attributes which can be used to
group RSEs together. To list all RSEs, list those with attribute ALL
:
(gwrucio) $ rucio list-rses --expression ALL
LIGO-CIT
LIGO-WA-ARCHIVE
LIGO-CIT-ARCHIVE
UNL
LIGO-WA
Similarly, find all RSEs which are not official archives (i.e., our writable test RSEs):
(gwrucio) $ rucio list-rses --expression ARCHIVE=0
LIGO-CIT
LIGO-WA
UNL
Finally, combine constraints to find all LIGO-lab RSEs which are not official archives:
rucio list-rses --expression 'ARCHIVE=0&LIGO_LAB'
LIGO-CIT
LIGO-WA
Tip: rucio list-rses --expression
is particularly useful to check
replication rules are going to work as expected and in determining the number
of copies of a DID should be specified.
Replication To A Single RSE¶
In an earlier
example
we registered a dataset ER9:H-H1_HOFT_C00
. To replicate that dataset from
LIGO-WA-ARCHIVE
to the RSE at LIGO-CIT
(a writeable RSE on the CIT
cluster), we create the rule:
(gwrucio) $ rucio add-rule ER9:H-H1_HOFT_C00 1 LIGO-CIT
c5fbf01b35464c2ab285f3598b30eeaf
Get rule info:
(gwrucio) $ rucio rule-info c5fbf01b35464c2ab285f3598b30eeaf
Id: c5fbf01b35464c2ab285f3598b30eeaf
Account: root
Scope: ER9
Name: H-H1_HOFT_C00
RSE Expression: LIGO-CIT
Copies: 1
State: OK
Locks OK/REPLICATING/STUCK: 9/0/0
Grouping: DATASET
<truncated>
Once rules have been entered into rucio, daemons regularly evaluate and implement the replication policies and communicate transfer requests to the FTS server etc.
After a short time, the replicas show up at the targeted location:
(gwrucio) $ for file in $(rucio list-dids ER9:* --filter type=file --short); do rucio list-file-replicas $file; done
+---------+-----------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| SCOPE | NAME | FILESIZE | ADLER32 | RSE: REPLICA |
|---------+-----------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------|
| ER9 | H-H1_HOFT_C00-1151848448-4096.gwf | 319.162 MB | c3e2cd8a | LIGO-CIT: gsiftp://ldas-pcdev6.ligo.caltech.edu:2811/mnt/rucio/ER9/hoft/H1/H-H1_HOFT_C00-11518/H-H1_HOFT_C00-1151848448-4096.gwf |
| ER9 | H-H1_HOFT_C00-1151848448-4096.gwf | 319.162 MB | c3e2cd8a | LIGO-WA-ARCHIVE: gsiftp://ldas-pcdev6.ligo-wa.caltech.edu:2811/archive/frames/ER9/hoft/H1/H-H1_HOFT_C00-11518/H-H1_HOFT_C00-1151848448-4096.gwf |
+---------+-----------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------+
<snip>
This command loops through the list of ER9 files and retrives information on the replicas available for each file.
Replicating To Multiple RSEs¶
We can now replicate this data to the rest of our data storage network. The LSC/Virgo have traditionally used a hub-spoke data distribution model: data is replicated from the observatories to each of the other observatories and the Caltech cluster. From those locations, data is then replicated to other major compute centers and higher Tier centers
Traditional hub-spoke model for LIGO/Virgo data distribution. Image illustrates distribution topology, the sites in the network have since changed. | |
---|---|
Consider the example of distributing raw HOFT from the LIGO Hanford observatory taken during O1.
Assume a dataset
O1:H1_HOFT_C00
has been registered at theLIGO-WA-ARCHIVE
RSE following these procedures.Set a rule to replicate LHO data to the Livingston observatory (
LIGO-LA
) and to Caltech (LIGO-CIT-ARCHIVE
):
$ rucio add-rule `O1:H1_HOFT_C00` \
2 'LIGO-LA|LIGO-CIT-ARCHIVE' \
--source-replica-expression LIGO-WA-ARCHIVE'
i.e., replicate 2 copies of O1:H1_HOFT_C00
to the 2 RSEs which are the set
{LIGO-LA
or LIGO-CIT
}, using the LIGO-WA-ARCHIVE
as the source of the
replication.
Finally, define a rule to distribute that dataset over all RSEs:
$ rucio add-rule `O1:H1_HOFT_C00` \
$(rucio list-rses --expression ALL | wc -l) ALL \
--source-replica-expression LIGO-CIT
where we have determined the number of copies to make by counting the number of RSEs which match our list of targets and, in keeping with the diagram above, we use the Caltech cluster as the distribution hub. Rucio will identify RSEs which already host that dataset and refrain from copying to those locations.
Between the data registration procedures and the replication rules described here, we essentially have all the tools required to archive and replicate bulk gravitational wave data across a network of HTC computing centers. To understand how to bring all of this together for continuous operation during an observing run, see ER13 for a description of a live implementation.
Transfer Monitoring and Debugging¶
A complete monitoring infrastructure has yet to be deployed but it can be useful to:
Tail the daemon logs on the rucio server:
/var/log/rucio/judge-evaluator
: evaluates/implements replication rules/var/log/rucio/conveyor-transfer-submitter
: submits transfer requests to the FTS server
Use the FTS web-monitor at
fts_url:8449/fts3/ftsmon
. Note that these pages require SSL identification. LIGO CAs seem not to be recognised but rucio admins can simply use the rucio server host certificate.