Building the Onedata@FR environment

During this practice, we will set up the spaces for the Onedata@FR environment, as specified in the spreadsheet.

You are responsible for the space(s) that are marked with a green cell in your provider's column. During the practice, continuously make sure that all the supports granted are configured according to the outline — use your view-only access to other clusters.

no margin

Storage import

In the Onedata@FR outline, some storage backends are marked as imported. This means that such a storage backend holds a pre-existing dataset that should be imported and exposed in a Onedata space. No data is copied in the process!

No more than one imported storage backend is allowed in a single space. Other space supports may be granted on non-imported storage backends.

A single imported storage can support only one space.


For a comprehensive explanation of the storage import, consult the documentation.

Storage import

Storage import should be used only in two setups:

  • There is a legacy dataset located on the storage, which should be imported into a space for processing. The dataset is not expected to change (outside of Onedata). A single dataset scan is sufficient.
  • Like above, but the data on the storage is to be modified directly by third parties, bypassing the Oneprovider interfaces, and the changes should be reflected in the supported space. Continuous dataset scan must be enabled.

In other cases (non-imported spaces), the storage backend is expected to be initially empty, and may only be accessed via Onedata (otherwise, the metadata consistency will be broken). Do not use the imported option unless necessary (see the above points), as it has a negative impact on the data access performance.

An imported storage backend can be specified as read-only — Oneprovider then denies any data modifications and does not attempt to create, modify, or delete files on it.

Storage backends configuration

Proceed to the storage backend configuration, consulting the Onedata@FR outline. Each person should create 3 different storage backends. For more details about storage backends, consult the documentation.

  1. Navigate to the Storage backends tab.
  2. Click Add storage backend in the top right corner.
  3. Fill out the form according to the outline (Oneproviders sheet).
  4. The next slides contain details for each of the storage backend types — fill out only provided values, and leave the rest empty (default).
  5. Make sure to add all 3 storage backends from the outline.

POSIX

  • Storage name: posix
  • Mount point: /hostfs/volumes/posix
  • Imported storage: false

Make sure that the mount-point exists, run (on the host):

sudo mkdir -p /volumes/posix

NFS

  • Storage name: imported_nfs_ro
  • Imported storage: true
  • Readonly: true
  • Hostname: od.meso.umontpellier.fr
  • NFS version: v3
  • Volume:
    • Marseille: /od_nfs1
    • Toulouse: /od_nfs2
    • Nantes: /od_nfs3
    • Strasbourg: /od_nfs4
    • Lille: /od_nfs5

Ceph (1/4)

If you have access to your own Ceph storage, go to slide (4/4).

Otherwise, follow the below guide to deploy a simple dockerized Ceph cluster.

A) If you have 2 VMs, deploy Ceph on a different one than the Oneprovider. You must set up a hostname first:

sudo hostname node

B) Otherwise, deploy it alongside the Oneprovider, but then --allow-fqdn-hostname must be added to the bootstrap command — reminder on the slide (3/4).

Ceph (2/4)

Ceph setup requires an empty disk device. You may have one (check sudo lsblk). If no disk device is available, a loopback device can be used like below:

sudo -s
dd if=/dev/zero of=ceph-osd-1 bs=1M count=5120
DEV=`losetup -f`
losetup $DEV /home/ubuntu/ceph-osd-1
pvcreate $DEV
vgcreate ceph_vg $DEV
lvcreate -l '100%FREE' ceph_vg

Check ls /dev/ceph_vg — there should be an LVM volume (e.g. lvol0) which can be used as a block device in the following steps.

Ceph (3/4)

Export the BLOCK_DEV variable, putting down the path to the block device:

export BLOCK_DEV=/dev/??  # (e.g. /dev/sdd or /dev/ceph_vg/lvol0)

Deploy Ceph via the cephadm tool:

sudo -s
cd /usr/local/bin
curl --silent --remote-name --location https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm
chmod +x cephadm
# ! add `--allow-fqdn-hostname` option if on the same VM as the Oneprovider
cephadm bootstrap --mon-ip $(hostname -i) --single-host-defaults |& tee c.log
cephadm install ceph-common
ceph config set global osd_pool_default_size 1
ceph -s
ceph orch daemon add osd $(hostname):$BLOCK_DEV
ceph osd pool create onedata 128

Ceph (4/4)

  • Type: Ceph RADOS
  • Name: ceph
  • Username: client.admin
  • Key: (copy the key from /etc/ceph/ceph.client.admin.keyring)
  • Monitor hostname: (value of hostname -i on the VM with Ceph)
  • Cluster name: ceph
  • Pool name: onedata

S3 (1/3)

  • Storage name:
    • Paris, Montpellier, Rennes: imported_s3_ro
    • Lyon, Nice, Bordeaux: imported_s3
    • Marseille, Toulouse, Nantes, Strasbourg, Lille: s3
  • Storage path type:
    • Paris, Montpellier, Rennes, Lyon, Nice, Bordeaux: canonical
    • Marseille, Toulouse, Nantes, Strasbourg, Lille: flat
  • Hostname: https://s3-data.meso.umontpellier.fr:443
  • Imported storage:
    • Paris, Montpellier, Rennes, Lyon, Nice, Bordeaux: true
    • Marseille, Toulouse, Nantes, Strasbourg, Lille: false
  • Readonly:
    • Paris, Montpellier, Rennes: true
    • Lyon, Nice, Bordeaux: false

S3 (2/3)

  • Bucket name:
    • Paris: sm1-onedata-1
    • Marseille: sm1-onedata-7
    • Lyon: sm1-onedata-2
    • Toulouse: sm1-onedata-8
    • Nice: sm1-onedata-3
    • Nantes: sm1-onedata-9
    • Montpellier: sm1-onedata-4
    • Strasbourg: sm1-onedata-10
    • Bordeaux: sm1-onedata-5
    • Lille: sm1-onedata-11
    • Rennes: sm1-onedata-6

S3 (3/3)

  • Admin access key: provided on Slack during the session.
  • Admin secret key: provided on Slack during the session.
  • Block size [bytes]:
    • Paris, Montpellier, Rennes, Lyon, Nice, Bordeaux: 0
    • Marseille, Toulouse, Nantes, Strasbourg, Lille: default(leave empty)

Storage import practice

Now that the storage backends are properly configured, we can proceed to support spaces.

During this practice, you will work in the following groups:

  • Group 1: Paris, Marseille, Lyon, Toulouse
  • Group 2: Nice, Nantes, Montpellier, Strasbourg
  • Group 3: Bordeaux, Lille, Rennes

Storage import practice

Install AWS CLI:

sudo apt install awscli

Create a file ~/.aws/credentials with the content shared on Slack. It should look like the following:

[onedata-fr]
aws_access_key_id = ...
aws_secret_access_key = ...

Storage import practice

  1. Create the space marked with a green cell in your provider's column.

    @Rennes — do not create the space alpha-11p yet.

  2. Add the Workshops group to the space. Grant all possible privileges.

  3. Copy the space support token from the Add support page.

  4. Support the space with the storage backend according to the outline. Use the Spaces menu in the Oneprovider cluster GUI. Assing 5 GiB of quota.

Storage import practice

  1. Configure the storage import:
  • Mode:
    • auto — the storage backend will be automatically scanned and data will be imported from storage into the assigned space. Choose this one.
    • manual — the files must be registered manually by the space users. Registration of directories is not supported. This mode will be practiced in another chapter.
  • Max depth — maximum depth of filesystem tree that will be traversed during the scan. By default, it is 65535.
  • Synchronize ACL — enables import of NFSv4 ACLs — not covered during these workshops.
  • Detect modifications — if disabled, the storage will be treated as immutable – changes of already imported files will not be detected. Set according to the outline.
  • Detect deletions — determines whether deletions of already imported files will be detected and reflected — not covered during these workshops.
  • Continuous scan — indicates if the data on the storage should be scanned periodically. Set if "detect modifications" is enabled.
  • Scan interval — the period between subsequent scans — use defaults.

Storage import practice

  1. After the support, the import process should have started. The progress can be observed on three charts. Wait for the import to finish.
  2. Generate and send a support token to all your colleagues from your group.
  3. Using tokens received from your colleagues, support all other spaces according to the outline. You will observe that only non-imported storage backends are allowed.
  4. Navigate to the Data tab and open the file browser for your space. It should include the preexisting dataset that was imported into the space.
  5. Select a space supported with a read-only storage backend:
  • Group 1: beta-4p
  • Group 2: kappa-4p
  • Group 3: omega-3p

Storage import practice

  1. List the S3 bucket directly to see its contents:
  • Group 1: BUCKET=sm1-onedata-1

  • Group 2: BUCKET=sm1-onedata-4

  • Group 3: BUCKET=sm1-onedata-6

    aws --profile onedata-fr --endpoint-url https://s3-data.meso.umontpellier.fr s3 ls s3://$BUCKET
    
  1. Compare with what you see in the space's file browser.

  2. Try to upload a file — it should fail on Paris, Montpellier, and Rennes (as those providers support the space with the read-only storage backends). For other providers, upload should succeed.

  3. If you want, list again the contents of the bucket to see that the Oneprovider didn't do anything nasty.

Modification detection

  1. To observe how modification detection works, create a file directly on the s3 storage backend. Reuse the value of BUCKET from the previous exercise. Use your name as the value for YOUR_NAME env.

    YOUR_NAME=???
    echo "Bonjour" > $YOUR_NAME
    aws --profile onedata-fr --endpoint-url https://s3-data.meso.umontpellier.fr \
        s3 cp $YOUR_NAME s3://$BUCKET/$YOUR_NAME
    
  2. Go to the cluster GUI of the provider with the read-only storage for your group (Paris, Montpellier, and Rennes), navigate to the Spaces tab, find the space, and switch to Storage import at the top of the page.

  3. Examine the statistics of the last storage import run. If a continuous run has already started you should see some modified files; if not, you cat start one by pressing the Start scan button, or wait. Once the scan is finished, new files should be visible in the space — verify that.

Write-enabled imported storage

  1. Switch to a space supported by a non-read-only storage backend:
    • Group 1: delta-4p ($BUCKET=sm1-delta)
    • Group 2: theta-4p ($BUCKET=sm1-theta)
    • Group 3: tau-3p ($BUCKET=sm1-tau)
  2. Try to upload a file into the space root directory — it should succeed.
  3. List the S3 bucket and see that the file has appeared.

Note: according to the outline, this space was configured without detect modifications option, so changes done directly on the storage backend are not reflected in the space. Modifying files directly on the storage backend without storage import and the detect modifications option is forbidden, as it would result in metadata inconsistencies. Oneprovider admin must ensure this.

Space alpha-11p

@Rennes — create the alpha-11p space and send a support token to all the other participants, so they can support the space (check the outline for assigned backends).

Don't forget to grant support yourself.

Behold the Onedata@FR environment

Take a minute to look around the environment that was built during the practice. Look into different spaces.

Feel free to test some operations on the spaces, or to spark discussions about what has happened in this chapter.

Next chapter:

Data distribution, replica management (transfers), and QoS — practice