Other data interfaces

This training session will cover usage examples for the following interfaces:

  • Oneclient
  • OneS3
  • OnedataFS

Oneclient

Oneclient practical exercises will include the following topics:

  • Installation, creating a mountpoint and releasing a mountpoint
  • Basic read and write examples
  • Extended attributes
  • Handling links
  • ProxyIO vs DirectIO
  • Accessing remote files
  • Customized data mounts

Oneclient - preparation

  1. Install tools needed for the exercises:

    apt install -y curl gnupg2 fuse xattr imagemagick awscli
    
  2. Export the name of your user directory, e.g.:

    export USER_DIR=joe
    

Oneclient - installation

  1. Install oneclient:

    curl -sS http://get.onedata.org/oneclient.sh | bash
    
  2. Verify that it worked:

    oneclient --version
    
  3. Create a mountpoint:

    mkdir /mnt/oneclient
    
  4. Create a oneclient access token in GUI and mount oneclient:

    oneclient -H <ONEPROVIDER_HOST> -t <ACCESS_TOKEN> /mnt/oneclient
    

Oneclient - installation

  1. Check the mountpoint whether it contains Onedata spaces:

    ls /mnt/oneclient
    
  2. Unmount oneclient:

    fusermount -uz /mnt/oneclient
    
  3. Check the mountpoint again:

    ls /mnt/oneclient
    

Oneclient - basic usage

  1. Create a oneclient access token in GUI and mount oneclient:

    export ONECLIENT_PROVIDER_HOST=...
    export ONECLIENT_ACCESS_TOKEN=...
    oneclient --force-proxy-io /mnt/oneclient
    cd /mnt/oneclient/alpha-11p/$USER_DIR
    
  2. Find a specific file in the space and change into that directory:

    find . -name arthouse.jpg
    cd ./FirstDataset/Photos
    
  3. Generate a thumbnail for the file:

    convert arthouse.jpg -auto-orient -thumbnail 5% -unsharp 0x.5 arthouse_thumbnail.gif
    

    Check in Web GUI that the thumbnail exists.

Oneclient - extended attributes

  1. List all extended attributes for a file:

    xattr -l arthouse_thumbnail.gif
    

    By default, a file has several attributes starting with prefix org.onedata., which represent some useful Onedata file properties such as it's File ID, access type or replication status.

  2. Set a new attribute:

    xattr -w license CC-0 arthouse_thumbnail.gif
    xattr -p license arthouse_thumbnail.gif
    

    Check in the Web GUI basic metadata for this file.

  3. For convenience, all files can be accessed directly by their File ID, which is useful for instance when combining requests from REST with operations on filesystem, for example:

     md5sum "/mnt/oneclient/.__onedata__file_id__$(xattr -p org.onedata.file_id arthouse_thumbnail.gif)"
    

Oneclient - links

  1. Create a hard link to a file:

    ln arthouse.jpg arthouse_lnk.jpg
    

    Check in the GUI that the link exists and has a hard link tag.

    NOTE: By default, hardlink counts are not shown properly in Oneclient stat command due to performance cost required for caching this information. It can be enabled by adding --hard-link-count option in oneclient.

  2. Create a symbolic link to a file:

    ln -s arthouse.jpg arthouse_symlnk.jpg
    file arthouse_symlnk.jpg
    

    Download the arthouse_symlnk.jpg from Web GUI.

Oneclient - ProxyIO and DirectIO

  1. Remount oneclient with --force-proxy-io mode:

    cd ~ && fusermount -uz /mnt/oneclient
    oneclient --force-proxy-io -H <ONEPROVIDER_WITH_S3> /mnt/oneclient
    cd /mnt/oneclient/alpha-11p/$USER_DIR/
    mkdir tmp && cd tmp
    
  2. Ensure that oneclient is mounted in ProxyIO mode:

    echo TEST > test.txt
    xattr -p org.onedata.access_type test.txt
    

    The output should show: proxy.

  3. Check write performance using dd:

    dd if=/dev/zero of=test250MB.dat bs=1MB count=250 status=progress
    dd if=test250MB.dat of=/dev/null bs=1MB count=250 status=progress
    rm test250MB.dat
    

Oneclient - ProxyIO and DirectIO

  1. Remount oneclient with --force-direct-io mode:

    cd ~ && fusermount -uz /mnt/oneclient
    oneclient --force-direct-io -H <ONEPROVIDER_WITH_S3> /mnt/oneclient
    cd /mnt/oneclient/alpha-11p/$USER_DIR/tmp
    
  2. Ensure that the oneclient is mounted in DirectIO mode:

    echo TEST > test.txt
    xattr -p org.onedata.access_type test.txt
    

    The value should be direct.

  3. Check write performance using dd:

    dd if=/dev/zero of=test250MB.dat bs=1MB count=250 status=progress
    dd if=test250MB.dat of=/dev/null bs=1MB count=250 status=progress
    

Oneclient - Accessing remote files

Oneclient provisions data replicas from other Oneproviders - which support the same space - transparently by replicating accessed blocks to the current Oneprovider.

  1. Enter some other users' FirstDataset/tmp directory and check extended attributes for the test250MB.dat file:

    cd /mnt/oneclient/alpha-11p/<OTHER_USER>/FirstDataset/tmp
    xattr -l test250MB.dat
    

    The org.onedata.file_blocks and org.onedata.replication_progress are empty.

  2. Now read a selected block from the middle of the file:

    dd if=test250MB.dat of=/dev/null bs=1024 count=1 skip=100000
    xattr -l test250MB.dat
    

    The org.onedata.file_blocks and org.onedata.replication_progress should now be non-zero.

Oneclient - Customized data mounts

Mounting specific space or spaces only:

  1. Remount oneclient with a specific space only:

    cd ~ && fusermount -uz /mnt/oneclient
    oneclient --space alpha-11p /mnt/oneclient
    cd /mnt/oneclient && ls
    

    There should only be 1 space in the mountpoint.

  2. Sometimes it is necessary to create a mountpoint, which can be accessed by other users on the same machine, this is enabled by additional option:

    oneclient -o allow_other /mnt/oneclient
    

Oneclient - Customized data mounts

Using token with Path caveat

  1. Create in Web GUI a token with a path caveat: alpha-11p/$USER_DIR/FirstDataset/Photos/arthouse.jpg

  2. Remount oneclient with a specific space only:

    cd ~ && fusermount -uz /mnt/oneclient
    oneclient -t <CUSTOM_TOKEN> /mnt/oneclient
    cd /mnt/oneclient && find . -type f
    

    There should only be 1 file in the mountpoint.

OneS3

OneS3 practical exercises will include the following topics:

  • Deployment and configuration of a OneS3 instance using Docker
  • Bucket creation and data access
  • Access to Onedata space in read-only mode

OneS3 - configuration and deployment

  1. Create a file /etc/ones3/ones3.conf with the following content:
    # Onezone address
    onezone_host=demo.onedata.org
    
    # Oneprovider address
    provider_host=<YOUR_ONEPROVIDER_HOST>
    
    # Whether ProxyIO mode should be forced (disables DirectIO completely)
    force_proxy_io=1
    
    # Disable read and write buffering
    no_buffer=true
    
    # Id of storage, on which Onedata spaces for new buckets should be supported
    ones3_support_storage_id=<ALPHA11P_STORAGE_ID>
    
    # OneS3 HTTP port
    ones3_http_port=9093
    
    # Onepanel password to enable space supports, it can be obtained from Onedatify:
    #   cat /opt/onedata/onedatify/config | grep admin_password
    ones3_support_storage_credentials=onepanel:<ADMIN_PASSWORD>
    

OneS3 - configuration and deployment

  1. Run the OneS3 service in a separate terminal window:

    docker run -it -p 9093:9093 -v /etc/ones3:/etc/ones3 onedata/ones3:21.02.3-develop
    

    S3 access logs will be printed on the terminal.

  2. Create a file ~/.aws/credentials with the following content:

    [ones3-demo]
    aws_access_key_id = <FULL_ACCESS_TOKEN>
    aws_secret_access_key = k3ZDJkN2NoN2EzMSM5MDQ5YmQ5MmRkYWZiMjVjMjkzZGUyNTRlMTZkYTU2M2NoN2EzMQVpZCMyNmQ3OWE4Yjk0OWQyZDI5MDU1N2E2ODk3Mj
    

    NOTE: Full access token is required to be able to create new buckets, which require creation of new Onedata spaces in Onezone.

OneS3 - configuration and deployment

  1. List buckets (spaces) using awscli:

    aws --profile ones3-demo --endpoint-url http://127.0.0.1:9093 s3 ls
    
  2. List existing bucket contents:

    aws --profile ones3-demo --endpoint-url http://127.0.0.1:9093 s3 ls --recursive s3://alpha-11p/$USER_DIR
    

OneS3 - bucket creation and data access

  1. Create a new bucket using awscli, which will create and automatically support a new space:

    aws --profile ones3-demo --endpoint-url http://127.0.0.1:9093 s3 mb s3://$USER_DIR-s3
    aws --profile ones3-demo --endpoint-url http://127.0.0.1:9093 s3 ls
    
  2. Upload example files

    dd if=/dev/urandom of=/tmp/test100MB.dat bs=1MB count=100
    aws --profile ones3-demo --endpoint-url http://127.0.0.1:9093 s3 cp /tmp/test100MB.dat s3://$USER_DIR-s3/test100MB.dat
    

    Check the file in GUI and it's metadata.

OneS3 - read-only data access

  1. Create a read-only Oneclient access token in GUI and add it to ~/.aws/credentials:

    [ones3-demo-ro]
    aws_access_key_id = <READONLY_ACCESS_TOKEN>
    aws_secret_access_key = k3ZDJkN2NoN2EzMSM5MDQ5YmQ5MmRkYWZiMjVjMjkzZGUyNTRlMTZkYTU2M2NoN2EzMQVpZCMyNmQ3OWE4Yjk0OWQyZDI5MDU1N2E2ODk3Mj
    
  2. Perform some readonly operations on a bucket:

    aws --profile ones3-demo-ro --endpoint-url http://127.0.0.1:9093 s3 ls s3://$USER_DIR-s3
    aws --profile ones3-demo-ro --endpoint-url http://127.0.0.1:9093 s3 cp s3://$USER_DIR-s3/test100MB.dat /tmp/test100MB-2.dat
    
  3. Try to perform some write operations on a bucket:

    aws --profile ones3-demo-ro --endpoint-url http://127.0.0.1:9093 s3 rm s3://$USER_DIR-s3/test100MB.dat
    echo TEST > /tmp/test.txt
    aws --profile ones3-demo-ro --endpoint-url http://127.0.0.1:9093 s3 cp /tmp/test.txt s3://$USER_DIR-s3/test.txt
    

OnedataFS

OnedataFS practical exercises will include the following topics:

  • Creating a OnedataFS instance connected to Oneprovider from a Python console
  • Basic data access using PyFilesystem API
  • Advanced operations specific to Onedata

OnedataFS - basic data operations

  1. First, start oneclient Docker image, where all necessary dependencies are set up:

    docker run --privileged --entrypoint /bin/bash -it onedata/oneclient:21.02.3-develop
    
  2. Now, open Python 3 console and setup some variables:

    python3
    
    >>> onedata_provider_host = '<ONEPROVIDER_HOST>'
    >>> onedata_access_token = '<ACCESS_TOKEN>'
    >>> username = '<USERNAME>'
    

OnedataFS - basic data operations

  1. Now create an instance of OnedataFS filesystem class:

    >>> from fs.onedatafs import OnedataFS
    >>> odfs = OnedataFS(onedata_provider_host, onedata_access_token, force_proxy_io=True)
    
  2. Get list of spaces:

    >>> odfs.listdir('')
    
  3. Open a specific space directory:

    >>> space = odfs.opendir(f'/alpha-11p/{username}')
    >>> space.tree()
    >>> space.listdir('FirstDataset')
    

OnedataFS - basic data operations

  1. Create a new file:

    >>> space.writetext('tmp/onedatafs.txt', 'TEST')
    

    Check in the GUI that the file exists.

  2. Get file contents:

    >>> space.readtext('tmp/onedatafs.txt')
    

OnedataFS - advanced operations

  1. Show replication map of data blocks for a specific file:

    >>> space.location('FirstDataset/Photos/arthouse.jpg')
    

    The result contains a map where key's represent Oneprovider ID's and values are lists of block ranges replicated on that provider.

  2. Access extended attributes from Python:

    >>> space.listxattr('FirstDataset/Photos/arthouse_thumbnail.gif')
    >>> space.getxattr('FirstDataset/Photos/arthouse_thumbnail.gif', 'license').decode('utf-8')
    
  3. Finally close the OnedataFS instance:

    >>> odfs.close()
    

Next chapter:

REST API — practice