Other data interfaces

In addition to Web GUI and REST data access, Onedata provides several other interfaces for accessing and uploading data:

  • Oneclient - FUSE based client providing Posix like access to users data spaces
  • OneS3 - AWS S3-compatible server exposing user data spaces over S3 protocol
  • OnedataFS - Python library, which implements PyFilesystem interface for convenient integration in Python applications

Oneclient - overview

  • Oneclient is a command-line client based on FUSE, that can mount your spaces in local filesystem tree

  • Oneclient does not keep any data on local disk, all data is read and written from the storage backend on the fly

    • Allows processing of data sets and files much larger than local disk space
    • Small amount of data is buffered in-memory to improve read and write operations with small block-size
  • Supports POSIX extended attributes, which are mapped to Onedata basic metadata

  • Packages are provided for several major Linux platforms including Ubuntu (Xenial, Bionic, Focal, Jammy) and CentOS 7

  • It can be also installed using Anaconda package manager

 

See the documentation here.

Oneclient - DirectIO vs ProxyIO

  • For a specific space, oneclient can work in 2 modes:

    • DirectIO - allows oneclient to read and write data directly to the physical storage backend, assuming that the oneclient process has direct network access to the storage backend (e.g. S3 bucket or Ceph pool)
    • ProxyIO - does not require physical access to the storage backend by Oneclient, as all read and write operations are forwarded through Oneprovider
  • DirectIO mode should always be preferred if possible due to much better performance and scalability

  • By default, oneclient will automatically try to detect if it can access storage backend using DirectIO, if not, it will fall back to ProxyIO (but this can take some time)

    • This behaviour can be controlled using command-line flags --force-direct-io and --force-proxy-io

 

See the documentation here.

Oneclient - DirectIO vs ProxyIO

centered screenshot

OneS3 - overview

  • OneS3 is an S3-compatible server, which exposes Onedata spaces over S3 interface
  • Based on Oneclient, it provides high-performance scalable DirectIO access to data while metadata operations are performed on a specific Oneprovider
  • Each OneS3 instance can only be connected to a single Oneprovider, and thus provide through DirectIO only spaces supported by this Oneprovider
    • For scalability, several OneS3 instances can be connected to a single Oneprovider
    • OneS3 is stateless, allowing for seamless scaling up or down
  • Same rules for DirectIO and ProxyIO apply as in case of Oneclient
  • OneS3 is currently in beta.

OnedataFS - overview

  • OnedataFS is a Python library for accessing the Onedata virtual file system, an alternative to Oneclient that offers a POSIX interface
  • It is a PyFilesystem plugin, allowing to access Onedata in the same way as any other supported filesystem
  • OnedataFS is based on Oneclient, it supports both ProxyIO and DirectIO modes
    • Due to this fact it cannot be installed simply using pip, as it requires the same dependencies as Oneclient and must be installed from native packages
    • A simpler implementation based on Oneprovider REST interface is also available, which can be installed directly using pip - fs.onedatarestfs

 

See the documentation here.

Next chapter:

Other data management interfaces (Oneclient, OneS3, OnedataFS) — practice