# OnedataFS (Python)

OnedataFS is a Python library for accessing the Onedata virtual file system, an alternative to Oneclient that offers a POSIX interface. As a PyFilesystem2 (opens new window) plugin, OnedataFS (opens new window) allows you to work with Onedata in the same way as any other supported filesystem.

# Installation

OnedataFS Python library is a wrapper for a low-level C++ OnedataFS library, which allows to access data managed by Onedata through high performance protocol, and to directly access storage resources if possible (see Oneclient direct I/O). The downside is that in addition to the Python library, it is necessary to install the C++ OnedataFS library using platform specific packages.

NOTE

If you need a pure Python library, and performance is not critical, check out onedatarestfs.

# Ubuntu

NOTE

Currently, this package is only provided for Ubuntu Focal.

$ curl -sSO https://get.onedata.org/oneclient.sh

$ pip3 install fs
$ sh oneclient.sh python3-fs-plugin-onedatafs

# Anaconda

OnedataFS can be installed using Anaconda (opens new window), from the official Onedata conda repository (opens new window):

NOTE

Currently for release 21.02.*, only Python 3 version 3.9 is supported.

$ conda install -c onedata -c conda-forge python=3.9 fs.onedatafs

or to install a specific version of fs.onedatafs

$ conda install -c onedata -c conda-forge python=3.9 fs.onedatafs=21.02.8

# Docker

In addition to installing the OnedataFS packages, it is possible to use our oneclient Docker image, which provides OnedataFS Python packages and all necessary dependencies:

docker run --rm --entrypoint /usr/bin/python3 -it onedata/oneclient:21.02.8
Python 3.8.10 (default, Nov  7 2024, 13:10:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Follow the further instructions in the usage section.

# Usage

To create an instance of OnedataFS connected to a specific Oneprovider, use the following code:

from fs.onedatafs import OnedataFS
onedata_provider_host = "..."
onedata_access_token = "..."

# Create connection to the provider
odfs = OnedataFS(onedata_provider_host, onedata_access_token)

# Open selected space directory
space = odfs.opendir('/SpaceA')

# List SpaceA contents
space.listdir('/')

From then on, space can be used as any PyFilesystem2 instance. Refer to the PyFilesystem2 API documentation (opens new window) for all operations available on a filesystem object.

The complete list of options that can be provided to the OnedataFS constructor can be found below (only host and token are required).

  • host — provider hostname — follow the same guidelines as for Oneclient
  • token — Onedata user access token — follow the same guidelines as for Oneclient
  • port — provider port (defaults to 443)
  • space — the list of space names that should be listed (defaults to all user spaces)
  • space_id — the list of space IDs that should be listed (defaults to all user spaces)
  • insecure — when True, allows connecting to providers without a valid SSL certificate
  • force_proxy_io — when True, forces all data transfers to go via providers
  • force_direct_io — when True, forces all data transfers to go directly via the target storage API. If storage is not available, for instance, due to network firewalls, an error will be returned for all read and write operations
  • no_buffer — when True, disables all internal buffering in the OnedataFS
  • io_trace_log — when True, the OnedataFS will log all requests in a CSV file in the directory specified by log_dir
  • provider_timeout — specifies the timeout for waiting for provider responses, in seconds
  • metadata_cache_size — the size of the cache for file and directory metadata
  • drop_dir_cache_after — time in seconds after unused metadata entries are purged from the cache
  • log_dir — path in the filesystem, where internal OnedataFS logs should be stored. When None, no logging will be generated
  • cli_args — any other Oneclient command line arguments can be passed as a value of this argument as a single string, e.g. '--storage-timeout=120 --storage-helper-thread-count=20

Refer to the Oneclient options documentation for more details.

# Advanced usage

In addition to PyFilesystem2 interface, OnedataFS provides some specific methods for using its advanced features.

# File location information

In Onedata, each file can be distributed among different providers, in blocks of various sizes. In order to get information about the distribution of these blocks, use the location method:

space.location("file.txt")

This will give the following output:

{'e0e49ac3d9b058c4839f8fb7ccc02d72': [[0, 1615273]]}

where the dictionary provides information on which Oneprovider, represented here by its ID, holds which blocks (specified using byte ranges). In the above example, provider e0e49ac3d9b058c4839f8fb7ccc02d72 holds the entire file (1.6 MB).

# Extended attributes and metadata

Onedata supports metadata for each file or directory, which is accessible via the virtual filesystem through the extended attribute mechanism. Since no such API is provided by PyFilesystem2, OnedataFS provides additional methods that allow interacting with the metadata directly.

For example to list extended attributes defined for file.txt:

# List extended attributes names for `file.txt`
space.listxattr("file.txt")
['org.onedata.guid',
 'org.onedata.file_id',
 'org.onedata.space_id',
 'org.onedata.storage_id',
 'org.onedata.storage_file_id',
 'org.onedata.access_type',
 'org.onedata.file_blocks_count',
 'org.onedata.file_blocks',
 'org.onedata.replication_progress']

To get the specific attribute value:

space.getxattr("file.txt", "org.onedata.space_id")
b'"8a803754b41d9d744c8f03170193c0ab"'

To create a new extended attribute:

space.setxattr("file.txt", "license", '"CC-0"')

Please note that the extended attribute values are by default parsed as JSON values, thus to insert a string, it has to have additional " quotes.

This allows to add numeric constants as values like this:

space.setxattr("file.txt", "priority", "2500")

which then will be indexed by Onedata as numbers are not strings, and thus enable more efficient data discovery in some cases.

It is also possible to attach an entire JSON document with a key, by simply encoding JSON as a string for the value parameter:

space.setxattr("file.txt", "origin", '{"continent": "Europe"}')

Finally, to remove an extended attribute:

space.removexattr("file.txt", "license")