# OnedataFS (Python)
OnedataFS is a Python library for accessing the Onedata virtual file system, an alternative to Oneclient that offers a POSIX interface. As a PyFilesystem2 (opens new window) plugin, OnedataFS (opens new window) allows you to work with Onedata in the same way as any other supported filesystem.
# Installation
OnedataFS Python library is a wrapper for a low-level C++ OnedataFS library, which allows to access data managed by Onedata through high performance protocol, and to directly access storage resources if possible (see Oneclient direct I/O). The downside is that in addition to the Python library, it is necessary to install the C++ OnedataFS library using platform specific packages.
NOTE
If you need a pure Python library, and performance is not critical, check out onedatarestfs.
# Ubuntu
NOTE
Currently, this package is only provided for Ubuntu Focal.
$ curl -sSO https://get.onedata.org/oneclient.sh
$ pip3 install fs
$ sh oneclient.sh python3-fs-plugin-onedatafs
# Anaconda
OnedataFS can be installed using Anaconda (opens new window), from the official Onedata conda repository (opens new window):
NOTE
Currently for release 21.02.*, only Python 3 version 3.9 is supported.
$ conda install -c onedata -c conda-forge python=3.9 fs.onedatafs
or to install a specific version of fs.onedatafs
$ conda install -c onedata -c conda-forge python=3.9 fs.onedatafs=21.02.8
# Docker
In addition to installing the OnedataFS packages, it is possible to use our oneclient
Docker image, which provides OnedataFS Python packages and all necessary dependencies:
❯ docker run --rm --entrypoint /usr/bin/python3 -it onedata/oneclient:21.02.8
Python 3.8.10 (default, Nov 7 2024, 13:10:47)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Follow the further instructions in the usage section.
# Usage
To create an instance of OnedataFS connected to a specific Oneprovider, use the following code:
from fs.onedatafs import OnedataFS
onedata_provider_host = "..."
onedata_access_token = "..."
# Create connection to the provider
odfs = OnedataFS(onedata_provider_host, onedata_access_token)
# Open selected space directory
space = odfs.opendir('/SpaceA')
# List SpaceA contents
space.listdir('/')
From then on, space can be used as any PyFilesystem2 instance. Refer
to the PyFilesystem2 API documentation (opens new window)
for all operations available on a filesystem object.
The complete list of options that can be provided to the OnedataFS constructor
can be found below (only host and token are required).
host— provider hostname — follow the same guidelines as for Oneclienttoken— Onedata user access token — follow the same guidelines as for Oneclientport— provider port (defaults to 443)space— the list of space names that should be listed (defaults to all user spaces)space_id— the list of space IDs that should be listed (defaults to all user spaces)insecure— whenTrue, allows connecting to providers without a valid SSL certificateforce_proxy_io— whenTrue, forces all data transfers to go via providersforce_direct_io— whenTrue, forces all data transfers to go directly via the target storage API. If storage is not available, for instance, due to network firewalls, an error will be returned for allreadandwriteoperationsno_buffer— whenTrue, disables all internal buffering in the OnedataFSio_trace_log— whenTrue, the OnedataFS will log all requests in a CSV file in the directory specified bylog_dirprovider_timeout— specifies the timeout for waiting for provider responses, in secondsmetadata_cache_size— the size of the cache for file and directory metadatadrop_dir_cache_after— time in seconds after unused metadata entries are purged from the cachelog_dir— path in the filesystem, where internal OnedataFS logs should be stored. WhenNone, no logging will be generatedcli_args— any other Oneclient command line arguments can be passed as a value of this argument as a single string, e.g.'--storage-timeout=120 --storage-helper-thread-count=20
Refer to the Oneclient options documentation for more details.
# Advanced usage
In addition to PyFilesystem2 interface, OnedataFS provides some specific methods
for using its advanced features.
# File location information
In Onedata, each file can be distributed among different providers,
in blocks of various sizes. In order to get information about the distribution
of these blocks, use the location method:
space.location("file.txt")
This will give the following output:
{'e0e49ac3d9b058c4839f8fb7ccc02d72': [[0, 1615273]]}
where the dictionary provides information on which Oneprovider, represented here
by its ID, holds which blocks (specified using byte ranges). In the above example,
provider e0e49ac3d9b058c4839f8fb7ccc02d72 holds the entire file (1.6 MB).
# Extended attributes and metadata
Onedata supports metadata for each file or directory, which is accessible via
the virtual filesystem through the extended attribute mechanism. Since no such
API is provided by PyFilesystem2, OnedataFS provides additional methods that
allow interacting with the metadata directly.
For example to list extended attributes defined for file.txt:
# List extended attributes names for `file.txt`
space.listxattr("file.txt")
['org.onedata.guid',
'org.onedata.file_id',
'org.onedata.space_id',
'org.onedata.storage_id',
'org.onedata.storage_file_id',
'org.onedata.access_type',
'org.onedata.file_blocks_count',
'org.onedata.file_blocks',
'org.onedata.replication_progress']
To get the specific attribute value:
space.getxattr("file.txt", "org.onedata.space_id")
b'"8a803754b41d9d744c8f03170193c0ab"'
To create a new extended attribute:
space.setxattr("file.txt", "license", '"CC-0"')
Please note that the extended attribute values are by default parsed as JSON values, thus to insert a string, it has to have additional " quotes.
This allows to add numeric constants as values like this:
space.setxattr("file.txt", "priority", "2500")
which then will be indexed by Onedata as numbers are not strings, and thus enable more efficient data discovery in some cases.
It is also possible to attach an entire JSON document with a key, by simply encoding JSON as a string for the value parameter:
space.setxattr("file.txt", "origin", '{"continent": "Europe"}')
Finally, to remove an extended attribute:
space.removexattr("file.txt", "license")