# OnedataFS (Python)
OnedataFS is a Python library for accessing the Onedata virtual file system, an alternative to Oneclient that offers a POSIX interface. As a PyFilesystem (opens new window) plugin, OnedataFS (opens new window) allows you to work with Onedata in the same way as any other supported filesystem.
# Installation
OnedataFS can be installed from our provided packages for Python 3.
# Ubuntu
$ curl -sSO https://get.onedata.org/oneclient.sh
# For Python3
$ pip3 install fs
$ sh oneclient.sh python3-fs-plugin-onedatafs
# CentOS
Please note that CentOS packages are distributed according to the Software Collections (opens new window) standard.
$ curl -sSO https://get.onedata.org/oneclient.sh
# For Python3
$ pip3 install fs
$ sh oneclient.sh onedata2102-python3-fs-onedatafs
$ scl enable onedata2102 bash
# On CentOS, it is necessary to provide custom PYTHONPATH, either through export:
$ export PYTHONPATH="${ONEDATA_PYTHON3_PATH}"
# or when executing python3 interpreter:
$ PYTHONPATH="${ONEDATA_PYTHON3_PATH}" python3
NOTE:
ONEDATA_PYTHON3_PATH
in the above example is provided automatically through thescl enable
command.
# Anaconda
OnedataFS can be installed using Anaconda (opens new window), from the official Onedata conda repository (opens new window):
NOTE: Currently for release 21.02.*, only Python 3 version 3.9 is supported.
$ conda install -c onedata -c conda-forge python=3.9 fs.onedatafs
or to install a specific version of fs.onedatafs
$ conda install -c onedata -c conda-forge python=3.9 fs.onedatafs=21.02.8
# Usage
To create an instance of OnedataFS connected to a specific Oneprovider, use the following code:
from fs.onedatafs import OnedataFS
onedata_provider_host = "..."
onedata_access_token = "..."
# Create connection to the provider
odfs = OnedataFS(onedata_provider_host, onedata_access_token)
# Open selected space directory
space = odfs.opendir('/SpaceA')
# List SpaceA contents
space.listdir('/')
From then on, space
can be used as any PyFilesystem
instance. Refer
to the PyFilesystem API documentation (opens new window)
for all operations available on a filesystem object.
The complete list of options that can be provided to the OnedataFS constructor
can be found below (only host
and token
are required).
host
— provider hostname — follow the same guidelines as for Oneclienttoken
— Onedata user access token — follow the same guidelines as for Oneclientport
— provider port (defaults to 443)space
— the list of space names that should be listed (defaults to all user spaces)space_id
— the list of space IDs that should be listed (defaults to all user spaces)insecure
— whenTrue
, allows connecting to providers without a valid SSL certificateforce_proxy_io
— whenTrue
, forces all data transfers to go via providersforce_direct_io
— whenTrue
, forces all data transfers to go directly via the target storage API. If storage is not available, for instance, due to network firewalls, an error will be returned for allread
andwrite
operationsno_buffer
— whenTrue
, disables all internal buffering in the OnedataFSio_trace_log
— whenTrue
, the OnedataFS will log all requests in a CSV file in the directory specified bylog_dir
provider_timeout
— specifies the timeout for waiting for provider responses, in secondsmetadata_cache_size
— the size of the cache for file and directory metadatadrop_dir_cache_after
— time in seconds after unused metadata entries are purged from the cachelog_dir
— path in the filesystem, where internal OnedataFS logs should be stored. WhenNone
, no logging will be generatedcli_args
— any other Oneclient command line arguments can be passed as a value of this argument as a single string, e.g.'--storage-timeout=120 --storage-helper-thread-count=20
Refer to the Oneclient options documentation for more details.
# Advanced usage
In addition to PyFilesystem
interface, OnedataFS
provides some specific methods
for using its advanced features.
# File location information
In Onedata, each file can be distributed among different providers,
in blocks of various sizes. In order to get information about the distribution
of these blocks, use the location
method:
space.location("file.txt")
This will give the following output:
{'e0e49ac3d9b058c4839f8fb7ccc02d72': [[0, 1615273]]}
where the dictionary provides information on which Oneprovider, represented here
by its ID, holds which blocks (specified using byte ranges). In the above example,
provider e0e49ac3d9b058c4839f8fb7ccc02d72
holds the entire file (1.6MB).
# Extended attributes and metadata
Onedata supports metadata for each file or directory, which is accessible via
the virtual filesystem through the extended attribute mechanism. Since no such
API is provided by PyFilesystem
, OnedataFS
provides additional methods that
allow interacting with the metadata directly.
For example to list extended attributes defined for file.txt
:
# List extended attributes names for `file.txt`
space.listxattr("file.txt")
['org.onedata.guid',
'org.onedata.file_id',
'org.onedata.space_id',
'org.onedata.storage_id',
'org.onedata.storage_file_id',
'org.onedata.access_type',
'org.onedata.file_blocks_count',
'org.onedata.file_blocks',
'org.onedata.replication_progress']
To get the specific attribute value:
space.getxattr("file.txt", "org.onedata.space_id")
b'"8a803754b41d9d744c8f03170193c0ab"'
To create a new extended attribute:
space.setxattr("file.txt", "license", '"CC-0"')
Please note that the extended attribute values are by default parsed as JSON values, thus to insert a string, it has to have additional "
quotes.
This allows to add numeric constants as values like this:
space.setxattr("file.txt", "priority", "2500")
which then will be indexed by Onedata as numbers are not strings, and thus enable more efficient data discovery in some cases.
It is also possible to attach an entire JSON document with a key, by simply encoding JSON as a string for the value parameter:
space.setxattr("file.txt", "origin", '{"continent": "Europe"}')
Finally to remove an extended attribute:
space.removexattr("file.txt", "license")