Data Management Platform for
Distributed AI Pipelines
and Hybrid Clouds
Built for the complexity of modern research
Onedata addresses the real challenges of managing petabyte-scale datasets across heterogeneous compute and storage environments.
Unified Virtual Filesystem
Mount any storage backend, e.g., S3, Lustre, GPFS, Ceph, NFS, POSIX, into a single global namespace. No data migration needed.
High-Performance Data Access
Optimized for AI training workloads. Stream petabytes across WAN with minimal latency using our intelligent caching layer.
Hybrid Cloud Support
Seamlessly span on-premises HPC clusters, AWS, GCP, Azure, and institutional storage in one unified environment.
Transfers & Replication Policies
Control replica placement through rule-based or manual transfers. Automatically cache popular files based on local demand.
S3 & POSIX Flexible Access
Access the data via S3 API or mount it as a POSIX filesystem. Ideal for cloud-native pipelines and HPC workloads alike.
Fine-Grained Access Control
SSO integrated multi-level access control from single file to whole data space. Built for multi-organization research collaborations.
From setup to pipeline in minutes
Onedata is designed for rapid deployment. Connect your existing infrastructure without disrupting ongoing workflows.
Deploy effortlessly
Quickly launch your environment in a containerized manner using Helm charts, Docker Compose, or convenient wizards for bare VMs. No complex infrastructure needed.
Connect your storage
Register any storage backend — cloud buckets, HPC storage, or institutional repositories — via simple configuration. Import existing data without copying.
Define your data spaces
Create virtual spaces that aggregate multiple storage resources. Set replication, caching, and access policies per dataset.
Access from anywhere
Use POSIX, S3, REST API, CDMI, or our Python-native libs. Your AI pipeline sees a unified filesystem regardless of where data lives.
Collaborate & govern
Enable collaborative data sharing between users of different institutions. Integrate with SSO and apply fine-grained permissions across the federation.
Designed for AI and scientific pipelines
From genomics to particle physics, Onedata is useful in many scientific disciplines.

Train models on federated datasets without moving data
AI teams at universities and labs use Onedata to aggregate training data from multiple institutional repositories. Run parallel training jobs on geographically distributed resources.

Manage petabytes of satellite and sensor data
Geoscience teams handle continuous data streams from global sensor networks and satellite imagery. Onedata provides a unified access layer for real-time and archival datasets.

Share genomics data across institutions
Research consortia use Onedata to comply with data governance policies while enabling cross-institutional data sharing for large-scale biological studies.

Accelerate HPC workflows with intelligent caching
HPC centers use Onedata to pre-stage simulation input data and checkpoint outputs automatically. Reduce I/O bottlenecks in large-scale simulations, e.g., in physics or chemistry.
Get a demo tailored to your needs
Onedata is a complex platform. That's why every demo is scoped to your specific infrastructure, research domain, and data challenges.
Tailored to your use case
Tell us about your research domain — genomics, climate, HPC, AI — and we'll focus the demo on workflows that matter to you.
Live infrastructure walkthrough
See a real Onedata deployment: federated storage, data spaces, provider setup, and access control in action.
Integration assessment
We'll evaluate how Onedata fits your existing storage (S3, POSIX, Ceph, NFS) and compute environments.
What to expect
- 60-minute focused session
- Zoom / Meet video call
- Weekdays · CET timezone