Open-source · Trusted by research institutions worldwide

Data Management Platform for Distributed AI Pipelines and Hybrid Clouds

Bridge distributed silos with Onedata. Our data mesh architecture orchestrates high-performance transfers across HPC and multi-cloud environments, delivering a seamless data fabric optimized for massive AI workloads and distributed processing.

View on GitHub

Latest release — 25.0

13+

Years of devoted development

110+

Data providers

2700+

Data spaces

3.5 PB+

Data managed

PLATFORM CAPABILITIES

Built for the complexity of modern research

Onedata addresses the real challenges of managing petabyte-scale datasets across heterogeneous compute and storage environments.

Unified Virtual Filesystem

Mount any storage backend, e.g., S3, Lustre, GPFS, Ceph, NFS, POSIX, into a single global namespace. No data migration needed.

High-Performance Data Access

Optimized for AI training workloads. Stream petabytes across WAN with minimal latency using our intelligent caching layer.

Hybrid Cloud Support

Seamlessly span on-premises HPC clusters, AWS, GCP, Azure, and institutional storage in one unified environment.

Transfers & Replication Policies

Control replica placement through rule-based or manual transfers. Automatically cache popular files based on local demand.

S3 & POSIX Flexible Access

Access the data via S3 API or mount it as a POSIX filesystem. Ideal for cloud-native pipelines and HPC workloads alike.

Fine-Grained Access Control

SSO integrated multi-level access control from single file to whole data space. Built for multi-organization research collaborations.

HOW IT WORKS

From setup to pipeline in minutes

Onedata is designed for rapid deployment. Connect your existing infrastructure without disrupting ongoing workflows.

Deploy effortlessly

Quickly launch your environment in a containerized manner using Helm charts, Docker Compose, or convenient wizards for bare VMs. No complex infrastructure needed.

Connect your storage

Register any storage backend — cloud buckets, HPC storage, or institutional repositories — via simple configuration. Import existing data without copying.

Define your data spaces

Create virtual spaces that aggregate multiple storage resources. Set replication, caching, and access policies per dataset.

Access from anywhere

Use POSIX, S3, REST API, CDMI, or our Python-native libs. Your AI pipeline sees a unified filesystem regardless of where data lives.

Collaborate & govern

Enable collaborative data sharing between users of different institutions. Integrate with SSO and apply fine-grained permissions across the federation.

USE CASES

Designed for AI and scientific pipelines

From genomics to particle physics, Onedata is useful in many scientific disciplines.

AI / ML Research

Train models on federated datasets without moving data

AI teams at universities and labs use Onedata to aggregate training data from multiple institutional repositories. Run parallel training jobs on geographically distributed resources.

Earth & Space Sciences

Manage petabytes of satellite and sensor data

Geoscience teams handle continuous data streams from global sensor networks and satellite imagery. Onedata provides a unified access layer for real-time and archival datasets.

Life Sciences

Share genomics data across institutions

Research consortia use Onedata to comply with data governance policies while enabling cross-institutional data sharing for large-scale biological studies.

Computational Science

Accelerate HPC workflows with intelligent caching

HPC centers use Onedata to pre-stage simulation input data and checkpoint outputs automatically. Reduce I/O bottlenecks in large-scale simulations, e.g., in physics or chemistry.

REQUEST A DEMO

Get a demo tailored to your needs

Onedata is a complex platform. That's why every demo is scoped to your specific infrastructure, research domain, and data challenges.

Tailored to your use case

Tell us about your research domain — genomics, climate, HPC, AI — and we'll focus the demo on workflows that matter to you.

Live infrastructure walkthrough

See a real Onedata deployment: federated storage, data spaces, provider setup, and access control in action.

Integration assessment

We'll evaluate how Onedata fits your existing storage (S3, POSIX, Ceph, NFS) and compute environments.

What to expect

60-minute focused session
Zoom / Meet video call
Weekdays · CET timezone

Book your demo