Velocity • Durability

White Paper

Data Platform V12

Modern Data Storage Infrastructure Software for AI & HPC
Powerful Data Storage for the AI Pipeline

Runtime

5 minute read

Audience

AI & HPC leaders, architects, DevOps

Primary themes

Performance · Economics · Simplicity

Table of Contents

01

Executive Summary

AI factories and Neocloud operators are deploying GPU infrastructure at unprecedented scale, but storage remains the overlooked bottleneck preventing optimal performance. If storage cannot feed GPUs fast enough, training stalls, checkpoints burn compute dollars, and inference latency spikes. Storage is also the single largest blast radius in any GPU cluster: SemiAnalysis estimates that a 5,000-GPU cluster running at just 98% storage availability bleeds roughly 876,000 GPU-hours per year, approximately $2.6 million in idle compute. Meanwhile, flash prices have surged as much as 472% in a single year, exposing organizations locked into all-flash architectures to volatile and unpredictable economics. Storage, once a quiet 10% line item, is now trending toward 20–30% of cluster spend in all-flash deployments; every dollar overspent on flash is a GPU you cannot deploy.

The VDURA® Data Platform V12 is modern data storage infrastructure software purpose-built for these challenges. Built on the HYDRA architecture, VDURA combines the performance of a true parallel file system with the resilience and cost-efficiency of object storage in a unified, software-defined system. By pairing NVMe flash for performance-critical workloads with high-capacity HDD for data retention, VDURA delivers the same mixed-fleet storage model to AI factories, Neoclouds, and enterprises that hyperscalers like Google, Meta, and Microsoft already deploy in production. All of this is backed by at least six nines of availability and up to 12 nines of durability, proven across more than 1,000 production deployments.

The VDURA Data Platform is modern data storage infrastructure software built for performance, durability, and scalability.

V12 also introduces the Elastic Metadata Engine with up to 20× acceleration in metadata operations, native snapshot support for AI pipeline checkpoints, SMR HDD optimization unlocking 25–30% more capacity per rack, RDMA support for GPU-native data paths that bypass the CPU entirely, Context-Aware Tiering, end-to-end encryption, and a native CSI plug-in for Kubernetes. Built on over 25 years of innovation in high-performance computing, VDURA delivers unprecedented parallel throughput, ultra-low latency metadata operations, and superior data protection across the entire AI pipeline.

With linear scalability to thousands of nodes, integrated value-tier storage, and total cost of ownership more than 60% lower than all-flash architectures at comparable performance levels, VDURA eliminates the traditional compromises between performance, durability, and cost. This document outlines the architectural components, performance capabilities, and deployment options of the VDURA Data Platform and V5000™ system, showcasing why leading AI factories, Neocloud operators, and enterprises trust VDURA to power their most demanding workloads.

02

A Legacy of Innovation, Reimagined for the AI Era

AI factories and Neocloud operators are building the infrastructure that will define the next decade of computing. Purpose-built GPU clouds represent an estimated $35 billion market today, projected to reach $236 billion by 2031. Yet the storage layer powering these environments remains the single largest performance bottleneck and the single largest blast radius in any GPU cluster. Storage, once a quiet 10% line item, is trending toward 20–30% or more in all-flash deployments. As NAND flash prices surge to an order of magnitude above HDD costs per TB, every dollar overspent on flash is a GPU you cannot deploy.

The hyperscalers figured this out years ago; Google, Meta, and Microsoft do not run all-flash storage. They deploy mixed-fleet architectures; intelligent tiering with just enough NVMe flash to saturate GPU throughput, then draining data to high-density HDDs. Flash is a performance medium, not a capacity medium. VDURA brings that same model to every AI factory, Neocloud, and enterprise.

The VDURA Data Platform is modern data storage infrastructure software built for performance, durability, and scalability.

VDURA combines true parallel performance, hyperscale durability, and effortless scalability in a single, software-defined platform. It delivers the performance AI training and inference demand on NVMe flash while leveraging high-capacity HDD for cost-efficient data retention, all under one control plane, one data plane, and one global namespace. Total cost of ownership is more than 60% lower than all-flash architectures at comparable performance levels.

We’re not new to this space; we helped define it.

Before the rise of AI, cloud-native workloads, and modern data infrastructure, there was Panasas®, the company that reshaped the high-performance computing landscape with the industry’s first true parallel file system. For more than 20 years, Panasas PanFS® set the bar for scalable performance, mixed workload efficiency, and enterprise-grade reliability in environments where data is everything. Built on that core architecture, VDURA is the modern evolution of a platform trusted by the world’s most data-intensive environments, with over 1,000 production deployments in more than 50 countries, with tens of millions of cumulative runtime hours and exabytes upon exabytes of data managed.

VDURA is where velocity meets durability.

The name says it all: lightning-fast NVMe flash throughput and petabyte-scale HDD capacity meets industry-leading durability in a platform that scales linearly to thousands of nodes. VDURA combines the scalable, linear high performance of a true parallel file system with a cost-efficient, resilient object store, unifying active and bulk storage under one architecture. The VeLO™ metadata engine powers intelligent data flow and fast namespace operations, delivering a software-defined platform built for AI factories, Neoclouds, and HPC environments that is simple to deploy and effortless to scale.

Why the Storage Landscape Evolved for AI

Storage architecture for the modern AI era was not designed in a vacuum; it was forged by hyperscalers, where the first organizations to confront true exabyte-scale workloads, billion-file namespaces, and unforgiving GPU economics had no choice but to invent something new. AI factories, Neoclouds, and enterprises are now arriving at the same crossroad. VDURA is the platform that brings that proven hyperscaler architecture to them.

The Hyperscaler Blueprint

A decade and a half ago, Google, Meta, Microsoft, and Amazon each ran into the same wall: storage architectures built for traditional enterprise applications could not survive search-, cloud-, social-, and AI-scale workloads. Centralized controllers serialized metadata. Static tiering ignored changing access patterns. Flash alone could not deliver capacity economics at hyperscale. RAID-protected arrays could not maintain durability efficiently at exabyte scale. Each hyperscaler responded by rebuilding storage around software, scale-out metadata, automated data placement, and distributed durability.

Google Colossus evolved beyond GFS with a distributed metadata architecture and software-driven placement across SSD and HDD. Meta’s Tectonic applied a similar lesson at social scale, disaggregating metadata and storage layers while using software-defined replication and erasure coding for durability. AWS S3 and Microsoft Azure Storage productized the same core operating principle for cloud-scale storage: use software to manage placement, durability, and economics across tiers instead of forcing every byte onto the most expensive media.

The pattern is clear: hyperscalers do not treat all-flash as the default architecture for massive data growth. They use flash where performance matters most, dense capacity media where economics matter most, and software to decide where data belongs.

That lesson now applies to every organization deploying GPUs at scale. Flash is a performance medium, not a capacity medium. The economically viable architecture for AI is mixed-fleet storage, governed by intelligent software, under a single namespace.

Why AI Breaks Legacy Storage

AI workloads break the assumptions that traditional enterprise storage was built on. Training reads massive datasets in randomized batches across thousands of GPUs simultaneously. Checkpointing demands burst-write throughput at the rate of an entire model state every few minutes. Inference produces high-concurrency random reads against weights, embeddings, and growing RAG corpora, with first-token latency budgets measured in single-digit milliseconds. Metadata operations explode into the billions per second. Tier policies change by the hour, not the quarter.

Storage that was perfectly adequate for relational databases, virtual machines, or shared user drives collapses under these conditions. Centralized controllers become bottlenecks. NFS gateways serialize what must be parallel. RAID groups cannot rebuild fast enough to hold their durability promise at petabyte scale. All-flash arrays drain capital budgets and consume rack power that should be feeding GPUs. The architecture that worked yesterday is the architecture that strands GPUs today.

The Hyperscaler Playbook, Brought to the Enterprise

VDURA is the realization of the hyperscaler playbook for organizations that do not have large teams of distributed-systems engineers. VDURA’s HYDRA architecture separates control plane from data plane the way Colossus and Tectonic do. The VeLO metadata engine distributes namespace operations across stateless Director Nodes, eliminating the single-controller bottleneck. Storage Nodes pair NVMe flash for hot data with high-capacity HDDs for retention, governed by Dynamic Data Acceleration™ and, in V12, Context-Aware Tiering. Multi-Level Erasure Coding™ delivers software-defined durability up to 12 nines, well past what hardware RAID can guarantee. Self-healing, automated rebalancing, and continuous data scrubbing match what hyperscale operators built internally over a decade of trial and error.

What the hyperscalers proved, VDURA productizes for everyone else. AI factories, Neoclouds and enterprises get the same hyperscale-grade storage tech the four hyperscalers invented for themselves, packaged as a software-defined platform that runs on commodity Dell, Supermicro, AIC, or any roadmap-certified server, with one control plane, one data plane, and one global namespace. The result is the storage substrate the AI era requires, finally, available to every organization that needs it, not just to the four companies that invented it.

What This Means for AI Factories and Neoclouds

  • More GPUs per dollar. Flash is a performance medium, not a capacity medium. Size flash for the bandwidth your GPUs need, then let HDD expansion handle the petabytes behind it. The flash footprint can drop below 20% of total capacity while exceeding per-GPU reference rates for NVIDIA and AMD (0.5 GB/s read, 0.25 GB/s write; up to 4 GB/s read, 2 GB/s write for vision). Storage lands closer to 10% of cluster spend instead of the 20–30% that all-flash forces. Every dollar saved funds another GPU on the floor, increasing AI factory yield.
  • Procurement velocity. Software-defined deployment on commodity hardware lets capacity land in any colocation in days rather than quarters.
  • Multi-tenant isolation by design. Per-tenant isolation, namespaces, end-to-end encryption, and VLAN isolation are first-class, exposed through native REST APIs and a Kubernetes CSI plug-in, so providers ship tenants on day one without bolt on tooling.
  • Predictable economics under volatile flash markets. Mixed-fleet architecture absorbs flash price swings the way a hedge absorbs market volatility. Because flash is sized for performance rather than capacity, the exposure to NAND pricing is structurally smaller.
  • A blast-radius story finance and SRE leaders can defend. Virtualized Protected Object Device™ (VPOD)-isolated failure domains, 12-nines durability, six-nines availability, and continuous scrubbing make the SLA a number, not a hope.
Flash capacity cost staircase and cluster budget allocation
Figure 1. Flash capacity cost staircase and cluster budget allocation, sizing flash for GPU throughput.
Navigating Flash Volatility: The VDURA Advantage

The storage industry is experiencing unprecedented flash price volatility. SSD prices climbed nearly 24% in just three weeks in early 2026, and enterprise-grade 30 TB TLC SSDs surged 472% between Q2 2025 and Q1 2026. The QLC-to-HDD price multiple expanded from 4.9× to 22.6× over the same period, making all-flash architectures increasingly expensive and unpredictable for large-scale AI deployments.

This volatility creates real financial exposure for organizations locked into all-flash storage strategies. A 25 PB all-flash deployment delivering 1,000 GB/s sustained performance saw its three-year total cost increase approximately 397% over one year. By contrast, a mixed-fleet architecture delivering identical performance and capacity costs significantly less, with substantially lower exposure to flash price swings.

VDURA’s architecture is uniquely positioned to address this challenge. By combining NVMe flash for performance-critical workloads with high-capacity HDD for data retention, VDURA delivers the same throughput and IOPS without requiring organizations to absorb the full impact of flash market volatility. This is the same model the hyperscalers deploy in production; VDURA makes it available to the enterprise.

472%
TLC SSD price surge Q2 2025 → Q1 2026
22.6×
QLC SSD vs HDD cost multiple
24%
SSD price jump in 3 weeks (Mar 2026)
397%
3-year all-flash cost increase (25 PB)

The VDURA Flash Volatility Index and Storage Economics Optimizer Tool

VDURA has developed the only tool of its kind on the market. The interactive index tracks how flash media volatility translates into real-world cost exposure and compares it to the current HDD market, allowing organizations to model total system costs across different architectures and media configurations. Infrastructure teams use it to make data-driven decisions about storage architecture and to quantify the financial implications of choosing between all-flash and mixed-fleet approaches as pricing conditions continue to evolve. This essential tool can be found at https://www.vdura.com/flash-volatility-index-and-storage-economics-optimizer-tool/.

“As pricing conditions continue to evolve, infrastructure teams must plan for greater variability in cost dynamics. The architectures that succeed will be the ones that can adapt without compromising performance.”
—Erik Salo, SVP, VDURA
03

The VDURA Advantage: Core Capabilities

Through its sophisticated software, the VDURA Data Platform can transfer terabytes of data per second to and from your compute cluster. VDURA manages this orchestration without manual intervention, continuously balancing the load across those systems, automatically ensuring resilience, scrubbing the stored data for the highest levels of data protection, and encrypting the stored data to safeguard it from unwanted exposure.

True Parallel Performance

VDURA bypasses bottlenecks with direct, parallel data transfers from NVMe flash Storage Nodes to the client. Unlike NFS or "sort-of-parallel" systems, VDURA’s shared-nothing architecture and separate metadata plane eliminate contention, delivering maximum throughput, lowest latency, and the consistent performance AI workloads demand at scale.

Blazing Metadata Performance

VDURA’s VeLO metadata engine delivers ultra-low latency for billions of file operations. V12’s Elastic Metadata Engine dynamically scales across nodes, delivering up to 20× improvement in metadata operations. Built for AI, it accelerates metadata-heavy tasks like model staging, small-file access, and checkpointing.

Integrated Value Tier

VDURA natively integrates high-capacity extensions as a value tier, combining NVMe flash and HDD storage within a single platform. This eliminates siloed object stores and delivers cost-efficient, long-term storage under the same namespace, making VDURA ideal for AI data lakes, model checkpoints, and archival workflows.

Hardware Freedom

VDURA runs on commodity-agnostic storage with AI-grade speed and cost efficiency. The shared-nothing architecture eliminates the need for specialized HA-pair servers, firmware-based RAID controllers, or dual-ported drives, enabling the use of standard, off-the-shelf commodity servers and storage devices.

End-to-End Encryption

VDURA provides industry-leading end-to-end encryption. AES-256 protects data from the moment it leaves the client through the transit and at rest, with transparent, tenant-per-volume encryption and KMIP-based key management.

RDMA-Enabled GPU-Native I/O

New in V12, RDMA support enables GPU-to-storage data transfers that bypass the CPU entirely. Direct memory access between GPU server nodes and the VDURA Data Platform eliminates CPU bottlenecks for low-latency, high-throughput data paths critical to AI training and inference workloads.

Advanced Data Protection

Reliability improves with scale through client-side, file-level erasure coding that protects each file individually, eliminating the need for legacy RAID or costly HA hardware. VDURA’s patented Multi-Level Erasure Coding (MLEC) provides superior data protection, delivering up to 12 nines of durability in all-flash configurations.

Simplicity, Single-Vendor Operations

One vendor, one stack, one upgrade path. The HYDRA architecture seamlessly expands NVMe flash and HDD capacity, automatically balances workloads, and self-heals from failures, with non-disruptive upgrades and zero day-two complexity.

Native CSI Plug-in

VDURA’s native Container Storage Interface (CSI) plug-in simplifies multi-tenant, Kubernetes-based deployments with zero-script persistent-volume provisioning and management. Cloud-native simplicity meets enterprise-grade storage.

VDURACare Premier

One simple contract covering hardware, software, and support. VDURACare Premier™ includes 10-year, no-cost replacement of drives and 24×7 expert response, delivering comprehensive, risk-free coverage that keeps your AI factory running.

04

The AI Workload Storage Problem

AI workloads have redefined what modern data infrastructure must deliver: speed, scale, and precision under constant I/O pressure. Most storage architectures are not built for this.

Each area of the AI pipeline is unique with different storage requirements to keep the factory running efficiently and better than the competition. The current approach has been to pick different systems for each area or to default to data infrastructure that may meet some requirements but not be ideal for all.

The AI Data Pipeline: What Happens at Each Stage
  • Data Ingest: Raw data (images, videos, text, or sensor streams) flows in, ready to be converted, stored, and staged for AI processing.
  • Model Load: Massive pretrained model files (often hundreds of GBs to multiple TBs) are loaded into GPU memory before training or inference can begin.
  • Training: Large datasets are read in randomized batches while updated model states, gradients, logs, and metrics are continuously written out.
  • Checkpointing: At regular intervals, the entire model state is saved. This ensures progress is not lost and allows recovery or restarts without starting over.
  • Fine-tuning: A previously trained model is refined using a smaller, domain-specific dataset, adapting it for new tasks or environments.
  • Inference: Models serve predictions at scale with high-concurrency random reads across weights, embeddings, and RAG corpora. Hot KV-cache, multi-tenant isolation, first-token latency SLAs, and model versioning for blue-green rollbacks make inference a mixed-workload problem, not a simple read problem.
Common Challenges in the AI Pipeline
  • GPU idle time: GPU stalls caused by poor I/O throughput and lengthy checkpointing waste compute and slow AI innovation.
  • Metadata overload: Traditional systems collapse under billions of small file operations, especially during inference and model versioning.
  • Flash waste: Overprovisioning high-cost NVMe for rarely accessed data drains budgets and inflates TCO.
  • Manual data tuning: Static policies and manual tiering cannot keep up with constantly shifting data access patterns across stages.
  • Inference complexity at scale: Inference is often treated as a pure read workload, but production inference platforms must handle KV-cache persistence, growing RAG corpora, multi-tenant namespace isolation, and model versioning across hundreds of concurrent sessions, all with single-digit millisecond latency requirements.
AI data pipeline and storage requirements
Figure 2. AI data pipeline and storage requirements.
AI Pipeline Storage Requirements

The following table displays the complexities of AI data infrastructure. Each stage in the AI pipeline has different read, write, throughput, capacity, and IOPS requirements that must be optimized.

StageReadWriteData SizeAI Workload Insights
Data IngestLowHighTBs to PBsBulk writes require fast speeds. Data retention requires high capacity.
Model LoadHighGBs to TBsHigh throughput required. Any delay holds back the entire pipeline.
TrainingLowLowTBs to PBsFast I/O crucial to saturating GPUs.
Checkpoint (Train)Very HighGBs to TBsGPUs are idle during checkpointing. Must be fast to prevent burning GPU dollars.
Fine-TuneLowLowGBsSmaller datasets than training. Typically, lighter on reads/writes.
Checkpoint (Fine-Tune)Very HighGBsHigh-speed write requirements similar to training checkpoints.
InferenceHighLowGBsHigh-concurrency random reads across model weights, embeddings, and RAG corpora. KV-cache persistence and multi-tenant isolation add write and metadata complexity beyond simple reads.
AI Archive / Data RetentionPBsLong-term, cost-efficient storage for raw or processed datasets.
Designing an AI infrastructure requires more than just performance at a single stage. It demands modern data storage infrastructure software that can handle the full pipeline.
05

Modern Storage for the AI Era

AI and HPC pipelines demand precision for fast writes during ingest, training, and fine-tuning checkpoints; high-throughput reads during model loading and inference; and scalable, cost-effective storage for AI data retention and reuse. Most vendors force tradeoffs. Shared-everything architectures rely on centralized head nodes to handle all I/O, introducing performance chokepoints. Writes slow dramatically during cache flushes due to compression and deduplication. Bolt-on, third-party object stores for data lake functionality add latency, break the namespace, and shift complexity to the user.

These disjointed approaches cannot keep pace with modern AI workloads.

VDURA eliminates these limitations with a true parallel file system and software-defined architecture that separates the control plane from the data plane. This shared-nothing design enables scalable, high-performance throughput with no single-node bottlenecks. AI training data flows directly from NVMe flash to clients. AI archive and retention data lives cost-efficiently in high-capacity mixed-fleet nodes, all under a single global namespace.

Every stage of the AI pipeline is covered:

  • Data ingest, fine-tuning, inference, and AI data retention leverage high-performance NVMe and cost-effective, high-capacity HDD storage.
  • Model load, training, and checkpointing run at full speed on all-NVMe flash, delivering up to 2.7 TB/s throughput and 45M IOPS per rack.
  • Metadata-heavy tasks like small-file access and checkpoint orchestration are accelerated by VeLO, now with the V12 Elastic Metadata Engine delivering up to 20× improvement in metadata operations.

Intelligent orchestration automates tiering, eliminating the need for manual tuning, extra software layers, or external storage systems. VDURA is the software-defined modern data storage infrastructure platform that is purpose-built to power every stage of the AI pipeline. We combine the scalable, linear performance of a true parallel file system with the resilience and cost efficiency of object storage.

One data plane, one control plane, one namespace. Simple to deploy, operate, and grow.
06

The VDURA Data Platform

Software-Defined Architecture Engineered for Performance and Scalability

The VDURA Data Platform V12 is built on a fully software-defined, microservices architecture that combines the speed and efficiency of a true parallel file system with the durability and cost-effectiveness of resilient object storage. This is HYDRA: High-Performance, Yield-Optimized, Distributed, Resilient Architecture.

VDURA combines the best features of parallel file systems and object storage to power AI under one global namespace.
Figure 3. VDURA combines the best features of parallel file systems and object storage to power AI under one global namespace.

This unified design ensures high performance and simplicity for active and bulk data storage and is designed specifically to address the complexities and requirements of the AI pipeline. The VDURA Data Platform explicitly separates the control plane handling metadata operations from the data plane, which is dedicated exclusively to user data storage.

Three key components work together to power the VDURA Data Platform:

  • Director Nodes are the core of the control plane.
  • Storage Nodes are the foundation of the data plane.
  • The DirectFlow™ Client is our high-performance parallel file system driver.

Director Nodes are the core of the control plane. They orchestrate and manage all metadata operations, coordinate the actions of Storage Nodes and DirectFlow Client drivers for file access, maintain the health and membership status within the storage cluster, and oversee all recovery and reliability functions. These nodes are simple, powerful compute servers featuring high-speed networking, substantial DRAM, and NVMe flash optimized for metadata transaction logs.

The VDURA VeLO metadata engine runs on each Director Node. VeLO is distributed and flash-optimized, designed specifically for high-speed parallel metadata operations. V12’s Elastic Metadata Engine dynamically scales across nodes, delivering up to 20× improvement in metadata operations and supporting billions of files and objects under active use. This integration ensures ultra-low latency, efficient handling of billions of file operations, and consistent metadata performance at scale.

Storage Nodes form the foundation of the data plane, dedicated exclusively to storing and managing user data. Available in configurations of either all-NVMe flash for peak performance or NVMe flash with HDD capacity expansion for high-performance and economical bulk storage, Storage Nodes deliver versatile and optimized infrastructure. Each node hosts multiple Virtualized Protected Object Device (VPOD) instances, enabling granular, scalable data management and enhanced reliability through Multi-Level Erasure Coding. VPOD architecture ensures linear scalability and consistent parallel performance, accommodating thousands of nodes seamlessly within a single cluster.

The VDURA DirectFlow Client is a high-performance parallel file system driver specifically engineered for Linux-based compute environments. Deployed directly on compute servers, DirectFlow seamlessly integrates with existing Linux applications, presenting itself like any conventional file system. It provides fully POSIX-compliant, cache-coherent file operations across a unified global namespace, tightly collaborating with Director and Storage Nodes. By enabling direct, parallel I/O paths from compute servers to Storage Nodes, DirectFlow eliminates traditional bottlenecks and intermediary processing overhead found in NFS or legacy storage solutions. V12 adds RDMA support for GPU-native data paths that bypass the CPU entirely.

Figure 4. Control plane (Director Nodes/VeLO) and data plane (Storage Nodes/VPODs) operate independently in VDURA HYDRA.
Figure 4. Control plane (Director Nodes/VeLO) and data plane (Storage Nodes/VPODs) operate independently in VDURA HYDRA.
Direct, Parallel Data Access for the AI Pipeline

The VDURA Data Platform is built as a true parallel file system, engineered to handle the intense I/O demands of modern AI and HPC workloads. Each file stored by the VDURA Data Platform is individually striped across many Storage Nodes, allowing each component piece of a file to be read and written in parallel, increasing the performance of accessing every file.

VDURA’s parallel architecture dramatically accelerates data access, significantly boosting performance and throughput.

Unlike other enterprise systems which route data through limited head nodes, causing potential bottlenecks and requiring additional backend network infrastructure, VDURA’s DirectFlow Client communicates directly with all relevant Storage Nodes. Each compute server directly accesses the nodes holding the data, bypassing intermediary bottlenecks. Director Nodes manage metadata and coordinate system activity out-of-band, ensuring efficient data flow without interference or congestion.

The DirectFlow Client is lightweight, consuming approximately 191 MB of DRAM per compute node, requires zero dedicated CPU cores, and uses the standard Linux page cache rather than the pinned HugePages required by kernel-bypass data paths. CPU cycles are borrowed opportunistically during active I/O and returned immediately to the application. This efficiency matters at fleet scale: a 500-node GPU cluster running VDURA commits roughly 93 GB of DRAM to storage clients, while architectures that require kernel-bypass modes for peak performance reserve 2.5 TB of DRAM and permanently lock 500 to 2,000 CPU cores across the same fleet. Every core and every gigabyte VDURA does not claim stays available to the applications. V12 takes this further with RDMA and NVIDIA® GPUDirect Storage (GDS) support, enabling direct DMA transfers between storage and GPU HBM that bypass host DRAM and the CPU entirely.

This direct and parallel design eliminates traditional NAS hotspots, ensures predictable and scalable performance, and simplifies infrastructure by removing the need for a separate, costly backend network. VDURA architecture delivers seamless scalability, consistently high performance, and exceptional efficiency across every stage of the AI pipeline, from ingest and training to inference and long-term data retention.

Linear Scalability, Seamless Expansion

The VDURA Data Platform delivers true linear scalability across both metadata and data services without compromise or complexity. AI workloads evolve fast, from early experimentation to scaled production across global clusters. Add Director Nodes to boost throughput for metadata-heavy tasks like model versioning and checkpoint tracking. Add Storage Nodes to scale bandwidth and capacity to support more training data, inference logs, or multi-tenant pipelines. VDURA enables linear scalability and seamless, predictable growth. A 50% increase in Storage Nodes delivers 50% more throughput and capacity, with no bottlenecks and no architectural redesigns.

  • VeLO metadata engine: VeLO metadata engine instances run in-memory across Director Nodes, and scale to billions of parallel metadata operations, making them perfect for high-frequency file creation, access pattern analysis, and rapid AI job cycles. V12’s Elastic Metadata Engine extends this further with dynamic scaling across nodes.
  • VPODs: VPODs manage user data in independently scalable units, each with its own erasure-coded stripe and logic, ideal for bursty checkpoint writes, long-term data lake retention, or active model training sets.
Figure 5. Linear scalability of Director and Storage Nodes.
Figure 5. Linear scalability of Director and Storage Nodes.
Director Nodes: The Brain Orchestrating Control in a Parallel World

Director Nodes serve as the brain in the VDURA architecture. VDURA separates the control plane, which handles metadata, orchestration, and policy, from the data plane, which handles user I/O. As the control plane’s core, they command every stage of the AI pipeline, from ingestion and training to checkpointing and inference. Director Nodes continuously adapt to workload changes, ensuring optimal throughput and seamless orchestration across the system.

Each Director runs VeLO, a flash-optimized metadata engine built to handle billions of operations per second. For modern AI, where performance is dictated as much by metadata velocity as data throughput, VeLO is essential. VeLO accelerates everything from tiny files to checkpoint indices to model versions. V12’s Elastic Metadata Engine dynamically scales metadata capacity across nodes, delivering up to 20× improvement in operations.

Director Nodes form the authoritative layer of VDURA’s control structure and every deployment requires a minimum of three. Administrators configure either three or five of the total Director Nodes as a replication set, or “repset,” a voting quorum that maintains a synchronized, fully replicated configuration database. One node from the repset is elected realm president and is tasked with managing configuration, status monitoring, and leading failure recovery. If the current president fails, a new one is elected instantly and automatically.

Beyond coordination, Director Nodes also perform essential tasks at the president’s request. These include managing volumes, serving as protocol gateways (NFS, SMB, S3), performing background data scrubbing, recovering failed Storage Nodes, and executing Active Capacity Balancing across VPODs. All changes are non-disruptive to clients; gateways and volumes can migrate transparently across nodes when necessary.

Storage Nodes: AI Pipeline Performance from Every Layer

Storage Nodes are the backbone of VDURA’s data plane, enabling seamless scale and sustained performance throughout every stage of the AI pipeline. Designed with flexibility and resilience, these nodes combine the best of both all-NVMe flash and flash with HDD capacity expansion storage, orchestrated under a unified control plane and single global namespace.

Figure 6. One control plane, one data plane, one single global namespace.
Figure 6. One control plane, one data plane, one single global namespace.

Optimized for Every Phase of the AI Pipeline

From high-frequency ingest and bursty checkpointing to real-time inference and long-term retraining, each phase of AI benefits from storage tiers purpose-built for performance and durability:

  • All-NVMe flash nodes deliver ultra-low latency and high IOPS for AI high-performance data and latency-sensitive phases like model loading, active training, and checkpointing.
  • NVMe flash nodes with HDD capacity expansion combine flash for metadata and active datasets with high-capacity HDDs for scalable, cost-efficient AI data retention. This is ideal for archived model weights, retraining inputs, inference logs, and data lakes that need fast access but are less frequently touched. V12’s SMR HDD Optimization unlocks 25–30% more capacity per rack without compromising throughput.
07

What’s New in V12

VDURA Data Platform V12 represents a major release with significant advancements across metadata performance, data management, storage economics, and GPU-native connectivity. V12 delivers more than 20% increase in throughput, 20× metadata acceleration, and over 20% cost-per-TB reductions, all available as a zero-downtime in-place upgrade for V11 customers.

20%+
Throughput Increase
20×
Metadata Acceleration
20%+
Cost-per-TB Reduction
25–30%
More Capacity (SMR)

Elastic Metadata Engine

The V12 Elastic Metadata Engine dynamically scales metadata capacity across Director Nodes, delivering up to 20× improvement in metadata operations. It supports billions of files and objects under active use, eliminating metadata bottlenecks that have traditionally constrained AI pipelines at scale. The engine automatically rebalances metadata distribution as clusters grow, ensuring consistent performance regardless of namespace size.

Snapshot Support

V12 introduces native snapshot support with instantaneous, space-efficient, point-in-time copies. Designed for AI pipeline checkpoints, model snapshots, and operational recovery, snapshots can be created manually or via policy-based retention. This capability is essential for protecting training progress, enabling rapid rollback during model development, and maintaining data integrity across complex AI workflows.

SMR HDD Optimization

A new write-placement engine in V12 organizes sequential zones intelligently for Shingled Magnetic Recording (SMR) drives, unlocking 25–30% more capacity per rack without compromising throughput.

RDMA Support

Available now for all V5000 systems, RDMA support enables GPU-to-storage data transfers that bypass the CPU entirely. Built on NVIDIA ConnectX-7 networking adapters and AMD EPYC Turin processors, RDMA delivers direct memory access between GPU server nodes and the VDURA Data Platform, eliminating CPU bottlenecks for the low-latency, high-throughput data paths critical to AI training and inference.

Context-Aware Tiering (Phase 1)

Phase 1 of Context-Aware Tiering introduces three capabilities: Extended DirectFlow Buffer to Local SSD, reducing dependency on network storage for hot data; KVCache Writeback for Persistence SLA, minimizing unnecessary I/O while maintaining inference SLA compliance; and Context Cache Tiering Framework for high-speed read/write at LMCache speed, supporting long-context LLM serving and RAG workloads. The roadmap includes deeper application-directed data placement, cross-node cache coherence, and NVIDIA BlueField-4 DPU support.

Native CSI Plug-in

V12 includes a native Container Storage Interface (CSI) plug-in that simplifies multitenant, Kubernetes-based deployments with zero-script persistent-volume provisioning and management. Organizations running containerized AI pipelines on Kubernetes can now provision VDURA storage volumes directly through standard Kubernetes APIs, eliminating custom integration work and accelerating time-to-production for cloud-native AI workloads.

End-to-End Encryption

VDURA provides industry-leading end-to-end encryption. V12 delivers comprehensive security with transparent, tenant-per-volume AES-256 encryption that protects data from the moment it leaves the client, through transit, and at rest. This unified encryption architecture replaces the patchwork of TLS for in-flight and SED for at-rest that competitors rely on, providing stronger confidentiality, integrity, and compliance alignment in a single, zero-performance-compromise implementation.

VDURACare Premier

One simple contract covering hardware, software, and support. VDURACare Premier™ includes 10-year, no-cost replacement of drives and 24×7 expert response, delivering comprehensive, risk-free coverage that protects your investment and keeps your AI factory running without interruption. No surprise costs, no separate maintenance contracts, no finger-pointing between vendors.

V12 is available as a zero-downtime in-place upgrade for all V11 customers and reaches general availability in Q2 2026 for all V5000 systems.

08

VPOD Architecture: Virtualization for Resilience and Efficiency

Rather than treating the entire server as a single failure domain, each VDURA Storage Node hosts multiple Virtualized Object Storage Devices, or VPODs. This architecture introduces a finer unit of failure isolation:

  • The unit of failure is now one VPOD, not the physical server.
  • More VPODs per node increases operational granularity and cluster flexibility.
  • Device-level failures are isolated to a single VPOD, eliminating the risk of full node failure.
  • Storage reconstruction only affects the failed VPOD’s component objects, not the entire node.

Files are striped across component objects in multiple VPODs using N+2 erasure coding, ensuring high fault tolerance with efficient space utilization. Large POSIX files benefit from this distributed protection model, while small POSIX files are triple-replicated across VPODs, delivering optimal performance and storage efficiency.

Figure 7. VPOD architecture provides failure isolation at the virtual instance level, not the physical server level.
Figure 7. VPOD architecture provides failure isolation at the virtual instance level, not the physical server level.
Dynamic Data Acceleration: The Smart Storage Fabric

VDURA Dynamic Data Acceleration™ (DDA) intelligently aligns I/O patterns with the most suitable media layer in real time:

  • Intent-log protection: Intent-log protection powered by SSDs replaces legacy NVDIMMs for inflight data integrity.
  • Metadata SSDs: Low-latency NVMe SSDs store metadata databases for rapid namespace access.
  • High-IOPS SSDs: High-IOPS SSDs handle small file workloads.
  • High-bandwidth HDDs: High-bandwidth HDDs manage large file sequential reads/writes.
  • System DRAM: System DRAM provides caching for unmodified data and metadata.

Together, these layers form a high-performance, self-optimizing data fabric that minimizes latency and maximizes cost-efficiency.

Resilient Data Reconstruction and Integrity

In the event of a Storage Node failure, the VDURA Data Platform reconstructs only the affected component objects, not the full node’s data. Files are rebuilt by pulling erasure-coded data fragments from other nodes. Continuous background scrubbing verifies data consistency across the system by validating erasure codes against stored data.

AI-Aware Placement and Automation

VDURA’s intelligent orchestration engine continuously analyzes file size, access pattern, and data temperature to automate data placement across flash and hybrid tiers. Key features include:

  • Flash prioritization: Flash prioritization for small and recently accessed files.
  • HDD capacity expansions: HDD capacity expansions for large, sequential, or infrequently accessed data.
  • Active Capacity Balancing: Continuous Active Capacity Balancing to eliminate hotspots and evenly distribute load.
  • Real-time adaptation: Real-time system adaptation to shifting model training cycles, inference loads, and checkpoint bursts, requiring zero manual tuning.

The result is a storage system that evolves with the volatility of the AI pipeline, scaling performance and capacity without trade-offs.

VDURA V12 Data Reduction

VDURA V12 performs data reduction at the Storage Node level, ensuring zero impact on client-side CPU or memory resources. Unlike architectures that shift compression or deduplication tasks to the client, consuming valuable compute and memory, VDURA handles all reduction operations within the storage layer itself. This design keeps GPU and application nodes fully dedicated to AI and HPC workloads, maximizing performance and system efficiency. The data reduction feature can be toggled on or off at any time via the GUI or CLI.

09

VDURA V5000 Certified Platform Hardware Overview

VDURA V5000 Certified Platform hardware is engineered for AI and HPC pipelines that demand relentless GPU feed rates. Built on industry-standard servers, V5000 Certified Platform hardware pairs flash performance with optional HDD capacity expansion, giving organizations a cost-balanced path from pilot to exabytes.

V5000 hardware runs the VDURA Data Platform V12, VDURA’s flash-tuned parallel file system, streaming multiple terabytes per second from a single global namespace. Working with the DirectFlow Client, VDURA offers parallel redundant data paths that scale linearly, safeguard data with enterprise-class durability, and keep day-to-day management simple.

Each system begins with a minimum of three Director Nodes and three Storage Nodes, which can be either all-flash or flash with HDD capacity expansion. Additional nodes can be added seamlessly to expand performance, capacity, or metadata throughput independently.

Figure 8. V5000 configuration options, left to right: all flash, 50% flash, 98% HDD.
Figure 8. V5000 configuration options, left to right: all flash, 50% flash, 98% HDD.
Key Features
  • 1U chassis for Director Nodes and Flash Storage Nodes.
  • 4U chassis for HDD capacity expansion.
  • Configurable as either all-flash NVMe or mixed-fleet nodes using flash NVMe with HDD capacity expansion.
  • Up to 2.7 TB/s throughput per rack in all-flash configurations.
  • Up to 200 GB/s throughput per rack in all-flash with HDD capacity expansion configurations.
  • Up to 1.2M IOPS per Storage Node, supporting low-latency, high-frequency workloads. Up to 45M IOPS per rack with all-flash configuration.
  • Multi-Level Erasure Coding with up to 12 nines durability (all-flash) or 11 nines durability (all-flash with HDD capacity expansion).
  • Supports InfiniBand™ (NDR/NDR200) and Ethernet (400/200/100 GbE) networking.
  • Hardware-agnostic deployment with full support for commodity components.
  • RDMA support for GPU-native, CPU-bypass data paths (new in V12).
  • Snapshot support for AI pipeline checkpoints and operational recovery (new in V12).
  • SMR HDD optimization unlocking 25–30% additional capacity per rack (new in V12).
  • Native CSI plug-in for zero-script Kubernetes persistent-volume provisioning (new in V12).
  • Native multi-tenancy with per-tenant QoS, namespaces, encryption keys, and VLAN isolation, designed for Neocloud and AI-factory provider workflows (new in V12).
  • REST API control plane with Terraform, Ansible, and Crossplane providers for tenant lifecycle, quota, and billing automation, integrated with marketplace control planes (new in V12).
  • End-to-end AES-256 encryption with tenant-per-volume granularity.
  • VDURACare Premier: 10-year drive replacement, 24×7 expert support, one contract.
V5000 Details

The VDURA V5000 Certified Platform hardware system represents the culmination of decades of engineering expertise in parallel file systems and distributed storage technology. Built for AI/ML and HPC workloads, V5000 hardware combines enterprise-grade reliability with maximum throughput and flexibility. Its modular architecture allows organizations to independently scale performance, capacity, and metadata operations to create the ideal balance for their specific workload requirements without overprovisioning or underutilization.

Director Node
  • Hosts the VeLO Metadata Engine optimized for flash. V12 Elastic Metadata Engine delivers up to 20× acceleration.
  • Contains 12 × NVMe SSD slots.
  • Up to 333M inodes and up to 225K creates/deletes per Director.
  • Requires a minimum of three nodes for metadata triplication and fault tolerance.
  • Scales out to hundreds of instances for greater metadata throughput.
All-Flash Storage Node
  • Configurable with 12× NVMe SSDs in 7.68 TB, 15.36 TB, 30.72 TB, 61.44 TB, or 122.88 TB capacities.
  • Delivers up to 60 GB/s per node and up to 2.7 TB/s throughput per rack.
  • Supports up to 1.2M IOPS per node.
  • Ideal for AI training pipelines, inference, and checkpointing workloads.
  • Fully compatible with Multi-Level Erasure Coding and data reduction features.
All-Flash with HDD Capacity Expansion Storage Node
  • Combined SSD tier up to 12× 3.84 TB, 7.68 TB, 15.36 TB, or 30.72 TB.
  • Supports JBOD expansion using 78 or 108 HDDs per node with 16 TB, 24 TB, 30 TB, 32 TB drive options.
  • Up to 200 GB/s flash-accelerated throughput per rack.
  • Scalable to 26 PBe per rack (effective) with inline compression.
  • Optimized for high-capacity/data lake storage with flash-accelerated performance.
  • Supports dynamic tiering and intelligent placement via the VDURA Orchestration Engine™.
  • V12 SMR HDD Optimization unlocks 25–30% additional capacity per rack.
Connectivity
  • Supports InfiniBand NDR/NDR200 and Ethernet 400/200/100 GbE ports.
  • Nodes connect at up to 2× NDR200 InfiniBand and up to 4× 100 GbE Ethernet.
Expansion Options

Each VDURA V5000 cluster can expand incrementally and non-disruptively.

  • Add Director Nodes to increase metadata and protocol performance.
  • Add all-flash Storage Nodes for higher throughput and IOPS.
  • Add flash with HDD capacity expansion Storage Nodes for cost-efficient capacity growth.
  • Mix and match node types within the same realm with no architectural redesign.
  • Tiered performance and capacity levels managed via StorageSets, ensuring isolation and quality-of-service (QoS).
Physical Connectivity

VDURA V5000 Director and Storage Nodes support 400/200/100 GbE networks via two network ports in the rear of each node. The default configuration upon initial installation is link aggregation across two ports, a 2×200/100 GbE configuration using two 200/100 GbE SFP28 cables, with one attached to each port. VDURA V5000 nodes support Link Aggregation Control Protocol (LACP) by default; static Link Aggregation Group (LAG), single link, and failover modes are also available.

VDURA V5000 Director and Storage Nodes contain two 25/10 GbE ports for corporate network connectivity. All nodes also contain a single 1 GbE port that may be used as a general administrative network port or for troubleshooting.

Network Configuration Options

There are four network configuration options:

  • Dynamic LACP
  • Static LAG
  • Single link
  • Failover network

The default network configuration for V5000 nodes is LACP across the dual 100 GbE ports. Generally, protocols other than LACP and static LAG operate in active/passive mode.

Active/Active Link Aggregation Mode: When load balancing is required to optimize performance, V5000 systems can be configured to use either dynamic LACP or static LAG. LACP is preferred, as it is significantly more robust than static LAG. In LACP mode, the physical ports are bonded with the IEEE 802.3ad LACP link-layer protocol, providing load balancing, better fault tolerance, and protection against misconfiguration.

Single Link Mode: While single link mode is supported on V5000 systems, it is not optimal since it is a single point of failure and suffers reduced bandwidth. Single link mode should be used with caution.

Network Failover Mode: Network failover is used on V5000 systems when active/passive redundancy is required.

Storage Configuration Options

VDURA provides two mechanisms to manage namespace and capacity: StorageSets and volumes.

StorageSets: The StorageSet is a physical mechanism that groups Storage Nodes into a uniform storage pool. It is a collection of three or more Storage Nodes grouped together to store data. You can grow a StorageSet by adding more hardware, and you can move data within a StorageSet.

Volumes: A volume is a logical mechanism, a sub-tree of the overall system directory structure. A read-only top-level root volume (“/”), under which all other volumes are mounted, and a /home volume are created during setup. All other volumes are created by the user on a particular StorageSet, with up to 1,200 per realm. V12 adds native snapshot support for volumes, enabling instantaneous point-in-time copies for AI pipeline checkpoints and operational recovery.

When planning volume configuration, keep the following points in mind:

  • Volumes can be used to manage capacity.
  • Volumes can be created to complement backup strategies.
  • Performance can be enhanced by assigning volumes to different Directors.
Erasure Coding and Data Protection

Storage Nodes in the VDURA Data Platform are highly sophisticated Virtualized Protected Object Devices (VPODs); you gain the same scale-out and shared-nothing architectural benefits from our VPODs as any object store would.

Figure 9. Per-file erasure coding layouts.
Figure 9. Per-file erasure coding layouts.

VDURA defines objects used in our VPODs per the American National Standards Institute (ANSI) T10 standard definition of objects rather than the Amazon S3 object definition. The VDURA Data Platform uses T10 objects to store POSIX files. Instead of storing each file in an object like S3 does, VDURA stripes a large POSIX file across a set of VPODs and adds additional VPODs into that stripe that store the P and Q data protection values of an N+2 erasure coding scheme. Using multiple VPODs per POSIX file enables the striping that is one of the sources of a parallel file system’s performance.

A traditional storage array reconstructs the contents of drives, while VDURA reconstructs the contents of files.

While large POSIX files are stored using erasure coding across multiple VPODs, small POSIX files use triple replication across three VPODs. This approach delivers higher performance than can be achieved by using erasure coding on such small files, while being more space efficient. Unless the first write to a file is a large one, it will start as a small file. If a small file grows into a large file, the Director Node will transparently transition the file to the erasure-coded format at the point that the erasure-coded format becomes more efficient.

Reliability That Increases with Scale

Any system can experience failures, and as systems grow larger, their increasing complexity typically leads to lower overall reliability. For example, in a traditional storage system, since the odds of any given drive failing are roughly the same during the current hour as they were during the prior hour, more time in degraded mode equals higher odds of another drive failing while the system is still degraded. If enough drives were to be in a failed state at the same time, there would be data loss, so recovering back to full data protection levels as quickly as possible becomes the key aspect of any resiliency plan.

The VDURA Data Platform has linear scale-out reconstruction performance that dramatically reduces recovery time in the event of a Storage Node failure, so reliability increases with scale.

If a VDURA Storage Node fails, the system must reconstruct only those VPODs that were on the failed node, not the entire raw capacity of the Storage Node like a traditional array would. The system reads the VPODs for each affected file from all the other Storage Nodes and uses each file’s erasure code to reconstruct the VPODs that were on the failed node.

When a StorageSet is first set up, it sets aside a configurable amount of spare space on all the Storage Nodes in that StorageSet to hold the output from file reconstructions. When the system reconstructs a missing VPOD, it writes it to the spare space on a randomly chosen Storage Node in the same StorageSet. As a result, during a reconstruction, the system uses the combined write bandwidth of all the Storage Nodes in that StorageSet. The increased reconstruction bandwidth results in reducing the total time to reconstruct affected files, which reduces the odds of an additional failure during that time and increases the overall reliability of the realm.

VDURA also continuously scrubs the data integrity of the system in the background by slowly reading through all files in the system, validating that the erasure codes for each file match the data in that file. Data scrubbing is a hallmark of enterprise-class storage systems and is only found in one HPC-class storage system, the VDURA Data Platform.

An Architecture of High Reliability

Based on system configuration, the N+2 erasure coding that VDURA implements protects against either one or two simultaneous failures within any given StorageSet without any data loss. The realm can automatically and transparently recover from more than two failures as long as there are no more than two failed Storage Nodes at any one time in a StorageSet.

If, in extreme circumstances, three Storage Nodes in a single StorageSet were to fail at the same time, VDURA has one additional line of defense that would limit the effects of that failure. All directories are independently stored triplicated, with three complete copies of each directory, with no two copies on the same Director Node. If a third Storage Node were to fail in a StorageSet while two others were being reconstructed, that Storage Set would immediately transition to read-only state. Only the files in the StorageSet that had VPODs on all three of the failed Storage Nodes would have lost data. All other files in the StorageSet would be unaffected or recoverable using their erasure coding. The number of affected files in these situations becomes smaller as the size of the StorageSet increases.

VDURA is unique in the way it provides clear knowledge of the impact of a given event, as opposed to other architectures which leave you with significant uncertainty about the extent of the data loss.

Per-File Erasure Coding

Instead of relying on hardware controllers that protect data at a drive level, VDURA architecture uses per-file distributed erasure coding in software. Files in the same StorageSet, volume, and even directory can have different erasure coding protection levels. In this way, a file can be seen as a single virtual object that is sliced into multiple component objects.

Users have three Erasure Coding Protection Levels available: Dual Parity (n+2), Single Parity (n+1), and Striped Mirror (2x).

  • Dual Parity (n+2) is the default protection level for all volumes, as it provides a balance between capacity overhead and performance. This is the recommended setting for most workloads.
  • Single Parity (n+1) is optimized for performance with less capacity overhead, ideal for workloads where throughput is prioritized over maximum fault tolerance.
  • Striped Mirror (2x) is optimized for small write performance, combining striping and mirroring to provide high performance for applications that perform small random writes while providing resiliency against a single disk failure.

The Erasure Coding Protection Level is selected at volume creation time and cannot be changed after a volume is created. You can mix protection levels and any volume layout together in the same StorageSet, with each volume evaluated independently for availability status.

Ready to Transform Your AI Infrastructure?

Download the full VDURA Data Platform V12 White Paper or visit vdura.com for a tailored AI factory assessment.

Continue the conversation

Translate the white paper into your roadmap

Need a deeper dive into architecture, proof-of-concept planning, or sizing? The VDURA team will tailor the V12 guidance to your AI pipeline and adoption timeline.

Get Whitepaper

';