The General-Purpose Era Is Ending. So Is General-Purpose Storage

Ken Claffey, Sept 29, 2025 (LinkedIn Blog Post)I spent some time this weekend listening to Jensen Huang on the Bg2 Pod (BG2 podcast) he said that the CPU-dominated, general-purpose computing era is ending. That we’re moving to accelerated computing and AI factories data centers architected to convert electrons into tokens, answers, and actions. So, I think we would all agree with that, and it has a very specific implication for storage: legacy, monolithic NAS and block SAN architectures, designed for yesterday’s workloads, aren’t the backbone of tomorrow’s AI supercomputers.

When the data center becomes an “AI factory,” storage can’t be an afterthought

Huang’s framing is simple: the data center is now a purpose-built AI factory where throughput equals revenue. If your pipeline stalls on memory, networking, or storage I/O, your economics fall apart. That’s why NVIDIA is pushing a one-year platform cadence; the stack is co-designed and upgraded continuously. Storage must keep pace—architecturally massively parallel and operationally resilient/efficient or it becomes the bottleneck.

What changes—and why the old guard struggles

Traditional filer and SAN products were terrific for client/server and virtualization eras. They optimized for generality, simplicity at small scale, and controller-centric reliability semantics again at small scale (a pair or a few nodes). But AI factories have different first principles:

  • Parallelism end-to-end. The ability to scale storage throughput linearly across clusters that match the needs to the AI compute clusters, aggregate small I/O into massive, concurrent flows with zero “hot controller” patterns.
  • Deterministic low latency. GPU utilization lives or dies on tail latency, not just peak bandwidth.
  • Data/model locality. Scheduling wants storage that understands placement, prefetch, and tiering across NVMe flash, QLC, and capacity HDD at cluster scale.
  • Failure as normal. At AI scale, “everything breaks,” so recovery math and rebuild behavior must be online, incremental, and parallel, not controller to controller or box-by-box.
  • Adaptability to upgrade every year. If your data storage infrastructure can’t adopt yearly silicon/networking advances without forklift swaps and actually translate new advances thru their software stack in term of improved capabilities (i.e. performance) you’ll miss the cadence window.

The incumbents understand this shift. Expect (more) “AI-ready” announcements, glossy benchmarks, and dressed up product-line “extensions” that bolt accelerators or pNFS verbs onto legacy controllers to look the part. But there’s a hard truth veterans in this industry know: you can’t incrementally patch your way into a parallel, AI-scale distributed system. The corner cases only reveal themselves in thousands of real-world, at-scale deployments over many years.

Two axioms I’ve learned the hard way over my 30+ years:

  1. It takes 5+ years to truly mature a RAID stack.
  2. It takes 10+ years to truly mature a distributed system and in both cases that clock starts after “first production at scale real world deployment,” not at architecture whiteboards or a marketing launch or internal testing.

Reality check vs. marketing theater

This is why “AI-native” storage can’t be a veneer. In an AI factory world, storage must be designed like part of a supercomputer, not a peripheral. That means ground-up parallelism in every element, every module of the stack; client-side intelligence; distributed metadata at extreme scale; network and multi-level erasure coding that rebuilds fast under load; and networking that assumes RDMA/NVLink/NVSwitch fabrics, not just a pair of active/standby controllers. Anything less gets found out in production.

Where VDURA fits

At VDURA, we’ve been building for this moment for two decades. PanFS is on its 11th major release (V11) and has evolved through hundreds of millions of R&D investment and thousands of real-world at scale deployments. That maturity matters because experience is a critical feature in distributed systems. It’s why so few truly parallel file systems are widely deployed and why “AI-ready” facelifts rarely survive their first large-scale failure domain event (i.e. the storage equivalent of punch in the face).

We’re not resting on that foundation. We’re evolving it, modernizing metadata (to exploit KV acceleration), extending multi-tier semantics across NVMe + capacity media, and tightening the feedback loops between scheduler, network, and storage placement so GPU clusters stay fed.

This is also why our founder, Garth Gibson, the pioneer of parallel file systems, has rejoined to help drive the next leg of this exciting journey. The mandate is clear: treat storage like an integral part of the AI Supercomputer, not a storage box (NAS/SAN) on the side.

What to watch for (and what to be skeptical of)

  • Cadence alignment. If a vendor can’t explain how their storage will adopt a yearly accelerator/network upgrade cycle without service disruption, beware.
  • Sustained Performance. Peak GB/s is table stakes; ask for sustained performance numbers under load when things go wrong – a client node fails, a storage node fails, a metadata server fails, if multiple nodes fail all at the same time or if 100’s of storage devices all go offline.
  • Scalability. Is the storage cluster proven to scale perfectly linearly not just at the scale of your initial purchase but at >10 times that scale. Ideally it has proven near perfectly linearly performance scalability to >1000 storage nodes, online scaling not just throughput but also IOPs, Metadata (file creates/deletes, Inodes) and capacity with no detriments to the SLA (availability, durability).
  • Failure math and repair kinetics. Demand evidence of parallel rebuild performance at scale, under real workload tension, not marketing footnotes. Remember if storage is down so is your compute cluster – the AI Factory is on ‘stop-ship’.
  • Control and data-plane maturity. How many production class clusters with 10,000s of clients? How many years of multi-workload production operations? Corners only round off in the field.
  • Software Design posture (SDS). Do they 100% control the software stack they rely on, can they quickly fix issues that will arise (it will happen at scale) and can they co-develop the roadmap features that you need in the time you need them to meet the latest hardware innovations or are they tied to a slow proprietary hardware cadence or a software stack that’s not truly theirs? Are they integrated with AI data pipelines and able to run as fast as the cluster will need, not just today’s AI compute cluster needs but also for when you add the next number of GPU nodes, or are they fronting the same NFS/FC heads with new stickers that will halt your AI factory production?

The next decade belongs to systems that convert watts to tokens efficiently

Huang’s message lands because it’s systemic: AI factories are a coordination game: silicon, memory, interconnects, software, power, and yes, storage – running on an annual rhythm. The winners won’t just ship faster chips; they’ll ship cohesive systems where storage actively amplifies GPU utilization instead of siphoning it away. That’s the bar we’re building to clear, consistently and transparently. Legacy vendors will make noise; that’s their big marketing team’s job. Ours is simpler: keep delivering a data storage infrastructure worthy of the AI factory of tomorrow, ground-up parallel, failure-tolerant at scale, and ready to upgrade every year. That’s the power of Adaptability at scale #VDURAPOWER.