SDxCentral, Jan 12, 2026 – Billion-dollar GPUs may dominate the headlines, but a strong sovereign storage foundation will prove the real star in eliminating AI bottlenecks
It’s true to say that for the public, as well as many investors, the AI revolution is a story of billion-dollar GPUs, vast amounts of compute power, and the complex (and often opaque) algorithms of foundational models. Indeed, it’s this focus that has propelled companies such as Nvidia to multitrillion-dollar valuations, creating a gold rush around raw processing power.
Yet, while GPUs understandably remain the celebrated heroes of the technological revolution, the true bottleneck – and in many ways the more critical challenge – lies in the somewhat less sexy area of computer memory and storage.
In particular, AI infrastructure funding is driving unprecedented demand for dynamic random access memory (DRAM), especially with the shift from training to inference further straining supply – inference workloads now account for 85% of units sold. As a result, DRAM prices are surging, with Omdia predicting up to a 40% price increase in the first quarter of 2026 alone, with demand expected to exceed supply by 20% throughout the entire year.
While memory is the immediate speed limiter during active computation, the underlying storage infrastructure also poses major challenges.
“In many ways, data storage is like plumbing,” Eric Salo, SVP of marketing and business operations at storage company Vdura, explained. “Nobody thinks it even exists until it doesn’t work, and then suddenly it’s an emergency.”
Without an efficient data storage layer, the multimillion GPUs will simply fail to achieve maximum efficiency, wasting colossal amounts of capital and energy. Clearly, the market is finally waking up to this fact, which explains why data infrastructure company Vast Data reached a $9.1 billion valuation and signed mega deals with Nvidia, Google Cloud, Microsoft, and CoreWeave.
As Tom Burke, chief revenue officer at Nscale – one of Vast Data’s clients – put it: “The future of AI will not be defined by raw compute alone, but by the strength of the foundations we build. That means infrastructure that can scale globally, operate responsibly, and deliver when it matters most.”
Feeding data-hungry GPUs
The industry’s focus for many years has been disproportionately on processing speed as AI models become ever more complex, however it’s the ability to feed data-hungry GPUs, such as those on Nvidia’s DGX platform, that has become the defining constraint.
Vincent Oostlander, European sales director for solutions at Seagate, said it’s this imperative that is injecting a much-needed excitement back into the sector. “I think data storage is becoming more sexy again,” Oostlander said, adding that recognizing that while the last few years has seen a “massive focus on GPUs,” the capacity crisis in data centers is now swinging the pendulum back the other way.
Then there’s the data itself – not just the volume, but also the complexity, which Vast Data claims necessitates a layer of intelligence. According to Jason Hammons, VP of systems engineering at Vast Data, the challenge for the industry isn’t just “building faster hardware but focusing on the infrastructure and the operating systems that bring data together in a more efficient way.”
“Basically, we’re the operating system for AI data services,” Hammons said, claiming that Vast’s :shared everything approach” offers much better performance levels than “legacy shared nothing architectures.” For example, Vast works with the National Hockey League (NHL) for integrated real-time data analysis, allowing applications such as flagging up security concerns at a stadium, which weren’t possible using previous bolt-on tools.
AI inferencing will define 2026, and the market’s wide open
AWS, Cisco, CoreWeave, Nutanix and more make the inference case as hyperscalers, neoclouds, open clouds, and storage go beyond model training
Disaggregated architecture
According to Grant Caley, NetApp’s U.K. and Ireland solutions director, “performance and capacity are a dual challenge.”
“When training large models, you can’t keep GPUs waiting. You need low-latency, high-performance storage that can scale,” Caley said. Importantly, disaggregated software is becoming an increasingly important requirement of NetApp’s clients, especially among larger enterprises.
This architecture allows organizations to scale compute (GPUs) independently of storage capacity and performance, eliminating the fixed ratios that previously created bottlenecks.
“You can scale compute nodes to get the performance, or you can scale capacity independently,” Caley explained. “That solves the latency problem because you can literally deliver hundreds of gigabytes per second of throughput.”
Flash storage (SSD) also provides the low-latency performance needed for rapid AI model training and inference. Indeed, vendors such as Pure Storage have built their entire business model around flash.
“We don’t come from the heritage world of hard disk drive,” Patrick Smith, CTO for EMEA at Pure Storage, explained. “Because we only use flash, it allows us to bring very high throughput, very low latency, and great scalability to modern analytics and AI workloads.”
This performance is not just vital for model training but increasingly inference too, especially with retrieval-augmented generation (RAG). A powerful AI technique, RAG works in conjunction with large language models (LLMs), retrieving external data to improve the accuracy of generative AI and preventing issues such as hallucinations. RAG is “an area that’s gathering a huge amount of traction at the moment,” Smith claimed, and it’s one that places new, severe demands on low-latency data access.
Data layering strategy
While the speed of flash is essential, the exabyte-scale of data creation in the AI era makes an all-flash approach simply financially impractical for many organizations. This is where the concept of intelligent data layering – or tiering – becomes imperative for cost-efficiency and scale.
While flash handles the low-latency, high-performance “hot” tier, hard disk drive (HDD) offers a high capacity, “warm tier” for storing massive training data sets, historical checkpoints, and archive logs.
Seagate’s Oostlander’s perspective is that “large-capacity hard drives will remain critical for supporting the massive influx of unstructured data that AI workloads generate,” adding that the demand is so high that “we are more or less sold out for high-capacity points for a year in advance.”
For software-designed storage company Vdura, hybrid architecture is key. “We give our systems a flash, but then we use these really efficient hard drives,” Eric Salo, Vdura’s SVP for marketing and business, said, pointing to the financial and performance benefits of leveraging both technologies.
“We’re about 60% less expensive than an all-flash array offering the same or better performance,” Salo claims. According to Volker Lindenstruth, director of the Center for Scientific Computing at Goethe University, which leverages a large AMD GPU for AI-driven physics models, “Vdura’s hybrid architecture delivered the ideal balance of price/performance and low management overhead.”
Enterprise challenges
Inevitably as the AI landscape matures, the focus is shifting from pure innovation to the rigors of regulatory compliance and data sovereignty.
“As customers move out of modeling into inferencing, they’re also running into more enterprise challenges around the data they are using – such as how do they secure it, how do they protect it, and how do they make it more cyber resilient?” NetApp’s Caley explained.
For example, while many organizations are eager to harness the power of AI, there is a growing reluctance to move sensitive data into centralized, international hyperscale clouds where they may lose control.
“We’ve been talking about data sovereignty for more than 20 years, but it’s become a heightened topic because of the increased focus on data, the widespread adoption of hyperscalers, and geopolitical tensions around the world,” Pure Storage’s Smith said.
Jason Hammons, VP of Systems Engineering at Vast Data, explained that his company has spent three years developing specific “secure multitenant isolation mechanisms” that allow cloud builders to safely isolate clients and tenants. It claims this architecture provides the level of security that “a regulator in a sovereign nation is going to be very comfortable with,” while still allowing users to take advantage of the massive scale of GPU clusters.
According to Hammons, the platform allows administrators and regulators to have a “very clear and transparent line of sight” to critical audit points, including the specific data a model was trained on at a certain point in time.
Put simply, the AI revolution is rapidly outgrowing its initial obsession with raw GPU performance, instead shifting its focus toward the foundational “plumbing” of memory and storage. While GPUs provide the necessary horsepower, the true challenge for the next era of enterprise AI lies in building efficient data infrastructures that can feed these processors without the crippling bottlenecks of latency and data silos.
Success in this environment requires more than just scaling capacity. It demands an intelligent data services layer that can unify fragmented data and simplify complex pipelines. By adopting a disaggregated architecture and strategic data tiering, organizations may finally begin to balance high levels of performance with cost efficiency.