Velocity • Durability

How AI is reshaping the foundations of computing and storage

Ken Claffey, Jan 6, 2026 (CIO)

If Jensen Huang is right that the era of general-purpose computing is coming to an end, then we are witnessing a transformation as profound as the shift from horsepower to steam power two centuries ago.

At the heart of this new revolution are the converging developments across AI and data infrastructure, where unprecedented computational power is aligning (or at least attempting to) with an equally demanding need for speed, reliability and scale in how information is stored and accessed.

By creating the most data-intensive workloads ever seen, AI is radically reshaping enterprise infrastructure. The eye-watering sums being spent on expanding global datacenter capacity bear this out, with Meta’s $600 billion plan among the most recent in a slew of announcements. Back in April this year, McKinsey put a $7 trillion price tag on what they thought would be required “to keep pace with the demand for compute power.” If the momentum behind AI continues unabated, that figure may need to be revised upwards.

The situation also has fundamental implications for data storage. Traditional storage was built for predictable, sequential workloads like databases and virtualization. AI upends that model, with thousands of GPU threads hammering existing systems with parallel, random, high-throughput access.

The performance problems this can create cascade across infrastructure components. When storage cannot keep up, GPUs sit idle, training cycles stall and overall costs soar. Every hour of underfed GPUs delays ROI because training is an investment and stalled or inefficient epochs push out time to value. The risks extend even further. If data is corrupted or lost, entire models often need to be retrained, creating enormous and unexpected costs. The impact goes beyond training inefficiency. Inference is the revenue-generating component, and slow or unstable data pipelines directly reduce the commercial return of AI applications. In response, legacy vendors are trying to retrofit existing architectures to meet AI demand, but despite their best efforts, most of these designs still limit performance and scalability.

Something has to give, starting with the recognition that AI requires purpose-built, natively high-performance storage systems.

Reliability 101

These performance pressures also expose a deeper problem — reliability. Large-scale AI models rely on uninterrupted access to training data, and any disruption, whether it’s a metadata server failure, data corruption or a myriad of other issues, can significantly impact productivity and compromise results.

Indeed, reliability in this context is not a single metric; it’s the product of durability, availability and recoverability. These are crucial issues because the ability to maintain continuous operations and data integrity isn’t just a technical safeguard; it’s what determines whether AI investments actually deliver value.

The problem today is that many legacy systems still rely on local RAID or HA-pair architectures, which protect against small-scale failures but falter at AI scale. In contrast, modern designs utilize multi-level erasure coding and shared-nothing architectures to deliver cluster-wide resilience, ensuring sustained uptime even under multiple simultaneous failures.

The knock-on effect of legacy shortcomings is enormous, with Gartner warning that “through 2026, organizations will abandon 60% of AI projects unsupported by AI-ready data.” If that wasn’t bad enough, poor data quality already drains $12.9–$15 million per enterprise annually, and pipeline failures cost around $300,000 per hour in lost insight and missed SLAs.

Storage at the speed of AI

Building the level of reliability AI systems need requires rethinking how systems are technologically and operationally architected. For instance, resilience must be embedded from the outset, rather than being retrofitted to legacy storage products as applications change around them.

At a technological level, capabilities such as multi-level erasure coding (MLEC), a modern distributed data protection mechanism, will replace traditional RAID’s limited fault tolerance with protection that spans multiple nodes, ensuring data remains intact even if several components fail simultaneously.

At the same time, hybrid flash-and-disk architectures help control cost by keeping high-performance data on flash while tiering less critical information to lower-cost media. Meanwhile, modular, shared-nothing designs eliminate single points of failure and allow performance to scale simply by adding standard server nodes with no proprietary hardware required.

Then there are operational requirements to address. For example, automated data integrity checks can detect and isolate corruption before it enters AI pipelines, while regular recovery drills ensure restoration processes work within the tight timeframes AI production demands. Aligning these technical and operational layers with governance and compliance frameworks minimizes both technical and regulatory risk.

Make no mistake, these capabilities are not just nice-to-haves; they are now fundamental to the way AI infrastructure should be designed. Inevitably, AI workloads and datasets will continue to expand, and storage architectures will need to be modular and vendor-neutral, allowing capacity and performance upgrades without wholesale replacement.