What Google, Meta, and Microsoft Know About Flash That Neoclouds Are Still Learning
There is a reason the largest AI builders in the world do not run all-flash storage. It is the same reason they never have.
Google, Meta, and Microsoft operate at a scale where bad architectural assumptions do not just cost money. They threaten the viability of entire product lines. When you are training models across hundreds of thousands of GPUs and storing exabytes of data that grow by the petabyte every week, you cannot afford to treat storage media decisions as a set-it-and-forget-it procurement exercise. These companies figured out the right approach years ago. The neocloud market is only now catching up.
The Hyperscaler Playbook
A March 2024 Meta engineering blog laid out the company’s data center storage in three explicit tiers: TLC SSDs for the performance layer at 8 to 16 TB per drive, QLC SSDs for the capacity layer at 64 to 150 TB per drive, and HDDs for bulk data at 20 to 30 TB per drive. Meta also uses tape for cold archival storage. That is four tiers of media, each selected for the job it does best.
Google has operated this way for over a decade. Its Colossus file system, the successor to the Google File System, runs across mixed media with software-driven data placement. Flash handles the hot path. Disk handles everything else. The intelligence lives in the software, not in the purchasing decision to buy more of one media type.
Microsoft follows the same pattern across Azure’s storage infrastructure. Hyperscaler capital expenditure is projected to reach $610 billion in 2026, triple what it was two years ago. At that level of spend, the difference between buying flash for performance and buying flash for capacity is not a rounding error. It is billions of dollars.
What They Understood First
The insight is deceptively simple: flash is a performance medium, not a capacity medium. The throughput of an NVMe SSD is determined by its controller and interface, not by how many terabytes it holds. A 7.68 TB drive and a 122 TB drive from the same generation deliver comparable sequential read throughput. The only difference is how much cold data you are paying premium prices to park on media that returns no additional speed for the privilege.
NVIDIA’s own DGX storage guidance makes this concrete. Text-based LLM training requires roughly 0.5 GB/s of read throughput per GPU. Physical AI and visualization workloads top out at approximately 4 GB/s reads and 2 GB/s writes. For a 4,096-GPU cluster, you need enough flash to saturate those throughput targets. After that, every additional terabyte of flash you buy for capacity is money that would have been better spent on disk, or on more GPUs.
The hyperscalers internalized this math at scale. They use just enough flash to keep GPUs fed, then drain data to high-density HDDs as fast as the software allows. The result is a storage architecture that stays closer to 10% of the infrastructure budget while still saturating every GPU in the cluster.
Why Neoclouds Took a Different Path
When the neocloud wave hit in 2021 and 2022, operators were under enormous pressure to get GPU capacity online fast. Storage was a means to an end, not an architectural decision. Several prominent storage vendors offered a compelling pitch: go all-flash, keep it simple, and worry about optimization later.
For a brief window, that pitch was economically defensible. NAND was affordable. Supply was abundant. The cost gap between flash and disk was narrow enough to justify the convenience. But that window closed hard. NAND flash contract prices have surged 55 to 60% in a single quarter. SSD costs now run up to 16x higher than HDDs on a per-terabyte basis. Storage, once a quiet 10% line item, is trending toward 20 to 30% of total infrastructure cost in all-flash deployments.
Industry leaders are now calling this out directly. StorONE’s CEO put it bluntly: the belief that flash could replace every tier of storage was a decade-long illusion. Flash price inflation is not a short-term hiccup. It is a structural reality. And operators who built their entire data layer on a single media type are now discovering that “performance everywhere” comes with a tax they cannot pay.
The Architecture Gap
The problem is not just the cost of flash. It is what happens when operators try to retrofit mixed-media into an architecture that was never designed for it. The most common workaround is bolting a separate object store onto an existing all-flash file system. You end up with two software stacks, two data planes, external data movers, and a networking layer that has to shuttle data between them. It works, technically. But it is a workaround dressed up as a strategy.
The hyperscalers do not operate this way. Google, Meta, and Microsoft run SSD and HDD within the same software stack, the same data plane, with high-performance native tiering. Data flows between flash and disk as a first-class operation inside the storage system, not as a batch job managed by a separate tool. One namespace. One control plane. Zero external movers.
That is the architecture that MIT Technology Review identified as the backbone of the world’s most demanding AI infrastructure. And it is the architecture that has been conspicuously absent from the neocloud market, where operators have been sold on simplicity at the expense of long-term economics.
The Takeaway
The neocloud market is projected to grow from $35 billion in 2026 to nearly $240 billion by 2031. The operators who capture that growth will not be the ones who stacked the most GPUs. They will be the ones who figured out how to keep storage at 10% of the budget instead of 30%. The hyperscalers cracked that code years ago. The playbook is not a secret. It just requires an architecture that was designed for mixed media from day one, not one that is trying to add it after the fact.