From CRAC to Liquid: Why Cooling Is Now the Biggest Risk (and Opportunity) in Data Centers
From CRAC units to liquid immersion, cooling has quietly become the hidden bottleneck in AI data centers and the next big opportunity for investors.
Welcome to Global Data Center Hub. Join investors, operators, and innovators reading to stay ahead of the latest trends in the data center sector in developed and emerging markets globally.
If you missed it, start with What Really Powers a Data Center (And Why It Decides Who Wins the AI Race), where we broke down the hidden story of electricity, how UPS systems, diesel generators, and PUE shape the real battle over who controls the future of AI.
We talk about data centers in terms of power and compute. How many megawatts are available? How many GPUs can fit in a rack?
But in practice, power is only half the story. The other half, often invisible to outsiders, is cooling.
Cooling determines whether that power can be used effectively. It decides if racks run at 100% utilization or if they throttle under thermal stress. And as AI workloads drive heat densities far beyond historical norms, cooling is moving from engineering detail to boardroom issue.
This lesson breaks down what cooling is, why it matters, and how the industry is evolving from chilled air to liquid immersion as the AI era redefines what it means to run at scale.
Why Cooling Is as Critical as Power
Every watt that enters a server eventually turns into heat. That heat must be removed, or the system fails.
Historically, CPUs generated manageable thermal loads. But today’s GPUs can draw 700 watts or more per chip, and clusters of tens of thousands create concentrated hot zones that air cooling alone can’t handle.
Cooling is no longer just about survival. It’s about performance, uptime, and economics. For operators, poor cooling means racks cannot run at full density, leaving stranded capacity. For investors, it means lower returns, as under-utilized power contracts drag on IRRs. For hyperscalers, it means broken SLAs and frustrated AI teams.
The paradox is simple: the more power you add, the harder cooling becomes. And the bottleneck is arriving faster than many projected.
The Cooling Systems That Keep AI Alive
Data centers use multiple layers of cooling, each with trade-offs in cost, density, and scalability.
The traditional approach relies on CRAC units (Computer Room Air Conditioning systems) which circulate chilled air, often through raised floors or ceiling ducts, into cold aisles. Servers expel hot air into hot aisles, which is recaptured and cooled again. CRACs remain dominant in enterprise and retail colocation, but they struggle above 10–15 kW per rack, making them insufficient for AI-heavy deployments.
A step up is CRAH units (Computer Room Air Handlers), which use chilled water supplied by large on-site plants. CRAHs are more efficient than refrigerant-based CRACs and now form the backbone of most hyperscale builds.
Before replacing systems, operators often start with airflow management. Hot-aisle and cold-aisle containment, blanking panels, ducted returns, and tile layout adjustments can deliver double-digit efficiency gains without major capital expense. Airflow design is the cheapest lever to lift cooling performance.
Beyond these measures, the industry is moving toward direct-to-chip liquid cooling. By pumping coolant directly over CPUs and GPUs, operators achieve far higher thermal transfer than air, enabling densities of 50–100 kW per rack or more. But this is not a retrofit. Direct liquid cooling requires rethinking rack design, manifolds, leak detection, and service contracts. It is an architectural shift.
The most radical option is immersion cooling, where entire servers are submerged in dielectric fluid that absorbs heat directly. Immersion promises unparalleled density and near-silent operation. Yet it faces barriers around hardware compatibility, servicing complexity, and cultural conservatism in IT departments. It is gaining traction in AI training facilities, but mainstream adoption remains early.
The Metrics That Matter
Cooling effectiveness is not just about whether it works, it is judged by efficiency and sustainability.
Cooling load, measured in kilowatts or tons, defines how much thermal energy must be removed per IT load. PUE, or Power Usage Effectiveness, is the most widely cited metric, and cooling is the single largest variable factor in this ratio. Shaving even 0.1 off PUE at hyperscale can mean millions in annual savings. WUE, or Water Usage Effectiveness, is now equally important. Regulators and communities are scrutinizing data center water use, and open-loop cooling systems in particular face rising political and ESG risks.
For boards and financiers, these numbers are not technical trivia. They determine operating margins, regulatory approval, and ultimately, valuation multiples.
Where Cooling Innovation Leads
Meta has been aggressively rolling out direct-to-chip cooling across AI campuses, pairing it with water recycling to drive higher rack densities while keeping operating costs manageable. This strategy makes AI workloads economically viable at scale.
Microsoft has gone even further with Project Natick, submerging servers offshore and testing immersion combined with ocean water cooling. While not yet a mainstream model, the experiment showed how radical siting and thermal approaches could reset both economics and design.
Startups like Submer and Nautilus are pushing immersion and floating data centers, offering specialized platforms that incumbents may one day acquire to accelerate adoption. These ventures represent not just technical progress but also venture capital opportunities and M&A dynamics.
The Future of Cooling in the AI Era
AI is forcing a structural reset. The era of incremental airflow tweaks is giving way to liquid-ready campuses designed for racks drawing 50–100 kW or more.
The next decade will likely be defined by hybrid environments, where enterprise workloads remain air-cooled while AI workloads demand liquid solutions. Operators must design facilities capable of supporting both.
Waterless cooling systems, closed-loop and refrigerant-based, will expand as water scarcity and regulatory pressure mount. Expect WUE to appear alongside PUE in ESG disclosures, as investors demand accountability.
Sustainability will become a differentiator in client decision-making. Tenants will no longer ask only about uptime; they will also scrutinize the carbon and water footprint of each inference or training cycle.
And finally, cooling strategies will move into boardrooms. They will appear in earnings calls, prospectuses, sovereign infrastructure strategies, and policy debates. Thermal design is evolving from an engineering issue into a matter of capital allocation.
Myths and Misconceptions About Cooling
Many assume that air will always be enough. But at AI densities, it simply cannot handle the heat. Others argue that liquid cooling is too risky. Leak risks are real, but modern engineering mitigates them, and early adopters are already proving viability at scale. A final misconception is that cooling is merely an operating expense. In reality, cooling choices influence capital expenditure, land use, sustainability compliance, and asset resale value. It is an asset strategy in its own right.
Why Cooling Is Now a Competitive Advantage
For decades, cooling was invisible, hidden in basements, roof plants, and ducts. Today it is a defining variable in competitiveness.
Investors should scrutinize cooling strategies with the same rigor they apply to power contracts. Operators must decide whether to retrofit existing facilities or rebuild entirely for liquid. Technology firms need to align chip roadmaps with cooling infrastructure. Policymakers must weigh industrial competitiveness against water use and sustainability concerns.
Those who master cooling will not only cut costs but also win clients, achieve regulatory approval, and gain market share.
Final Takeaway
Cooling used to be delegated to engineers. Now it’s debated in boardrooms.
In the AI era, whoever controls the heat, wins.
The winners will not be those with the cheapest megawatts, but those with the smartest thermal strategies, those who can balance performance, sustainability, and economics at scale.