Discussion about this post

User's avatar
Ken's avatar

Well, missing is the RAS issues being scaled, and that only the shell is Tier 4, the racks, the GPUs are running Tier 0 or worse. You can have all the power and cooling in the world, but when the O-rings in the quick disconnects break off and form fragments that clog the cold plate internal fins, leaks shut down the GPU blades. This is idle without power, but often leads to scrapped $300,000 blades. Allocation means you cannot just replace. The liquid metal TIM is pumping out and causing damage to other components. I can keep going, the worst part is, you go to a CE engineer at Microsoft, and they say "not interested.". You go to Meta, and they say "we don't care, we just replace the racks, up to $66 million without any approval needed". When I ask if the CIO or CISO cares about lost RSA data, they sort of wake up...slowly.

No posts

Ready for more?