Breaking news
If you plan Nvidia’s 120 kW NVL72 racks had been compute dense with 72 Blackwell accelerators, they have nothing on HPE Cray’s latest EX methods, which can pack more than thrice as many GPUs into a single cabinet.
Announced ahead of next week’s Large Computing conference in Atlanta, Cray’s EX154n platform will fortify as a lot as 224 Nvidia Blackwell GPUs and 8,064 Grace CPU cores per cabinet. That works out to legal over 10 petaFLOPS at FP64 for HPC applications or over 4.4 exaFLOPS of FP4 for sparse AI and machine learning workloads, where precision usually just isn’t really as sizable a deal.
One rack. 120kW of compute. Taking a closer behold at Nvidia’s DGX GB200 NVL72 beast
READ MORE
Specifically, each EX154n accelerator blade will feature a pair of 2.7 kW Grace Blackwell Superchips (GB200), each of which is supplied with two Blackwell GPUs and a single 72-core Arm CPU. These two Superchips will probably be interconnected by Nvidia’s NVL4 reference configuration.
At a rack level, the compute alone will eat upwards of 300 kW, so it goes without saying that, legal savor past EX methods, HPE’s Blackwell blades will probably be liquid cooled.
In fact, these methods are totally fanless apt all the way down to the all-unusual Slingshot 400 family of Ethernet NICs, cables, and switches. As the name suggests, Slingshot 400 represents a welcome upgrade over its predecessor, pushing bandwidth from 200 to 400 Gbps, bringing it according to fresh-gen Ethernet and InfiniBand networking.
HPE’s prior-gen Slingshot 200 interconnects have turn into a mainstay of large-scale supercomputing platforms and are at the heart of the Frontier, Aurora, and Lumi machines to name legal a handful.
Unfortunately, anyone wanting to secure their hands on Cray’s large-dense Blackwell methods and fast Slingshot 400 networking will have to wait a whereas. Neither are anticipated to ship till late in 2025.
If conventional CPU-based HPC is more your factor, Cray’s fifth-gen Epyc-based EX4252 Gen 2 compute blades are due out next spring and will pack as a lot as eight 192-core Turin-C processors for a total of 98,304 cores per cabinet.
Cray will also start up transport upgraded E2000 storage methods, which it claims will more than double the I/O performance over prior generations thanks to faster PCIe 5.0-based NVMe storage. HPE expects to start transport these storage arrays starting early 2025.
- The Register takes AMD’s Ryzen 9800X3D for a plug
- Dow swaps Intel for Nvidia leaving no index free from wild AI volatility
- Fujitsu, AMD lay groundwork to pair Monaka CPUs with Instinct GPUs
- xAI picked Ethernet over InfiniBand for its H100 Colossus training cluster
While HPE’s Cray EX Platforms promise greater density than a typical server or rack, they aren’t exactly the roughly methods that can be deployed for your average datacenter. So HPE is also rolling out a pair of unusual air-cooled ProLiant Compute servers, which make use of its endeavor-centered iLO lights-out management plan.
These methods will probably be fairly familiar to anyone who’s ever considered an Nvidia HGX platform with both XD680 and XD685 servers boasting fortify for eight accelerators of your alternative.
Surprisingly, we aren’t shrimp to legal Nvidia and AMD GPUs as you may demand. The XD680 actually comes standard with eight Intel Gaudi3 accelerators totaling 1 TB of HBM2e. As we reported in spring, Gaudi3 is rather aggressive with the sizzling gash of accelerators. Each is capable of churning out 1.8 petaFLOPS of dense BF16 performance, giving it an edge in compute-certain workloads over the H100, H200, and AMD’s MI300X.
Stepping as a lot as HPE’s XD685, you have the series of either eight Nvidia H200s with a blended 1.1 TB of HBM3e or the upcoming Blackwell GPUs – presumably B200 – which will have to peaceful enhance memory capacity to 1.5 TB. The former is due out in early 2025, whereas timing for the Blackwell-based methods remains rather vague.
If Nvidia just isn’t really your style, or you want more memory, HPE is also rolling out a model of the plan with AMD’s newly launched MI325X. That plan, announced alongside the accelerator in October, will boast as a lot as 2 TB of HBM3e memory on board and is space to ship within the first quarter of 2025. ®