NVIDIA announces xAI's Colossus supercomputer, based on technology developed in Israeli R&D center
NVIDIA announces xAI's Colossus supercomputer, based on technology developed in Israeli R&D center
Elon Musk has dubbed it “the most powerful AI training system in the world.”
xAI’s new Colossus supercomputer cluster, built by xAI with NVIDIA and comprising 100,000 NVIDIA Hopper Tensor Core GPUs used to train the Grok family of large language models, was built in part at the company’s Israel-based R&D center. The scale was achieved by using the NVIDIA Spectrum-X Ethernet networking platform designed to deliver superior performance to multi-tenant, hyperscale AI factories using standards-based Ethernet, for its Remote Direct Memory Access (RDMA) network.
xAI is in the process of doubling the size of Colossus to a combined total of 200,000 NVIDIA Hopper GPUs, with chatbots offered as a feature for X Premium subscribers. The supercomputer was built in only 122 days, as opposed to the typical timeframe for systems this size which could take months or years.
“AI is becoming mission-critical and requires increased performance, security, scalability, and cost-efficiency,” said Gilad Shainer, senior vice president of networking at NVIDIA. “The NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis, and execution of AI workloads, and in turn accelerates the development, deployment, and time to market of AI solutions.”
xAI owner Elon Musk has dubbed Colossus “the most powerful AI training system in the world” and called the work achieved by NVIDIA and its supplier/partners “excellent”.
While training the large Grok model, Colossus achieves unprecedented network performance. Across three tiers of the network fabric, the system has experienced zero application latency degradation or packet loss due to flow collisions. It has maintained 95% data throughput enabled by Spectrum-X congestion control.
“xAI has built the world’s largest, most powerful supercomputer,” said an xAI spokesperson. “NVIDIA’s Hopper GPUs and Spectrum-X allow us to push the boundaries of training AI models at a massive scale, creating a super-accelerated and optimized AI factory based on the Ethernet standard.”