Huawei Unveils Supernode 384 Architecture, Challenging Nvidia’s Dominance in AI Processing

Huawei has introduced its breakthrough Supernode 384 architecture, a significant move in the ongoing global processor competition. The innovation, revealed at the Kunpeng Ascend Developer Conference in Shenzhen is poised to directly challenge Nvidia's market dominance in AI computing, especially as Huawei navigates the challenges posed by US-led trade restrictions.
Architectural Innovation Driven by Necessity
According to AI News, Zhang Dixuan, president of Huawei’s Ascend computing business, discussed how growing demands for parallel processing in AI workloads have exposed critical bottlenecks in traditional server architectures.
To address this, the Supernode 384 moves away from conventional Von Neumann principles, instead embracing a peer-to-peer architecture. This shift is particularly advantageous for Mixture-of-Experts models, which utilize multiple specialized sub-networks for solving complex computational challenges.
Impressive Performance Metrics
Huawei’s CloudMatrix 384 system, which integrates 384 Ascend AI processors spread across 12 computing cabinets and four bus cabinets, offers an astounding 300 petaflops of computational power, complemented by 48 terabytes of high-bandwidth memory.
Real-world benchmarks highlight the system’s superiority: Dense AI models such as Meta’s LLaMA 3 achieved 132 tokens per second per card, offering 2.5 times the performance of traditional cluster architectures.
In communications-intensive applications, such as models from Alibaba’s Qwen and DeepSeek families, the system demonstrated 600 to 750 tokens per second per card, underscoring its optimization for next-gen AI workloads.
The improvements stem from the fundamental redesign of Huawei’s infrastructure. By replacing conventional Ethernet interconnects with high-speed bus connections, Huawei increased bandwidth by 15 times and reduced single-hop latency from 2 microseconds to just 200 nanoseconds, a tenfold enhancement.
Also read: Huawei’s CloudMatrix 384 Supernode Could Disrupt Global AI Chip Landscape
Geopolitical Forces Shaping Technical Innovation
Huawei's development of the Supernode 384 is heavily influenced by the broader US-China technological rivalry. Ongoing American sanctions have limited Huawei’s access to advanced semiconductor technologies, prompting the company to push the boundaries of innovation within these constraints.
Industry analysis suggests that while Huawei’s Ascend 910C AI processor may be a generation behind Nvidia’s and AMD’s chips, its scale-up solution provides a competitive edge in system-level optimization, outpacing the current market offerings in terms of architecture.
Operationalizing the Supernode 384
Huawei’s Supernode 384, now deployed in several Chinese data centers, showcases scalable AI infrastructure capable of supporting tens of thousands of processors. Positioned as a rival to Nvidia, its global impact depends on developer adoption and proven performance, amid growing tech ecosystem fragmentation.