NVIDIA stacks LPUs on Feynman chips, bets on inference dominance

Dec 28, 2025

NVIDIA's next Feynman chips might stack LPU units using advanced bonding tech. An expert theory suggests the company could use TSMC's hybrid bonding, similar to AMD's 3D V-Cache, to attach separate LPU dies packed with SRAM onto a main A16 process compute die. This move follows NVIDIA's IP deal with Groq and aims to dominate the AI inference market by combining GPU flexibility with deterministic LPU throughput.

The proposed design would place large SRAM banks on dedicated LPU dies bonded to the Feynman die containing tensor cores and control logic. This approach supposedly saves expensive silicon area since SRAM does not scale well on advanced nodes. TSMC's bonding tech would provide a wide, low-power connection, while backside power delivery on the A16 node could free the front for low-latency vertical links to the stacked memory.

Major challenges include managing heat from stacked high-density components and resolving execution conflicts. LPUs use a fixed order of operations, which clashes with the flexible nature of CUDA kernels designed for hardware abstraction. NVIDIA would need to engineer a deeply optimized LPU-GPU environment, handling explicit memory placement against CUDA's abstractions. Overcoming these hardware and software hurdles is the steep price for inference leadership.

Click to expand...

NVIDIA stacks LPUs on Feynman chips, bets on inference dominance

Attachments

Similar threads

Latest media

Trending content

Sponsored

Latest posts

Featured content

Misc

NALA grabs Nigeria IMTO license for cross-border payments

Zambia rolls out SmartCare Pro to 2,000 health facilities

Showmax Originals move to DStv Stream before April shutdown

Côte d’Ivoire hikes digital budget by 37 percent

Vodacom Lesotho drops $40 million for network upgrade