In context: The primary iteration of high-bandwidth reminiscence (HBM) was considerably restricted, solely permitting speeds of as much as 128 GB/s per stack. Nonetheless, there was one main caveat: graphics playing cards that used HBM1 had a cap of 4 GB of reminiscence on account of bodily limitations.
Over time, HBM producers similar to SK Hynix and Samsung improved upon HBM’s shortcomings. The primary replace, HBM2, doubled potential speeds to 256 GB/s per stack and the utmost capacities to eight GB. In 2018, HBM2 obtained a minor replace (HBM2E), which additional elevated capability limits to 24 GB and introduced one other pace enhance, finally hitting 460 GB/s per chip at its peak.
When HBM3 rolled out, the pace doubled once more, permitting for a most of 819 GB/s per stack. Much more spectacular, capacities elevated practically threefold, from 24 GB to 64 GB. Like HBM2E, HBM3 noticed one other mid-life improve, HBM3E, which elevated the theoretical speeds as much as 1.2 TB/s per stack.
Alongside the best way, HBM slowly obtained changed in consumer-grade graphics playing cards by extra reasonably priced GDDR reminiscence. Excessive-bandwidth reminiscence grew to become a regular in knowledge facilities, with producers of workplace-focused playing cards opting to make use of the a lot quicker interface.
All through the assorted updates and enhancements, HBM retained the identical 1,024-bit (per stack) interface in all its iterations. In response to a report out of Korea, this will likely lastly change when HBM4 reaches the market. If the claims are legitimate, the reminiscence interface will double from 1,024 bits to 2,048 bits.
Leaping to a 2,048 interface might theoretically double switch speeds once more. Sadly, reminiscence producers may be unable to take care of the identical switch charges with HBM4 in comparison with HBM3E. Nonetheless, a better reminiscence interface would enable producers to make use of fewer stacks in a card.
For example, Nvidia’s flagship AI card, the H100, at present makes use of six 1,024-bit recognized good stacked dies, which permits for a 6,144-bit interface. If the reminiscence interface doubled to 2,048-bit, Nvidia might theoretically halve the variety of dies to 3 and obtain the identical efficiency. In fact, it’s unclear which path producers will take, as HBM4 is sort of definitely years away from being in manufacturing.
At the moment, each SK Hynix and Samsung consider they may be capable of obtain a “100% yield” with HBM4 once they start to fabricate it. Solely time will inform if the reviews maintain water, so take the information with a grain of salt.