HBM4 Enters Production for NVIDIA's Vera Rubin, GENZ TECH

All three major memory makers, SK hynix, Samsung, and Micron, are now in mass production of HBM4, the next-generation stacked memory that feeds NVIDIA's Vera Rubin AI GPUs, with first systems scheduled to ship in Q3 2026. This is the quiet half of the AI boom: the accelerator gets the headlines, but the memory next to it decides how fast the chip can actually think, and HBM4 is the biggest step-change in that memory in years.

On June 5, 2026 in Seoul, NVIDIA's Jensen Huang confirmed Samsung, SK hynix, and Micron are all qualified and in production of HBM4 for Vera Rubin.
HBM4 doubles the memory interface from 1,024 to 2,048 bits and independent channels from 16 to 32, pushing past 2TB/s of bandwidth per stack.
SK hynix is expected to hold the largest share of Vera Rubin HBM4 volume, with analyst estimates near 54 percent of the 2026 market.
Supply is the story after speed: SK hynix says it has already sold out its entire 2026 HBM output, and expects the shortage to worsen in the second half.

Fig 1 HBM4's headline change is width, not clock: doubling the interface to 2,048 bits and channels to 32 is how it clears 2TB/s per stack without simply cranking the pins faster.

What actually changed with HBM4?

The architecture, not just the label. Previous generations mostly turned up the clock, but HBM4 widens the road. The memory interface jumps from 1,024 to 2,048 bits per stack, and the number of independent data channels doubles from 16 to 32, so more data moves in parallel at any given speed. At the JEDEC baseline that yields at least 2 terabytes per second per stack, up from roughly 1.2TB/s for HBM3E. Vendors are pushing further: SK hynix showed 48GB 16-high modules running at 11.7Gbps per pin, above the 8Gb/s JEDEC spec, hitting 2TB/s per stack; Samsung, which reached mass production first on February 12, 2026, claims 3.3TB/s using a 4-nanometer logic base die; and Micron is shipping HBM4 36GB 12-high parts above 11Gb/s with bandwidth over 2.8TB/s. For a GPU whose job is moving enormous tensors, that width translates almost directly into tokens per second.

Fig 2 · benchmark Terabytes per second, per stack. Samsung claims the fastest single spec, but SK hynix (in orange) is the volume leader NVIDIA leans on most, which is why allocation matters as much as peak numbers.

Why is the supplier split the real news?

Because in a shortage, who can ship decides who wins. NVIDIA qualifying all three vendors is a hedge, but the volume is lopsided. Analysts estimate SK hynix holds roughly 54 to 70 percent of Vera Rubin HBM4 supply, Samsung 25 to 30 percent, and Micron the remainder, with Counterpoint pegging 2026 market share near 54 / 28 / 18. SK hynix outsources its logic base die to TSMC through a "One-Team" alliance, syncing its memory to the same process making NVIDIA's GPUs, which reduces integration risk. The uncomfortable part sits underneath all of it: SK hynix's CFO said the company has already sold its entire 2026 HBM output, and warned the shortage will deepen in the second half. When the memory is sold out a year ahead, GPU availability stops being about wafers and starts being about who reserved stacks.

Supplier	SK hynix	Samsung	Micron
Est. 2026 share	~54%	~28%	~18%
Peak bandwidth/stack	2.0 TB/s (11.7Gbps)	3.3 TB/s	2.8 TB/s
Mass production	In production	First, Feb 12 2026	Volume, Q1 2026
Base die	TSMC (One-Team)	In-house 4nm	Own process

Where does this land in a Vera Rubin GPU?

Directly on the accelerator's ceiling. The first Rubin GPU carries 288GB of HBM4 across eight stacks for roughly 22TB/s of aggregate bandwidth, close to triple Blackwell, and NVIDIA quotes up to 50 PFLOPS of NVFP4 inference and 35 PFLOPS of training. At the full platform level Vera Rubin packs 16 stacks for 576GB, ahead of AMD's MI450 at 432GB. Vera Rubin entered full production after the June 1 GTC Taipei keynote, with first customer shipments this summer and AWS, Google Cloud, Microsoft Azure, and Oracle among the earliest deployments. NVIDIA says the platform delivers 10x agent throughput at scale versus Grace Blackwell, and most of that jump is memory, not just cores.

What to watch · 2026 to 2027

Allocation, not announcements. With HBM sold out, cloud GPU availability tracks who locked stacks early, not who taped out fastest.
Yield at 16-high. Stacking taller raises bandwidth and defect risk together. Yields decide whether the top-end parts stay scarce.
Samsung's comeback. Being first to mass production and fastest on paper is Samsung's bid to claw share back from SK hynix.
The shortage curve. If the second-half squeeze is as bad as SK hynix warns, HBM becomes the pacing item for the entire AI buildout.

Our take

The GPU wars are increasingly memory wars wearing a GPU costume. HBM4's doubled interface is exactly the kind of unglamorous, structural upgrade that quietly determines how large a model you can serve and how cheaply, and NVIDIA lining up three suppliers is less about choice than about survival in a market where the memory sells out before the silicon ships. SK hynix is the name to track, not because its peak spec is highest, it is not, but because it holds the volume and the TSMC pipeline that Vera Rubin actually depends on. The bottleneck for 2026 is not compute and it is not even fabrication. It is 2,048 stacked bits per pin, sold out a year in advance, and everyone building frontier AI is now competing for the same short supply.

Primary sources

OfficialMicron: HBM4 in high-volume production , vendor confirmation for Vera Rubin
ReportingSK hynix 48GB HBM4 at 11.7Gbps , per-pin and per-stack specs
ReferenceTrendForce: Rubin HBM4 suppliers , allocation and timeline
ReferenceTom's Hardware: HBM4 spec push , on NVIDIA raising the bar

Original analysis by GenZTech. Specifications per vendor disclosures, current as of July 2026. Source.

HBM4 Enters Production for NVIDIA's Vera Rubin

What actually changed with HBM4?

Why is the supplier split the real news?

Where does this land in a Vera Rubin GPU?

Our take

$ quick-answers

$ related --topic=hardware