Glossary
HBM3E Memory
High Bandwidth Memory 3E — the current-generation stacked DRAM technology offering the highest available memory bandwidth for AI accelerators, achieved by mounting DRAM dies directly on the processor package.
HBM3E (High Bandwidth Memory 3E) is the current generation of high-bandwidth memory technology, designed for AI accelerators, GPUs, and other workloads where memory bandwidth is the primary performance constraint. HBM stacks multiple DRAM dies vertically and connects them to the processor using through-silicon vias (TSVs) and a silicon interposer, dramatically reducing both latency and the physical distance data must travel.
The "3E" designation reflects an extension of the HBM3 standard with improved bandwidth per pin and higher per-stack capacity. Individual HBM3E stacks can provide over 1 TB/s of bandwidth — compared to approximately 100 GB/s for conventional DRAM modules.
Why stacked memory matters for AI
The memory wall problem is acute in large model inference: billions of model parameters must be transferred from memory to compute units during every forward pass. The speed of this transfer, not the speed of the arithmetic itself, determines inference latency and throughput.
HBM addresses this by:
-
3D stacking: Multiple DRAM dies stacked vertically, connected by thousands of through-silicon vias. More dies = more bandwidth channels operating in parallel.
-
Short physical distance: HBM dies are mounted on the same interposer as the processor. Short traces mean lower latency, lower power, and higher bandwidth than routing signals off the package to conventional DRAM.
-
Wide interface: HBM uses a very wide bus (1024+ bits) operating at moderate clock speeds. This achieves high bandwidth with lower per-bit signal integrity challenges than alternative approaches.
-
Capacity scaling: HBM3E supports up to 24 GB per stack in current configurations; multi-stack configurations can reach 96 GB or more on a single package.
How Webbeon uses HBM3E
Oracle Class W1 incorporates 12 stacks of HBM3E, providing:
- 96 GB total capacity — sufficient to hold large model shards without compression
- 4.8 TB/s aggregate bandwidth — the primary determinant of inference throughput
- Optimized interconnect topology connecting HBM stacks to the spatial dataflow array
The 12-stack configuration is among the most aggressive HBM configurations in production silicon, enabled by Oracle Class's physical design and thermal management.
Key facts
- 4.8 TB/s total bandwidth on W1 — enabling 12,000 tokens/second for large models
- HBM3E consumes significantly less energy per bit than conventional DRAM (approximately 30-40% less)
- Energy efficiency gain from HBM3E contributes to W1's 40% overall energy improvement vs. commodity hardware
- HBM3E is manufactured in limited quantities; Oracle Class silicon design is optimized for maximum utilization of the available bandwidth