Skip to content
Webbeon
  • Technology
    TechnologyOdysseyObject ClassOracle Class SiliconThe Stack
  • Research
    ResearchAI SafetyMedicineQuantumBiophysicsRoboticsSilicon
  • Safety
  • Posts
  • Company
    CompanyAboutVisionCareersPartner NetworksPhilanthropy
  • Contact
  • TechnologyOdysseyObject ClassOracle Class SiliconThe Stack
  • ResearchAI SafetyMedicineQuantumBiophysicsRoboticsSilicon
  • Safety
  • Posts
  • CompanyAboutVisionCareersPartner NetworksPhilanthropy
  • Contact
Webbeon

Built for what comes next.

Technology
  • Odyssey
  • Object Class
  • Oracle Class
  • The Stack
Research
  • AI Safety
  • Medicine
  • Quantum
  • Biophysics
  • Robotics
  • Silicon
Company
  • About
  • Vision
  • Careers
  • Partner Networks
  • Philanthropy
  • Contact
  • News
Legal
  • Privacy Policy
  • Terms of Service
  • Safety
Connect
  • hello@webbeon.com
  • research@webbeon.com
  • careers@webbeon.com
  • press@webbeon.com
Webbeon
© 2026 Webbeon Inc. All rights reserved.
Home/Glossary/Near-Memory Computing
Glossary

Near-Memory Computing

A hardware paradigm that places compute logic physically close to or within memory arrays — reducing the energy and latency cost of moving data between memory and processors.

Near-memory computing is a hardware paradigm where compute logic is placed physically adjacent to or within memory arrays. The goal is to reduce — or eliminate — the cost of moving data between memory and compute, which is the dominant energy expenditure and performance bottleneck in memory-intensive workloads like neural network inference.

The spectrum of near-memory approaches ranges from processors placed near conventional DRAM packages, to compute logic integrated within HBM stacks, to Processing-in-Memory (PIM) where simple arithmetic operations execute inside the DRAM array itself.

The energy argument

Moving a 32-bit value from off-chip DRAM to a processor register consumes approximately 200 picojoules. Performing a multiply-accumulate on that value consumes approximately 1-2 picojoules. The ratio is striking: in a memory-bound workload, data movement consumes 100x more energy than computation.

Near-memory computing changes this ratio by shortening the path data must travel. At the extreme of processing-in-memory, data moves across micrometers within the memory array rather than millimeters or centimeters between packages. Even a 10x reduction in data movement distance translates to substantial energy savings at scale.

Near-memory in the AI context

Large model inference has specific near-memory requirements:

  • Weight loading: model weights must be loaded from memory for every forward pass — moving them closer to compute is the primary optimization target
  • Activation streaming: attention key-value caches grow with context length and must be accessed repeatedly
  • Precision variability: near-memory units can apply quantization or dequantization at the memory boundary, reducing the bandwidth needed between memory and compute

How Webbeon approaches Near-Memory Computing

Oracle Class W1 incorporates near-memory design principles throughout:

  • 256 MB distributed SRAM within the compute tile array — weights in SRAM require no off-chip access
  • HBM3E controller logic colocated with memory stack interfaces to minimize transfer overhead
  • Pipeline stages designed to consume data as it arrives from memory, rather than buffering and then processing

Future Oracle Class generations target tighter near-memory integration, including compute logic within HBM stacks for specific operations.

Key facts

  • SRAM access costs approximately 5 pJ/bit; DRAM access costs approximately 25 pJ/bit; off-chip DRAM access costs approximately 200+ pJ/bit
  • Near-memory computing is complementary to bandwidth improvements like HBM3E — both reduce the effective cost of memory access
  • Processing-in-memory for AI inference is an active research area; commercial deployment at scale remains limited
  • Webbeon's 40% energy reduction vs. commodity hardware reflects in part the near-memory design principles throughout W1
Related terms
memory wall problemhbm3e memorycustom ai inference chipspatial dataflow architecture
See also
technology/oracle classresearch/silicon