Skip to content
Webbeon
  • Technology
    TechnologyOdysseyObject ClassOracle Class SiliconThe Stack
  • Research
    ResearchAI SafetyMedicineQuantumBiophysicsRoboticsSilicon
  • Safety
  • Posts
  • Company
    CompanyAboutVisionCareersPartner NetworksPhilanthropy
  • Contact
  • TechnologyOdysseyObject ClassOracle Class SiliconThe Stack
  • ResearchAI SafetyMedicineQuantumBiophysicsRoboticsSilicon
  • Safety
  • Posts
  • CompanyAboutVisionCareersPartner NetworksPhilanthropy
  • Contact
Webbeon

Built for what comes next.

Technology
  • Odyssey
  • Object Class
  • Oracle Class
  • The Stack
Research
  • AI Safety
  • Medicine
  • Quantum
  • Biophysics
  • Robotics
  • Silicon
Company
  • About
  • Vision
  • Careers
  • Partner Networks
  • Philanthropy
  • Contact
  • News
Legal
  • Privacy Policy
  • Terms of Service
  • Safety
Connect
  • hello@webbeon.com
  • research@webbeon.com
  • careers@webbeon.com
  • press@webbeon.com
Webbeon
© 2026 Webbeon Inc. All rights reserved.
Home/Glossary/Spatial Dataflow Architecture
Glossary

Spatial Dataflow Architecture

A chip architecture in which computation is organized as a mesh of processing tiles with local communication — data flows spatially through the array, minimizing off-chip memory accesses for matrix operations.

Spatial dataflow architecture is a computing paradigm in which hardware resources are organized as a 2D mesh of processing elements, each with local memory and communication links to its neighbors. Rather than all computation flowing through a central processor that accesses a shared memory hierarchy, data flows spatially across the array — intermediate results are passed directly between adjacent tiles without touching off-chip memory.

This architecture is particularly well-suited to the matrix operations that dominate neural network inference. A matrix multiplication between two large matrices can be mapped to the tile array such that partial products flow through the mesh, accumulating at output tiles, without any intermediate result needing to leave the chip.

Why spatial dataflow matters for AI

The dominant cost in neural network inference is moving data between compute and memory. In a conventional processor, every intermediate result in a computation must be written to and read from the memory hierarchy. In a spatial dataflow architecture, intermediate results are passed between compute tiles through direct links — no memory transactions required.

This dramatically improves utilization: compute units spend time computing rather than waiting for memory, and the memory bandwidth that does exist is used only for the final inputs and outputs of each layer, not for intermediate values.

Tile granularity trade-offs: Smaller tiles allow finer-grained parallelism and more data reuse, but require more interconnect. Larger tiles have higher compute density but less spatial flexibility. Optimal tile size depends on the matrix dimensions of the workloads being served.

Static vs. dynamic dataflow: Static dataflow schedules all computation at compile time; dynamic dataflow allows runtime routing based on actual data values. Static scheduling achieves higher efficiency for predictable workloads; dynamic scheduling handles variable-length and conditional computations better.

How Webbeon implements Spatial Dataflow

Oracle Class W1 uses a 512-tile spatial dataflow mesh, designed around the attention and feed-forward computations in transformer models:

  • Each tile contains dedicated matrix multiply units, local SRAM, and network-on-chip interfaces
  • Data flows between tiles through a high-bandwidth mesh interconnect
  • The 256 MB total distributed SRAM holds entire attention layers in flight without off-chip access
  • 1.6 TB/s inter-chip links enable multi-chip tensor parallelism for large models

Key facts

  • 512 tiles in the W1 mesh, each with dedicated compute and local storage
  • The spatial dataflow approach is why W1 achieves 12,000 tokens/second — compute utilization is high because memory stalls are minimized
  • Oracle Class co-design with Odyssey model architecture ensures matrix dimensions align with tile geometry, eliminating padding waste
  • Total distributed SRAM: 256 MB — designed to hold the hot weight tensors for typical inference workloads entirely on-chip
Related terms
custom ai inference chiphbm3e memorynear memory computingmemory wall problem
See also
technology/oracle classresearch/silicon