Spatial Dataflow Architecture — Webbeon AI Glossary

Spatial dataflow architecture is a computing paradigm in which hardware resources are organized as a 2D mesh of processing elements, each with local memory and communication links to its neighbors. Rather than all computation flowing through a central processor that accesses a shared memory hierarchy, data flows spatially across the array — intermediate results are passed directly between adjacent tiles without touching off-chip memory.

This architecture is particularly well-suited to the matrix operations that dominate neural network inference. A matrix multiplication between two large matrices can be mapped to the tile array such that partial products flow through the mesh, accumulating at output tiles, without any intermediate result needing to leave the chip.

Why spatial dataflow matters for AI

The dominant cost in neural network inference is moving data between compute and memory. In a conventional processor, every intermediate result in a computation must be written to and read from the memory hierarchy. In a spatial dataflow architecture, intermediate results are passed between compute tiles through direct links — no memory transactions required.

This dramatically improves utilization: compute units spend time computing rather than waiting for memory, and the memory bandwidth that does exist is used only for the final inputs and outputs of each layer, not for intermediate values.

Tile granularity trade-offs: Smaller tiles allow finer-grained parallelism and more data reuse, but require more interconnect. Larger tiles have higher compute density but less spatial flexibility. Optimal tile size depends on the matrix dimensions of the workloads being served.

Static vs. dynamic dataflow: Static dataflow schedules all computation at compile time; dynamic dataflow allows runtime routing based on actual data values. Static scheduling achieves higher efficiency for predictable workloads; dynamic scheduling handles variable-length and conditional computations better.

How Webbeon implements Spatial Dataflow

Oracle Class W1 uses a 512-tile spatial dataflow mesh, designed around the attention and feed-forward computations in transformer models:

Each tile contains dedicated matrix multiply units, local SRAM, and network-on-chip interfaces
Data flows between tiles through a high-bandwidth mesh interconnect
The 256 MB total distributed SRAM holds entire attention layers in flight without off-chip access
1.6 TB/s inter-chip links enable multi-chip tensor parallelism for large models

Key facts

512 tiles in the W1 mesh, each with dedicated compute and local storage
The spatial dataflow approach is why W1 achieves 12,000 tokens/second — compute utilization is high because memory stalls are minimized
Oracle Class co-design with Odyssey model architecture ensures matrix dimensions align with tile geometry, eliminating padding waste
Total distributed SRAM: 256 MB — designed to hold the hot weight tensors for typical inference workloads entirely on-chip