How to Map LLM Architecture with a Physical Thinking Tool

How to Map LLM Architecture with a Physical Thinking Tool

The transformer architecture is the foundation of every large language model in production today, GPT, Gemini, Claude, all of them. It was introduced in a 2017 Google paper and has since become the dominant framework for sequence modeling. The architecture has two sides: an encoder that processes the input and a decoder that generates the output, connected by a cross-attention mechanism that lets each side inform the other. Understanding how those components relate is less a matter of memorizing the diagram and more a matter of being able to reason about what each piece does and why the connections between them matter.

Watch the transformer architecture assemble block by block on a wall. The encoder stack is on the left, the decoder stack is on the right, and the cross-attention bridge connects them, until the full architecture is visible and holdable.

Switch-Its makes system architecture something you build

Switch-Its magnetic dry erase blocks let you write each component on its own block, input embedding, positional encoding, multi-head attention, feed forward, add and norm. Place them on a magnetic surface in the arrangement that makes their relationships visible. Moving a block is a claim about how the system works, which makes the architecture an argument you construct rather than a diagram you copy.

Switch-Its blocks being placed to build the encoder and decoder stacks of the transformer architecture side by side on a magnetic surface

Build the encoder and decoder stacks

Each stack starts from the bottom: input and output embeddings, positional encoding, then multi-head attention layers building upward. Placing the blocks in parallel makes the symmetry between the two sides immediately visible and the differences between them equally obvious.

The cross-attention bridge block being placed between the encoder and decoder stacks of the transformer architecture diagram

Place the cross-attention bridge

The block connecting encoder to decoder is the most important piece in the architecture. It's the mechanism that lets the decoder attend to the full encoded input while generating output. Placing it physically between the two stacks makes the relationship concrete: this is where the two sides of the model talk to each other.

Completed transformer architecture diagram built from Switch-Its blocks showing the full encoder-decoder structure on a magnetic wall surface

The full architecture on the wall

With all components placed, the transformer is a physical object you can point at, explain, and reorganize. Any block can be pulled off to ask what happens if that component changes, which turns a static diagram into an active thinking tool for anyone reasoning about how modern AI systems are built.

Complex technical architectures are easier to reason about when they're physical, when you can hold a component, place it in relation to the others, and move it when your understanding shifts. That's the same principle behind visible thinking at work, and it connects directly to the broader case for putting ideas on the wall developed in Put the Plan on the Wall.

More thinking activities

Back to blog

AI Disclosure: This blog was drafted with AI assistance but fully reviewed, edited, and approved by a human author who takes full responsibility for its accuracy.