Interfere

Interfere is an exploration of how a causal Transformer internally represents and reasons about the transition rules of a Game-of-Life-like system.
We train a deliberately overparameterized 25M-parameter Transformer on multiple variants of Conway’s Game of Life (GOL) and use mechanistic interpretability tools to probe what the model treats as first-class features.

The goal is not to solve GOL efficiently, but to study how a Transformer behaves when a simple, deterministic rule does exist—without giving it the right inductive biases.

(rough) idea

Train a Transformer to predict t+1 from (rules + t).
Encode both the update rule and the 2D grid state as tokens.
Probe attention patterns, linear probes, and PCA directions to understand:
- What structure emerges?
- What features are linearly accessible?
- Which properties the model prioritizes internally?

Data & training setup

Each sequence is laid out as:

, 18 rule bits, , H×W (t), , H×W (t+1)

markdown Copy code

Rules: 18 “physics bits” encoding different GOL-style update rules
State: Binary grid, flattened row-major
Positioning: Standard positional encoding + 2D RoPE on Q/K (added post-hoc)
Training: MLM-style masking after <SEP>
Scale: ~0.6B tokens, trained to near-perfect overfitting on GOL
Hardware: bf16 on 8× H200 GPUs

Model

8 layers, d_model=512
8 attention heads / layer
MLP dim = 2048
Pre-LayerNorm, GELU (SwiGLU planned)
Attention restricted to prefix when predicting

What we expected

Dead vs alive tokens should be linearly separable.
Local (3×3 neighborhood) attention should emerge.
Some layers should specialize in rule-prefix processing.
Linear probes + PCA should reveal which system properties are most salient.

What we observed

Local structure

Early layers (L0–L1) develop strong local attention.
~5/8 heads attend locally; others attend far away.
Striation patterns likely arise from the 2D RoPE setup (to be ablated).
By late layers (L7), locality largely disappears.

Linear probes

Probes work, but weaker than expected.
The model does not cleanly encode exact neighbor counts.
This aligns with GOL’s logic: thresholds matter more than exact values.
Later layers are not consistently more linearly accessible.

PCA

The top PCA direction in early layers cleanly separates alive vs dead cells.
Unsurprising, but confirms this is a first-class feature.
Other layers may encode more abstract structure (future work).

Why use a Transformer?

GOL is trivially solvable by a CNN—but that’s not the point.
Transformers are routinely used as general function approximators.
We want to understand:
- What structure a Transformer discovers when inductive biases are misaligned.
- How macroscopic behavior emerges from token-level computation.

This setup stays close to the Chinchilla frontier (within this toy domain) and asks:

What does a Transformer learn when a simple rule exists, but we don’t tell it how to look?

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets		assets
cfg		cfg
explorations		explorations
outputs/2025-08-29		outputs/2025-08-29
runs		runs
src		src
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
ddp_train.py		ddp_train.py
eval.log		eval.log
eval.py		eval.py
interp_run.py		interp_run.py
pyproject.toml		pyproject.toml
run.txt		run.txt
setup.cfg		setup.cfg
t) path=os.path.join(od,demo_attn.png) plt.savefig(path, dpi=120, bbox_inches=tight) print(saved		t) path=os.path.join(od,demo_attn.png) plt.savefig(path, dpi=120, bbox_inches=tight) print(saved
t) plt.savefig(os.path.join(od,demo_attn.png), dpi=120, bbox_inches=tight) print(saved		t) plt.savefig(os.path.join(od,demo_attn.png), dpi=120, bbox_inches=tight) print(saved
train.log		train.log
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interfere

(rough) idea

Data & training setup

What we expected

What we observed

Local structure

Linear probes

PCA

Why use a Transformer?

About

Uh oh!

Releases

Packages

Languages

plugyawn/interfere

Folders and files

Latest commit

History

Repository files navigation

Interfere

(rough) idea

Data & training setup

What we expected

What we observed

Local structure

Linear probes

PCA

Why use a Transformer?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages