State Flow Machine: Non-Transformer Architecture Maintains 62% Accuracy on Long Sequences Where Transformers Drop to 2%

A developer has built State Flow Machine (SFM), a non-transformer architecture designed for tasks requiring state tracking across long sequences. The model runs on a single Huawei Ascend 910 ProA NPU and addresses transformers' limitations in simulating processes step-by-step when sequences exceed training lengths.
Architecture Details
Instead of attention heads, SFM uses a bank of explicit memory slots (small fixed-size vectors). At each token, a gating mechanism decides which slots to update and how. The model reads from slots, computes an update, and writes back, functioning like a tiny differentiable register file. This approach is related to DeltaNet, Linear Attention, and state-space models (Mamba, RWKV) but more explicit—slots are directly addressable and updated via learned gates rather than being an implicit recurrent state.
Benchmark Setup
The synthetic program state tracking benchmark involves sequences like x = 42; x += 17; x -= 8; x *= 2; ... where the model must predict the final value of x (integer 0–100, framed as 101-class classification).
- Training data: 10,000 programs with 10–27 operations, hard difficulty (all ops: add, subtract, multiply, integer divide, modulo, set), seed 42
- Validation: 1,000 programs, same distribution
- Evaluation: test at 1× (in-distribution), 2×, 4×, 8×, 16×, and 32× the training program length
Results
Exact Match Accuracy:
- 1× (10 ops): State Slots 99.9%, Transformer-Fair 100.0%, Transformer-Large 100.0%
- 2× (20 ops): State Slots 92.9%, Transformer-Fair 99.0%, Transformer-Large 99.5%
- 4× (40 ops): State Slots 62.0%, Transformer-Fair 1.9%, Transformer-Large 3.1%
- 8× (80 ops): State Slots 35.3%, Transformer-Fair 1.3%, Transformer-Large 1.0%
- 16× (160 ops): State Slots 5.1%, Transformer-Fair 0.9%, Transformer-Large 0.7%
- 32× (320 ops): State Slots 5.0%, Transformer-Fair 1.0%, Transformer-Large 0.8%
Generalization ratio (accuracy retention):
- State Slots: 4×/1× = 0.62×, 8×/1× = 0.35×
- Transformer-Fair: 4×/1× = 0.02×, 8×/1× = 0.01×
- Transformer-Large: 4×/1× = 0.03×, 8×/1× = 0.01×
Mean Absolute Error at extrapolation lengths (scale 0–100):
- 4×: State Slots 14.03, Transformer-Fair 40.33, Transformer-Large 36.76
- 8×: State Slots 26.73, Transformer-Fair 41.71, Transformer-Large 41.19
The transformers are essentially guessing randomly at 4× and beyond (MAE ~40 on a 0–100 scale is close to the expected error of a uniform random guess), while State Slots continues making meaningful predictions.
Model Parameters
State Slots uses 961K parameters, compared to Transformer-Fair (443K) and Transformer-Large (2.2M).
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Pro User Documents Chronic Interface and Workflow Issues
A long-term Claude Pro subscriber details five persistent problems: file destruction during corrections, lack of versioning, amnesia after context compaction, inconsistent decision-making, and ignored user preferences. The user reports these issues occur despite explicit instructions in Claude's preferences section.
Qwen3 27B Outperforms Gemma 4 26B in Real-World Tool-Calling for Local AI Video Pipeline
A local AI video pipeline experiment shows Qwen3 27B handling tool-calling cleanly while Gemma 4 26B got stuck in loops. Also covers Said Image Turbo for local image generation and OpenCode orchestration hitting 174K context.

Claude AI credited in macOS Tahoe 26.5 update release notes
Apple’s macOS Tahoe 26.5 release notes credit Claude AI alongside engineering teams, marking the first known case of an AI being formally acknowledged in Apple’s changelog.

Claude-Code v2.1.51: Security fixes, performance improvements, and new remote control feature
Claude-Code v2.1.51 adds a remote-control subcommand for external builds, fixes two security vulnerabilities in hooks, improves BashTool performance, and reduces context usage by persisting large tool results to disk at 50K characters.