ANE Optimization Through Phone-Steered AI Experiments Shows Kernel Fusion Benefits

A developer conducted 55 optimization experiments on the autoresearch-ane fork, primarily steering the process from their phone on a Saturday. The work focused on Apple Neural Engine (ANE) performance improvements through kernel optimization and architectural changes.
Performance Improvements
The experiments yielded measurable gains across several metrics:
- Validation loss decreased from 3.75 (a throwback from optimized 3.2) to 2.49
- Step time improved from 176ms to 96ms
- ANE utilization increased from 3.6% to 6.5%
Key Technical Change
The most significant improvement came from kernel fusion: "Fusing 3 ANE kernels into 1 mega-kernel eliminated 12 IOSurface round-trips per step - that single change beat every hyperparameter tweak combined." This architectural optimization proved more impactful than parameter adjustments.
Workflow Details
The developer used an unconventional approach:
- Ran experiments remotely, steering from their phone in brief moments
- Used Claude for brainstorming and pulling insights from public sources listed in the repository README
- Approached the problem with "short attention and minimal token input" - speculating on directions rather than dictating precise steps
- Completed 55 experiments with "several cases of actual typing"
- Worked in non-destructive mode only due to permission constraints ("no rm -rf /* and such")
Main Learning
Beyond the technical improvements, the developer noted: "Main learning isn't the improvement itself. It's that short attention and minimal token input - brainstorming direction, not dictating steps - can produce real measurable gains on a hard systems problem."
The work was conducted on the developer's laptop, and they mention an acceptance rate discrepancy: "55vs45 not quite mathing" in reference to experiment outcomes.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClawDreams: A Dream Simulator Extension for OpenClaw Agents
OpenClawDreams is an extension that adds a background reflection process and nightly dream cycle to OpenClaw agents. It captures encrypted conversation summaries to a local SQLite database, processes them during background cycles, and generates consolidated insights that get pushed into the agent's persistent memory.

Shipwright: An Open-Source Project Management Tool Built on Claude Code
Shipwright is an open-source project management tool that runs on Claude Code with 44 skills, 7 specialized agents, and 16 workflows. It includes binary quality gates and recovery playbooks, and was used to audit credential registries and evaluate automation platforms before engineering work began.

Claude Code Hook Monitors WIP Accumulation in AI Coding Workflows
A developer built a UserPromptSubmit hook for Claude Code that surfaces work-in-progress accumulation across four queues: uncommitted changes over 200 lines, three or more unpushed commits, pushed commits without changeset files, and release PRs open longer than 24 hours.

Why Your Claude Code UI Output Drifts and How a Structured Spec Fixes It
A developer explains that inconsistent UI output from Claude Code isn't a prompt problem — it's a format problem. Providing exact hex codes, font weights, spacing, screen states, and transitions eliminates drift. They also open-sourced an MCP server that converts screen recordings into structured specs.