Apfel: Free CLI Tool to Access Apple's On-Device LLM on macOS

What Apfel Does
Apfel is a free tool that gives you direct access to the LLM Apple ships with macOS 26 (Tahoe) on Apple Silicon Macs. Apple normally restricts this model to Siri and system features, but Apfel exposes it through three interfaces: a UNIX command-line tool, an OpenAI-compatible HTTP server, and an interactive chat.
Technical Details
The tool is built in Swift 6.3 and wraps Apple's FoundationModels framework, specifically the LanguageModelSession API. All inference runs on the Neural Engine and GPU - no network calls, no cloud, and nothing leaves your machine.
Key specifications from the source:
- Version: v0.6.13
- Requirements: macOS 26+ (Tahoe), Apple Silicon, Apple Intelligence enabled
- Context window: 4,096 tokens (input and output combined)
- License: MIT
- Installation:
brew install Arthur-Ficial/tap/apfel
Three Usage Modes
1. CLI Tool
Pipe-friendly UNIX tool with stdin/stdout support, JSON output, file attachments, and proper exit codes:
$ apfel "What is the capital of Austria?"
The capital of Austria is Vienna.
$ apfel -o json "Translate to German: hello" | jq .content
"Hallo"
2. OpenAI-Compatible Server
Drop-in replacement at localhost:11434 that works with any OpenAI SDK:
$ apfel --serve
Server running on http://127.0.0.1:11434
any OpenAI client works
$ curl localhost:11434/v1/chat/completions
Supports streaming (SSE), tool calling, CORS, response formats, temperature, max_tokens, and seed parameters.
3. Interactive Chat
Multi-turn conversations with automatic context management and five trimming strategies:
$ apfel --chat -s "You are a coding assistant"
Chat started. Type /quit to exit.
> How do I reverse a list in Python?What Apfel Adds Over Apple's Raw API
- Proper exit codes for shell scripting
- JSON output format
- File attachment support
- Five context trimming strategies for the 4,096-token window
- Real token counting via the SDK
- Conversion of OpenAI tool schemas to Apple's native Transcript.ToolDefinition format
Included Power Tools
The demo/ folder includes several shell scripts:
cmd: Natural language to shell command conversiononeliner: Generates pipe chains from plain Englishmac-narrator: Narrates system activity like a nature documentaryexplain: Explains commands, error messages, or code snippetswtd: Instant project orientation for any codebasegitsum: Summarizes recent git commits
Who This Is For
Developers who want to experiment with Apple's on-device LLM without writing Swift applications or paying for cloud API calls.
📖 Read the full source: HN AI Agents
👀 See Also

Head-to-head code review experiment compares three AI tools on same codebase
A video experiment tests Codex, Claude Code, and Claude Code with Sextant on identical code review tasks, with Codex verifying findings and judging which report is more valuable. The focus is on how workflow and structure affect what AI notices and prioritizes.

Bespoke AI v0.8.1: VS Code Autocomplete Extension for Code and Text
Bespoke AI v0.8.1 is a VS Code extension providing autocomplete for both code and text, leveraging Claude Code subscriptions via Anthropic's Agent SDK to avoid API charges while supporting multiple backends including Ollama.
Claude Prototypes Real Estate Analysis App in 3 Hours Using Live Zillow Data via clawhub
A developer used Claude with the zillow-full clawhub tool to build a rental cash flow analysis app — pulling live Zillow API data, prototyping the UI around real JSON responses, and delivering a working prototype in one afternoon.

W2A — an open protocol for agent sensors: giving local agents real-time perception
W2A (World2Agent) is an open protocol standardizing the perception layer for AI agents — self-hostable, TS SDK, Apache 2.0. It lets agents receive real-time signals from sensors without one-off scripts.