Chapper: Native iOS Client for LM Studio, Ollama, and OpenAI-Compatible Local Models

Chapper is a native SwiftUI iOS client for connecting to local AI models running on LM Studio, Ollama, and any OpenAI-compatible server. The app runs entirely on-device with no cloud requirements, web views, or mandatory accounts.
Core Features
- Real-time token streaming with live inference speed display
- Full sampling controls: temperature, top-p, top-k, min-p, TFS-Z, repeat/presence/frequency penalty
- Structured output/JSON schema mode
- Markdown rendering with syntax-highlighted code blocks
Reasoning Model Support
- Collapsible thought process panel inline above each response
- Works with Qwen3, DeepSeek-R1, and any model using <think> tags
- Custom <think> tag parser for reasoning model output
Model Management
- In-app model management: browse, load, configure context length
- Flash attention support
- GPU KV-cache offload
Conversation Features
- Personas with persistent system prompts per chat
- Full-text search across all conversations + pinned chats
- Memory system that injects long-term context automatically
- Scratchpad for working notes while chatting
Output Options
- Export in 7 formats: PDF, HTML, Markdown, JSON, CSV, XML, TXT
- TTS in three modes: native iOS voices, local on-device Kokoro model (experimental), or custom TTS server
- Background playback support
Technical Implementation
- Native async streaming over SSE
- MCP tool integration for web search, file access, URL fetching
- iCloud sync (optional)
- On-device analytics dashboard
- 12 language support
- Custom haptics with toggle option
Pricing & Availability
Free + Pro model with one-time purchase, no subscription. Core chat is free. Pro unlocks advanced sampling, unlimited history, all export formats, custom icons, and unlimited personas. Works on iPhone and iPad.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code: How to Connect Your AI-Built Frontend to a Real Backend
Claude Code builds polished frontends but often uses hardcoded data. Here are four ways to connect it to real backends: raw APIs, SDKs, CLIs, and MCP.

Single-page chatbot interface for locally running Gemma 4 26B A4B
A developer built a single HTML page chatbot that connects to Gemma 4 26B A4B running locally with 32K context window at 50-65 tokens/second, sharded between a 7900 XT and 3060 Ti GPU. The interface includes full streaming, Markdown rendering, and parameter controls.

cc+ Desktop App for Claude Code: Multi-Session Management and Fleet Orchestration
cc+ is an open-source desktop application for Claude Code built on the Claude Agent SDK, available for macOS and Linux. It provides multi-session tabs, live activity tree visualization, security scoring, workflow enforcement, and fleet orchestration capabilities.

LLM Circuit Finder: Duplicate 3 layers to boost reasoning without training
A new toolkit finds 'reasoning circuits' in transformer models - contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicating these blocks (layers 12-14 in Devstral-24B) improves logical deduction from 0.22 to 0.76 on BBH benchmarks with no weight changes or training.