Qwen2-0.5B Fine-Tuned for Local Task Automation with llama.cpp

A developer has fine-tuned Qwen2-0.5B for task automation, creating a model that runs entirely locally on CPU without requiring GPU or cloud APIs. The project, named ACE, is available on GitHub.
What It Does
- Takes natural language tasks (e.g., "copy logs to backup")
- Detects task type: atomic, repetitive, or clarification
- Generates execution plans consisting of CLI commands and hotkeys
- Runs entirely locally on CPU (no GPU, no cloud APIs)
Technical Details
- Base model: Qwen2-0.5B
- Training: LoRA fine-tuning on approximately 1000 custom task examples
- Quantization: GGUF Q4_K_M format (300MB file size)
- Inference: llama.cpp
- Inference time: 3-10 seconds on i3/i5 processors
Main Challenges During Training
- Data quality: Had to regenerate dataset 2-3 times due to garbage examples
- Overfitting: Took multiple iterations to get validation loss stable
- EOS token handling: Model wouldn't stop generating until tokenizer config was fixed
- GGUF conversion: Required BF16 dtype + imatrix quantization to get stable outputs
Limitations (v0.1)
- Requires full file paths (no smart file search yet)
- CPU inference only (slower on older hardware)
- Basic execution (no visual understanding)
Performance Benchmarks
- i5 (2018+) with SSD: 3-5 seconds
- i3 (2015+) with SSD: 5-10 seconds
- Older hardware (Pentium + HDD): 30-90 seconds
The developer is seeking feedback on performance across different hardware, edge cases that break the model, and feature requests for v0.2.
📖 Read the full source: r/LocalLLaMA
👀 See Also

MTPLX: 2.24x Faster Tokens on Apple Silicon Using Native MTP Heads
MTPLX achieves 63 tok/s on Qwen3.6-27B on M5 Max (up from 28 tok/s) using built-in MTP heads, with exact temperature sampling and no external drafter.

Graph Compose: Hosted Temporal Workflows with Visual Builder and AI
Graph Compose is a hosted platform for orchestrating API workflows on Temporal, letting you define workflows as JSON graphs with three building methods: a React Flow visual builder, a TypeScript SDK, and an AI assistant that converts plain English to graphs.

Open-source multi-account manager for Claude CLI enables profile switching
claude-multi-account is a CLI tool that creates isolated profiles for different Claude accounts, allowing instant switching without logging out. It supports shared settings, cloud backup, and works across Windows, Linux, macOS, and Termux.

Using OpenAI Codex IDE with Local Ollama Models in VSCodium
OpenAI Codex IDE can be configured to work with local Ollama models in VSCodium using specific configurations in the config.toml file.