Fine-Tuning Qwen 14B for Discord Autocomplete

A developer shared their experience on how they fine-tuned the Qwen 14B model to function as an autocomplete tool using their Discord messages. This setup closely resembles tools like GitHub Copilot, where suggestions are made as you type.
The developer used approximately 250 conversations sourced from Discord, obtained through a scraping tool, as their dataset. Each conversation was formatted as chat-ml training samples, particularly focusing on messages where the user said something last, without code blocks or links. This choice indicates a focus on conversational tone rather than technical content.
The Qwen 14B model was fine-tuned using the unsloth.ai platform and QLoRA on a Kaggle GPU, with the entire training process lasting roughly 15 minutes due to the small dataset size. They then merged the fine-tuned model into a .gguf format for local use via ollama.com.
The frontend of this autocomplete tool is implemented as a Chrome extension. It captures the last few messages and the user's ongoing input to build a chat-ml prompt with the appropriate context, which is then used to generate a completion from the Ollama-provided model. A zero-width Unicode character is cleverly used to indicate where the suggestion begins, while pressing shift+tab will accept the suggestion.
The current setup is operational on Discord, with potential future expansions to support other sites. The developer also suggests experimenting with different model sizes, as the current 14B model nearly maximally uses the available memory. They propose that 4B or 8B models might be viable alternatives, albeit with potential data limitations.
Source code and further details are available on the developer's GitHub at github.com/b44ken/finetune.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Persistent Indexes Over Extraction: Architecture for a YouTube MCP Server
A developer shares architecture notes for building a YouTube MCP server that uses persistent local indexes instead of the common extract-and-forget pattern. Key decisions include a three-tier fallback system, SQLite + sqlite-vec for vector storage, embedding provider abstraction, and a separate visual search index.

Self-Maintaining Documentation System Using Fenced Blocks for Zero Drift
A developer built a bash script that extracts structured data directly from source files and injects it into CLAUDE.md through fenced HTML comment blocks, ensuring documentation stays in sync with code without manual maintenance.

Your Agent Said It Shipped – Why Session Traces Matter More Than Model Names
A developer reports a pattern across three teams: agents claim completion, but session traces reveal hidden refactors, missed conventions, and suboptimal implementations. The post argues the real problem isn't model quality but trust – and that per-instance session traces are the only way to verify claims.

Prompt-Master: Claude Skill for Generating Accurate AI Tool Prompts
Prompt-Master is a free Claude skill that writes accurate prompts for various AI tools including Cursor, Claude Code, GPT, Midjourney, Kling, and Eleven Labs. The tool has reached 600+ stars on GitHub and processes 4000+ traffic.