Local Fine-Tuning of Llama 3.2-1B for Secret Detection Surpasses Wiz's Model

A developer has documented their successful local fine-tuning of Llama 3.2-1B for secret detection in code, surpassing the metrics of a similar model from Wiz. The project was conducted entirely with local AI tools, avoiding proprietary APIs.
Key Results and Approach
The developer aimed to replicate or beat Wiz's results of 86% precision and 82% recall. After a few weekends of work, they achieved 88% precision and 84.4% recall simultaneously with a fine-tuned Llama 3.2-1B model. They also benchmarked Qwen 3.5-2B and 4B models, which outperformed the 1B model at the cost of higher VRAM usage and longer inference times.
Dataset and Training Process
The work relied solely on publicly available data, which was insufficient, so procedural generation was used to augment and improve the dataset. All labeling was done locally using the Qwen3-Coder-Next model. A key training objective was to have the models output structured JSON. Initially, untrained models (Llama & Qwen) scored 0% on schema compliance, but after training, this improved to 98-100%.
Challenges and Learnings
The developer encountered several issues during the process:
- Included a high entropy class that was detrimental to training; this was identified and removed.
- Discovered that 4,500 of the 'negative' samples in the dataset actually contained real-world passwords, meaning the model was being trained to ignore secrets. Fixing this improved recall on passwords.
The developer has published a full technical write-up with training stats, examples, and a step-by-step breakdown of the process.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw Introduces One-Prompt Email Reporting for Seamless Operations
OpenClaw takes operational efficiency to the next level by enabling its agents to generate and send operational reports via a single prompt. This innovative feature simplifies workflow and enhances automation.

OpenClaw Telegram Organization: Topic-per-Agent Setup Solves Chat Chaos
A developer fixed OpenClaw Telegram management issues by implementing a topic-per-agent structure in a dedicated group, reducing context bleed and improving debugging. The setup includes specific topic mapping, mention-only defaults, and cleaner routing rules.

Non-Coder Builds AI Prompt Diagnostic Framework with Claude Over Many Sessions
A non-coder built SMARRT, a diagnostic framework that audits AI prompts before generation, entirely through conversational collaboration with Claude over many months.

Claude as a memoir-writing assistant for an 80-year-old user: practical use cases and limitations
An 80-year-old user describes using Claude to help write memoirs, manage tech issues (hosting, email, Mac Mini), find accounting software (non-QuickBooks), and generate astrology interpretations — with honest notes on calculation accuracy and iterative correction.