How to Fine-Tune Llama 3.2-1B for Secret Detection, Beats Wiz

A developer has documented their successful local fine-tuning of Llama 3.2-1B for secret detection in code, surpassing the metrics of a similar model from Wiz. The project was conducted entirely with local AI tools, avoiding proprietary APIs.

Key Results and Approach

The developer aimed to replicate or beat Wiz's results of 86% precision and 82% recall. After a few weekends of work, they achieved 88% precision and 84.4% recall simultaneously with a fine-tuned Llama 3.2-1B model. They also benchmarked Qwen 3.5-2B and 4B models, which outperformed the 1B model at the cost of higher VRAM usage and longer inference times.

Dataset and Training Process

The work relied solely on publicly available data, which was insufficient, so procedural generation was used to augment and improve the dataset. All labeling was done locally using the Qwen3-Coder-Next model. A key training objective was to have the models output structured JSON. Initially, untrained models (Llama & Qwen) scored 0% on schema compliance, but after training, this improved to 98-100%.

Challenges and Learnings

The developer encountered several issues during the process:

Included a high entropy class that was detrimental to training; this was identified and removed.
Discovered that 4,500 of the 'negative' samples in the dataset actually contained real-world passwords, meaning the model was being trained to ignore secrets. Fixing this improved recall on passwords.

The developer has published a full technical write-up with training stats, examples, and a step-by-step breakdown of the process.

📖 Read the full source: r/LocalLLaMA