PromptForest: Local-First Prompt Injection Detection

PromptForest is a new local-first library created to tackle the issues commonly seen with current prompt injection detectors. It aims to detect prompt injections and jailbreaks efficiently and with a measure of uncertainty to avoid overconfidence in results. This approach differentiates it from traditional systems, particularly by maintaining performance while still providing more nuanced outputs.

Key Details

One of the fundamental issues with existing injection detectors is the reliance on large models like Llama 2 8B and Qualifire Sentinel 0.6B. These models are not only slow, but their overconfidence in results can lead to false positives that undermine their trustworthiness in production scenarios. Recognizing these limitations, PromptForest leverages a voting ensemble method comprising three smaller, specialized models:

Llama Prompt Guard (86M): Offers the highest pre-ensemble Expected Calibration Error (ECE) in its weight class.
Vijil Dome (ModernBERT): Delivers the highest accuracy per parameter.
Custom XGBoost: Trained on embeddings for architectural diversity.

These models collectively use a weighted soft voting method to determine results, where more accurate models have greater influence. This method simplifies decision-making while maintaining high accuracy and consistency.

Benchmarking shows that PromptForest performs with a mean latency of ~141ms, compared to ~225ms for the Qualifire Sentinel v2, while delivering a comparable accuracy of 90% against their 97%. Calibration ECE also fares well at 0.070 versus Sentinel's 0.096. Throughput is impressive as well, with approximately 27 prompts processed per second on a consumer GPU using the pfranger CLI.

For testing and implementation, developers can experiment with PromptForest on Google Colab or audit prompts with the PFRanger tool, which works entirely locally. PFRanger utilizes parallelization to enhance speed and throughput.

📖 Read the full source: r/LocalLLaMA

PromptForest: Local-First Prompt Injection Detection with Uncertainty

Key Details

👀 See Also

Auto Router vs Sonnet: Cost Savings vs Response Quality

OpenClawDreams: A Dream Simulator Extension for OpenClaw Agents

Integrating Local LLM Agents with ComfyUI for Natural Language Batch Image Generation

Unveiling OpenClaw: How It Empowers AI Coding Agents