Smaller models leak privacy under adversarial probing
POLAR-Bench exposes that 1–30B open-weight models running as on-device agents leak over 50% of protected attributes, while frontier models withhold 99%+—forcing a choice between privacy and local inference.
May 27, 2026
Summary
If you're deploying LLM agents with user data locally or via private inference, your threat model now has a measured failure point. Smaller models that fit on-device consistently fail intent-following under adversarial pressure, making privacy-sensitive workloads risky at that scale.
Why it matters
If you're deploying LLM agents with user data locally or via private inference, your threat model now has a measured failure point. Smaller models that fit on-device consistently fail intent-following under adversarial pressure, making privacy-sensitive workloads risky at that scale.
Implementation verdict
POLAR-Bench is a diagnostic tool, not a solution. It localizes privacy breakdown across model size and attack strategy but doesn't replace your privacy architecture—it audits it. Worth running against your candidate models before shipping agents with PII access. Requires adapting their 5×5 diagnostic surface to your specific privacy policies and domains.
Sources
- 1.current frontier models withhold over 99% of protected attributes, while smaller open-weight models in the 1--30B range, the class users most commonly run as their own trusted agent on-device or via private inference, score notably worse, with the weakest leaking over half
- 2.Across 10 domains and 7,852 samples, we score privacy and utility by deterministic set-membership
- 3.providing a foothold for privacy alignment where it matters most
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.