formal-verification proof-search llm-reasoning mcts benchmark

Verifier feedback steers formal proof search better

VERITAS routes syntax errors, type mismatches, and partial goal states back into proof search via Best-of-N + critic-guided MCTS, replacing binary pass/fail collapse with iterative negative-example conditioning.

June 23, 2026

Summary

Formal verification tooling wastes verifier output by treating it as pass/fail; recovering this signal cuts through lemma-name guessing and exposes when unguided sampling fails, directly improving theorem-solving rates on hard combinatorics problems.

Why it matters

Implementation verdict

Replaces naive Best-of-N sampling with two-phase protocol (Phase 1: Best-of-N, Phase 2: MCTS + critic on Phase 1 failures). Requires verifier integration and MCTS implementation; artifacts on GitHub. Worth testing now if you build formal proof assistants or LLM-guided verification, but maturity limited to miniF2F and a new 55-theorem benchmark.

Sources

1.LLM-based formal provers often collapse rich verifier signals (syntax errors, type mismatches, partial goal progress) into a binary pass/fail bit
2.VERITAS reaches 40.6% on miniF2F (vs. an independently run Best-of-5 at 36.9%, Portfolio 26.2%)
3.Phase 2's additional solves are attributable to feedback-driven exploration
4.unguided sampling hurts when correct lemma names must be recovered iteratively from verifier feedback

Dev Signal

Get briefs like this in your inbox — free, 3x a week.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs