Six ModernBERT rerankers ship with training recipe
Distilled cross-encoders (17M–1B params) built on Ettin encoders replace exhaustive ranking in retrieve-then-rerank pipelines with 1.7x–8.3x speedup via Flash Attention 2.
May 20, 2026
Summary
Rerankers require per-pair inference, making them expensive at scale. These models let you rank top-K retrieval results accurately without running a cross-encoder over your entire corpus, keeping latency bounded while improving result quality.
Why it matters
Rerankers require per-pair inference, making them expensive at scale. These models let you rank top-K retrieval results accurately without running a cross-encoder over your entire corpus, keeping latency bounded while improving result quality.
Implementation verdict
Drop-in replacement for generic cross-encoders in production pipelines. Requires sentence-transformers library and optional flash-attention2 + bfloat16 for speed gains. Ready now: three lines of code, supports 8K token context, benchmarked on MTEB. Start with 32M or 68M for typical trade-offs.
Sources
- 1.six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes
- 2.retrieve-then-rerank: a fast embedding model retrieves the top-K candidates (cheap), then a cross-encoder re-orders just those K with high accuracy
- 3.a 1.7x-8.3x speedup over default loading depending on model size and sequence length
- 4.All six accept up to 8K tokens of context (useful for long-document reranking) thanks to ModernBERT's long-context pre-training
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.