reranking cross-encoders retrieval sentence-transformers modernbert

Six ModernBERT rerankers ship with training recipe

Distilled cross-encoders (17M–1B params) built on Ettin encoders replace exhaustive ranking in retrieve-then-rerank pipelines with 1.7x–8.3x speedup via Flash Attention 2.

May 20, 2026

Summary

Rerankers require per-pair inference, making them expensive at scale. These models let you rank top-K retrieval results accurately without running a cross-encoder over your entire corpus, keeping latency bounded while improving result quality.

Why it matters

Implementation verdict

Drop-in replacement for generic cross-encoders in production pipelines. Requires sentence-transformers library and optional flash-attention2 + bfloat16 for speed gains. Ready now: three lines of code, supports 8K token context, benchmarked on MTEB. Start with 32M or 68M for typical trade-offs.

Sources

1.six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes
2.retrieve-then-rerank: a fast embedding model retrieves the top-K candidates (cheap), then a cross-encoder re-orders just those K with high accuracy
3.a 1.7x-8.3x speedup over default loading depending on model size and sequence length
4.All six accept up to 8K tokens of context (useful for long-document reranking) thanks to ModernBERT's long-context pre-training

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs