llm-training reinforcement-learning agentic-ai long-context cursor

Composer 2.5 improves long-task execution and collaboration

Targeted textual feedback during RL training fixes localized failures (bad tool calls, style violations) that global reward signals miss, enabling better long-horizon behavior without full rollout retraining.

May 20, 2026

Summary

Composer now handles multi-step coding tasks more reliably with fewer false starts, reducing iteration cycles in sustained agentic work. Better instruction following and communication style cut friction in human-AI collaboration loops.

Why it matters

Implementation verdict

Drop-in replacement for Composer 2 at $0.50/$2.50 per M tokens (standard) or $3.00/$15.00 (fast tier). Requires no client-side changes—Cursor users get it automatically. Worth switching today if you're running long-context code tasks; the 25x synthetic task scale and targeted feedback training directly address timeout/retry patterns in multi-file editing.

Sources

1.It is better at sustained work on long-running tasks, follows complex instructions more reliably, and is more pleasant to collaborate with
2.Composer 2.5 is trained with 25x more synthetic tasks than Composer 2
3.we trained Composer 2.5 with targeted textual feedback
4.Credit assignment during RL is becoming an increasingly difficult challenge as rollouts can span hundreds of thousands of tokens
5.Composer 2.5 is priced at $0.50/M input and $2.50/M output tokens

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs