embodied-ai mllm-planning norm-compliance benchmark social-constraints

MLLMs fail hidden social norms in embodied planning

State-of-the-art multimodal models achieve explicit goals 67.3% of the time but comply with hidden social norms only 26.4%—gap stems from context grounding, not social knowledge.

Summary

If you're building embodied AI agents or egocentric planners, this benchmark exposes a critical failure mode: your model may solve the stated task while violating implicit constraints that users expect. NormPerceptor's cue-generation approach offers a concrete pattern for injecting norm-detection into planning pipelines.

Why it matters

Implementation verdict

NormAct benchmark replaces hand-coded norm checks with systematic evaluation; NormPerceptor requires a separate norm-inference stage before action planning. The 24.2% → 46.7% improvement on Task Success suggests this is worth integrating now, but only if your deployment context involves repeated human interaction where norm violations compound.

Sources

1.models achieve explicit goals in 67.3% of cases, but comply with hidden norms in only 26.4%
2.a context-conditioned cue generator that infers scene-relevant norms prior to planning, increasing Task Success from 24.2% to 46.7%
3.challenges in activating and grounding relevant norms in context

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs