arXiv cs.AIWednesday · May 27, 2026FREE

Advancing Creative Physical Intelligence in Large Multimodal Models

multimodal-modelsbenchmarkcreative-reasoningai-research

A new paper on arXiv (2605.26396) presents MM-CreativityBench, a benchmark designed to evaluate large multimodal models (LMMs) on affordance-grounded creative tool use. Unlike existing benchmarks that focus on pattern recognition or well-posed questions, MM-CreativityBench requires models to identify non-obvious yet physically feasible ways to repurpose scene elements. Each instance includes a scenario image with structured views of candidate entities and their parts, enabling fine-grained, interactive evaluation. Experiments show that current LMMs often fail, not due to lack of generative capability, but because they do not sustain grounded reasoning throughout the problem-solving process. This work underscores a gap between LMMs' perception and reasoning abilities and their capacity for creative physical intelligence in open-ended environments.

// why it matters

Highlights a critical gap in LMMs' ability to sustain grounded reasoning for creative problem-solving.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

SuiChat-CN: Benchmarking Contextual Suicide Risk Assessment in Chinese Group Chats Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows Dr-CiK: A Testbed for Foresight-Driven Agents

Advancing Creative Physical Intelligence in Large Multimodal Models

Sources

Related

Like this? Get the next digest.