Advancing Creative Physical Intelligence in Large Multimodal Models
A new paper on arXiv (2605.26396) presents MM-CreativityBench, a benchmark designed to evaluate large multimodal models (LMMs) on affordance-grounded creative tool use. Unlike existing benchmarks that focus on pattern recognition or well-posed questions, MM-CreativityBench requires models to identify non-obvious yet physically feasible ways to repurpose scene elements. Each instance includes a scenario image with structured views of candidate entities and their parts, enabling fine-grained, interactive evaluation. Experiments show that current LMMs often fail, not due to lack of generative capability, but because they do not sustain grounded reasoning throughout the problem-solving process. This work underscores a gap between LMMs' perception and reasoning abilities and their capacity for creative physical intelligence in open-ended environments.
Highlights a critical gap in LMMs' ability to sustain grounded reasoning for creative problem-solving.