Vision language models (VLMs) exhibit vast knowledge of the physical world, including intuition of physical and spatial properties, affordances, and motion. With fine-tuning, VLMs can also natively ...
We tie our shoes, we put on neckties, we wrestle with power cords. Yet despite deep familiarity with knots, most people cannot tell a weak knot from a strong one by looking at them, Johns Hopkins ...
Humans are pretty good at guessing whether a towering stack of dishes in the sink will topple over or where a pool ball will go when a cue hits it. We evolved this kind of physical reasoning to ...