We introduce a novel task of 3D visual grounding in monocular RGB images using descriptions with appearance and geometry information, termed Mono3DVG. Mono3DVG aims to localize the true 3D extent of ...