Multimodal Visual Language ActionBook

Tightly Coupled Visual–Inertial SLAM With Dual-Modal Confidence Perception

Abstract: To address the challenges of diminished localization accuracy and reduced adaptability in intelligent agent systems caused by drastic viewpoint changes during rapid carrier movements or ...

GitHub

E-FineR: Vocabulary-free Fine-grained Visual Recognition

E-FineR is a training-free, fully automated framework for vocabulary-free fine-grained visual recognition. This repository accompanies the research paper: Vocabulary-free Fine-grained Visual ...

IEEE

CoGA: A Collaborative Gray-Box Adversarial Attack for Multimodal Language Models

Abstract: Multimodal language models (LMs) have shown significant potential for applications across various domains but remain vulnerable to adversarial attacks. Current research in white-box or black ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Tightly Coupled Visual–Inertial SLAM With Dual-Modal Confidence Perception

E-FineR: Vocabulary-free Fine-grained Visual Recognition

CoGA: A Collaborative Gray-Box Adversarial Attack for Multimodal Language Models

Trending now