Abstract: To address the challenges of diminished localization accuracy and reduced adaptability in intelligent agent systems caused by drastic viewpoint changes during rapid carrier movements or ...
E-FineR is a training-free, fully automated framework for vocabulary-free fine-grained visual recognition. This repository accompanies the research paper: Vocabulary-free Fine-grained Visual ...
Abstract: Multimodal language models (LMs) have shown significant potential for applications across various domains but remain vulnerable to adversarial attacks. Current research in white-box or black ...