Matching Items (212)
Filtering by

Clear all filters

190815-Thumbnail Image.png
Description
Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on textual distribution shifts rather than joint shifts across modalities. This

Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on textual distribution shifts rather than joint shifts across modalities. This is suboptimal for properly assessing model robustness and generalization. To address this gap, a novel multi-modal VQA benchmark dataset is introduced for the first time. This dataset combines both visual and textual distribution shifts across training and test sets. Using this challenging benchmark exposes vulnerabilities in existing models relying on spurious correlations and overfitting to dataset biases. The novel dataset advances the field by enabling more robust model training and rigorous evaluation of multi-modal distribution shift generalization. In addition, a new few-shot multi-modal prompt fusion model is proposed to better adapt models for downstream VQA tasks. The model incorporates a prompt encoder module and dual-path design to align and fuse image and text prompts. This represents a novel prompt learning approach tailored for multi-modal learning across vision and language. Together, the introduced benchmark dataset and prompt fusion model address key limitations around evaluating and improving VQA model robustness. The work expands the methodology for training models resilient to multi-modal distribution shifts.
ContributorsJyothi Unni, Suraj (Author) / Liu, Huan (Thesis advisor) / Davalcu, Hasan (Committee member) / Bryan, Chris (Committee member) / Arizona State University (Publisher)
Created2023
193894-Thumbnail Image.png
Description
In today’s world, artificial intelligence (AI) is increasingly becoming a part of our daily lives. For this integration to be successful, it’s essential that AI systems can effectively interact with humans. This means making the AI system’s behavior more understandable to users and allowing users to customize the system’s behavior

In today’s world, artificial intelligence (AI) is increasingly becoming a part of our daily lives. For this integration to be successful, it’s essential that AI systems can effectively interact with humans. This means making the AI system’s behavior more understandable to users and allowing users to customize the system’s behavior to match their preferences. However, there are significant challenges associated with achieving this goal. One major challenge is that modern AI systems, which have shown great success, often make decisions based on learned representations. These representations, often acquired through deep learning techniques, are typically inscrutable to the users inhibiting explainability and customizability of the system. Additionally, since each user may have unique preferences and expertise, the interaction process must be tailored to each individual. This thesis addresses these challenges that arise in human-AI interaction scenarios, especially in cases where the AI system is tasked with solving sequential decision-making problems. This is achieved by introducing a framework that uses a symbolic interface to facilitate communication between humans and AI agents. This shared vocabulary acts as a bridge, enabling the AI agent to provide explanations in terms that are easy for humans to understand and allowing users to express their preferences using this common language. To address the need for personalization, the framework provides mechanisms that allow users to expand this shared vocabulary, enabling them to express their unique preferences effectively. Moreover, the AI systems are designed to take into account the user’s background knowledge when generating explanations tailored to their specific needs.
ContributorsSoni, Utkarsh (Author) / Kambhampati, Subbarao (Thesis advisor) / Baral, Chitta (Committee member) / Bryan, Chris (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2024