Skip to main content

ASU Global menu

Skip to Content Report an accessibility problem ASU Home My ASU Colleges and Schools Sign In
Arizona State University Arizona State University
ASU Library KEEP

Main navigation

Home Browse Collections Share Your Work
Copyright Describe Your Materials File Formats Open Access Repository Practices Share Your Materials Terms of Deposit API Documentation
Skip to Content Report an accessibility problem ASU Home My ASU Colleges and Schools Sign In
  1. KEEP
  2. Theses and Dissertations
  3. ASU Electronic Theses and Dissertations
  4. Weakly-Supervised Visual-Retriever-Reader Pipeline for Knowledge-Based VQA Tasks
  5. Full metadata

Weakly-Supervised Visual-Retriever-Reader Pipeline for Knowledge-Based VQA Tasks

Full metadata

Title
Weakly-Supervised Visual-Retriever-Reader Pipeline for Knowledge-Based VQA Tasks
Description

Visual question answering (VQA) is a task that answers the questions by giving an image, and thus involves both language and vision methods to solve, which make the VQA tasks a frontier interdisciplinary field. In recent years, as the great progress made in simple question tasks (e.g. object recognition), researchers start to shift their interests to the questions that require knowledge and reasoning. Knowledge-based VQA requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverages different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, this paper collects a natural language knowledge base that can be used for any question answering (QA) system. Moreover, a Visual Retriever-Reader pipeline is proposed to approach knowledge-based VQA, where the visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. The retriever is constructed with two versions: term based retriever which uses best matching 25 (BM25), and neural based retriever where the latest dense passage retriever (DPR) is introduced. To encode the visual information, the image and caption are encoded separately in the two kinds of neural based retriever: Image-DPR and Caption-DPR. There are also two styles of readers, classification reader and extraction reader. Both the retriever and reader are trained with weak supervision. The experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge.

Date Created
2021
Contributors
  • Zeng, Yankai (Author)
  • Baral, Chitta (Thesis advisor)
  • Yang, Yezhou (Committee member)
  • Ghayekhloo, Samira (Committee member)
  • Arizona State University (Publisher)
Topical Subject
  • Computer Science
  • Information retrieval
  • Knowledge Base
  • OK-VQA
  • Visual Question Answering
Resource Type
Text
Genre
Masters Thesis
Academic theses
Extent
57 pages
Language
eng
Copyright Statement
In Copyright
Reuse Permissions
All Rights Reserved
Primary Member of
ASU Electronic Theses and Dissertations
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.161838
Level of coding
minimal
Cataloging Standards
asu1
System Created
  • 2021-11-16 04:30:35
System Modified
  • 2021-11-30 12:51:28
  •     
  • 1 year 9 months ago
Additional Formats
  • OAI Dublin Core
  • MODS XML

Quick actions

About this item

Overview
 Copy permalink

Explore this item

Explore Document

Share this content

Feedback

ASU University Technology Office Arizona State University.
KEEP

Contact Us

Repository Services
Home KEEP PRISM ASU Research Data Repository
Resources
Terms of Deposit Sharing Materials: ASU Digital Repository Guide Open Access at ASU

The ASU Library acknowledges the twenty-three Native Nations that have inhabited this land for centuries. Arizona State University's four campuses are located in the Salt River Valley on ancestral territories of Indigenous peoples, including the Akimel O’odham (Pima) and Pee Posh (Maricopa) Indian Communities, whose care and keeping of these lands allows us to be here today. ASU Library acknowledges the sovereignty of these nations and seeks to foster an environment of success and possibility for Native American students and patrons. We are advocates for the incorporation of Indigenous knowledge systems and research methodologies within contemporary library practice. ASU Library welcomes members of the Akimel O’odham and Pee Posh, and all Native nations to the Library.

Number one in the U.S. for innovation. ASU ahead of MIT and Stanford. - U.S. News and World Report, 8 years, 2016-2023
Maps and Locations Jobs Directory Contact ASU My ASU
Copyright and Trademark Accessibility Privacy Terms of Use Emergency COVID-19 Information