Matching Items (2)
Filtering by
- All Subjects: Natural Language Processing
- Member of: Theses and Dissertations
Description
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain
for a video captioning system to generate natural language descriptions focusing on
the prominent interest and aligning with the latent aspects beyond observations. This
work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as
CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the
training of video captioning model with a novel paradigm for sentence-level semantic
alignment. Specifically, commonsense knowledge is queried to complement per training
caption by querying a generic knowledge atlas ATOMIC, and form the commonsense-
caption entailment corpus. A BERT based language entailment model trained from
this corpus then serves as a commonsense discriminator for the training of video
captioning model, and penalizes the model from generating semantically misaligned
captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX
datasets, CAVAN consistently improves the quality of generations and shows higher
keyword hit rate. Experimental results with ablations validate the effectiveness of
CAVAN and reveals that the use of commonsense knowledge contributes to the video
caption generation.
ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)
Created2022
Description
Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with a cost of high computation, which invariably increases power usage and cost of the hardware. In this thesis we explore applications of ML techniques, applied to two completely different fields - arts, media and theater and urban climate research using low-cost and low-powered edge devices. The multi-modal chatbot uses different machine learning techniques: natural language processing (NLP) and computer vision (CV) to understand inputs of the user and accordingly perform in the play and interact with the audience. This system is also equipped with other interactive hardware setups like movable LED systems, together they provide an experiential theatrical play tailored to each user. I will discuss how I used edge devices to achieve this AI system which has created a new genre in theatrical play. I will then discuss MaRTiny, which is an AI-based bio-meteorological system that calculates mean radiant temperature (MRT), which is an important parameter for urban climate research. It is also equipped with a vision system that performs different machine learning tasks like pedestrian and shade detection. The entire system costs around $200 which can potentially replace the existing setup worth $20,000. I will further discuss how I overcame the inaccuracies in MRT value caused by the system, using machine learning methods. These projects although belonging to two very different fields, are implemented using edge devices and use similar ML techniques. In this thesis I will detail out different techniques that are shared between these two projects and how they can be used in several other applications using edge devices.
ContributorsKulkarni, Karthik Kashinath (Author) / Jayasuriya, Suren (Thesis advisor) / Middel, Ariane (Thesis advisor) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)
Created2021