Search Content

Video Captioning with Commonsense Knowledge Anchors

Description

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents…

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.

ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)

Created2022

Machine Learning and Vision Using Edge Devices for Multimodal Chatbots and Bio-meteorological Sensing

Description

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with…

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with a cost of high computation, which invariably increases power usage and cost of the hardware. In this thesis we explore applications of ML techniques, applied to two completely different fields - arts, media and theater and urban climate research using low-cost and low-powered edge devices. The multi-modal chatbot uses different machine learning techniques: natural language processing (NLP) and computer vision (CV) to understand inputs of the user and accordingly perform in the play and interact with the audience. This system is also equipped with other interactive hardware setups like movable LED systems, together they provide an experiential theatrical play tailored to each user. I will discuss how I used edge devices to achieve this AI system which has created a new genre in theatrical play. I will then discuss MaRTiny, which is an AI-based bio-meteorological system that calculates mean radiant temperature (MRT), which is an important parameter for urban climate research. It is also equipped with a vision system that performs different machine learning tasks like pedestrian and shade detection. The entire system costs around $200 which can potentially replace the existing setup worth $20,000. I will further discuss how I overcame the inaccuracies in MRT value caused by the system, using machine learning methods. These projects although belonging to two very different fields, are implemented using edge devices and use similar ML techniques. In this thesis I will detail out different techniques that are shared between these two projects and how they can be used in several other applications using edge devices.

ContributorsKulkarni, Karthik Kashinath (Author) / Jayasuriya, Suren (Thesis advisor) / Middel, Ariane (Thesis advisor) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2021

Augmented Coach: An Augmented Reality Tool for Immersive Sports Coaching

Description

Video playback is currently the primary method coaches and athletes use in sports training to give feedback on the athlete’s form and timing. Athletes will commonly record themselves using a phone or camera when practicing a sports movement, such as shooting a basketball, to then send to their coach for…

Video playback is currently the primary method coaches and athletes use in sports training to give feedback on the athlete’s form and timing. Athletes will commonly record themselves using a phone or camera when practicing a sports movement, such as shooting a basketball, to then send to their coach for feedback on how to improve. In this work, we present Augmented Coach, an augmented reality tool for coaches to give spatiotemporal feedback through a 3-dimensional point cloud of the athlete. The system allows coaches to view a pre-recorded video of their athlete in point cloud form, and provides them with the proper tools in order to go frame by frame to both analyze the athlete’s form and correct it. The result is a fundamentally new concept of an interactive video player, where the coach can remotely view the athlete in a 3-dimensional form and create annotations to help improve their form. We then conduct a user study with subject matter experts to evaluate the usability and capabilities of our system. As indicated by the results, Augmented Coach successfully acts as a supplement to in-person coaching, since it allows coaches to break down the video recording in a 3-dimensional space and provide feedback spatiotemporally. The results also indicate that Augmented Coach can be a complete coaching solution in a remote setting. This technology will be extremely relevant in the future as coaches look for new ways to improve their feedback methods, especially in a remote setting.

ContributorsChannar, Sameer (Author) / Dbeis, Yasser (Co-author) / Richards, Connor (Co-author) / LiKamWa, Robert (Thesis director) / Jayasuriya, Suren (Committee member) / Barrett, The Honors College (Contributor) / Dean, W.P. Carey School of Business (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

Augmented Coach: An Augmented Reality Tool for Immersive Sports Coaching

Description

Video playback is currently the primary method coaches and athletes use in sports training to give feedback on the athlete’s form and timing. Athletes will commonly record themselves using a phone or camera when practicing a sports movement, such as shooting a basketball, to then send to their coach for…

Video playback is currently the primary method coaches and athletes use in sports training to give feedback on the athlete’s form and timing. Athletes will commonly record themselves using a phone or camera when practicing a sports movement, such as shooting a basketball, to then send to their coach for feedback on how to improve. In this work, we present Augmented Coach, an augmented reality tool for coaches to give spatiotemporal feedback through a 3-dimensional point cloud of the athlete. The system allows coaches to view a pre-recorded video of their athlete in point cloud form, and provides them with the proper tools in order to go frame by frame to both analyze the athlete’s form and correct it. The result is a fundamentally new concept of an interactive video player, where the coach can remotely view the athlete in a 3-dimensional form and create annotations to help improve their form. We then conduct a user study with subject matter experts to evaluate the usability and capabilities of our system. As indicated by the results, Augmented Coach successfully acts as a supplement to in-person coaching, since it allows coaches to break down the video recording in a 3-dimensional space and provide feedback spatiotemporally. The results also indicate that Augmented Coach can be a complete coaching solution in a remote setting. This technology will be extremely relevant in the future as coaches look for new ways to improve their feedback methods, especially in a remote setting.

ContributorsRichards, Connor (Author) / Dbeis, Yasser (Co-author) / Channar, Sameer (Co-author) / LiKamWa, Robert (Thesis director) / Jayasuriya, Suren (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of International Letters and Cultures (Contributor)

Created2022-05

Networked System for Volumetric Athletic Coaching in Augmented Reality

Description

Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area as the athlete, then the coach will not be able to see the athlete’s full body and thus cannot give precise guidance to the athlete, limiting the…

Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area as the athlete, then the coach will not be able to see the athlete’s full body and thus cannot give precise guidance to the athlete, limiting the athlete’s improvement. To address these challenges, this paper proposes Augmented Coach, an augmented reality platform where coaches can view, manipulate and comment on athletes’ movement volumetric video data remotely via the network. In particular, this work includes a). Capturing the athlete’s movement video data with Kinects and converting it into point cloud format b). Transmitting the point cloud data to the coach’s Oculus headset via 5G or wireless network c). Coach’s commenting on the athlete’s joints. In addition, the evaluation of Augmented Coach includes an assessment of its performance from five metrics via the wireless network and 5G network environment, but also from the coaches’ and athletes’ experience of using it. The result shows that Augmented Coach enables coaches to instruct athletes from a distance and provide effective feedback for correcting athletes’ motions under the network.

ContributorsQiao, Yunhan (Author) / LiKamWa, Robert (Thesis advisor) / Bansal, Ajay (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by

Video Captioning with Commonsense Knowledge Anchors

Machine Learning and Vision Using Edge Devices for Multimodal Chatbots and Bio-meteorological Sensing

Augmented Coach: An Augmented Reality Tool for Immersive Sports Coaching

Augmented Coach: An Augmented Reality Tool for Immersive Sports Coaching

Networked System for Volumetric Athletic Coaching in Augmented Reality