Matching Items (2)
Filtering by

Clear all filters

132117-Thumbnail Image.png
Description
91% of smartphone and tablet users experience a problem with their device screen being oriented the wrong way during use [11]. In [11], the authors proposed iRotate, a previous solution which uses computer vision to solve the orientation problem. We propose iLieDown, an improved method of automatically rotating smartphones, tablets,

91% of smartphone and tablet users experience a problem with their device screen being oriented the wrong way during use [11]. In [11], the authors proposed iRotate, a previous solution which uses computer vision to solve the orientation problem. We propose iLieDown, an improved method of automatically rotating smartphones, tablets, and other device displays. This paper introduces a new algorithm to correctly orient the display relative to the user’s face using a convolutional neural network (CNN). The CNN model is trained to predict the rotation of faces in various environments through data augmentation, uses a confidence threshold, and analyzes multiple images to be accurate and robust. iLieDown is battery and CPU efficient, causes no noticeable lag to the user during use, and is 6x more accurate than iRotate.
ContributorsTallman, Riley Paul (Author) / Yang, Yezhou (Thesis director) / Fang, Zhiyuan (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)
Created2019-12
171740-Thumbnail Image.png
Description
An important objective of AI is to understand real-world observations and build up interactive communication with people. The ability to interpret and react to the perception reveals the important necessity of developing such a system across both the modalities of Vision (V) and Language (L). Although there have been massive

An important objective of AI is to understand real-world observations and build up interactive communication with people. The ability to interpret and react to the perception reveals the important necessity of developing such a system across both the modalities of Vision (V) and Language (L). Although there have been massive efforts on various VL tasks, e.g., Image/Video Captioning, Visual Question Answering, and Textual Grounding, very few of them focus on building the VL models with increased efficiency under real-world scenarios. The main focus of this dissertation is to comprehensively investigate the very uncharted efficient VL learning, aiming to build lightweight, data-efficient, and real-world applicable VL models. The proposed studies in this dissertation take three primary aspects into account when it comes to efficient VL, 1). Data Efficiency: collecting task-specific annotations is prohibitively expensive and so manual labor is not always attainable. Techniques are developed to assist the VL learning from implicit supervision, i.e., in a weakly- supervised fashion. 2). Continuing from that, efficient representation learning is further explored with increased scalability, leveraging a large image-text corpus without task-specific annotations. In particular, the knowledge distillation technique is studied for generic Representation Learning which proves to bring substantial performance gain to the regular representation learning schema. 3). Architectural Efficiency. Deploying the VL model on edge devices is notoriously challenging due to their cumbersome architectures. To further extend these advancements to the real world, a novel efficient VL architecture is designed to tackle the inference bottleneck and the inconvenient two-stage training. Extensive discussions have been conducted on several critical aspects that prominently influence the performances of compact VL models.
ContributorsFang, Zhiyuan (Author) / Yang, Yezhou (Thesis advisor) / Baral, Chitta (Committee member) / Liu, Huan (Committee member) / Liu, Zicheng (Committee member) / Arizona State University (Publisher)
Created2022