Search Content

Matching Items (2)

Filtering by

Status: Published

Building Vision and Language Models with Implicit Supervision and Increased Efficiency

Description

An important objective of AI is to understand real-world observations and build up interactive communication with people. The ability to interpret and react to the perception reveals the important necessity of developing such a system across both the modalities of Vision (V) and Language (L). Although there have been massive efforts on various VL tasks, e.g., Image/Video Captioning, Visual Question Answering, and Textual Grounding, very few of them focus on building the VL models with increased efficiency under real-world scenarios. The main focus of this dissertation is to comprehensively investigate the very uncharted efficient VL learning, aiming to build lightweight, data-efficient, and real-world applicable VL models. The proposed studies in this dissertation take three primary aspects into account when it comes to efficient VL, 1). Data Efficiency: collecting task-specific annotations is prohibitively expensive and so manual labor is not always attainable. Techniques are developed to assist the VL learning from implicit supervision, i.e., in a weakly- supervised fashion. 2). Continuing from that, efficient representation learning is further explored with increased scalability, leveraging a large image-text corpus without task-specific annotations. In particular, the knowledge distillation technique is studied for generic Representation Learning which proves to bring substantial performance gain to the regular representation learning schema. 3). Architectural Efficiency. Deploying the VL model on edge devices is notoriously challenging due to their cumbersome architectures. To further extend these advancements to the real world, a novel efficient VL architecture is designed to tackle the inference bottleneck and the inconvenient two-stage training. Extensive discussions have been conducted on several critical aspects that prominently influence the performances of compact VL models.

ContributorsFang, Zhiyuan (Author) / Yang, Yezhou (Thesis advisor) / Baral, Chitta (Committee member) / Liu, Huan (Committee member) / Liu, Zicheng (Committee member) / Arizona State University (Publisher)

Created2022

Simultaneous Two-Color Lasing in a Single CdSSe Heterostructure Nanosheet

Description

The ability of a single monolithic semiconductor structure to emit or lase in a broad spectrum range is of great importance for many applications such as solid-state lighting and multi-spectrum detection. But spectral range of a laser or light-emitting diode made of a given semiconductor is typically limited by its emission or gain bandwidth. Due to lattice mismatch, it is typically difficult to grow thin film or bulk materials with very different bandgaps in a monolithic fashion. But nanomaterials such as nanowires, nanobelts, nanosheets provide a unique opportunity. Here we report our experimental results demonstrating simultaneous lasing in two visible colors at 526 and 623 nm from a single CdSSe heterostructure nanosheet at room temperature. The 97 nm wavelength separation of the two colors is significantly larger than the gain bandwidth of a typical single II-VI semiconductor material. Such lasing and light emission in a wide spectrum range from a single monolithic structure will have important applications mentioned above.

ContributorsFan, Fan (Author) / Liu, Zicheng (Author) / Yin, Leijun (Author) / Nichols, Patricia L. (Author) / Ning, H. (Author) / Turkdogan, Sunay (Author) / Ning, Cun-Zheng (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2013-10-28