Search Content

Video Captioning with Commonsense Knowledge Anchors

Description

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents…

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.

ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)

Created2022

Vehicle Re-identification Using a Multi-View Vehicle Dataset

Description

There has been an explosion in the amount of data on the internet because of modern technology – especially image data – as a consequence of an exponential growth in the number of cameras existing in the world right now; from more extensive surveillance camera systems to billions of people…

There has been an explosion in the amount of data on the internet because of modern technology – especially image data – as a consequence of an exponential growth in the number of cameras existing in the world right now; from more extensive surveillance camera systems to billions of people walking around with smartphones in their pockets that come with built-in cameras. With this sudden increase in the accessibility of cameras, most of the data that is getting captured through these devices is ending up on the internet. Researchers soon took leverage of this data by creating large-scale datasets. However, generating a dataset – let alone a large-scale one – requires a lot of man-hours. This work presents an algorithm that makes use of optical flow and feature matching, along with utilizing localization outputs from a Mask R-CNN, to generate large-scale vehicle datasets without much human supervision. Additionally, this work proposes a novel multi-view vehicle dataset (MVVdb) of 500 vehicles which is also generated using the aforementioned algorithm.There are various research problems in computer vision that can leverage a multi-view dataset, e.g., 3D pose estimation, and 3D object detection. On the other hand, a multi-view vehicle dataset can be used for a 2D image to 3D shape prediction, generation of 3D vehicle models, and even a more robust vehicle make and model recognition. In this work, a ResNet is trained on the multi-view vehicle dataset to perform vehicle re-identification, which is fundamentally similar to a vehicle make and recognition problem – also showcasing the usability of the MVVdb dataset.

ContributorsGuha, Anubhab (Author) / Yang, Yezhou (Thesis advisor) / Lu, Duo (Committee member) / Banerjee, Ayan (Committee member) / Arizona State University (Publisher)

Created2022

Context-aware rank-oriented recommender systems

Description

Recommender systems are a type of information filtering system that suggests items that may be of interest to a user. Most information retrieval systems have an overwhelmingly large number of entries. Most users would experience information overload if they were forced to explore the full set of results. The goal…

Recommender systems are a type of information filtering system that suggests items that may be of interest to a user. Most information retrieval systems have an overwhelmingly large number of entries. Most users would experience information overload if they were forced to explore the full set of results. The goal of recommender systems is to overcome this limitation by predicting how users will value certain items and returning the items that should be of the highest interest to the user. Most recommender systems collect explicit user feedback, such as a rating, and attempt to optimize their model to this rating value. However, there is potential for a system to collect implicit user feedback, such as user purchases and clicks, to learn user preferences. Additionally with implicit user feedback, it is possible for the system to remember the context of user feedback in terms of which other items a user was considering when making their decisions. When considering implicit user feedback, only a subset of all evaluation techniques can be used. Currently, sufficient evaluation techniques for evaluating implicit user feedback do not exist. In this thesis, I introduce a new model for recommendation that borrows the idea of opportunity cost from economics. There are two variations of the model, one considering context and one that does not. Additionally, I propose a new evaluation measure that works specifically for the case of implicit user feedback.

ContributorsAckerman, Brian (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2012

A 1.2V 25MSPS pipelined ADC using split CLS with Op-amp sharing

Description

ABSTRACT As the technology length shrinks down, achieving higher gain is becoming very difficult in deep sub-micron technologies. As the supply voltages drop, cascodes are very difficult to implement and cascade amplifiers are needed to achieve sufficient gain with required output swing. This sets the fundamental limit on the SNR…

ABSTRACT As the technology length shrinks down, achieving higher gain is becoming very difficult in deep sub-micron technologies. As the supply voltages drop, cascodes are very difficult to implement and cascade amplifiers are needed to achieve sufficient gain with required output swing. This sets the fundamental limit on the SNR and hence the maximum resolution that can be achieved by ADC. With the RSD algorithm and the range overlap, the sub ADC can tolerate large comparator offsets leaving the linearity and accuracy requirement for the DAC and residue gain stage. Typically, the multiplying DAC requires high gain wide bandwidth op-amp and the design of this high gain op-amp becomes challenging in the deep submicron technologies. This work presents `A 12 bit 25MSPS 1.2V pipelined ADC using split CLS technique' in IBM 130nm 8HP process using only CMOS devices for the application of Large Hadron Collider (LHC). CLS technique relaxes the gain requirement of op-amp and improves the signal-to-noise ratio without increase in power or input sampling capacitor with rail-to-rail swing. An op-amp sharing technique has been incorporated with split CLS technique which decreases the number of op-amps and hence the power further. Entire pipelined converter has been implemented as six 2.5 bit RSD stages and hence decreases the latency associated with the pipelined architecture - one of the main requirements for LHC along with the power requirement. Two different OTAs have been designed to use in the split-CLS technique. Bootstrap switches and pass gate switches are used in the circuit along with a low power dynamic kick-back compensated comparator.

ContributorsSwaminathan, Visu Vaithiyanathan (Author) / Barnaby, Hugh (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Christen, Jennifer Blain (Committee member) / Arizona State University (Publisher)

Created2012

Towards Robust VQA: Evaluations and Methods

Description

Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on textual distribution shifts rather than joint shifts across modalities. This…

Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on textual distribution shifts rather than joint shifts across modalities. This is suboptimal for properly assessing model robustness and generalization. To address this gap, a novel multi-modal VQA benchmark dataset is introduced for the first time. This dataset combines both visual and textual distribution shifts across training and test sets. Using this challenging benchmark exposes vulnerabilities in existing models relying on spurious correlations and overfitting to dataset biases. The novel dataset advances the field by enabling more robust model training and rigorous evaluation of multi-modal distribution shift generalization. In addition, a new few-shot multi-modal prompt fusion model is proposed to better adapt models for downstream VQA tasks. The model incorporates a prompt encoder module and dual-path design to align and fuse image and text prompts. This represents a novel prompt learning approach tailored for multi-modal learning across vision and language. Together, the introduced benchmark dataset and prompt fusion model address key limitations around evaluating and improving VQA model robustness. The work expands the methodology for training models resilient to multi-modal distribution shifts.

ContributorsJyothi Unni, Suraj (Author) / Liu, Huan (Thesis advisor) / Davalcu, Hasan (Committee member) / Bryan, Chris (Committee member) / Arizona State University (Publisher)

Created2023

A Scenario-Based Test Selection and Scoring Methodology for Inclusion Into a Safety Case Framework for Automated Vehicles

Description

The need for robust verification and validation of automated vehicles (AVs) to ensure driving safety grows more urgent as increasing numbers of AVs are allowed to operate on open roads. To address this need, AV developers can present a safety case to regulators and the public that provides an evidence-based…

The need for robust verification and validation of automated vehicles (AVs) to ensure driving safety grows more urgent as increasing numbers of AVs are allowed to operate on open roads. To address this need, AV developers can present a safety case to regulators and the public that provides an evidence-based justification of their assertion that an AV is safe to operate on open roads. This work aims to describe the development of a scenario-based testing methodology that contributes to this safety case. A high-level definition of this test selection and scoring methodology (TSSM) is first presented, along with an outline of its scope and key ideas. This is followed by a literature review that details the current state of the art in AV testing, including the driving performance metrics and equations that provide a basis for the TSSM. A chart-based method for quantifying an AV’s operational design domain (ODD) and behavioral competency portfolio is then described that provides the foundation for a scenario generation and filtration process. After outlining a method for the AV to progress through increasingly robust test methods based on its current technology readiness level (TRL), the generation and filtration of two sets of scenarios by the TSSM is outlined: a standardized set that can be used to compare the performance of vehicles with identical ODD and behavioral competency portfolios, and a set containing high-relevance scenarios that is partially randomized to ensure test integrity. A related framework for incorporating testing on open roads is subsequently specified. An equation for an overall AV driving performance score is then defined that quantifies the aggregate performance of the AV across all generated scenarios. The TSSM continues according to an iterative process, which includes a method for exploring edge and corner scenarios, until a stopping condition is achieved. Two proofs of concept are provided: a demonstration of the ability of the TSSM to pare scenarios from a preexisting database, and an example ODD and behavioral competency portfolio specification form. Finally, this work concludes by evaluating the TSSM and its proofs of concept and outlining possible future work on the methodology.

ContributorsO'Malley, Gavin (Author) / Wishart, Jeffrey (Thesis advisor) / Zhao, Junfeng (Thesis advisor) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2023

Built-in Self-Test for RF Impedance Measurement

Description

Impedance is one of the fundamental properties of electrical components, materials, and waves. Therefore, impedance measurement and monitoring have a wide range of applications. The multi-port technique is a natural candidate for impedance measurement and monitoring due to its low overhead and ease of implementation for Built-in Self-Test (BIST) applications.…

Impedance is one of the fundamental properties of electrical components, materials, and waves. Therefore, impedance measurement and monitoring have a wide range of applications. The multi-port technique is a natural candidate for impedance measurement and monitoring due to its low overhead and ease of implementation for Built-in Self-Test (BIST) applications. The multi-port technique can measure complex reflection coefficients, thus impedance, by using scalar measurements provided by the power detectors. These power detectors are strategically placed on different points (ports) of a passive network to produce unique solution. Impedance measurement and monitoring is readily deployed on mobile phone radio-frequency (RF) front ends, and are combined with antenna tuners to boost the signal reception capabilities of phones. These sensors also can be used in self-healing circuits to improve their yield and performance under process, voltage, and temperature variations. Even though, this work is preliminary interested in low-overhead impedance measurement for RF circuit applications, the proposed methods can be used in a wide variety of metrology applications where impedance measurements are already used. Some examples of these applications include determining material properties, plasma generation, and moisture detection. Additionally, multi-port applications extend beyond the impedance measurement. There are applications where multi-ports are used as receivers for communication systems, RADARs, and remote sensing applications. The multi-port technique generally requires a careful design of the testing structure to produce a unique solution from power detector measurements. It also requires the use of nonlinear solvers during calibration, and depending on calibration procedure, measurement. The use of nonlinear solvers generates issues for convergence, computational complexity, and resources needed for carrying out calibrations and measurements in a timely manner. In this work, using periodic structures, a structure where a circuit block repeats itself, for multi-port measurements is proposed. The periodic structures introduce a new constraint that simplifies the multi-port theory and leads to an explicit calibration and measurement procedure. Unlike the existing calibration procedures which require at least five loads and various constraints on the load for explicit solution, the proposed method can use three loads for calibration. Multi-ports built with periodic structures will always produce a unique measurement result. This leads to increased bandwidth of operation and simplifies design procedure. The efficacy of the method demonstrated in two embodiments. In the first embodiment, a multi-port is directly embedded into a matching network to measure impedance of the load. In the second embodiment, periodic structures are used to compare two loads without requiring any calibration.

ContributorsAvci, Muslum Emir (Author) / Ozev, Sule (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Kitchen, Jennifer (Committee member) / Trichopoulos, Georgios (Committee member) / Arizona State University (Publisher)

Created2023

X-Band and K-Band Balanced Power Amplifiers for Small Satellite Applications

Description

This work presents two balanced power amplifier (PA) architectures, one at X-band and the other at K-band. The presented balanced PAs are designed for use in small satellite and cube satellite applications.The presented X-band PA employs wideband hybrid couplers to split input power to two commercial off-the-shelf (COTS) Gallium Nitride…

This work presents two balanced power amplifier (PA) architectures, one at X-band and the other at K-band. The presented balanced PAs are designed for use in small satellite and cube satellite applications.The presented X-band PA employs wideband hybrid couplers to split input power to two commercial off-the-shelf (COTS) Gallium Nitride (GaN) monolithic microwave integrated circuit (MMIC) PAs and combine their output powers. The presented X-band balanced PA manufactured on a Rogers 4003C substrate yields increased small signal gain and saturated output power under continuous wave (CW) operation compared to the single MMIC PA used in the design under pulsed operation. The presented PA operates from 7.5 GHz to 11.5 GHz, has a maximum small signal gain of 36.3 dB, a maximum saturated power out of 40.0 dBm, and a maximum power added efficiency (PAE) of 38%. Both a Wilkinson and a Gysel splitter and combiner are designed for use at K-band and their performance is compared. The presented K-band balanced PA uses Gysel power dividers and combiners with a GaN MMIC PA that is soon to be released in production.

ContributorsPearson, Katherine Elizabeth (Author) / Kitchen, Jennifer (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Ozev, Sule (Committee member) / Arizona State University (Publisher)

Created2023

Investigating the Role of Silent Users on Social Media

Description

Social media platforms provide a rich environment for analyzing user behavior. Recently, deep learning-based methods have been a mainstream approach for social media analysis models involving complex patterns. However, these methods are susceptible to biases in the training data, such as participation inequality. Basically, a mere 1% of users generate…

Social media platforms provide a rich environment for analyzing user behavior. Recently, deep learning-based methods have been a mainstream approach for social media analysis models involving complex patterns. However, these methods are susceptible to biases in the training data, such as participation inequality. Basically, a mere 1% of users generate the majority of the content on social networking sites, while the remaining users, though engaged to varying degrees, tend to be less active in content creation and largely silent. These silent users consume and listen to information that is propagated on the platform.However, their voice, attitude, and interests are not reflected in the online content, making the decision of the current methods predisposed towards the opinion of the active users. So models can mistake the loudest users for the majority. To make the silent majority heard is to reveal the true landscape of the platform. In this dissertation, to compensate for this bias in the data, which is related to user-level data scarcity, I introduce three pieces of research work. Two of these proposed solutions deal with the data on hand while the other tries to augment the current data. Specifically, the first proposed approach modifies the weight of users' activity/interaction in the input space, while the second approach involves re-weighting the loss based on the users' activity levels during the downstream task training. Lastly, the third approach uses large language models (LLMs) and learns the user's writing behavior to expand the current data. In other words, by utilizing LLMs as a sophisticated knowledge base, this method aims to augment the silent user's data.

ContributorsKarami, Mansooreh (Author) / Liu, Huan (Thesis advisor) / Sen, Arunabha (Committee member) / Davulcu, Hasan (Committee member) / Mancenido, Michelle V. (Committee member) / Arizona State University (Publisher)

Created2023

High Dynamic Range Power Amplifiers to Support Modern Communication Standards

Description

Recent advancements in communication standards, such as 5G demand transmitter hardware to support high data rates with high energy efficiency. With the revolution of communication standards, modulation schemes have become more complex and require high peak-to-average (PAPR) signals. In wireless transceiver hardware, the power amplifier (PA) consumes most of the…

Recent advancements in communication standards, such as 5G demand transmitter hardware to support high data rates with high energy efficiency. With the revolution of communication standards, modulation schemes have become more complex and require high peak-to-average (PAPR) signals. In wireless transceiver hardware, the power amplifier (PA) consumes most of the transceiver’s DC power and is typically the bottleneck for transmitter linearity. Therefore, the transmitter’s performance directly depends on the PA. To support high PAPR signals, the PA must operate efficiently at its saturated and backoff output power. Maintaining high efficiency at both peak and backoff output power is challenging. One effective technique for addressing this problem is load modulation. Some of the prominent load-modulated PA architectures are outphasing PAs, load-modulated balanced amplifiers (LMBA), envelope elimination and restoration (EER), envelope tracking (ET), Doherty power amplifiers (DPA), and polar transmitters. Amongst them, the DPA is the most popular for infrastructure applications due to its simpler architecture compared to other techniques and linearizability with digital pre-distortion (DPD). Another crucial characteristic of progressing communication standards is wide signal bandwidths. High-efficiency power amplifiers like class J/F/F-1 and load-modulated PAs like the DPA exhibit narrowband performance because the amplifiers require precise output impedance terminations. Therefore, it is equally essential to develop adaptable PA solutions to process radio frequency (RF) signals with wide bandwidths. To support modern and future cellular infrastructure, RF PAs need to be innovated to increase the backoff power efficiency by two times or more and support ten times or more wider bandwidths than current state-of-the-art PAs. This work presents five RF PA analyses and implementations to support future wireless communications transmitter hardware. Chapter 2 presents an optimized output-matching network analysis and design to achieve extended output power backoff of the DPA. Chapters 3 and 4 unveil two bandwidth enhancement techniques for the DPA while maintaining extended output power backoff. Chapter 5 exhibits a dual-band hybrid mode PA design targeted for wideband applications. Chapter 6 presents a built-in self-test circuit integrated into a PA for output impedance monitoring. This can alleviate the PA performance degradation due to the variation in the PA's output load over frequency, process, and aging. All RF PAs in this dissertation are implemented using Gallium Nitride (GaN)-based high electron mobility transistors (HEMT), and the realized designs validate the proposed PAs' theories/architectures.

ContributorsRoychowdhury, Debatrayee (Author) / Kitchen, Jennifer (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Ozev, Sule (Committee member) / Aberle, James (Committee member) / Arizona State University (Publisher)

Created2024

Filtering by