Matching Items (249)
Filtering by

Clear all filters

168842-Thumbnail Image.png
Description
There has been an explosion in the amount of data on the internet because of modern technology – especially image data – as a consequence of an exponential growth in the number of cameras existing in the world right now; from more extensive surveillance camera systems to billions of people

There has been an explosion in the amount of data on the internet because of modern technology – especially image data – as a consequence of an exponential growth in the number of cameras existing in the world right now; from more extensive surveillance camera systems to billions of people walking around with smartphones in their pockets that come with built-in cameras. With this sudden increase in the accessibility of cameras, most of the data that is getting captured through these devices is ending up on the internet. Researchers soon took leverage of this data by creating large-scale datasets. However, generating a dataset – let alone a large-scale one – requires a lot of man-hours. This work presents an algorithm that makes use of optical flow and feature matching, along with utilizing localization outputs from a Mask R-CNN, to generate large-scale vehicle datasets without much human supervision. Additionally, this work proposes a novel multi-view vehicle dataset (MVVdb) of 500 vehicles which is also generated using the aforementioned algorithm.There are various research problems in computer vision that can leverage a multi-view dataset, e.g., 3D pose estimation, and 3D object detection. On the other hand, a multi-view vehicle dataset can be used for a 2D image to 3D shape prediction, generation of 3D vehicle models, and even a more robust vehicle make and model recognition. In this work, a ResNet is trained on the multi-view vehicle dataset to perform vehicle re-identification, which is fundamentally similar to a vehicle make and recognition problem – also showcasing the usability of the MVVdb dataset.
ContributorsGuha, Anubhab (Author) / Yang, Yezhou (Thesis advisor) / Lu, Duo (Committee member) / Banerjee, Ayan (Committee member) / Arizona State University (Publisher)
Created2022
168367-Thumbnail Image.png
Description
In recent years, there has been significant progress in deep learning and computer vision, with many models proposed that have achieved state-of-art results on various image recognition tasks. However, to explore the full potential of the advances in this field, there is an urgent need to push the processing of

In recent years, there has been significant progress in deep learning and computer vision, with many models proposed that have achieved state-of-art results on various image recognition tasks. However, to explore the full potential of the advances in this field, there is an urgent need to push the processing of deep networks from the cloud to edge devices. Unfortunately, many deep learning models cannot be efficiently implemented on edge devices as these devices are severely resource-constrained. In this thesis, I present QU-Net, a lightweight binary segmentation model based on the U-Net architecture. Traditionally, neural networks consider the entire image to be significant. However, in real-world scenarios, many regions in an image do not contain any objects of significance. These regions can be removed from the original input allowing a network to focus on the relevant regions and thus reduce computational costs. QU-Net proposes the salient regions (binary mask) that the deeper models can use as the input. Experiments show that QU-Net helped achieve a computational reduction of 25% on the Microsoft Common Objects in Context (MS COCO) dataset and 57% on the Cityscapes dataset. Moreover, QU-Net is a generalizable model that outperforms other similar works, such as Dynamic Convolutions.
ContributorsSanthosh Kumar Varma, Rahul (Author) / Yang, Yezhou (Thesis advisor) / Fan, Deliang (Committee member) / Yang, Yingzhen (Committee member) / Arizona State University (Publisher)
Created2021
168694-Thumbnail Image.png
Description
Retinotopic map, the map between visual inputs on the retina and neuronal activation in brain visual areas, is one of the central topics in visual neuroscience. For human observers, the map is typically obtained by analyzing functional magnetic resonance imaging (fMRI) signals of cortical responses to slowly moving visual stimuli

Retinotopic map, the map between visual inputs on the retina and neuronal activation in brain visual areas, is one of the central topics in visual neuroscience. For human observers, the map is typically obtained by analyzing functional magnetic resonance imaging (fMRI) signals of cortical responses to slowly moving visual stimuli on the retina. Biological evidences show the retinotopic mapping is topology-preserving/topological (i.e. keep the neighboring relationship after human brain process) within each visual region. Unfortunately, due to limited spatial resolution and the signal-noise ratio of fMRI, state of art retinotopic map is not topological. The topic was to model the topology-preserving condition mathematically, fix non-topological retinotopic map with numerical methods, and improve the quality of retinotopic maps. The impose of topological condition, benefits several applications. With the topological retinotopic maps, one may have a better insight on human retinotopic maps, including better cortical magnification factor quantification, more precise description of retinotopic maps, and potentially better exam ways of in Ophthalmology clinic.
ContributorsTu, Yanshuai (Author) / Wang, Yalin (Thesis advisor) / Lu, Zhong-Lin (Committee member) / Crook, Sharon (Committee member) / Yang, Yezhou (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2022
168821-Thumbnail Image.png
Description
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.
ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)
Created2022
171562-Thumbnail Image.png
Description
Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose to utilize these opportunities to self-assess their learning progress and practice their skill. My objective in this thesis is to understand to what extent self-assess process can impact novice programmers learning and what advanced learning technologies can I provide to enhance the learner’s outcome and the progress. In this dissertation, I conducted a series of studies to investigate learning analytics and students’ behaviors in working on self-assessments and reflection opportunities. To enable this objective, I designed a personalized learning platform named QuizIT that provides daily quizzes to support learners in the computer science domain. QuizIT adopts an Open Social Student Model (OSSM) that supports personalized learning and serves as a self-assessment system. It aims to ignite self-regulating behavior and engage students in the self-assessment and reflective procedure. I designed and integrated the personalized practice recommender to the platform to investigate the self-assessment process. I also evaluated the self-assessment behavioral trails as a predictor to the students’ performance. The statistical indicators suggested that the distributed reflections were associated with the learner's performance. I proceeded to address whether distributed reflections enable self-regulating behavior and lead to better learning in CS introductory courses. From the student interactions with the system, I found distinct behavioral patterns that showed early signs of the learners' performance trajectory. The utilization of the personalized recommender improved the student’s engagement and performance in the self-assessment procedure. When I focused on enhancing reflections impact during self-assessment sessions through weekly opportunities, the learners in the CS domain showed better self-regulating learning behavior when utilizing those opportunities. The weekly reflections provided by the learners were able to capture more reflective features than the daily opportunities. Overall, this dissertation demonstrates the effectiveness of the learning technologies, including adaptive recommender and reflection, to support novice programming learners and their self-assessing processes.
ContributorsAlzaid, Mohammed (Author) / Hsiao, Ihan (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / VanLehn, Kurt (Committee member) / Nelson, Brian (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)
Created2022
171495-Thumbnail Image.png
Description
Multimodal reasoning is one of the most interesting research fields because of the ability to interact with systems and the explainability of the models' behavior. Traditional multimodal research problems do not focus on complex commonsense reasoning (such as physical interactions). Although real-world objects have physical properties associated with them,

Multimodal reasoning is one of the most interesting research fields because of the ability to interact with systems and the explainability of the models' behavior. Traditional multimodal research problems do not focus on complex commonsense reasoning (such as physical interactions). Although real-world objects have physical properties associated with them, many of these properties (such as mass and coefficient of friction) are not captured directly by the imaging pipeline. Videos often capture objects, their motion, and the interactions between different objects. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. This thesis introduces a new video question-answering task for reasoning about the implicit physical properties of objects in a scene, from videos. For this task, I introduce a dataset -- CRIPP-VQA (Counterfactual Reasoning about Implicit Physical Properties - Video Question Answering), which contains videos of objects in motion, annotated with hypothetical/counterfactual questions about the effect of actions (such as removing, adding, or replacing objects), questions about planning (choosing actions to perform to reach a particular goal), as well as descriptive questions about the visible properties of objects. Further, I benchmark the performance of existing video question-answering models on two test settings of CRIPP-VQA: i.i.d. and an out-of-distribution setting which contains objects with values of mass, coefficient of friction, and initial velocities that are not seen in the training distribution. Experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this thesis) and explicit properties (the focus of prior work) of objects.
ContributorsPatel, Maitreya Jitendra (Author) / Yang, Yezhou (Thesis advisor) / Baral, Chitta (Committee member) / Lee, Kookjin (Committee member) / Arizona State University (Publisher)
Created2022
171980-Thumbnail Image.png
Description
The increasing availability of data and advances in computation have spurred the development of data-driven approaches for modeling complex dynamical systems. These approaches are based on the idea that the underlying structure of a complex system can be discovered from data using mathematical and computational techniques. They also show promise

The increasing availability of data and advances in computation have spurred the development of data-driven approaches for modeling complex dynamical systems. These approaches are based on the idea that the underlying structure of a complex system can be discovered from data using mathematical and computational techniques. They also show promise for addressing the challenges of modeling high-dimensional, nonlinear systems with limited data. In this research expository, the state of the art in data-driven approaches for modeling complex dynamical systems is surveyed in a systemic way. First the general formulation of data-driven modeling of dynamical systems is discussed. Then several representative methods in feature engineering and system identification/prediction are reviewed, including recent advances and key challenges.
ContributorsShi, Wenlong (Author) / Ren, Yi (Thesis advisor) / Hong, Qijun (Committee member) / Jiao, Yang (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022
171430-Thumbnail Image.png
Description
To date, there is not a standardized method for consistently quantifying the performance of an automated driving system (ADS)-equipped vehicle (AV). The purpose of this dissertation is to contribute to a framework for such an approach referred to throughout as the operational safety assessment (OSA) methodology. Through this research, safety

To date, there is not a standardized method for consistently quantifying the performance of an automated driving system (ADS)-equipped vehicle (AV). The purpose of this dissertation is to contribute to a framework for such an approach referred to throughout as the operational safety assessment (OSA) methodology. Through this research, safety metrics are identified, researched, and analyzed to capture aspects of the operational safety of AVs, interacting with other salient objects. This dissertation outlines the approach for developing this methodology through a series of key steps including: (1) comprehensive literature review; (2) research and refinement of OSA metrics; (3) generation of MATLAB script for metric calculations; (4) generation of simulated events for analysis; (5) collection of real-world data for analysis; (6) review of OSA methodology results; and (7) discussion of future work to expand complexity, fidelity, and relevance aspects of the OSA methodology. The detailed literature review includes the identification of metrics historically used in both traditional and more recent evaluations of vehicle performance. Subsequently, the metric formulations are refined, and robust severity evaluations are proposed. A MATLAB script is then presented which was generated to calculate the metrics from any given source assuming proper formatting of the data. To further refine the formulations and the MATLAB script, a variety of simulated scenarios are discussed including car-following, intersection, and lane change situations. Additionally, a data collection activity is presented, leveraging the SMARTDRIVE testbed operated by Maricopa County Department of Transportation in Anthem, AZ to collect real-world data from an active intersection. Lastly, the efficacy of the OSA methodology with respect to the evaluation of vehicle performance for a set of scenarios is evaluated utilizing both simulated and real-world data. This assessment provides a demonstration of the ability and robustness of this methodology to evaluate vehicle performance for a given scenario. At the conclusion of this dissertation, additional factors including fidelity, complexity, and relevance are explored to contribute to a more comprehensive evaluation.
ContributorsComo, Steven Gerard (Author) / Wishart, Jeffrey (Thesis advisor) / Yang, Yezhou (Thesis advisor) / Chen, Yan (Committee member) / Favaro, Francesca (Committee member) / Arizona State University (Publisher)
Created2022
171921-Thumbnail Image.png
Description
With the bloom of machine learning, a massive amount of data has been used in the training process of machine learning. A tremendous amount of this data is user-generated data which allows the machine learning models to produce accurate results and personalized services. Nevertheless, I recognize the importance of preserving

With the bloom of machine learning, a massive amount of data has been used in the training process of machine learning. A tremendous amount of this data is user-generated data which allows the machine learning models to produce accurate results and personalized services. Nevertheless, I recognize the importance of preserving the privacy of individuals by protecting their information in the training process. One privacy attack that affects individuals is the private attribute inference attack. The private attribute attack is the process of inferring individuals' information that they do not explicitly reveal, such as age, gender, location, and occupation. The impacts of this go beyond knowing the information as individuals face potential risks. Furthermore, some applications need sensitive data to train the models and predict helpful insights and figuring out how to build privacy-preserving machine learning models will increase the capabilities of these applications.However, improving privacy affects the data utility which leads to a dilemma between privacy and utility. The utility of the data is measured by the quality of the data for different tasks. This trade-off between privacy and utility needs to be maintained to satisfy the privacy requirement and the result quality. To achieve more scalable privacy-preserving machine learning models, I investigate the privacy risks that affect individuals' private information in distributed machine learning. Even though the distributed machine learning has been driven by privacy concerns, privacy issues have been proposed in the literature which threaten individuals' privacy. In this dissertation, I investigate how to measure and protect individuals' privacy in centralized and distributed machine learning models. First, a privacy-preserving text representation learning is proposed to protect users' privacy that can be revealed from user generated data. Second, a novel privacy-preserving text classification for split learning is presented to improve users' privacy and retain high utility by defending against private attribute inference attacks.
ContributorsAlnasser, Walaa (Author) / Liu, Huan (Thesis advisor) / Davulcu, Hasan (Committee member) / Shu, Kai (Committee member) / Bao, Tiffany (Committee member) / Arizona State University (Publisher)
Created2022
190719-Thumbnail Image.png
Description
Social media platforms provide a rich environment for analyzing user behavior. Recently, deep learning-based methods have been a mainstream approach for social media analysis models involving complex patterns. However, these methods are susceptible to biases in the training data, such as participation inequality. Basically, a mere 1% of users generate

Social media platforms provide a rich environment for analyzing user behavior. Recently, deep learning-based methods have been a mainstream approach for social media analysis models involving complex patterns. However, these methods are susceptible to biases in the training data, such as participation inequality. Basically, a mere 1% of users generate the majority of the content on social networking sites, while the remaining users, though engaged to varying degrees, tend to be less active in content creation and largely silent. These silent users consume and listen to information that is propagated on the platform.However, their voice, attitude, and interests are not reflected in the online content, making the decision of the current methods predisposed towards the opinion of the active users. So models can mistake the loudest users for the majority. To make the silent majority heard is to reveal the true landscape of the platform. In this dissertation, to compensate for this bias in the data, which is related to user-level data scarcity, I introduce three pieces of research work. Two of these proposed solutions deal with the data on hand while the other tries to augment the current data. Specifically, the first proposed approach modifies the weight of users' activity/interaction in the input space, while the second approach involves re-weighting the loss based on the users' activity levels during the downstream task training. Lastly, the third approach uses large language models (LLMs) and learns the user's writing behavior to expand the current data. In other words, by utilizing LLMs as a sophisticated knowledge base, this method aims to augment the silent user's data.
ContributorsKarami, Mansooreh (Author) / Liu, Huan (Thesis advisor) / Sen, Arunabha (Committee member) / Davulcu, Hasan (Committee member) / Mancenido, Michelle V. (Committee member) / Arizona State University (Publisher)
Created2023