Search Content

Towards Learning Representations in Visual Computing Tasks

Description

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle…

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.

The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.

In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.

ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2017

Content Detection in Handwritten Documents

Description

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a…

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a mathematical expression; in this case, the label would be math. This differentiation facilitates the performance of specific recognition tasks depending on the content type. We hypothesize that the recognition accuracy of the subsequent tasks such as textual, math, and shape recognition will increase, further leading to a better analysis of the document.

Content detection on handwritten documents assigns a particular class to a homogeneous portion of the document. To complete this task, a set of handwritten solutions was digitally collected from middle school students located in two different geographical regions in 2017 and 2018. This research discusses the methods to collect, pre-process and detect content type in the collected handwritten documents. A total of 4049 documents were extracted in the form of image, and json format; and were labelled using an object labelling software with tags being text, math, diagram, cross out, table, graph, tick mark, arrow, and doodle. The labelled images were fed to the Tensorflow’s object detection API to learn a neural network model. We show our results from two neural networks models, Faster Region-based Convolutional Neural Network (Faster R-CNN) and Single Shot detection model (SSD).

ContributorsFaizaan, Shaik Mohammed (Author) / VanLehn, Kurt (Thesis advisor) / Cheema, Salman Shaukat (Thesis advisor) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Pain-Inspired Intrinsic Reward For Deep Reinforcement Learning

Description

Reinforcement learning (RL) is a powerful methodology for teaching autonomous agents complex behaviors and skills. A critical component in most RL algorithms is the reward function -- a mathematical function that provides numerical estimates for desirable and undesirable states. Typically, the reward function must be hand-designed by a human expert…

Reinforcement learning (RL) is a powerful methodology for teaching autonomous agents complex behaviors and skills. A critical component in most RL algorithms is the reward function -- a mathematical function that provides numerical estimates for desirable and undesirable states. Typically, the reward function must be hand-designed by a human expert and, as a result, the scope of a robot's autonomy and ability to safely explore and learn in new and unforeseen environments is constrained by the specifics of the designed reward function. In this thesis, I design and implement a stateful collision anticipation model with powerful predictive capability based upon my research of sequential data modeling and modern recurrent neural networks. I also develop deep reinforcement learning methods whose rewards are generated by self-supervised training and intrinsic signals. The main objective is to work towards the development of resilient robots that can learn to anticipate and avoid damaging interactions by combining visual and proprioceptive cues from internal sensors. The introduced solutions are inspired by pain pathways in humans and animals, because such pathways are known to guide decision-making processes and promote self-preservation. A new "robot dodge ball' benchmark is introduced in order to test the validity of the developed algorithms in dynamic environments.

ContributorsRichardson, Trevor W (Author) / Ben Amor, Heni (Thesis advisor) / Yang, Yezhou (Committee member) / Srivastava, Siddharth (Committee member) / Arizona State University (Publisher)

Created2018

Machine Learning of Real and Pseudo Physics: Modeling Dynamical Systems

Description

The research presented in this Honors Thesis provides development in machine learning models which predict future states of a system with unknown dynamics, based on observations of the system. Two case studies are presented for (1) a non-conservative pendulum and (2) a differential game dictating a two-car uncontrolled intersection scenario.…

The research presented in this Honors Thesis provides development in machine learning models which predict future states of a system with unknown dynamics, based on observations of the system. Two case studies are presented for (1) a non-conservative pendulum and (2) a differential game dictating a two-car uncontrolled intersection scenario. In the paper we investigate how learning architectures can be manipulated for problem specific geometry. The result of this research provides that these problem specific models are valuable for accurate learning and predicting the dynamics of physics systems. In order to properly model the physics of a real pendulum, modifications were made to a prior architecture which was sufficient in modeling an ideal pendulum. The necessary modifications to the previous network [13] were problem specific and not transferrable to all other non-conservative physics scenarios. The modified architecture successfully models real pendulum dynamics. This case study provides a basis for future research in augmenting the symplectic gradient of a Hamiltonian energy function to provide a generalized, non-conservative physics model. A problem specific architecture was also utilized to create an accurate model for the two-car intersection case. The Costate Network proved to be an improvement from the previously used Value Network [17]. Note that this comparison is applied lightly due to slight implementation differences. The development of the Costate Network provides a basis for using characteristics to decompose functions and create a simplified learning problem. This paper is successful in creating new opportunities to develop physics models, in which the sample cases should be used as a guide for modeling other real and pseudo physics. Although the focused models in this paper are not generalizable, it is important to note that these cases provide direction for future research.

ContributorsMerry, Tanner (Author) / Ren, Yi (Thesis director) / Zhang, Wenlong (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Learning Scalable Dynamical Models for Predicting Atomic Structures of High-Entropy Alloys

Description

High-entropy alloys possessing mechanical, chemical, and electrical properties that far exceed those of conventional alloys have the potential to make a significant impact on many areas of engineering. Identifying element combinations and configurations to form these alloys, however, is a difficult, time-consuming, computationally intensive task. Machine learning has revolutionized many…

High-entropy alloys possessing mechanical, chemical, and electrical properties that far exceed those of conventional alloys have the potential to make a significant impact on many areas of engineering. Identifying element combinations and configurations to form these alloys, however, is a difficult, time-consuming, computationally intensive task. Machine learning has revolutionized many different fields due to its ability to generalize well to different problems and produce computationally efficient, accurate predictions regarding the system of interest. In this thesis, we demonstrate the effectiveness of machine learning models applied to toy cases representative of simplified physics that are relevant to high-entropy alloy simulation. We show these models are effective at learning nonlinear dynamics for single and multi-particle cases and that more work is needed to accurately represent complex cases in which the system dynamics are chaotic. This thesis serves as a demonstration of the potential benefits of machine learning applied to high-entropy alloy simulations to generate fast, accurate predictions of nonlinear dynamics.

ContributorsDaly, John H (Author) / Ren, Yi (Thesis director) / Zhuang, Houlong (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Toward Reliable Graph Matching: from Deterministic Optimization to Combinatorial Learning

Description

Graph matching is a fundamental but notoriously difficult problem due to its NP-hard nature, and serves as a cornerstone for a series of applications in machine learning and computer vision, such as image matching, dynamic routing, drug design, to name a few. Although there has been massive previous investigation on…

Graph matching is a fundamental but notoriously difficult problem due to its NP-hard nature, and serves as a cornerstone for a series of applications in machine learning and computer vision, such as image matching, dynamic routing, drug design, to name a few. Although there has been massive previous investigation on high-performance graph matching solvers, it still remains a challenging task to tackle the matching problem under real-world scenarios with severe graph uncertainty (e.g., noise, outlier, misleading or ambiguous link).In this dissertation, a main focus is to investigate the essence and propose solutions to graph matching with higher reliability under such uncertainty. To this end, the proposed research was conducted taking into account three perspectives related to reliable graph matching: modeling, optimization and learning. For modeling, graph matching is extended from typical quadratic assignment problem to a more generic mathematical model by introducing a specific family of separable function, achieving higher capacity and reliability. In terms of optimization, a novel high gradient-efficient determinant-based regularization technique is proposed in this research, showing high robustness against outliers. Then learning paradigm for graph matching under intrinsic combinatorial characteristics is explored. First, a study is conducted on the way of filling the gap between discrete problem and its continuous approximation under a deep learning framework. Then this dissertation continues to investigate the necessity of more reliable latent topology of graphs for matching, and propose an effective and flexible framework to obtain it. Coherent findings in this dissertation include theoretical study and several novel algorithms, with rich experiments demonstrating the effectiveness.

ContributorsYu, Tianshu (Author) / Li, Baoxin (Thesis advisor) / Wang, Yalin (Committee member) / Yang, Yezhou (Committee member) / Yang, Yingzhen (Committee member) / Arizona State University (Publisher)

Created2021

Uncertainty Quantification and Prognostics using Bayesian Statistics and Machine Learning

Description

Uncertainty quantification is critical for engineering design and analysis. Determining appropriate ways of dealing with uncertainties has been a constant challenge in engineering. Statistical methods provide a powerful aid to describe and understand uncertainties. This work focuses on applying Bayesian methods and machine learning in uncertainty quantification and prognostics among…

Uncertainty quantification is critical for engineering design and analysis. Determining appropriate ways of dealing with uncertainties has been a constant challenge in engineering. Statistical methods provide a powerful aid to describe and understand uncertainties. This work focuses on applying Bayesian methods and machine learning in uncertainty quantification and prognostics among all the statistical methods. This study focuses on the mechanical properties of materials, both static and fatigue, the main engineering field on which this study focuses. This work can be summarized in the following items: First, maintaining the safety of vintage pipelines requires accurately estimating the strength. The objective is to predict the reliability-based strength using nondestructive multimodality surface information. Bayesian model averaging (BMA) is implemented for fusing multimodality non-destructive testing results for gas pipeline strength estimation. Several incremental improvements are proposed in the algorithm implementation. Second, the objective is to develop a statistical uncertainty quantification method for fatigue stress-life (S-N) curves with sparse data.Hierarchical Bayesian data augmentation (HBDA) is proposed to integrate hierarchical Bayesian modeling (HBM) and Bayesian data augmentation (BDA) to deal with sparse data problems for fatigue S-N curves. The third objective is to develop a physics-guided machine learning model to overcome limitations in parametric regression models and classical machine learning models for fatigue data analysis. A Probabilistic Physics-guided Neural Network (PPgNN) is proposed for probabilistic fatigue S-N curve estimation. This model is further developed for missing data and arbitrary output distribution problems. Fourth, multi-fidelity modeling combines the advantages of low- and high-fidelity models to achieve a required accuracy at a reasonable computation cost. The fourth objective is to develop a neural network approach for multi-fidelity modeling by learning the correlation between low- and high-fidelity models. Finally, conclusions are drawn, and future work is outlined based on the current study.

ContributorsChen, Jie (Author) / Liu, Yongming (Thesis advisor) / Chattopadhyay, Aditi (Committee member) / Mignolet, Marc (Committee member) / Ren, Yi (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2022

Machine Learning and Mario Speedruns

Description

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will…

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will be described is self-play in video games, where a neural model will be researched and described that will teach a computer to complete a level of Super Mario World (1990) on its own. The neural model in question was inspired by the academic paper “Evolving Neural Networks through Augmenting Topologies”, which was written by Kenneth O. Stanley and Risto Miikkulainen of University of Texas at Austin. The model that will actually be described is from YouTuber SethBling of the California Institute of Technology. The second department that will be described is cybersecurity, where an algorithm is described from the academic paper “Process Based Volatile Memory Forensics for Ransomware Detection”, written by Asad Arfeen, Muhammad Asim Khan, Obad Zafar, and Usama Ahsan. This algorithm utilizes Python and the Volatility framework to detect malicious software in an infected system.

ContributorsBallecer, Joshua (Author) / Yang, Yezhou (Thesis director) / Luo, Yiran (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Towards Reliable Semantic Vision

Description

Models that learn from data are widely and rapidly being deployed today for real-world use, and have become an integral and embedded part of human lives. While these technological advances are exciting and impactful, such data-driven computer vision systems often fail in inscrutable ways. This dissertation seeks to study and…

Models that learn from data are widely and rapidly being deployed today for real-world use, and have become an integral and embedded part of human lives. While these technological advances are exciting and impactful, such data-driven computer vision systems often fail in inscrutable ways. This dissertation seeks to study and improve the reliability of machine learning models from several perspectives including the development of robust training algorithms to mitigate the risks of such failures, construction of new datasets that provide a new perspective on capabilities of vision models, and the design of evaluation metrics for re-calibrating the perception of performance improvements. I will first address distribution shift in image classification with the following contributions: (1) two methods for improving the robustness of image classifiers to distribution shift by leveraging the classifier's failures into an adversarial data transformation pipeline guided by domain knowledge, (2) an interpolation-based technique for flagging out-of-distribution samples, and (3) an intriguing trade-off between distributional and adversarial robustness resulting from data modification strategies. I will then explore reliability considerations for \textit{semantic vision} models that learn from both visual and natural language data; I will discuss how logical and semantic sentence transformations affect the performance of vision--language models and my contributions towards developing knowledge-guided learning algorithms to mitigate these failures. Finally, I will describe the effort towards building and evaluating complex reasoning capabilities of vision--language models towards the long-term goal of robust and reliable computer vision models that can communicate, collaborate, and reason with humans.

ContributorsGokhale, Tejas (Author) / Yang, Yezhou (Thesis advisor) / Baral, Chitta (Thesis advisor) / Ben Amor, Heni (Committee member) / Anirudh, Rushil (Committee member) / Arizona State University (Publisher)

Created2023

Three Facets of Online Political Networks: Communities, Antagonisms, and Polarization

Description

Millions of users leave digital traces of their political engagements on social media platforms every day. Users form networks of interactions, produce textual content, like and share each others' content. This creates an invaluable opportunity to better understand the political engagements of internet users. In this proposal, I present three…

Millions of users leave digital traces of their political engagements on social media platforms every day. Users form networks of interactions, produce textual content, like and share each others' content. This creates an invaluable opportunity to better understand the political engagements of internet users. In this proposal, I present three algorithmic solutions to three facets of online political networks; namely, detection of communities, antagonisms and the impact of certain types of accounts on political polarization. First, I develop a multi-view community detection algorithm to find politically pure communities. I find that word usage among other content types (i.e. hashtags, URLs) complement user interactions the best in accurately detecting communities.

Second, I focus on detecting negative linkages between politically motivated social media users. Major social media platforms do not facilitate their users with built-in negative interaction options. However, many political network analysis tasks rely on not only positive but also negative linkages. Here, I present the SocLSFact framework to detect negative linkages among social media users. It utilizes three pieces of information; sentiment cues of textual interactions, positive interactions, and socially balanced triads. I evaluate the contribution of each three aspects in negative link detection performance on multiple tasks.

Third, I propose an experimental setup that quantifies the polarization impact of automated accounts on Twitter retweet networks. I focus on a dataset of tragic Parkland shooting event and its aftermath. I show that when automated accounts are removed from the retweet network the network polarization decrease significantly, while a same number of accounts to the automated accounts are removed randomly the difference is not significant. I also find that prominent predictors of engagement of automatically generated content is not very different than what previous studies point out in general engaging content on social media. Last but not least, I identify accounts which self-disclose their automated nature in their profile by using expressions such as bot, chat-bot, or robot. I find that human engagement to self-disclosing accounts compared to non-disclosing automated accounts is much smaller. This observational finding can motivate further efforts into automated account detection research to prevent their unintended impact.

ContributorsOzer, Mert (Author) / Davulcu, Hasan (Thesis advisor) / Liu, Huan (Committee member) / Sen, Arunabha (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by