Matching Items (55)
Filtering by

Clear all filters

156084-Thumbnail Image.png
Description
The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.

The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.

In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.
ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2017
156586-Thumbnail Image.png
Description
Image Understanding is a long-established discipline in computer vision, which encompasses a body of advanced image processing techniques, that are used to locate (“where”), characterize and recognize (“what”) objects, regions, and their attributes in the image. However, the notion of “understanding” (and the goal of artificial intelligent machines) goes beyond

Image Understanding is a long-established discipline in computer vision, which encompasses a body of advanced image processing techniques, that are used to locate (“where”), characterize and recognize (“what”) objects, regions, and their attributes in the image. However, the notion of “understanding” (and the goal of artificial intelligent machines) goes beyond factual recall of the recognized components and includes reasoning and thinking beyond what can be seen (or perceived). Understanding is often evaluated by asking questions of increasing difficulty. Thus, the expected functionalities of an intelligent Image Understanding system can be expressed in terms of the functionalities that are required to answer questions about an image. Answering questions about images require primarily three components: Image Understanding, question (natural language) understanding, and reasoning based on knowledge. Any question, asking beyond what can be directly seen, requires modeling of commonsense (or background/ontological/factual) knowledge and reasoning.

Knowledge and reasoning have seen scarce use in image understanding applications. In this thesis, we demonstrate the utilities of incorporating background knowledge and using explicit reasoning in image understanding applications. We first present a comprehensive survey of the previous work that utilized background knowledge and reasoning in understanding images. This survey outlines the limited use of commonsense knowledge in high-level applications. We then present a set of vision and reasoning-based methods to solve several applications and show that these approaches benefit in terms of accuracy and interpretability from the explicit use of knowledge and reasoning. We propose novel knowledge representations of image, knowledge acquisition methods, and a new implementation of an efficient probabilistic logical reasoning engine that can utilize publicly available commonsense knowledge to solve applications such as visual question answering, image puzzles. Additionally, we identify the need for new datasets that explicitly require external commonsense knowledge to solve. We propose the new task of Image Riddles, which requires a combination of vision, and reasoning based on ontological knowledge; and we collect a sufficiently large dataset to serve as an ideal testbed for vision and reasoning research. Lastly, we propose end-to-end deep architectures that can combine vision, knowledge and reasoning modules together and achieve large performance boosts over state-of-the-art methods.
ContributorsAditya, Somak (Author) / Baral, Chitta (Thesis advisor) / Yang, Yezhou (Thesis advisor) / Aloimonos, Yiannis (Committee member) / Lee, Joohyung (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2018
157413-Thumbnail Image.png
Description
Rapid growth of internet and connected devices ranging from cloud systems to internet of things have raised critical concerns for securing these systems. In the recent past, security attacks on different kinds of devices have evolved in terms of complexity and diversity. One of the challenges is establishing secure communication

Rapid growth of internet and connected devices ranging from cloud systems to internet of things have raised critical concerns for securing these systems. In the recent past, security attacks on different kinds of devices have evolved in terms of complexity and diversity. One of the challenges is establishing secure communication in the network among various devices and systems. Despite being protected with authentication and encryption, the network still needs to be protected against cyber-attacks. For this, the network traffic has to be closely monitored and should detect anomalies and intrusions. Intrusion detection can be categorized as a network traffic classification problem in machine learning. Existing network traffic classification methods require a lot of training and data preprocessing, and this problem is more serious if the dataset size is huge. In addition, the machine learning and deep learning methods that have been used so far were trained on datasets that contain obsolete attacks. In this thesis, these problems are addressed by using ensemble methods applied on an up to date network attacks dataset. Ensemble methods use multiple learning algorithms to get better classification accuracy that could be obtained when the corresponding learning algorithm is applied alone. This dataset for network traffic classification has recent attack scenarios and contains over fifteen attacks. This approach shows that ensemble methods can be used to classify network traffic and detect intrusions with less training times of the model, and lesser pre-processing without feature selection. In addition, this thesis also shows that only with less than ten percent of the total features of input dataset will lead to similar accuracy that is achieved on whole dataset. This can heavily reduce the training times and classification duration in real-time scenarios.
ContributorsPonneganti, Ramu (Author) / Yau, Stephen (Thesis advisor) / Richa, Andrea (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2019
156771-Thumbnail Image.png
Description
Reinforcement learning (RL) is a powerful methodology for teaching autonomous agents complex behaviors and skills. A critical component in most RL algorithms is the reward function -- a mathematical function that provides numerical estimates for desirable and undesirable states. Typically, the reward function must be hand-designed by a human expert

Reinforcement learning (RL) is a powerful methodology for teaching autonomous agents complex behaviors and skills. A critical component in most RL algorithms is the reward function -- a mathematical function that provides numerical estimates for desirable and undesirable states. Typically, the reward function must be hand-designed by a human expert and, as a result, the scope of a robot's autonomy and ability to safely explore and learn in new and unforeseen environments is constrained by the specifics of the designed reward function. In this thesis, I design and implement a stateful collision anticipation model with powerful predictive capability based upon my research of sequential data modeling and modern recurrent neural networks. I also develop deep reinforcement learning methods whose rewards are generated by self-supervised training and intrinsic signals. The main objective is to work towards the development of resilient robots that can learn to anticipate and avoid damaging interactions by combining visual and proprioceptive cues from internal sensors. The introduced solutions are inspired by pain pathways in humans and animals, because such pathways are known to guide decision-making processes and promote self-preservation. A new "robot dodge ball' benchmark is introduced in order to test the validity of the developed algorithms in dynamic environments.
ContributorsRichardson, Trevor W (Author) / Ben Amor, Heni (Thesis advisor) / Yang, Yezhou (Committee member) / Srivastava, Siddharth (Committee member) / Arizona State University (Publisher)
Created2018
156869-Thumbnail Image.png
Description
Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task

Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-à-vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated.

Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated.

Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset.
ContributorsSaha, Rudra (Author) / Yang, Yezhou (Thesis advisor) / Singh, Maneesh Kumar (Committee member) / Baral, Chitta (Committee member) / Arizona State University (Publisher)
Created2018
133413-Thumbnail Image.png
Description
Catastrophe events occur rather infrequently, but upon their occurrence, can lead to colossal losses for insurance companies. Due to their size and volatility, catastrophe losses are often treated separately from other insurance losses. In fact, many property and casualty insurance companies feature a department or team which focuses solely on

Catastrophe events occur rather infrequently, but upon their occurrence, can lead to colossal losses for insurance companies. Due to their size and volatility, catastrophe losses are often treated separately from other insurance losses. In fact, many property and casualty insurance companies feature a department or team which focuses solely on modeling catastrophes. Setting reserves for catastrophe losses is difficult due to their unpredictable and often long-tailed nature. Determining loss development factors (LDFs) to estimate the ultimate loss amounts for catastrophe events is one method for setting reserves. In an attempt to aid Company XYZ set more accurate reserves, the research conducted focuses on estimating LDFs for catastrophes which have already occurred and have been settled. Furthermore, the research describes the process used to build a linear model in R to estimate LDFs for Company XYZ's closed catastrophe claims from 2001 \u2014 2016. This linear model was used to predict a catastrophe's LDFs based on the age in weeks of the catastrophe during the first year. Back testing was also performed, as was the comparison between the estimated ultimate losses and actual losses. Future research consideration was proposed.
ContributorsSwoverland, Robert Bo (Author) / Milovanovic, Jelena (Thesis director) / Zicarelli, John (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
133225-Thumbnail Image.png
Description
Speech nasality disorders are characterized by abnormal resonance in the nasal cavity. Hypernasal speech is of particular interest, characterized by an inability to prevent improper nasalization of vowels, and poor articulation of plosive and fricative consonants, and can lead to negative communicative and social consequences. It can be associated with

Speech nasality disorders are characterized by abnormal resonance in the nasal cavity. Hypernasal speech is of particular interest, characterized by an inability to prevent improper nasalization of vowels, and poor articulation of plosive and fricative consonants, and can lead to negative communicative and social consequences. It can be associated with a range of conditions, including cleft lip or palate, velopharyngeal dysfunction (a physical or neurological defective closure of the soft palate that regulates resonance between the oral and nasal cavity), dysarthria, or hearing impairment, and can also be an early indicator of developing neurological disorders such as ALS. Hypernasality is typically scored perceptually by a Speech Language Pathologist (SLP). Misdiagnosis could lead to inadequate treatment plans and poor treatment outcomes for a patient. Also, for some applications, particularly screening for early neurological disorders, the use of an SLP is not practical. Hence this work demonstrates a data-driven approach to objective assessment of hypernasality, through the use of Goodness of Pronunciation features. These features capture the overall precision of articulation of speaker on a phoneme-by-phoneme basis, allowing demonstrated models to achieve a Pearson correlation coefficient of 0.88 on low-nasality speakers, the population of most interest for this sort of technique. These results are comparable to milestone methods in this domain.
ContributorsSaxon, Michael Stephen (Author) / Berisha, Visar (Thesis director) / McDaniel, Troy (Committee member) / Electrical Engineering Program (Contributor, Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
135056-Thumbnail Image.png
Description
In this paper, I will show that news headlines of global events can predict changes in stock price by using Machine Learning and eight years of data from r/WorldNews, a popular forum on Reddit.com. My data is confined to the top 25 daily posts on the forum, and due to

In this paper, I will show that news headlines of global events can predict changes in stock price by using Machine Learning and eight years of data from r/WorldNews, a popular forum on Reddit.com. My data is confined to the top 25 daily posts on the forum, and due to the implicit filtering mechanism in the online community, these 25 posts are representative of the most popular news headlines and influential global events of the day. Hence, these posts shine a light on how large-scale social and political events affect the stock market. Using a Logistic Regression and a Naive Bayes classifier, I am able to predict with approximately 85% accuracy a binary change in stock price using term-feature vectors gathered from the news headlines. The accuracy, precision and recall results closely rival the best models in this field of research. In addition to the results, I will also describe the mathematical underpinnings of the two models; preceded by a general investigation of the intersection between the multiple academic disciplines related to this project. These range from social to computer science and from statistics to philosophy. The goal of this additional discussion is to further illustrate the interdisciplinary nature of the research and hopefully inspire a non-monolithic mindset when further investigations are pursued.
Created2016-12
155378-Thumbnail Image.png
Description
To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge

To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge about sources of perturbation to be encoded before deployment, our method is based on experiential learning. Robots learn to associate visual cues with subsequent physical perturbations and contacts. In turn, these extracted visual cues are then used to predict potential future perturbations acting on the robot. To this end, we introduce a novel deep network architecture which combines multiple sub- networks for dealing with robot dynamics and perceptual input from the environment. We present a self-supervised approach for training the system that does not require any labeling of training data. Extensive experiments in a human-robot interaction task show that a robot can learn to predict physical contact by a human interaction partner without any prior information or labeling. Furthermore, the network is able to successfully predict physical contact from either depth stream input or traditional video input or using both modalities as input.
ContributorsSur, Indranil (Author) / Amor, Heni B (Thesis advisor) / Fainekos, Georgios (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2017
155604-Thumbnail Image.png
Description
In recent years, several methods have been proposed to encode sentences into fixed length continuous vectors called sentence representation or sentence embedding. With the recent advancements in various deep learning methods applied in Natural Language Processing (NLP), these representations play a crucial role in tasks such as named entity recognition,

In recent years, several methods have been proposed to encode sentences into fixed length continuous vectors called sentence representation or sentence embedding. With the recent advancements in various deep learning methods applied in Natural Language Processing (NLP), these representations play a crucial role in tasks such as named entity recognition, question answering and sentence classification.

Traditionally, sentence vector representations are learnt from its constituent word representations, also known as word embeddings. Various methods to learn the distributed representation (embedding) of words have been proposed using the notion of Distributional Semantics, i.e. “meaning of a word is characterized by the company it keeps”. However, principle of compositionality states that meaning of a sentence is a function of the meanings of words and also the way they are syntactically combined. In various recent methods for sentence representation, the syntactic information like dependency or relation between words have been largely ignored.

In this work, I have explored the effectiveness of sentence representations that are composed of the representation of both, its constituent words and the relations between the words in a sentence. The word and relation embeddings are learned based on their context. These general-purpose embeddings can also be used as off-the- shelf semantic and syntactic features for various NLP tasks. Similarity Evaluation tasks was performed on two datasets showing the usefulness of the learned word embeddings. Experiments were conducted on three different sentence classification tasks showing that our sentence representations outperform the original word-based sentence representations, when used with the state-of-the-art Neural Network architectures.
ContributorsRath, Trideep (Author) / Baral, Chitta (Thesis advisor) / Li, Baoxin (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2017