Matching Items (101)
Filtering by

Clear all filters

157582-Thumbnail Image.png
Description
The rapid advancements of technology have greatly extended the ubiquitous nature of smartphones acting as a gateway to numerous social media applications. This brings an immense convenience to the users of these applications wishing to stay connected to other individuals through sharing their statuses, posting their opinions, experiences, suggestions, etc

The rapid advancements of technology have greatly extended the ubiquitous nature of smartphones acting as a gateway to numerous social media applications. This brings an immense convenience to the users of these applications wishing to stay connected to other individuals through sharing their statuses, posting their opinions, experiences, suggestions, etc on online social networks (OSNs). Exploring and analyzing this data has a great potential to enable deep and fine-grained insights into the behavior, emotions, and language of individuals in a society. This proposed dissertation focuses on utilizing these online social footprints to research two main threads – 1) Analysis: to study the behavior of individuals online (content analysis) and 2) Synthesis: to build models that influence the behavior of individuals offline (incomplete action models for decision-making).

A large percentage of posts shared online are in an unrestricted natural language format that is meant for human consumption. One of the demanding problems in this context is to leverage and develop approaches to automatically extract important insights from this incessant massive data pool. Efforts in this direction emphasize mining or extracting the wealth of latent information in the data from multiple OSNs independently. The first thread of this dissertation focuses on analytics to investigate the differentiated content-sharing behavior of individuals. The second thread of this dissertation attempts to build decision-making systems using social media data.

The results of the proposed dissertation emphasize the importance of considering multiple data types while interpreting the content shared on OSNs. They highlight the unique ways in which the data and the extracted patterns from text-based platforms or visual-based platforms complement and contrast in terms of their content. The proposed research demonstrated that, in many ways, the results obtained by focusing on either only text or only visual elements of content shared online could lead to biased insights. On the other hand, it also shows the power of a sequential set of patterns that have some sort of precedence relationships and collaboration between humans and automated planners.
ContributorsManikonda, Lydia (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Li, Baoxin (Committee member) / De Choudhury, Munmun (Committee member) / Kamar, Ece (Committee member) / Arizona State University (Publisher)
Created2019
157892-Thumbnail Image.png
Description
Machine learning (ML) and deep neural networks (DNNs) have achieved great success in a variety of application domains, however, despite significant effort to make these networks robust, they remain vulnerable to adversarial attacks in which input that is perceptually indistinguishable from natural data can be erroneously classified with high prediction

Machine learning (ML) and deep neural networks (DNNs) have achieved great success in a variety of application domains, however, despite significant effort to make these networks robust, they remain vulnerable to adversarial attacks in which input that is perceptually indistinguishable from natural data can be erroneously classified with high prediction confidence. Works on defending against adversarial examples can be broadly classified as correcting or detecting, which aim, respectively at negating the effects of the attack and correctly classifying the input, or detecting and rejecting the input as adversarial. In this work, a new approach for detecting adversarial examples is proposed. The approach takes advantage of the robustness of natural images to noise. As noise is added to a natural image, the prediction probability of its true class drops, but the drop is not sudden or precipitous. The same seems to not hold for adversarial examples. In other word, the stress response profile for natural images seems different from that of adversarial examples, which could be detected by their stress response profile. An evaluation of this approach for detecting adversarial examples is performed on the MNIST, CIFAR-10 and ImageNet datasets. Experimental data shows that this approach is effective at detecting some adversarial examples on small scaled simple content images and with little sacrifice on benign accuracy.
ContributorsSun, Lin (Author) / Bazzi, Rida (Thesis advisor) / Li, Baoxin (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2019
157840-Thumbnail Image.png
Description
Over the last decade, deep neural networks also known as deep learning, combined with large databases and specialized hardware for computation, have made major strides in important areas such as computer vision, computational imaging and natural language processing. However, such frameworks currently suffer from some drawbacks. For example, it is

Over the last decade, deep neural networks also known as deep learning, combined with large databases and specialized hardware for computation, have made major strides in important areas such as computer vision, computational imaging and natural language processing. However, such frameworks currently suffer from some drawbacks. For example, it is generally not clear how the architectures are to be designed for different applications, or how the neural networks behave under different input perturbations and it is not easy to make the internal representations and parameters more interpretable. In this dissertation, I propose building constraints into feature maps, parameters and and design of algorithms involving neural networks for applications in low-level vision problems such as compressive imaging and multi-spectral image fusion, and high-level inference problems including activity and face recognition. Depending on the application, such constraints can be used to design architectures which are invariant/robust to certain nuisance factors, more efficient and, in some cases, more interpretable. Through extensive experiments on real-world datasets, I demonstrate these advantages of the proposed methods over conventional frameworks.
ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2019
157653-Thumbnail Image.png
Description
The advent of commercial inexpensive sensors and the advances in information and communication technology (ICT) have brought forth the era of pervasive Quantified-Self. Automatic diet monitoring is one of the most important aspects for Quantified-Self because it is vital for ensuring the well-being of patients suffering from chronic diseases as

The advent of commercial inexpensive sensors and the advances in information and communication technology (ICT) have brought forth the era of pervasive Quantified-Self. Automatic diet monitoring is one of the most important aspects for Quantified-Self because it is vital for ensuring the well-being of patients suffering from chronic diseases as well as for providing a low cost means for maintaining the health for everyone else. Automatic dietary monitoring consists of: a) Determining the type and amount of food intake, and b) Monitoring eating behavior, i.e., time, frequency, and speed of eating. Although there are some existing techniques towards these ends, they suffer from issues of low accuracy and low adherence. To overcome these issues, multiple sensors were utilized because the availability of affordable sensors that can capture the different aspect information has the potential for increasing the available knowledge for Quantified-Self. For a), I envision an intelligent dietary monitoring system that automatically identifies food items by using the knowledge obtained from visible spectrum camera and infrared spectrum camera. This system is able to outperform the state-of-the-art systems for cooked food recognition by 25% while also minimizing user intervention. For b), I propose a novel methodology, IDEA that performs accurate eating action identification within eating episodes with an average F1-score of 0.92. This is an improvement of 0.11 for precision and 0.15 for recall for the worst-case users as compared to the state-of-the-art. IDEA uses only a single wrist-band which includes four sensors and provides feedback on eating speed every 2 minutes without obtaining any manual input from the user.
ContributorsLee, Junghyo (Author) / Gupta, Sandeep K.S. (Thesis advisor) / Banerjee, Ayan (Committee member) / Li, Baoxin (Committee member) / Chiou, Erin (Committee member) / Kudva, Yogish C. (Committee member) / Arizona State University (Publisher)
Created2019
157623-Thumbnail Image.png
Description
Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models

Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models have seen numerous applications in both language and vision community as they capture the information in the modality (English language) efficiently. Inspired by these language models, this work focuses on learning embedding spaces for two visual computing tasks, 1. Image Hashing 2. Zero Shot Learning. The training set was used to learn embedding spaces over which similarity/dissimilarity is measured using several distance metrics like hamming / euclidean / cosine distances. While the above-mentioned language models learn generic word embeddings, in this work task specific embeddings were learnt which can be used for Image Retrieval and Classification separately.

Image Hashing is the task of mapping images to binary codes such that some notion of user-defined similarity is preserved. The first part of this work focuses on designing a new framework that uses the hash-tags associated with web images to learn the binary codes. Such codes can be used in several applications like Image Retrieval and Image Classification. Further, this framework requires no labelled data, leaving it very inexpensive. Results show that the proposed approach surpasses the state-of-art approaches by a significant margin.

Zero-shot classification is the task of classifying the test sample into a new class which was not seen during training. This is possible by establishing a relationship between the training and the testing classes using auxiliary information. In the second part of this thesis, a framework is designed that trains using the handcrafted attribute vectors and word vectors but doesn’t require the expensive attribute vectors during test time. More specifically, an intermediate space is learnt between the word vector space and the image feature space using the hand-crafted attribute vectors. Preliminary results on two zero-shot classification datasets show that this is a promising direction to explore.
ContributorsGattupalli, Jaya Vijetha (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2019
158615-Thumbnail Image.png
Description
In recent years, Convolutional Neural Networks (CNNs) have been widely used in not only the computer vision community but also within the medical imaging community. Specifically, the use of pre-trained CNNs on large-scale datasets (e.g., ImageNet) via transfer learning for a variety of medical imaging applications, has become the de

In recent years, Convolutional Neural Networks (CNNs) have been widely used in not only the computer vision community but also within the medical imaging community. Specifically, the use of pre-trained CNNs on large-scale datasets (e.g., ImageNet) via transfer learning for a variety of medical imaging applications, has become the de facto standard within both communities.

However, to fit the current paradigm, 3D imaging tasks have to be reformulated and solved in 2D, losing rich 3D contextual information. Moreover, pre-trained models on natural images never see any biomedical images and do not have knowledge about anatomical structures present in medical images. To overcome the above limitations, this thesis proposes an image out-painting self-supervised proxy task to develop pre-trained models directly from medical images without utilizing systematic annotations. The idea is to randomly mask an image and train the model to predict the missing region. It is demonstrated that by predicting missing anatomical structures when seeing only parts of the image, the model will learn generic representation yielding better performance on various medical imaging applications via transfer learning.

The extensive experiments demonstrate that the proposed proxy task outperforms training from scratch in six out of seven medical imaging applications covering 2D and 3D classification and segmentation. Moreover, image out-painting proxy task offers competitive performance to state-of-the-art models pre-trained on ImageNet and other self-supervised baselines such as in-painting. Owing to its outstanding performance, out-painting is utilized as one of the self-supervised proxy tasks to provide generic 3D pre-trained models for medical image analysis.
ContributorsSodha, Vatsal Arvindkumar (Author) / Liang, Jianming (Thesis advisor) / Devarakonda, Murthy (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2020
158677-Thumbnail Image.png
Description
Convolutional Neural Network (CNN) has achieved state-of-the-art performance in numerous applications like computer vision, natural language processing, robotics etc. The advancement of High-Performance Computing systems equipped with dedicated hardware accelerators has also paved the way towards the success of compute intensive CNNs. Graphics Processing Units (GPUs), with massive processing capability,

Convolutional Neural Network (CNN) has achieved state-of-the-art performance in numerous applications like computer vision, natural language processing, robotics etc. The advancement of High-Performance Computing systems equipped with dedicated hardware accelerators has also paved the way towards the success of compute intensive CNNs. Graphics Processing Units (GPUs), with massive processing capability, have been of general interest for the acceleration of CNNs. Recently, Field Programmable Gate Arrays (FPGAs) have been promising in CNN acceleration since they offer high performance while also being re-configurable to support the evolution of CNNs. This work focuses on a design methodology to accelerate CNNs on FPGA with low inference latency and high-throughput which are crucial for scenarios like self-driving cars, video surveillance etc. It also includes optimizations which reduce the resource utilization by a large margin with a small degradation in performance thus making the design suitable for low-end FPGA devices as well.

FPGA accelerators often suffer due to the limited main memory bandwidth. Also, highly parallel designs with large resource utilization often end up achieving low operating frequency due to poor routing. This work employs data fetch and buffer mechanisms, designed specifically for the memory access pattern of CNNs, that overlap computation with memory access. This work proposes a novel arrangement of the systolic processing element array to achieve high frequency and consume less resources than the existing works. Also, support has been extended to more complicated CNNs to do video processing. On Intel Arria 10 GX1150, the design operates at a frequency as high as 258MHz and performs single inference of VGG-16 and C3D in 23.5ms and 45.6ms respectively. For VGG-16 and C3D the design offers a throughput of 66.1 and 23.98 inferences/s respectively. This design can outperform other FPGA 2D CNN accelerators by up to 9.7 times and 3D CNN accelerators by up to 2.7 times.
ContributorsRavi, Pravin Kumar (Author) / Zhao, Ming (Thesis advisor) / Li, Baoxin (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)
Created2020
158792-Thumbnail Image.png
Description
Access to real-time situational information including the relative position and motion of surrounding objects is critical for safe and independent travel. Object or obstacle (OO) detection at a distance is primarily a task of the visual system due to the high resolution information the eyes are able to receive from

Access to real-time situational information including the relative position and motion of surrounding objects is critical for safe and independent travel. Object or obstacle (OO) detection at a distance is primarily a task of the visual system due to the high resolution information the eyes are able to receive from afar. As a sensory organ in particular, the eyes have an unparalleled ability to adjust to varying degrees of light, color, and distance. Therefore, in the case of a non-visual traveler, someone who is blind or low vision, access to visual information is unattainable if it is positioned beyond the reach of the preferred mobility device or outside the path of travel. Although, the area of assistive technology in terms of electronic travel aids (ETA’s) has received considerable attention over the last two decades; surprisingly, the field has seen little work in the area focused on augmenting rather than replacing current non-visual travel techniques, methods, and tools. Consequently, this work describes the design of an intuitive tactile language and series of wearable tactile interfaces (the Haptic Chair, HaptWrap, and HapBack) to deliver real-time spatiotemporal data. The overall intuitiveness of the haptic mappings conveyed through the tactile interfaces are evaluated using a combination of absolute identification accuracy of a series of patterns and subjective feedback through post-experiment surveys. Two types of spatiotemporal representations are considered: static patterns representing object location at a single time instance, and dynamic patterns, added in the HaptWrap, which represent object movement over a time interval. Results support the viability of multi-dimensional haptics applied to the body to yield an intuitive understanding of dynamic interactions occurring around the navigator during travel. Lastly, it is important to point out that the guiding principle of this work centered on providing the navigator with spatial knowledge otherwise unattainable through current mobility techniques, methods, and tools, thus, providing the \emph{navigator} with the information necessary to make informed navigation decisions independently, at a distance.
ContributorsDuarte, Bryan Joiner (Author) / McDaniel, Troy (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2020
158817-Thumbnail Image.png
Description
Over the past decade, machine learning research has made great strides and significant impact in several fields. Its success is greatly attributed to the development of effective machine learning algorithms like deep neural networks (a.k.a. deep learning), availability of large-scale databases and access to specialized hardware like Graphic Processing Units.

Over the past decade, machine learning research has made great strides and significant impact in several fields. Its success is greatly attributed to the development of effective machine learning algorithms like deep neural networks (a.k.a. deep learning), availability of large-scale databases and access to specialized hardware like Graphic Processing Units. When designing and training machine learning systems, researchers often assume access to large quantities of data that capture different possible variations. Variations in the data is needed to incorporate desired invariance and robustness properties in the machine learning system, especially in the case of deep learning algorithms. However, it is very difficult to gather such data in a real-world setting. For example, in certain medical/healthcare applications, it is very challenging to have access to data from all possible scenarios or with the necessary amount of variations as required to train the system. Additionally, the over-parameterized and unconstrained nature of deep neural networks can cause them to be poorly trained and in many cases over-confident which, in turn, can hamper their reliability and generalizability. This dissertation is a compendium of my research efforts to address the above challenges. I propose building invariant feature representations by wedding concepts from topological data analysis and Riemannian geometry, that automatically incorporate the desired invariance properties for different computer vision applications. I discuss how deep learning can be used to address some of the common challenges faced when working with topological data analysis methods. I describe alternative learning strategies based on unsupervised learning and transfer learning to address issues like dataset shifts and limited training data. Finally, I discuss my preliminary work on applying simple orthogonal constraints on deep learning feature representations to help develop more reliable and better calibrated models.
ContributorsSom, Anirudh (Author) / Turaga, Pavan (Thesis advisor) / Krishnamurthi, Narayanan (Committee member) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2020
158774-Thumbnail Image.png
Description
Technological advances have allowed for the assimilation of a variety of data, driving a shift away from the use of simpler and constrained patterns to more complex and diverse patterns in retrieval and analysis of such data. This shift has inundated the conventional techniques and has stressed the need for

Technological advances have allowed for the assimilation of a variety of data, driving a shift away from the use of simpler and constrained patterns to more complex and diverse patterns in retrieval and analysis of such data. This shift has inundated the conventional techniques and has stressed the need for intelligent mechanisms that can model the complex patterns in the data. Deep neural networks have shown some success at capturing complex patterns, including the so-called attentioned networks, have significant shortcomings in distinguishing what is important in data from what is noise. This dissertation observes that the traditional neural networks primarily rely solely on gradient-based learning to model deep features maps while ignoring the key insight in the data that can be leveraged as complementary information to help learn an accurate model. In particular, this dissertation shows that the localized multi-scale features (captured implicitly or explicitly) can be leveraged to help improve model performance as these features capture salient informative points in the data.

This dissertation focuses on “working with the data, not just on data”, i.e. leveraging feature saliency through pre-training, in-training, and post-training analysis of the data. In particular, non-neural localized multi-scale feature extraction, in images and time series, are relatively cheap to obtain and can provide a rough overview of the patterns in the data. Furthermore, localized features coupled with deep features can help learn a high performing network. A pre-training analysis of sizes, complexities, and distribution of these localized features can help intelligently allocate a user-provided kernel budget in the network as a single-shot hyper-parameter search. Additionally, these localized features can be used as a secondary input modality to the network for cross-attention. Retraining pre-trained networks can be a costly process, yet, a post-training analysis of model inferences can allow for learning the importance of individual network parameters to the model inferences thus facilitating a retraining-free network sparsification with minimal impact on the model performance. Furthermore, effective in-training analysis of the intermediate features in the network help learn the importance of individual intermediate features (neural attention) and this analysis can be achieved through simulating local-extrema detection or learning features simultaneously and understanding their co-occurrences. In summary, this dissertation argues and establishes that, if appropriately leveraged, localized features and their feature saliency can help learn high-accurate, yet cheaper networks.
ContributorsGarg, Yash (Author) / Candan, K. Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Sapino, Maria Luisa (Committee member) / Arizona State University (Publisher)
Created2020