Search Content

Exploring Deep Learning Vulnerability: Attack and Defense

Description

Deep neural networks have been shown to be vulnerable to adversarial attacks. Typical attack strategies alter authentic data subtly so as to obtain adversarial samples that resemble the original but otherwise would cause a network's misbehavior such as a high misclassification rate. Various attack approaches have been reported, with some…

Deep neural networks have been shown to be vulnerable to adversarial attacks. Typical attack strategies alter authentic data subtly so as to obtain adversarial samples that resemble the original but otherwise would cause a network's misbehavior such as a high misclassification rate. Various attack approaches have been reported, with some showing state-of-the-art performance in attacking certain networks. In the meanwhile, many defense mechanisms have been proposed in the literature, some of which are quite effective for guarding against typical attacks. Yet, most of these attacks fail when the targeted network modifies its architecture or uses another set of parameters and vice versa. Moreover, the emerging of more advanced deep neural networks, such as generative adversarial networks (GANs), has made the situation more complicated and the game between the attack and defense is continuing. This dissertation aims at exploring the venerability of the deep neural networks by investigating the mechanisms behind the success/failure of the existing attack and defense approaches. Therefore, several deep learning-based approaches have been proposed to study the problem from different perspectives. First, I developed an adversarial attack approach by exploring the unlearned region of a typical deep neural network which is often over-parameterized. Second, I proposed an end-to-end learning framework to analyze the images generated by different GAN models. Third, I developed a defense mechanism that can secure the deep neural network against adversarial attacks with a defense layer consisting of a set of orthogonal kernels. Substantial experiments are conducted to unveil the potential factors that contribute to attack/defense effectiveness. This dissertation also concludes with a discussion of possible future works of achieving a robust deep neural network.

ContributorsDing, Yuzhen (Author) / Li, Baoxin (Thesis advisor) / Davulcu, Hasan (Committee member) / Venkateswara, Hemanth Kumar Demakethepalli (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2022

Novel Deep Learning Algorithms for Enhancing Inference in Cross-Modal Applications

Description

With the exponential growth of multi-modal data in the field of computer vision, the ability to do inference effectively among multiple modalities—such as visual, textual, and auditory data—shows significant opportunities. The rapid development of cross-modal applications such as retrieval and association is primarily attributed to their ability to bridge the…

With the exponential growth of multi-modal data in the field of computer vision, the ability to do inference effectively among multiple modalities—such as visual, textual, and auditory data—shows significant opportunities. The rapid development of cross-modal applications such as retrieval and association is primarily attributed to their ability to bridge the gap between different modalities of data. However, the current mainstream cross-modal methods always heavily rely on the availability of fully annotated paired data, presenting a significant challenge due to the scarcity of precisely matched datasets in real-world scenarios. In response to this bottleneck, several sophisticated deep learning algorithms are designed to substantially improve the inference capabilities across a broad spectrum of cross-modal applications. This dissertation introduces novel deep learning algorithms aimed at enhancing inference capabilities in cross-modal applications, which take four primary aspects. Firstly, it introduces the algorithm for image retrieval by learning hashing codes. This algorithm only utilizes the other modality data in weakly supervised tags format rather than the supervised label. Secondly, it designs a novel framework for learning the joint embeddings of images and texts for the cross-modal retrieval tasks. It efficiently learns the binary codes from the continuous CLIP feature space and can even deliver competitive performance compared with the results from non-hashing methods. Thirdly, it conducts a method to learn the fragment-level embeddings that capture fine-grained cross-modal association in images and texts. This method uses the fragment proposals in an unsupervised manner. Lastly, this dissertation also outlines the algorithm to enhance the mask-text association ability of pre-trained semantic segmentation models with zero examples provided. Extensive future plans to further improve this algorithm for semantic segmentation tasks will be discussed.

ContributorsZhuo, Yaoxin (Author) / Li, Baoxin (Thesis advisor) / Wu, Teresa (Committee member) / Davulcu, Hasan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2024

Towards Addressing Key Visual Processing Challenges in Social Media Computing

Description

Visual processing in social media platforms is a key step in gathering and understanding information in the era of Internet and big data. Online data is rich in content, but its processing faces many challenges including: varying scales for objects of interest, unreliable and/or missing labels, the inadequacy of single…

Visual processing in social media platforms is a key step in gathering and understanding information in the era of Internet and big data. Online data is rich in content, but its processing faces many challenges including: varying scales for objects of interest, unreliable and/or missing labels, the inadequacy of single modal data and difficulty in analyzing high dimensional data. Towards facilitating the processing and understanding of online data, this dissertation primarily focuses on three challenges that I feel are of great practical importance: handling scale differences in computer vision tasks, such as facial component detection and face retrieval, developing efficient classifiers using partially labeled data and noisy data, and employing multi-modal models and feature selection to improve multi-view data analysis. For the first challenge, I propose a scale-insensitive algorithm to expedite and accurately detect facial landmarks. For the second challenge, I propose two algorithms that can be used to learn from partially labeled data and noisy data respectively. For the third challenge, I propose a new framework that incorporates feature selection modules into LDA models.

ContributorsZhou, Xu (Author) / Li, Baoxin (Thesis advisor) / Hsiao, Sharon (Committee member) / Davulcu, Hasan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Towards Supporting Visual Question and Answering Applications

Description

Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and…

Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and to generate a textual answer to the input question. There are two key research problems in VQA: image understanding and the question answering. My research mainly focuses on developing solutions to support solving these two problems.

In image understanding, one important research area is semantic segmentation, which takes images as input and output the label of each pixel. As much manual work is needed to label a useful training set, typical training sets for such supervised approaches are always small. There are also approaches with relaxed labeling requirement, called weakly supervised semantic segmentation, where only image-level labels are needed. With the development of social media, there are more and more user-uploaded images available

on-line. Such user-generated content often comes with labels like tags and may be coarsely labelled by various tools. To use these information for computer vision tasks, I propose a new graphic model by considering the neighborhood information and their interactions to obtain the pixel-level labels of the images with only incomplete image-level labels. The method was evaluated on both synthetic and real images.

In question answering, my research centers on best answer prediction, which addressed two main research topics: feature design and model construction. In the feature design part, most existing work discussed how to design effective features for answer quality / best answer prediction. However, little work mentioned how to design features by considering the relationship between answers of one given question. To fill this research gap, I designed new features to help improve the prediction performance. In the modeling part, to employ the structure of the feature space, I proposed an innovative learning-to-rank model by considering the hierarchical lasso. Experiments with comparison with the state-of-the-art in the best answer prediction literature have confirmed

that the proposed methods are effective and suitable for solving the research task.

ContributorsTian, Qiongjie (Author) / Li, Baoxin (Thesis advisor) / Tong, Hanghang (Committee member) / Davulcu, Hasan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2017

Towards Learning Representations in Visual Computing Tasks

Description

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle…

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.

The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.

In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.

ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2017

Novel Image Representations and Learning Tasks

Description

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is…

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is generalization of supervised learning, is one

example of task learning that is discussed. In particular, a novel non-parametric k-

NN-based multiple-instance learning is proposed, which is shown to outperform other

existing approaches. This solution is applied to a diabetic retinopathy pathology

detection problem eectively.

In cases of representation learning, generality of neural features are investigated

rst. This investigation leads to some critical understanding and results in feature

generality among datasets. The possibility of learning from a mentor network instead

of from labels is then investigated. Distillation of dark knowledge is used to eciently

mentor a small network from a pre-trained large mentor network. These studies help

in understanding representation learning with smaller and compressed networks.

ContributorsVenkatesan, Ragav (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2017

Robust and Generalizable Machine Learning through Generative Models,Adversarial Training, and Physics Priors

Description

Machine learning has demonstrated great potential across a wide range of applications such as computer vision, robotics, speech recognition, drug discovery, material science, and physics simulation. Despite its current success, however, there are still two major challenges for machine learning algorithms: limited robustness and generalizability.

The robustness of a neural network…

Machine learning has demonstrated great potential across a wide range of applications such as computer vision, robotics, speech recognition, drug discovery, material science, and physics simulation. Despite its current success, however, there are still two major challenges for machine learning algorithms: limited robustness and generalizability.

The robustness of a neural network is defined as the stability of the network output under small input perturbations. It has been shown that neural networks are very sensitive to input perturbations, and the prediction from convolutional neural networks can be totally different for input images that are visually indistinguishable to human eyes. Based on such property, hackers can reversely engineer the input to trick machine learning systems in targeted ways. These adversarial attacks have shown to be surprisingly effective, which has raised serious concerns over safety-critical applications like autonomous driving. In the meantime, many established defense mechanisms have shown to be vulnerable under more advanced attacks proposed later, and how to improve the robustness of neural networks is still an open question.

The generalizability of neural networks refers to the ability of networks to perform well on unseen data rather than just the data that they were trained on. Neural networks often fail to carry out reliable generalizations when the testing data is of different distribution compared with the training one, which will make autonomous driving systems risky under new environment. The generalizability of neural networks can also be limited whenever there is a scarcity of training data, while it can be expensive to acquire large datasets either experimentally or numerically for engineering applications, such as material and chemical design.

In this dissertation, we are thus motivated to improve the robustness and generalizability of neural networks. Firstly, unlike traditional bottom-up classifiers, we use a pre-trained generative model to perform top-down reasoning and infer the label information. The proposed generative classifier has shown to be promising in handling input distribution shifts. Secondly, we focus on improving the network robustness and propose an extension to adversarial training by considering the transformation invariance. Proposed method improves the robustness over state-of-the-art methods by 2.5% on MNIST and 3.7% on CIFAR-10. Thirdly, we focus on designing networks that generalize well at predicting physics response. Our physics prior knowledge is used to guide the designing of the network architecture, which enables efficient learning and inference. Proposed network is able to generalize well even when it is trained with a single image pair.

ContributorsYao, Houpu (Author) / Ren, Yi (Thesis advisor) / Liu, Yongming (Committee member) / Li, Baoxin (Committee member) / Yang, Yezhou (Committee member) / Marvi, Hamidreza (Committee member) / Arizona State University (Publisher)

Created2019

Three Facets of Online Political Networks: Communities, Antagonisms, and Polarization

Description

Millions of users leave digital traces of their political engagements on social media platforms every day. Users form networks of interactions, produce textual content, like and share each others' content. This creates an invaluable opportunity to better understand the political engagements of internet users. In this proposal, I present three…

Millions of users leave digital traces of their political engagements on social media platforms every day. Users form networks of interactions, produce textual content, like and share each others' content. This creates an invaluable opportunity to better understand the political engagements of internet users. In this proposal, I present three algorithmic solutions to three facets of online political networks; namely, detection of communities, antagonisms and the impact of certain types of accounts on political polarization. First, I develop a multi-view community detection algorithm to find politically pure communities. I find that word usage among other content types (i.e. hashtags, URLs) complement user interactions the best in accurately detecting communities.

Second, I focus on detecting negative linkages between politically motivated social media users. Major social media platforms do not facilitate their users with built-in negative interaction options. However, many political network analysis tasks rely on not only positive but also negative linkages. Here, I present the SocLSFact framework to detect negative linkages among social media users. It utilizes three pieces of information; sentiment cues of textual interactions, positive interactions, and socially balanced triads. I evaluate the contribution of each three aspects in negative link detection performance on multiple tasks.

Third, I propose an experimental setup that quantifies the polarization impact of automated accounts on Twitter retweet networks. I focus on a dataset of tragic Parkland shooting event and its aftermath. I show that when automated accounts are removed from the retweet network the network polarization decrease significantly, while a same number of accounts to the automated accounts are removed randomly the difference is not significant. I also find that prominent predictors of engagement of automatically generated content is not very different than what previous studies point out in general engaging content on social media. Last but not least, I identify accounts which self-disclose their automated nature in their profile by using expressions such as bot, chat-bot, or robot. I find that human engagement to self-disclosing accounts compared to non-disclosing automated accounts is much smaller. This observational finding can motivate further efforts into automated account detection research to prevent their unintended impact.

ContributorsOzer, Mert (Author) / Davulcu, Hasan (Thesis advisor) / Liu, Huan (Committee member) / Sen, Arunabha (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2019

ASU Electronic Theses and Dissertations

Filtering by

Exploring Deep Learning Vulnerability: Attack and Defense

Novel Deep Learning Algorithms for Enhancing Inference in Cross-Modal Applications

Towards Addressing Key Visual Processing Challenges in Social Media Computing

Towards Supporting Visual Question and Answering Applications

Towards Learning Representations in Visual Computing Tasks

Novel Image Representations and Learning Tasks

Robust and Generalizable Machine Learning through Generative Models,Adversarial Training, and Physics Priors

Three Facets of Online Political Networks: Communities, Antagonisms, and Polarization