Search Content

Interpretable Features for Distinguishing Machine Generated News Articles.

Description

Social media has become a primary means of communication and a prominent source of information about day-to-day happenings in the contemporary world. The rise in the popularity of social media platforms in recent decades has empowered people with an unprecedented level of connectivity. Despite the benefits social media offers, it…

Social media has become a primary means of communication and a prominent source of information about day-to-day happenings in the contemporary world. The rise in the popularity of social media platforms in recent decades has empowered people with an unprecedented level of connectivity. Despite the benefits social media offers, it also comes with disadvantages. A significant downside to staying connected via social media is the susceptibility to falsified information or Fake News. Easy accessibility to social media and lack of truth verification tools favored the miscreants on online platforms to spread false propaganda at scale, ensuing chaos. The spread of misinformation on these platforms ultimately leads to mistrust and social unrest. Consequently, there is a need to counter the spread of misinformation which could otherwise have a detrimental impact on society. A notable example of such a case is the 2019 Covid pandemic misinformation spread, where coordinated misinformation campaigns misled the public on vaccination and health safety. The advancements in Natural Language Processing gave rise to sophisticated language generation models that can generate realistic-looking texts. Although the current Fake News generation process is manual, it is just a matter of time before this process gets automated at scale and generates Neural Fake News using language generation models like the Bidirectional Encoder Representations from Transformers (BERT) and the third generation Generative Pre-trained Transformer (GPT-3). Moreover, given that the current state of fact verification is manual, it calls for an urgent need to develop reliable automated detection tools to counter Neural Fake News generated at scale. Existing tools demonstrate state-of-the-art performance in detecting Neural Fake News but exhibit a black box behavior. Incorporating explainability into the Neural Fake News classification task will build trust and acceptance amongst different communities and decision-makers. Therefore, the current study proposes a new set of interpretable discriminatory features. These features capture statistical and stylistic idiosyncrasies, achieving an accuracy of 82% on Neural Fake News classification. Furthermore, this research investigates essential dependency relations contributing to the classification process. Lastly, the study concludes by providing directions for future research in building explainable tools for Neural Fake News detection.

ContributorsKarumuri, Ravi Teja (Author) / Liu, Huan (Thesis advisor) / Corman, Steven (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2022

Learning Complex Behaviors from Simple Ones: An analysis of Behavior-based Modular Design for RL Agents

Description

Traditional Reinforcement Learning (RL) assumes to learn policies with respect to reward available from the environment but sometimes learning in a complex domain requires wisdom which comes from a wide range of experience. In behavior based robotics, it is observed that a complex behavior can be described by a combination…

Traditional Reinforcement Learning (RL) assumes to learn policies with respect to reward available from the environment but sometimes learning in a complex domain requires wisdom which comes from a wide range of experience. In behavior based robotics, it is observed that a complex behavior can be described by a combination of simpler behaviors. It is tempting to apply similar idea such that simpler behaviors can be combined in a meaningful way to tailor the complex combination. Such an approach would enable faster learning and modular design of behaviors. Complex behaviors can be combined with other behaviors to create even more advanced behaviors resulting in a rich set of possibilities. Similar to RL, combined behavior can keep evolving by interacting with the environment. The requirement of this method is to specify a reasonable set of simple behaviors. In this research, I present an algorithm that aims at combining behavior such that the resulting behavior has characteristics of each individual behavior. This approach has been inspired by behavior based robotics, such as the subsumption architecture and motor schema-based design. The combination algorithm outputs n weights to combine behaviors linearly. The weights are state dependent and change dynamically at every step in an episode. This idea is tested on discrete and continuous environments like OpenAI’s “Lunar Lander” and “Biped Walker”. Results are compared with related domains like Multi-objective RL, Hierarchical RL, Transfer learning, and basic RL. It is observed that the combination of behaviors is a novel way of learning which helps the agent achieve required characteristics. A combination is learned for a given state and so the agent is able to learn faster in an efficient manner compared to other similar approaches. Agent beautifully demonstrates characteristics of multiple behaviors which helps the agent to learn and adapt to the environment. Future directions are also suggested as possible extensions to this research.

ContributorsVora, Kevin Jatin (Author) / Zhang, Yu (Thesis advisor) / Yang, Yezhou (Committee member) / Praharaj, Sarbeswar (Committee member) / Arizona State University (Publisher)

Created2021

Joint Exome and Metabolome Analysis in Individuals With Dyslexia: Evidence for Associated Dysregulations of Olfactory Perception and Autoimmune Functions

Description

Dyslexia is a learning disability that negatively affects reading, writing, and spelling development at the word level in 5%-9% of children. The phenotype is variable and complex, involving several potential cognitive and physical concomitants such as sensory dysregulation and immunodeficiencies. The biological pathogenesis is not well-understood. Toward a better understanding…

Dyslexia is a learning disability that negatively affects reading, writing, and spelling development at the word level in 5%-9% of children. The phenotype is variable and complex, involving several potential cognitive and physical concomitants such as sensory dysregulation and immunodeficiencies. The biological pathogenesis is not well-understood. Toward a better understanding of the biological drivers of dyslexia, we conducted the first joint exome and metabolome investigation in a pilot sample of 30 participants with dyslexia and 13 controls. In the metabolite analysis, eight metabolites of interest emerged (pyridoxine, kynurenic acid, citraconic acid, phosphocreatine, hippuric acid, xylitol, 2-deoxyuridine, and acetylcysteine). A metabolite-metabolite interaction analysis identified Krebs cycle intermediates that may be implicated in the development of dyslexia. Gene ontology analysis based on exome variants resulted in several pathways of interest, including the sensory perception of smell (olfactory) and immune system-related responses. In the joint exome and metabolite analysis, the olfactory transduction pathway emerged as the primary pathway of interest. Although the olfactory transduction and Krebs cycle pathways have not previously been described in the dyslexia literature, these pathways have been implicated in other neurodevelopmental disorders including autism spectrum disorder and obsessive-compulsive disorder, suggesting the possibility of these pathways playing a role in dyslexia as well. Immune system response pathways, on the other hand, have been implicated in both dyslexia and other neurodevelopmental disorders.

ContributorsNandakumar, Rohit (Author) / Dinu, Valentin (Thesis director) / Peter, Beate (Committee member) / Barrett, The Honors College (Contributor) / College of Health Solutions (Contributor)

Created2022-05

Learning Analytics and Behavior of Distributed Self-assessment and Reflections in Programming Problem Solving

Description

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose…

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose to utilize these opportunities to self-assess their learning progress and practice their skill. My objective in this thesis is to understand to what extent self-assess process can impact novice programmers learning and what advanced learning technologies can I provide to enhance the learner’s outcome and the progress. In this dissertation, I conducted a series of studies to investigate learning analytics and students’ behaviors in working on self-assessments and reflection opportunities. To enable this objective, I designed a personalized learning platform named QuizIT that provides daily quizzes to support learners in the computer science domain. QuizIT adopts an Open Social Student Model (OSSM) that supports personalized learning and serves as a self-assessment system. It aims to ignite self-regulating behavior and engage students in the self-assessment and reflective procedure. I designed and integrated the personalized practice recommender to the platform to investigate the self-assessment process. I also evaluated the self-assessment behavioral trails as a predictor to the students’ performance. The statistical indicators suggested that the distributed reflections were associated with the learner's performance. I proceeded to address whether distributed reflections enable self-regulating behavior and lead to better learning in CS introductory courses. From the student interactions with the system, I found distinct behavioral patterns that showed early signs of the learners' performance trajectory. The utilization of the personalized recommender improved the student’s engagement and performance in the self-assessment procedure. When I focused on enhancing reflections impact during self-assessment sessions through weekly opportunities, the learners in the CS domain showed better self-regulating learning behavior when utilizing those opportunities. The weekly reflections provided by the learners were able to capture more reflective features than the daily opportunities. Overall, this dissertation demonstrates the effectiveness of the learning technologies, including adaptive recommender and reflection, to support novice programming learners and their self-assessing processes.

ContributorsAlzaid, Mohammed (Author) / Hsiao, Ihan (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / VanLehn, Kurt (Committee member) / Nelson, Brian (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2022

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

Measuring and Enhancing Users' Privacy in Machine Learning

Description

With the bloom of machine learning, a massive amount of data has been used in the training process of machine learning. A tremendous amount of this data is user-generated data which allows the machine learning models to produce accurate results and personalized services. Nevertheless, I recognize the importance of preserving…

With the bloom of machine learning, a massive amount of data has been used in the training process of machine learning. A tremendous amount of this data is user-generated data which allows the machine learning models to produce accurate results and personalized services. Nevertheless, I recognize the importance of preserving the privacy of individuals by protecting their information in the training process. One privacy attack that affects individuals is the private attribute inference attack. The private attribute attack is the process of inferring individuals' information that they do not explicitly reveal, such as age, gender, location, and occupation. The impacts of this go beyond knowing the information as individuals face potential risks. Furthermore, some applications need sensitive data to train the models and predict helpful insights and figuring out how to build privacy-preserving machine learning models will increase the capabilities of these applications.However, improving privacy affects the data utility which leads to a dilemma between privacy and utility. The utility of the data is measured by the quality of the data for different tasks. This trade-off between privacy and utility needs to be maintained to satisfy the privacy requirement and the result quality. To achieve more scalable privacy-preserving machine learning models, I investigate the privacy risks that affect individuals' private information in distributed machine learning. Even though the distributed machine learning has been driven by privacy concerns, privacy issues have been proposed in the literature which threaten individuals' privacy. In this dissertation, I investigate how to measure and protect individuals' privacy in centralized and distributed machine learning models. First, a privacy-preserving text representation learning is proposed to protect users' privacy that can be revealed from user generated data. Second, a novel privacy-preserving text classification for split learning is presented to improve users' privacy and retain high utility by defending against private attribute inference attacks.

ContributorsAlnasser, Walaa (Author) / Liu, Huan (Thesis advisor) / Davulcu, Hasan (Committee member) / Shu, Kai (Committee member) / Bao, Tiffany (Committee member) / Arizona State University (Publisher)

Created2022

Investigating the Role of Silent Users on Social Media

Description

Social media platforms provide a rich environment for analyzing user behavior. Recently, deep learning-based methods have been a mainstream approach for social media analysis models involving complex patterns. However, these methods are susceptible to biases in the training data, such as participation inequality. Basically, a mere 1% of users generate…

Social media platforms provide a rich environment for analyzing user behavior. Recently, deep learning-based methods have been a mainstream approach for social media analysis models involving complex patterns. However, these methods are susceptible to biases in the training data, such as participation inequality. Basically, a mere 1% of users generate the majority of the content on social networking sites, while the remaining users, though engaged to varying degrees, tend to be less active in content creation and largely silent. These silent users consume and listen to information that is propagated on the platform.However, their voice, attitude, and interests are not reflected in the online content, making the decision of the current methods predisposed towards the opinion of the active users. So models can mistake the loudest users for the majority. To make the silent majority heard is to reveal the true landscape of the platform. In this dissertation, to compensate for this bias in the data, which is related to user-level data scarcity, I introduce three pieces of research work. Two of these proposed solutions deal with the data on hand while the other tries to augment the current data. Specifically, the first proposed approach modifies the weight of users' activity/interaction in the input space, while the second approach involves re-weighting the loss based on the users' activity levels during the downstream task training. Lastly, the third approach uses large language models (LLMs) and learns the user's writing behavior to expand the current data. In other words, by utilizing LLMs as a sophisticated knowledge base, this method aims to augment the silent user's data.

ContributorsKarami, Mansooreh (Author) / Liu, Huan (Thesis advisor) / Sen, Arunabha (Committee member) / Davulcu, Hasan (Committee member) / Mancenido, Michelle V. (Committee member) / Arizona State University (Publisher)

Created2023

Unveiling Cellular Heterogeneity, Genetic Regulation, and Protein Trafficking Dynamics Via Novel Integrative Multi-Omic Approaches

Description

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological…

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological mechanisms, integrative multi-omics analysis is needed that can provide a comprehensive picture of the disease. Here, I present three novel approaches to multi-omics integrative analysis. I introduce a single-cell integrative clustering method, which leverages multi-omics to enhance the resolution of cell subpopulations. Applied to a Cellular Indexing of Transcriptomes and Epitopes (CITE-Seq) dataset from human Acute Myeloid Lymphoma (AML) and control samples, this approach unveiled nuanced cell populations that otherwise remain elusive. I then shift the focus to a computational framework to discover transcriptional regulatory trios in which a transcription factor binds to a regulatory element harboring a genetic variant and subsequently differentially regulates the transcription level of a target gene. Applied to whole-exome, whole-genome, and transcriptome data of multiple myeloma samples, this approach discovered synergetic cis-acting and trans-acting regulatory elements associated with tumorigenesis. The next part of this work introduces a novel methodology that leverages the transcriptome and surface protein data at the single-cell level produced by CITE-Seq to model the intracellular protein trafficking process. Applied to COVID-19 samples, this approach revealed dysregulated protein trafficking associated with the severity of the infection.

ContributorsMudappathi, Rekha (Author) / Liu, Li (Thesis advisor) / Dinu, Valentin (Committee member) / Sun, Zhifu (Committee member) / Arizona State University (Publisher)

Created2023

Identifying Sources of Anomalies in Complex Networks

Description

The problem of monitoring complex networks for the detection of anomalous behavior is well known. Sensors are usually deployed for the purpose of monitoring these networks for anomalies and Sensor Placement Optimization (SPO) is the problem of determining where these sensors should be placed (deployed) in the network. Prior works…

The problem of monitoring complex networks for the detection of anomalous behavior is well known. Sensors are usually deployed for the purpose of monitoring these networks for anomalies and Sensor Placement Optimization (SPO) is the problem of determining where these sensors should be placed (deployed) in the network. Prior works have utilized the well known Set Cover formulation in order to determine the locations where sensors should be placed in the network, so that anomalies can be effectively detected. However, such works cannot be utilized to address the problem when the objective is to not only detect the presence of anomalies, but also to detect (distinguish) the source(s) of the detected anomalies, i.e., uniquely monitoring the network. In this dissertation, I attempt to fill in this gap by utilizing the mathematical concept of Identifying Codes and illustrating how it not only can overcome the aforementioned limitation, but also it, and its variants, can be utilized to monitor complex networks modeled from multiple domains. Over the course of this dissertation, I make key contributions which further enhance the efficacy and applicability of Identifying Codes as a monitoring strategy. First, I show how Identifying Codes are superior to not only the Set Cover formulation but also standard graph centrality metrics, for the purpose of uniquely monitoring complex networks. Second, I study novel problems such as the budget constrained Identifying Code, scalable Identifying Code, robust Identifying Code etc., and present algorithms and results for the respective problems. Third, I present useful Identifying Code results for restricted graph classes such as Unit Interval Bigraphs and Unit Disc Bigraphs. Finally, I show the universality of Identifying Codes by applying it to multiple domains.

ContributorsBasu, Kaustav (Author) / Sen, Arunabha (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2022

Exploring Deep Learning Vulnerability: Attack and Defense

Description

Deep neural networks have been shown to be vulnerable to adversarial attacks. Typical attack strategies alter authentic data subtly so as to obtain adversarial samples that resemble the original but otherwise would cause a network's misbehavior such as a high misclassification rate. Various attack approaches have been reported, with some…

Deep neural networks have been shown to be vulnerable to adversarial attacks. Typical attack strategies alter authentic data subtly so as to obtain adversarial samples that resemble the original but otherwise would cause a network's misbehavior such as a high misclassification rate. Various attack approaches have been reported, with some showing state-of-the-art performance in attacking certain networks. In the meanwhile, many defense mechanisms have been proposed in the literature, some of which are quite effective for guarding against typical attacks. Yet, most of these attacks fail when the targeted network modifies its architecture or uses another set of parameters and vice versa. Moreover, the emerging of more advanced deep neural networks, such as generative adversarial networks (GANs), has made the situation more complicated and the game between the attack and defense is continuing. This dissertation aims at exploring the venerability of the deep neural networks by investigating the mechanisms behind the success/failure of the existing attack and defense approaches. Therefore, several deep learning-based approaches have been proposed to study the problem from different perspectives. First, I developed an adversarial attack approach by exploring the unlearned region of a typical deep neural network which is often over-parameterized. Second, I proposed an end-to-end learning framework to analyze the images generated by different GAN models. Third, I developed a defense mechanism that can secure the deep neural network against adversarial attacks with a defense layer consisting of a set of orthogonal kernels. Substantial experiments are conducted to unveil the potential factors that contribute to attack/defense effectiveness. This dissertation also concludes with a discussion of possible future works of achieving a robust deep neural network.

ContributorsDing, Yuzhen (Author) / Li, Baoxin (Thesis advisor) / Davulcu, Hasan (Committee member) / Venkateswara, Hemanth Kumar Demakethepalli (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2022