Search Content

BioEve: user interface framework bridging IE and IR

Description

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to…

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search
avigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm.

ContributorsKanwar, Pradeep (Author) / Davulcu, Hasan (Thesis advisor) / Dinu, Valentin (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2010

Coffee Cup Chaos

Description

Complex human controls is a topic of much interest in the fields of robotics, manufacturing, space exploration and many others. Even simple tasks that humans perform with ease can be extremely complicated when observed from a controls and complex systems perspective. One such simple task is that of a human…

Complex human controls is a topic of much interest in the fields of robotics, manufacturing, space exploration and many others. Even simple tasks that humans perform with ease can be extremely complicated when observed from a controls and complex systems perspective. One such simple task is that of a human carrying and moving a coffee cup. Though this may be a mundane task for humans, when this task is modelled and analyzed, the system may be quite chaotic in nature. Understanding such systems is key to the development robots and autonomous systems that can perform these tasks themselves.

The coffee cup system can be simplified and modeled by a cart-and-pendulum system. Bazzi et al. and Maurice et al. present two different cart-and-pendulum systems to represent the coffee cup system [1],[2]. The purpose of this project was to build upon these systems and to gain a better understanding of the coffee cup system and to determine where chaos existed within the system. The honors thesis team first worked with their senior design group to develop a mathematical model for the cart-and-pendulum system based on the Bazzi and Maurice papers [1],[2]. This system was analyzed and then built upon by the honors thesis team to build a cart-and-two-pendulum model to represent the coffee cup system more accurately.

Analysis of the single pendulum model showed that there exists a low frequency region where the pendulum and the cart remain in phase with each other and a high frequency region where the cart and pendulum have a π phase difference between them. The transition point of the low and high frequency region is determined by the resonant frequency of the pendulum. The analysis of the two-pendulum system also confirmed this result and revealed that differences in length between the pendulum cause the pendulums to transition to the high frequency regions at separate frequency. The pendulums have different resonance frequencies and transition into the high frequency region based on their own resonant frequency. This causes a range of frequencies where the pendulums are out of phase from each other. After both pendulums have transitioned, they remain in phase with each other and out of phase from the cart.

However, if the length of the pendulum is decreased too much, the system starts to exhibit chaotic behavior. The short pendulum starts to act in a chaotic manner and the phase relationship between the pendulums and the carts is no longer maintained. Since the pendulum length represents the distance between the particle of coffee and the top of the cup, this implies that coffee near the top of the cup would cause the system to act chaotically. Further analysis would be needed to determine the reason why the length affects the system in this way.

ContributorsZindani, Abdul Rahman (Co-author) / Crane, Kari (Co-author) / Lai, Ying-Cheng (Thesis director) / Jiang, Junjie (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

Joint Exome and Metabolome Analysis in Individuals With Dyslexia: Evidence for Associated Dysregulations of Olfactory Perception and Autoimmune Functions

Description

Dyslexia is a learning disability that negatively affects reading, writing, and spelling development at the word level in 5%-9% of children. The phenotype is variable and complex, involving several potential cognitive and physical concomitants such as sensory dysregulation and immunodeficiencies. The biological pathogenesis is not well-understood. Toward a better understanding…

Dyslexia is a learning disability that negatively affects reading, writing, and spelling development at the word level in 5%-9% of children. The phenotype is variable and complex, involving several potential cognitive and physical concomitants such as sensory dysregulation and immunodeficiencies. The biological pathogenesis is not well-understood. Toward a better understanding of the biological drivers of dyslexia, we conducted the first joint exome and metabolome investigation in a pilot sample of 30 participants with dyslexia and 13 controls. In the metabolite analysis, eight metabolites of interest emerged (pyridoxine, kynurenic acid, citraconic acid, phosphocreatine, hippuric acid, xylitol, 2-deoxyuridine, and acetylcysteine). A metabolite-metabolite interaction analysis identified Krebs cycle intermediates that may be implicated in the development of dyslexia. Gene ontology analysis based on exome variants resulted in several pathways of interest, including the sensory perception of smell (olfactory) and immune system-related responses. In the joint exome and metabolite analysis, the olfactory transduction pathway emerged as the primary pathway of interest. Although the olfactory transduction and Krebs cycle pathways have not previously been described in the dyslexia literature, these pathways have been implicated in other neurodevelopmental disorders including autism spectrum disorder and obsessive-compulsive disorder, suggesting the possibility of these pathways playing a role in dyslexia as well. Immune system response pathways, on the other hand, have been implicated in both dyslexia and other neurodevelopmental disorders.

ContributorsNandakumar, Rohit (Author) / Dinu, Valentin (Thesis director) / Peter, Beate (Committee member) / Barrett, The Honors College (Contributor) / College of Health Solutions (Contributor)

Created2022-05

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

Unveiling Cellular Heterogeneity, Genetic Regulation, and Protein Trafficking Dynamics Via Novel Integrative Multi-Omic Approaches

Description

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological…

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological mechanisms, integrative multi-omics analysis is needed that can provide a comprehensive picture of the disease. Here, I present three novel approaches to multi-omics integrative analysis. I introduce a single-cell integrative clustering method, which leverages multi-omics to enhance the resolution of cell subpopulations. Applied to a Cellular Indexing of Transcriptomes and Epitopes (CITE-Seq) dataset from human Acute Myeloid Lymphoma (AML) and control samples, this approach unveiled nuanced cell populations that otherwise remain elusive. I then shift the focus to a computational framework to discover transcriptional regulatory trios in which a transcription factor binds to a regulatory element harboring a genetic variant and subsequently differentially regulates the transcription level of a target gene. Applied to whole-exome, whole-genome, and transcriptome data of multiple myeloma samples, this approach discovered synergetic cis-acting and trans-acting regulatory elements associated with tumorigenesis. The next part of this work introduces a novel methodology that leverages the transcriptome and surface protein data at the single-cell level produced by CITE-Seq to model the intracellular protein trafficking process. Applied to COVID-19 samples, this approach revealed dysregulated protein trafficking associated with the severity of the infection.

ContributorsMudappathi, Rekha (Author) / Liu, Li (Thesis advisor) / Dinu, Valentin (Committee member) / Sun, Zhifu (Committee member) / Arizona State University (Publisher)

Created2023

Quantum Scattering and Machine Learning in Dirac Materials

Description

A remarkable phenomenon in contemporary physics is quantum scarring in classically chaoticsystems, where the wave functions tend to concentrate on classical periodic orbits. Quantum scarring has been studied for more than four decades, but the problem of efficiently detecting quantum scars has remained to be challenging, relying mostly on human visualization of wave…

A remarkable phenomenon in contemporary physics is quantum scarring in classically chaoticsystems, where the wave functions tend to concentrate on classical periodic orbits. Quantum scarring has been studied for more than four decades, but the problem of efficiently detecting quantum scars has remained to be challenging, relying mostly on human visualization of wave function patterns. This paper develops a machine learning approach to detecting quantum scars in an automated and highly efficient manner. In particular, this paper exploits Meta learning. The first step is to construct a few-shot classification algorithm, under the requirement that the one-shot classification accuracy be larger than 90%. Then propose a scheme based on a combination of neural networks to improve the accuracy. This paper shows that the machine learning scheme can find the correct quantum scars from thousands images of wave functions, without any human intervention, regardless of the symmetry of the underlying classical system. This will be the first application of Meta learning to quantum systems. Interacting spin networks are fundamental to quantum computing. Data-based tomography oftime-independent spin networks has been achieved, but an open challenge is to ascertain the structures of time-dependent spin networks using time series measurements taken locally from a small subset of the spins. Physically, the dynamical evolution of a spin network under time-dependent driving or perturbation is described by the Heisenberg equation of motion. Motivated by this basic fact, this paper articulates a physics-enhanced machine learning framework whose core is Heisenberg neural networks. This paper demonstrates that, from local measurements, not only the local Hamiltonian can be recovered but the Hamiltonian reflecting the interacting structure of the whole system can also be faithfully reconstructed. Using Heisenberg neural machine on spin networks of a variety of structures. In the extreme case where measurements are taken from only one spin, the achieved tomography fidelity values can reach about 90%. The developed machine learning framework is applicable to any time-dependent systems whose quantum dynamical evolution is governed by the Heisenberg equation of motion.

ContributorsHan, Chendi (Author) / Lai, Ying-Cheng (Thesis advisor) / Yu, Hongbin (Committee member) / Dasarathy, Gautam (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2022

Fabrication, Characterization, and Device Applications of Few-Layer Black Phosphorus

Description

Few-layer black phosphorous (FLBP) is one of the most important two-dimensional (2D) materials due to its strongly layer-dependent quantized bandstructure, which leads to wavelength-tunable optical and electrical properties. This thesis focuses on the preparation of stable, high-quality FLBP, the characterization of its optical properties, and device applications.Part I presents an…

Few-layer black phosphorous (FLBP) is one of the most important two-dimensional (2D) materials due to its strongly layer-dependent quantized bandstructure, which leads to wavelength-tunable optical and electrical properties. This thesis focuses on the preparation of stable, high-quality FLBP, the characterization of its optical properties, and device applications.Part I presents an approach to preparing high-quality, stable FLBP samples by combining O2 plasma etching, boron nitride (BN) sandwiching, and subsequent rapid thermal annealing (RTA). Such a strategy has successfully produced FLBP samples with a record-long lifetime, with 80% of photoluminescence (PL) intensity remaining after 7 months. The improved material quality of FLBP allows the establishment of a more definitive relationship between the layer number and PL energies. Part II presents the study of oxygen incorporation in FLBP. The natural oxidation formed in the air environment is dominated by the formation of interstitial oxygen and dangling oxygen. By the real-time PL and Raman spectroscopy, it is found that continuous laser excitation breaks the bonds of interstitial oxygen, and free oxygen atoms can diffuse around or form dangling oxygen under low heat. RTA at 450 °C can turn the interstitial oxygen into dangling oxygen more thoroughly. Such oxygen-containing samples show similar optical properties to the pristine BP samples. The bandgap of such FLBP samples increases with the concentration of the incorporated oxygen. Part III deals with the investigation of emission natures of the prepared samples. The power- and temperature-dependent measurements demonstrate that PL emissions are dominated by excitons and trions, with a combined percentage larger than 80% at room temperature. Such measurements allow the determination of trion and exciton binding energies of 2-, 3-, and 4-layer BP, with values around 33, 23, 15 meV for trions and 297, 276, 179 meV for excitons at 77K, respectively. Part IV presents the initial exploration of device applications of such FLBP samples. The coupling between photonic crystal cavity (PCC) modes and FLBP's emission is realized by integrating the prepared sandwich structure onto 2D PCC. Electroluminescence has also been achieved by integrating such materials onto interdigital electrodes driven by alternating electric fields.

ContributorsLi, Dongying (Author) / Ning, Cun-Zheng (Thesis advisor) / Vasileska, Dragica (Committee member) / Lai, Ying-Cheng (Committee member) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2022

Characterizing EEG Data for Epileptic Seizures Using a Variety of Data Analysis Methods to Understand the Data and Enable Future Research

Description

In this research, I surveyed existing methods of characterizing Epilepsy from Electroencephalogram (EEG) data, including the Random Forest algorithm, which was claimed by many researchers to be the most effective at detecting epileptic seizures [7]. I observed that although many papers claimed a detection of >99% using Random Forest, it…

In this research, I surveyed existing methods of characterizing Epilepsy from Electroencephalogram (EEG) data, including the Random Forest algorithm, which was claimed by many researchers to be the most effective at detecting epileptic seizures [7]. I observed that although many papers claimed a detection of >99% using Random Forest, it was not specified “when” the detection was declared within the 23.6 second interval of the seizure event. In this research, I created a time-series procedure to detect the seizure as early as possible within the 23.6 second epileptic seizure window and found that the detection is effective (> 92%) as early as the first few seconds of the epileptic episode. I intend to use this research as a stepping stone towards my upcoming Masters thesis research where I plan to expand the time-series detection mechanism to the pre-ictal stage, which will require a different dataset.

ContributorsBou-Ghazale, Carine (Author) / Lai, Ying-Cheng (Thesis director) / Berisha, Visar (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)

Created2022-05

Pharmacogenomics of Selective Serotonin Reuptake Inhibitor Treatment for Major Depressive Disorder: a Genome Wide Association Study

Description

A genome wide association study (GWAS) of treatment outcomes for citalopram and escitalopram, two frontline SSRI treatments for Major Depressive Disorder, was conducted with 529 subjects on an imputed dataset. While no variants of genome-wide significance were identified, various potentially interesting variants were identified that warrant further exploration. These findings…

A genome wide association study (GWAS) of treatment outcomes for citalopram and escitalopram, two frontline SSRI treatments for Major Depressive Disorder, was conducted with 529 subjects on an imputed dataset. While no variants of genome-wide significance were identified, various potentially interesting variants were identified that warrant further exploration. These findings have the potential to elucidate novel mechanisms underlying drug response for SSRIs. This work will be continued further, with machine learning and deep learning analyses to perform non-linear analyses and employing a biologist or geneticist to provide more specialized knowledge for interpretation of results.

ContributorsLeiter-Weintraub, Ethan (Author) / Dinu, Valentin (Thesis director) / Scotch, Matthew (Committee member) / Barrett, The Honors College (Contributor) / Dean, W.P. Carey School of Business (Contributor) / College of Health Solutions (Contributor) / School of Life Sciences (Contributor)

Created2024-05

Applications and Machine-Learning Prediction of Nonlinear Dynamical Systems

Description

Predicting nonlinear dynamical systems has been a long-standing challenge in science. This field is currently witnessing a revolution with the advent of machine learning methods. Concurrently, the analysis of dynamics in various nonlinear complex systems continues to be crucial. Guided by these directions, I conduct the following studies. Predicting critical…

Predicting nonlinear dynamical systems has been a long-standing challenge in science. This field is currently witnessing a revolution with the advent of machine learning methods. Concurrently, the analysis of dynamics in various nonlinear complex systems continues to be crucial. Guided by these directions, I conduct the following studies. Predicting critical transitions and transient states in nonlinear dynamics is a complex problem. I developed a solution called parameter-aware reservoir computing, which uses machine learning to track how system dynamics change with a driving parameter. I show that the transition point can be accurately predicted while trained in a sustained functioning regime before the transition. Notably, it can also predict if the system will enter a transient state, the distribution of transient lifetimes, and their average before a final collapse, which are crucial for management. I introduce a machine-learning-based digital twin for monitoring and predicting the evolution of externally driven nonlinear dynamical systems, where reservoir computing is exploited. Extensive tests on various models, encompassing optics, ecology, and climate, verify the approach’s effectiveness. The digital twins can extrapolate unknown system dynamics, continually forecast and monitor under non-stationary external driving, infer hidden variables, adapt to different driving waveforms, and extrapolate bifurcation behaviors across varying system sizes. Integrating engineered gene circuits into host cells poses a significant challenge in synthetic biology due to circuit-host interactions, such as growth feedback. I conducted systematic studies on hundreds of circuit structures exhibiting various functionalities, and identified a comprehensive categorization of growth-induced failures. I discerned three dynamical mechanisms behind these circuit failures. Moreover, my comprehensive computations reveal a scaling law between the circuit robustness and the intensity of growth feedback. A class of circuits with optimal robustness is also identified. Chimera states, a phenomenon of symmetry-breaking in oscillator networks, traditionally have transient lifetimes that grow exponentially with system size. However, my research on high-dimensional oscillators leads to the discovery of ’short-lived’ chimera states. Their lifetime increases logarithmically with system size and decreases logarithmically with random perturbations, indicating a unique fragility. To understand these states, I use a transverse stability analysis supported by simulations.

ContributorsKong, Lingwei (Author) / Lai, Ying-Cheng (Thesis advisor) / Tian, Xiaojun (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Alkhateeb, Ahmed (Committee member) / Arizona State University (Publisher)

Created2023