Matching Items (13)
Filtering by

Clear all filters

151939-Thumbnail Image.png
Description
Random peptide microarrays are a powerful tool for both the treatment and diagnostics of infectious diseases. On the treatment side, selected random peptides on the microarray have either binding or lytic potency against certain pathogens cells, thus they can be synthesized into new antimicrobial agents, denoted as synbodies (synthetic antibodies).

Random peptide microarrays are a powerful tool for both the treatment and diagnostics of infectious diseases. On the treatment side, selected random peptides on the microarray have either binding or lytic potency against certain pathogens cells, thus they can be synthesized into new antimicrobial agents, denoted as synbodies (synthetic antibodies). On the diagnostic side, serum containing specific infection-related antibodies create unique and distinct "pathogen-immunosignatures" on the random peptide microarray distinct from the healthy control serum, and this different mode of binding can be used as a more precise measurement than traditional ELISA tests. My thesis project is separated into these two parts: the first part falls into the treatment side and the second one focuses on the diagnostic side. My first chapter shows that a substitution amino acid peptide library helps to improve the activity of a recently reported synthetic antimicrobial peptide selected by the random peptide microarray. By substituting one or two amino acids of the original lead peptide, the new substitutes show changed hemolytic effects against mouse red blood cells and changed potency against two pathogens: Staphylococcus aureus and Pseudomonas aeruginosa. Two new substitutes are then combined together to form the synbody, which shows a significantly antimicrobial potency against Staphylococcus aureus (<0.5uM). In the second chapter, I explore the possibility of using the 10K Ver.2 random peptide microarray to monitor the humoral immune response of dengue. Over 2.5 billion people (40% of the world's population) live in dengue transmitting areas. However, currently there is no efficient dengue treatment or vaccine. Here, with limited dengue patient serum samples, we show that the immunosignature has the potential to not only distinguish the dengue infection from non-infected people, but also the primary dengue infection from the secondary dengue infections, dengue infection from West Nile Virus (WNV) infection, and even between different dengue serotypes. By further bioinformatic analysis, we demonstrate that the significant peptides selected to distinguish dengue infected and normal samples may indicate the epitopes responsible for the immune response.
ContributorsWang, Xiao (Author) / Johnston, Stephen Albert (Thesis advisor) / Blattman, Joseph (Committee member) / Arntzen, Charles (Committee member) / Arizona State University (Publisher)
Created2013
149928-Thumbnail Image.png
Description
The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.
ContributorsTorres Garcia, Wandaliz (Author) / Meldrum, Deirdre R. (Thesis advisor) / Runger, George C. (Thesis advisor) / Gel, Esma S. (Committee member) / Li, Jing (Committee member) / Zhang, Weiwen (Committee member) / Arizona State University (Publisher)
Created2011
151176-Thumbnail Image.png
Description
Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these massive datasets lay in their complex structures, such as high-dimensionality, hierarchy, multi-modality, heterogeneity and data uncertainty. Besides the statistical challenges, the associated computational approaches are also considered essential in achieving efficiency, effectiveness, as well as the numerical stability in practice. On the other hand, some recent developments in statistics and machine learning, such as sparse learning, transfer learning, and some traditional methodologies which still hold potential, such as multi-level models, all shed lights on addressing these complex datasets in a statistically powerful and computationally efficient way. In this dissertation, we identify four kinds of general complex datasets, including "high-dimensional datasets", "hierarchically-structured datasets", "multimodality datasets" and "data uncertainties", which are ubiquitous in many domains, such as biology, medicine, neuroscience, health care delivery, manufacturing, etc. We depict the development of novel statistical models to analyze complex datasets which fall under these four categories, and we show how these models can be applied to some real-world applications, such as Alzheimer's disease research, nursing care process, and manufacturing.
ContributorsHuang, Shuai (Author) / Li, Jing (Thesis advisor) / Askin, Ronald (Committee member) / Ye, Jieping (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)
Created2012
155994-Thumbnail Image.png
Description
Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of

Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of miRNAs on the topology of biological pathways between controls and cases. In the dissertation, it is discussed that how rewired biological pathways (Chapter 1) and/or rewired miRNA-mRNA interactions (Chapter 2) aberrantly influence the activity of biological pathways and their association with disease.

This dissertation proposes two PageRank-based analytical methods, Pathways of Topological Rank Analysis (PoTRA) and miR2Pathway, discussed in Chapter 1 and Chapter 2, respectively. PoTRA focuses on detecting pathways with an altered number of hub genes in corresponding pathways between two phenotypes. The basis for PoTRA is that the loss of connectivity is a common topological trait of cancer networks, as well as the prior knowledge that a normal biological network is a scale-free network whose degree distribution follows a power law where a small number of nodes are hubs and a large number of nodes are non-hubs. However, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the scale-free structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal samples. Hence, it is hypothesized that if the number of hub genes is different in a pathway between normal and cancer, this pathway might be involved in cancer. MiR2Pathway focuses on quantifying the differential effects of miRNAs on the activity of a biological pathway when miRNA-mRNA connections are altered from normal to disease and rank disease risk of rewired miRNA-mediated biological pathways. This dissertation explores how rewired gene-gene interactions and rewired miRNA-mRNA interactions lead to aberrant activity of biological pathways, and rank pathways for their disease risk. The two methods proposed here can be used to complement existing genomics analysis methods to facilitate the study of biological mechanisms behind disease at the systems-level.
ContributorsLi, Chaoxing (Author) / Dinu, Valentin (Thesis advisor) / Kuang, Yang (Thesis advisor) / Liu, Li (Committee member) / Wang, Xiao (Committee member) / Arizona State University (Publisher)
Created2017
156679-Thumbnail Image.png
Description
The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to hel

The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields.

The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner.

The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability.

The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability.

The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.
ContributorsLin, Sangdi (Author) / Runger, George C. (Thesis advisor) / Kocher, Jean-Pierre A (Committee member) / Pan, Rong (Committee member) / Escobedo, Adolfo R. (Committee member) / Arizona State University (Publisher)
Created2018
155019-Thumbnail Image.png
Description
In species with highly heteromorphic sex chromosomes, the degradation of one of the sex chromosomes can result in unequal gene expression between the sexes (e.g., between XX females and XY males) and between the sex chromosomes and the autosomes. Dosage compensation is a process whereby genes on the sex chromosomes

In species with highly heteromorphic sex chromosomes, the degradation of one of the sex chromosomes can result in unequal gene expression between the sexes (e.g., between XX females and XY males) and between the sex chromosomes and the autosomes. Dosage compensation is a process whereby genes on the sex chromosomes achieve equal gene expression which prevents deleterious side effects from having too much or too little expression of genes on sex chromsomes. The green anole is part of a group of species that recently underwent an adaptive radiation. The green anole has XX/XY sex determination, but the content of the X chromosome and its evolution have not been described. Given its status as a model species, better understanding the green anole genome could reveal insights into other species. Genomic analyses are crucial for a comprehensive picture of sex chromosome differentiation and dosage compensation, in addition to understanding speciation.

In order to address this, multiple comparative genomics and bioinformatics analyses were conducted to elucidate patterns of evolution in the green anole and across multiple anole species. Comparative genomics analyses were used to infer additional X-linked loci in the green anole, RNAseq data from male and female samples were anayzed to quantify patterns of sex-biased gene expression across the genome, and the extent of dosage compensation on the anole X chromosome was characterized, providing evidence that the sex chromosomes in the green anole are dosage compensated.

In addition, X-linked genes have a lower ratio of nonsynonymous to synonymous substitution rates than the autosomes when compared to other Anolis species, and pairwise rates of evolution in genes across the anole genome were analyzed. To conduct this analysis a new pipeline was created for filtering alignments and performing batch calculations for whole genome coding sequences. This pipeline has been made publicly available.
ContributorsRupp, Shawn Michael (Author) / Wilson Sayres, Melissa A (Thesis advisor) / Kusumi, Kenro (Committee member) / DeNardo, Dale (Committee member) / Arizona State University (Publisher)
Created2016
156520-Thumbnail Image.png
Description
Study of canine cancer’s molecular underpinnings holds great potential for informing veterinary and human oncology. Sporadic canine cancers are highly abundant (~4 million diagnoses/year in the United States) and the dog’s unique genomic architecture due to selective inbreeding, alongside the high similarity between dog and human genomes both confer power

Study of canine cancer’s molecular underpinnings holds great potential for informing veterinary and human oncology. Sporadic canine cancers are highly abundant (~4 million diagnoses/year in the United States) and the dog’s unique genomic architecture due to selective inbreeding, alongside the high similarity between dog and human genomes both confer power for improving understanding of cancer genes. However, characterization of canine cancer genome landscapes has been limited. It is hindered by lack of canine-specific tools and resources. To enable robust and reproducible comparative genomic analysis of canine cancers, I have developed a workflow for somatic and germline variant calling in canine cancer genomic data. I have first adapted a human cancer genomics pipeline to create a semi-automated canine pipeline used to map genomic landscapes of canine melanoma, lung adenocarcinoma, osteosarcoma and lymphoma. This pipeline also forms the backbone of my novel comparative genomics workflow.

Practical impediments to comparative genomic analysis of dog and human include challenges identifying similarities in mutation type and function across species. For example, canine genes could have evolved different functions and their human orthologs may perform different functions. Hence, I undertook a systematic statistical evaluation of dog and human cancer genes and assessed functional similarities and differences between orthologs to improve understanding of the roles of these genes in cancer across species. I tested this pipeline canine and human Diffuse Large B-Cell Lymphoma (DLBCL), given that canine DLBCL is the most comprehensively genomically characterized canine cancer. Logistic regression with genes bearing somatic coding mutations in each cancer was used to determine if conservation metrics (sequence identity, network placement, etc.) could explain co-mutation of genes in both species. Using this model, I identified 25 co-mutated and evolutionarily similar genes that may be compelling cross-species cancer genes. For example, PCLO was identified as a co-mutated conserved gene with PCLO having been previously identified as recurrently mutated in human DLBCL, but with an unclear role in oncogenesis. Further investigation of these genes might shed new light on the biology of lymphoma in dogs and human and this approach may more broadly serve to prioritize new genes for comparative cancer biology studies.
ContributorsSivaprakasam, Karthigayini (Author) / Dinu, Valentin (Thesis advisor) / Trent, Jeffrey (Thesis advisor) / Hendricks, William (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)
Created2018
155356-Thumbnail Image.png
Description
The past decade has seen a drastic increase in collaboration between Computer Science (CS) and Molecular Biology (MB). Current foci in CS such as deep learning require very large amounts of data, and MB research can often be rapidly advanced by analysis and models from CS. One of the places

The past decade has seen a drastic increase in collaboration between Computer Science (CS) and Molecular Biology (MB). Current foci in CS such as deep learning require very large amounts of data, and MB research can often be rapidly advanced by analysis and models from CS. One of the places where CS could aid MB is during analysis of sequences to find binding sites, prediction of folding patterns of proteins. Maintenance and replication of stem-like cells is possible for long terms as well as differentiation of these cells into various tissue types. These behaviors are possible by controlling the expression of specific genes. These genes then cascade into a network effect by either promoting or repressing downstream gene expression. The expression level of all gene transcripts within a single cell can be analyzed using single cell RNA sequencing (scRNA-seq). A significant portion of noise in scRNA-seq data are results of extrinsic factors and could only be removed by customized scRNA-seq analysis pipeline. scRNA-seq experiments utilize next-gen sequencing to measure genome scale gene expression levels with single cell resolution.

Almost every step during analysis and quantification requires the use of an often empirically determined threshold, which makes quantification of noise less accurate. In addition, each research group often develops their own data analysis pipeline making it impossible to compare data from different groups. To remedy this problem a streamlined and standardized scRNA-seq data analysis and normalization protocol was designed and developed. After analyzing multiple experiments we identified the possible pipeline stages, and tools needed. Our pipeline is capable of handling data with adapters and barcodes, which was not the case with pipelines from some experiments. Our pipeline can be used to analyze single experiment scRNA-seq data and also to compare scRNA-seq data across experiments. Various processes like data gathering, file conversion, and data merging were automated in the pipeline. The main focus was to standardize and normalize single-cell RNA-seq data to minimize technical noise introduced by disparate platforms.
ContributorsBalachandran, Parithi (Author) / Wang, Xiao (Thesis advisor) / Brafman, David (Committee member) / Lockhart, Thurmon (Committee member) / Arizona State University (Publisher)
Created2017
157806-Thumbnail Image.png
Description
The WNT signaling pathway plays numerous roles in development and maintenance of adult homeostasis. In concordance with it’s numerous roles, dysfunction of WNT signaling leads to a variety of human diseases ranging from developmental disorders to cancer. WNT signaling is composed of a family of 19 WNT soluble secreted glycoproteins,

The WNT signaling pathway plays numerous roles in development and maintenance of adult homeostasis. In concordance with it’s numerous roles, dysfunction of WNT signaling leads to a variety of human diseases ranging from developmental disorders to cancer. WNT signaling is composed of a family of 19 WNT soluble secreted glycoproteins, which are evolutionarily conserved across all phyla of the animal kingdom. WNT ligands interact most commonly with a family of receptors known as frizzled (FZ) receptors, composed of 10 independent genes. Specific interactions between WNT proteins and FZ receptors are not well characterized and are known to be promiscuous, Traditionally canonical WNT signaling is described as a binary system in which WNT signaling is either off or on. In the ‘off’ state, in the absence of a WNT ligand, cytoplasmic β-catenin is continuously degraded by the action of the APC/Axin/GSK-3β destruction complex. In the ‘on’ state, when WNT binds to its Frizzled (Fz) receptor and LRP coreceptor, this protein destruction complex is disrupted, allowing β-catenin to translocate into the nucleus where it interacts with the DNA-bound T cell factor/lymphoid factor (TCF/LEF) family of proteins to regulate target gene expression. However in a variety of systems in development and disease canonical WNT signaling acts in a gradient fashion, suggesting more complex regulation of β-catenin transcriptional activity. As such, the traditional ‘binary’ view of WNT signaling does not clearly explain how this graded signal is transmitted intracellularly to control concentration-dependent changes in gene expression and cell identity. I have developed an in vitro human pluripotent stem cell (hPSC)-based model that recapitulates the same in vivo developmental effects of the WNT signaling gradient on the anterior-posterior (A/P) patterning of the neural tube observed during early development. Using RNA-seq and ChIP-seq I have characterized β-catenin binding at different levels of WNT signaling and identified different classes of β-catenin peaks that bind cis-regulatory elements to influence neural cell fate. This work expands the traditional binary view of canonical WNT signaling and illuminates WNT/β-catenin activity in other developmental and diseased contexts.
ContributorsCutts, Joshua Patrick (Author) / Brafman, David A (Thesis advisor) / Stabenfeldt, Sarah (Committee member) / Nikkhah, Mehdi (Committee member) / Wang, Xiao (Committee member) / Plaisier, Christopher (Committee member) / Arizona State University (Publisher)
Created2019
157879-Thumbnail Image.png
Description
Accounting for over a third of all emerging and re-emerging infections, viruses represent a major public health threat, which researchers and epidemiologists across the world have been attempting to contain for decades. Recently, genomics-based surveillance of viruses through methods such as virus phylogeography has grown into a popular tool for

Accounting for over a third of all emerging and re-emerging infections, viruses represent a major public health threat, which researchers and epidemiologists across the world have been attempting to contain for decades. Recently, genomics-based surveillance of viruses through methods such as virus phylogeography has grown into a popular tool for infectious disease monitoring. When conducting such surveillance studies, researchers need to manually retrieve geographic metadata denoting the location of infected host (LOIH) of viruses from public sequence databases such as GenBank and any publication related to their study. The large volume of semi-structured and unstructured information that must be reviewed for this task, along with the ambiguity of geographic locations, make it especially challenging. Prior work has demonstrated that the majority of GenBank records lack sufficient geographic granularity concerning the LOIH of viruses. As a result, reviewing full-text publications is often necessary for conducting in-depth analysis of virus migration, which can be a very time-consuming process. Moreover, integrating geographic metadata pertaining to the LOIH of viruses from different sources, including different fields in GenBank records as well as full-text publications, and normalizing the integrated metadata to unique identifiers for subsequent analysis, are also challenging tasks, often requiring expert domain knowledge. Therefore, automated information extraction (IE) methods could help significantly accelerate this process, positively impacting public health research. However, very few research studies have attempted the use of IE methods in this domain.

This work explores the use of novel knowledge-driven geographic IE heuristics for extracting, integrating, and normalizing the LOIH of viruses based on information available in GenBank and related publications; when evaluated on manually annotated test sets, the methods were found to have a high accuracy and shown to be adequate for addressing this challenging problem. It also presents GeoBoost, a pioneering software system for georeferencing GenBank records, as well as a large-scale database containing over two million virus GenBank records georeferenced using the algorithms introduced here. The methods, database and software developed here could help support diverse public health domains focusing on sequence-informed virus surveillance, thereby enhancing existing platforms for controlling and containing disease outbreaks.
ContributorsTahsin, Tasnia (Author) / Gonzalez, Graciela (Thesis advisor) / Scotch, Matthew (Thesis advisor) / Runger, George C. (Committee member) / Arizona State University (Publisher)
Created2019