Matching Items (105)
Filtering by

Clear all filters

151180-Thumbnail Image.png
Description
As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual patients render this a difficult task. In the last decade, several algorithms have been proposed to elucidate cellular systems from data, resulting in numerous data-driven hypotheses. However, due to the large number of variables involved in the process, many of which are unknown or not measurable, such computational approaches often lead to a high proportion of false positives. This renders interpretation of the data-driven hypotheses extremely difficult. Consequently, a dismal proportion of these hypotheses are subject to further experimental validation, eventually limiting their potential to augment existing biological knowledge. This dissertation develops a framework of computational methods for the analysis of such data-driven hypotheses leveraging existing biological knowledge. Specifically, I show how biological knowledge can be mapped onto these hypotheses and subsequently augmented through novel hypotheses. Biological hypotheses are learnt in three levels of abstraction -- individual interactions, functional modules and relationships between pathways, corresponding to three complementary aspects of biological systems. The computational methods developed in this dissertation are applied to high throughput cancer data, resulting in novel hypotheses with potentially significant biological impact.
ContributorsRamesh, Archana (Author) / Kim, Seungchan (Thesis advisor) / Langley, Patrick W (Committee member) / Baral, Chitta (Committee member) / Kiefer, Jeffrey (Committee member) / Arizona State University (Publisher)
Created2012
190776-Thumbnail Image.png
Description
This project analyzed the sequencing results of 230 bat samples to investigatenovel Coronaviruses (CoVs) appearance. A bioinformatics workflow solution was developed to process the Next-Generation Sequencing (NGS) data to identify novel CoV genomes. A parallel computing scheme was implemented to enhance performance. Among the 230 bat samples, 14 samples previously

This project analyzed the sequencing results of 230 bat samples to investigatenovel Coronaviruses (CoVs) appearance. A bioinformatics workflow solution was developed to process the Next-Generation Sequencing (NGS) data to identify novel CoV genomes. A parallel computing scheme was implemented to enhance performance. Among the 230 bat samples, 14 samples previously tested positive for CoV appearance by a pan-CoV quantitative polymerase chain reaction (qPCR). The Illumina NGS techniques are used to generate the shotgun readings. With the newly developed bioinformatics pipeline, the sequencing reads from each bat sample, and a positive control sample were quality controlled and assembled to generate longer viral contigs. They then went through a Basic Local Alignment Search Tool X (BLASTx) query against a customized CoV database from the National Center for Biotechnology Information (NCBI) databases. After further filtering with BLASTx and megaBLAST against the NCBI nucleotide collection (nr/nt) database, the confirmed CoV contigs were used to build bootstrapped phylogenetic trees with several representative Alpha, Beta, and Gamma-CoV genomes. Two bat samples contained potentially novel CoV fragments corresponding to the Open Reading Frame 1ab (ORF1ab), ORF7, and Nucleocapsid (N) gene regions. The phylogenetic trees showed that the fragments are Alpha-CoVs, which are closely related to Eptesicus Bat Coronavirus, Pipistrellus Bat Coronavirus, and Tadarida Brasiliensis Bat Alphacoronavirus 1.
ContributorsMu, Tianchen Nil (Author) / Lim, Efrem EL (Thesis advisor) / Lee, Kookjin KL (Thesis advisor) / Chung, Yunro YC (Committee member) / Arizona State University (Publisher)
Created2023
190974-Thumbnail Image.png
Description
Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological

Advancements in high-throughput biotechnologies have generated large-scale multi-omics datasets encompassing diverse dimensions such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, and phenomics. Traditionally, statistical and machine learning-based approaches utilize single-omics data sources to uncover molecular signatures, dissect complicated cellular mechanisms, and predict clinical results. However, to capture the multifaceted pathological mechanisms, integrative multi-omics analysis is needed that can provide a comprehensive picture of the disease. Here, I present three novel approaches to multi-omics integrative analysis. I introduce a single-cell integrative clustering method, which leverages multi-omics to enhance the resolution of cell subpopulations. Applied to a Cellular Indexing of Transcriptomes and Epitopes (CITE-Seq) dataset from human Acute Myeloid Lymphoma (AML) and control samples, this approach unveiled nuanced cell populations that otherwise remain elusive. I then shift the focus to a computational framework to discover transcriptional regulatory trios in which a transcription factor binds to a regulatory element harboring a genetic variant and subsequently differentially regulates the transcription level of a target gene. Applied to whole-exome, whole-genome, and transcriptome data of multiple myeloma samples, this approach discovered synergetic cis-acting and trans-acting regulatory elements associated with tumorigenesis. The next part of this work introduces a novel methodology that leverages the transcriptome and surface protein data at the single-cell level produced by CITE-Seq to model the intracellular protein trafficking process. Applied to COVID-19 samples, this approach revealed dysregulated protein trafficking associated with the severity of the infection.
ContributorsMudappathi, Rekha (Author) / Liu, Li (Thesis advisor) / Dinu, Valentin (Committee member) / Sun, Zhifu (Committee member) / Arizona State University (Publisher)
Created2023
190921-Thumbnail Image.png
Description
Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequence artifacts and errors made in alignment reconstruction can impact downstream analyses, leading to erroneous conclusions in comparative and functional genomic studies. While such errors are eventually

Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequence artifacts and errors made in alignment reconstruction can impact downstream analyses, leading to erroneous conclusions in comparative and functional genomic studies. While such errors are eventually fixed in the reference genomes of model organisms, many genomes used by researchers contain these artifacts, often forcing researchers to discard large amounts of data to prevent artifacts from impacting results. I developed COATi, a statistical, codon-aware pairwise aligner designed to align protein-coding sequences in the presence of artifacts commonly introduced by sequencing or annotation errors, such as early stop codons and abiological frameshifts. Unlike common sequence aligners, which rely on amino acid translations, only model insertion and deletions between codons, or lack a statistical model, COATi combines a codon substitution model specifically designed for protein-coding regions, a complex insertion-deletion model, and a sequencing base calling error step. The alignment algorithm is based on finite state transducers (FSTs), computational machines well-suited for modeling sequence evolution. I show that COATi outperforms available methods using a simulated empirical pairwise alignment dataset as a benchmark. The FST-based model and alignment algorithm in COATi is resource-intense for sequences longer than a few kilobases. To address this constraint, I developed an approximate model compatible with traditional dynamic programming alignment algorithms. I describe how the original codon substitution model is transformed to build an approximate model and how the alignment algorithm is implemented by modifying the popular Gotoh algorithm. I simulated a benchmark of alignments and measured how well the marginal models approximate the original method. Finally, I present a novel tool for analyzing sequence alignments. Available metrics can measure the similarity between two alignments or the column uncertainty within an alignment but cannot produce a site-specific comparison of two or more alignments. AlnDotPlot is an R software package inspired by traditional dot plots that can provide valuable insights when comparing pairwise alignments. I describe AlnDotPlot and showcase its utility in displaying a single alignment, comparing different pairwise alignments, and summarizing alignment space.
ContributorsGarcia Mesa, Juan Jose (Author) / Cartwright, Reed A (Thesis advisor) / Taylor, Jesse (Committee member) / Pavlic, Theodore (Committee member) / Ozkan, Banu (Committee member) / Arizona State University (Publisher)
Created2023
193032-Thumbnail Image.png
Description
Metagenomics is the study of the structure and function of microbial communities through the application of the whole-genome shotgun (WGS) sequencing method. Providing high-resolution community profiles at species or even strain levels, metagenomics points to a new direction for microbiome research in understanding microbial gene function, microbial-microbial interactions, and host-microbe

Metagenomics is the study of the structure and function of microbial communities through the application of the whole-genome shotgun (WGS) sequencing method. Providing high-resolution community profiles at species or even strain levels, metagenomics points to a new direction for microbiome research in understanding microbial gene function, microbial-microbial interactions, and host-microbe interactions. My thesis work includes innovation in metagenomic research through the application of ChatGPT in assisting beginning researchers, adopt pre-existed alpha diversity metric for metagenomic data to improve diversity calculation, and the application of metagenomic data in Alzheimer’s disease research.Since the release of ChatGPT in March 2023, the conversation regarding AI in research has promptly been debated. Through the prompted bioinformatic case study, I demonstrate the application of ChatGPT in conducting metagenomic analysis. I constructed and tested a working pipeline aimed at instructing GPT in completing shotgun metagenomic research. The pipeline includes instructions for various essential analytic steps: quality controls, host filtering, read classification, abundance estimation, diversity calculation, and data visualization. The pipeline demonstrated successful completion and reproducible results. Alpha diversity measurement is critical to understanding microbiomes. The widely used Faith’s phylogenetic diversity (PD) metric is agnostic of feature abundance and, therefore, falls short of analyzing metagenomic data. BWPDθ, an abundance weighted variant of Faith’s PD, was implemented in scikit-bio alpha diversity metrics. My analysis shows that BWPDθ does have better performance compared to Faith’s PD, revealing more biological significance, and maintaining their robustness at a lower sampling depth. The progression of Alzheimer’s disease (AD) is known to be associated with alterations in the patient’s gut microbiome. Utilizing metagenomic data from the AlzBiom study, I explored the differential abundance of bacterial pncA genes among healthy and AD participants by age group. The analysis showed that there was no significant difference in pncA abundance between the healthy and AD patients. However, when stratified by age group, within the age group 64 to 69, AD was shown to have significantly lower pncA abundance than the healthy control group. The Pearson's test showed a moderate positive association between age and pncA abundance.
ContributorsXing, Zhu (Author) / Zhu, Qiyun (Thesis advisor) / Lim, Efrem (Committee member) / Snyder-Mackler, Noah (Committee member) / Arizona State University (Publisher)
Created2024