Filtering by
- Member of: Theses and Dissertations
- Member of: Barrett, The Honors College Thesis/Creative Project Collection
- Resource Type: Text
- Status: Published
I first demonstrate how a GLM can be employed and how the support for the predictors can be measured using influenza A/H5N1 in Egypt as an example. Secondly, I compare the GLM framework to two alternative frameworks of Bayesian phylogeography: one that uses an advanced computational technique and one that does not. For this assessment, I model the diffusion of influenza A/H3N2 in the United States during the 2014-15 flu season with five methods encapsulated by the three frameworks. I summarize metrics of the phylogenies created by each and demonstrate their reproducibility by performing analyses on several random sequence samples under a variety of population growth scenarios. Next, I demonstrate how discretization of the location trait for a given sequence set can influence phylogenies and support for predictors. That is, I perform several GLM analyses on a set of sequences and change how the sequences are pooled, then show how aggregating predictors at four levels of spatial resolution will alter posterior support. Finally, I provide a solution for researchers that wish to use the GLM framework but may be deterred by the tedious file-manipulation requirements that must be completed to do so. My pipeline, which is publicly available, should alleviate concerns pertaining to the difficulty and time-consuming nature of creating the files necessary to perform GLM analyses. This dissertation expands the knowledge of Bayesian phylogeographic GLMs and will facilitate the use of this framework, which may ultimately reveal the variables that drive the spread of pathogens.
This dissertation introduces novel machine learning methods based on parallelized cellomics to analyze interactions between cells, bacteria, and chemical compounds while reducing the use of fluorescent reagents. Machine learning analysis using image-based high-content screening (HCS) data is compartmentalized into three primary components: (1) \textit{Image Analytics}, (2) \textit{Phenotypic Analytics}, and (3) \textit{Compound Analytics}. A novel software analytics tool called the Insights project is also introduced. The Insights project fully incorporates distributed processing, high performance computing, and database management that can rapidly and effectively utilize and store massive amounts of data generated using HCS biological assessments (bioassays). It is ideally suited for parallelized cellomics in high dimensional space.
Results demonstrate that a parallelized cellomics approach increases the quality of a bioassay while vastly decreasing the need for control data. The reduction in control data leads to less fluorescent reagent consumption. Furthermore, a novel proposed method that uses single-cell data points is proven to identify known active chemical compounds with a high degree of accuracy, despite traditional quality control measurements indicating the bioassay to be of poor quality. This, ultimately, decreases the time and resources needed in optimizing bioassays while still accurately identifying active compounds.
avigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm.
Idiopathic pulmonary fibrosis (IPF) is an interstitial lung disease (ILD) that results in the permanent scarring and damage of lung tissue. Currently, there is no known cause or viable treatment for this disease, and the majority of patients either receive a lung transplant or succumb to the disease within five years of diagnosis. This project centers around studying IPF through analyzing gene expression patterns in healthy vs. diseased lung tissue via spatial transcriptomics. Spatial transcriptomics is the study of individual RNA transcripts within cells on a spatial level. With the novel technology MERFISH, we can detect gene expression in a spatial context with single-cell resolution, allowing us to make inferences about certain patterns of gene expression that are solely driven by the pathology of the disease. A total of 120 cells were selected from 21 different lung samples - 6 healthy; 15 ILD. Within those lung samples, selected from 4 different tissue features - control, less fibrotic, more fibrotic, and cystic. We built an analysis pipeline in R to analyze cell type composition around these features at different distances from the center cell (0-75, 76-150, and 150-225 μm). Cell types were annotated at both a broad (less specific) and fine (more specific) level. Upon analyzing the relationship between the proportions of various cell types and distance from tissue features, we found that within the broad cell type annotation level, airway epithelium cells had a negative relationship with distance and were statistically significant through linear regression models. Within the fine cell type annotation level, ciliated/secretory cells displayed this same trend. The results above support our current understanding of cystic tissue in lung tissue, and is a foundation for understanding disease pathology as a whole.