Matching Items (3)
Filtering by

Clear all filters

151176-Thumbnail Image.png
Description
Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these massive datasets lay in their complex structures, such as high-dimensionality, hierarchy, multi-modality, heterogeneity and data uncertainty. Besides the statistical challenges, the associated computational approaches are also considered essential in achieving efficiency, effectiveness, as well as the numerical stability in practice. On the other hand, some recent developments in statistics and machine learning, such as sparse learning, transfer learning, and some traditional methodologies which still hold potential, such as multi-level models, all shed lights on addressing these complex datasets in a statistically powerful and computationally efficient way. In this dissertation, we identify four kinds of general complex datasets, including "high-dimensional datasets", "hierarchically-structured datasets", "multimodality datasets" and "data uncertainties", which are ubiquitous in many domains, such as biology, medicine, neuroscience, health care delivery, manufacturing, etc. We depict the development of novel statistical models to analyze complex datasets which fall under these four categories, and we show how these models can be applied to some real-world applications, such as Alzheimer's disease research, nursing care process, and manufacturing.
ContributorsHuang, Shuai (Author) / Li, Jing (Thesis advisor) / Askin, Ronald (Committee member) / Ye, Jieping (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)
Created2012
156200-Thumbnail Image.png
Description
Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”.

As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics.
ContributorsSeto, Christian (Author) / Pavlic, Theodore (Thesis advisor) / Li, Jing (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)
Created2018
156575-Thumbnail Image.png
Description
Mathematical modeling and decision-making within the healthcare industry have given means to quantitatively evaluate the impact of decisions into diagnosis, screening, and treatment of diseases. In this work, we look into a specific, yet very important disease, the Alzheimer. In the United States, Alzheimer’s Disease (AD) is the 6th leading

Mathematical modeling and decision-making within the healthcare industry have given means to quantitatively evaluate the impact of decisions into diagnosis, screening, and treatment of diseases. In this work, we look into a specific, yet very important disease, the Alzheimer. In the United States, Alzheimer’s Disease (AD) is the 6th leading cause of death. Diagnosis of AD cannot be confidently confirmed until after death. This has prompted the importance of early diagnosis of AD, based upon symptoms of cognitive decline. A symptom of early cognitive decline and indicator of AD is Mild Cognitive Impairment (MCI). In addition to this qualitative test, Biomarker tests have been proposed in the medical field including p-Tau, FDG-PET, and hippocampal. These tests can be administered to patients as early detectors of AD thus improving patients’ life quality and potentially reducing the costs of the health structure. Preliminary work has been conducted in the development of a Sequential Tree Based Classifier (STC), which helps medical providers predict if a patient will contract AD or not, by sequentially testing these biomarker tests. The STC model, however, has its limitations and the need for a more complex, robust model is needed. In fact, STC assumes a general linear model as the status of the patient based upon the tests results. We take a simulation perspective and try to define a more complex model that represents the patient evolution in time.

Specifically, this thesis focuses on the formulation of a Markov Chain model that is complex and robust. This Markov Chain model emulates the evolution of MCI patients based upon doctor visits and the sequential administration of biomarker tests. Data provided to create this Markov Chain model were collected by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The data lacked detailed information of the sequential administration of the biomarker tests and therefore, different analytical approaches were tried and conducted in order to calibrate the model. The resulting Markov Chain model provided the capability to conduct experiments regarding different parameters of the Markov Chain and yielded different results of patients that contracted AD and those that did not, leading to important insights into effect of thresholds and sequence on patient prediction capability as well as health costs reduction.



The data in this thesis was provided from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). ADNI investigators did not contribute to any analysis or writing of this thesis. A list of the ADNI investigators can be found at: http://adni.loni.usc.edu/about/governance/principal-investigators/ .
ContributorsCamarena, Raquel (Author) / Pedrielli, Giulia (Thesis advisor) / Li, Jing (Thesis advisor) / Wu, Teresa (Committee member) / Arizona State University (Publisher)
Created2018