Search Content

Integrative analyses of diverse biological data sources

Description

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards…

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.

ContributorsTorres Garcia, Wandaliz (Author) / Meldrum, Deirdre R. (Thesis advisor) / Runger, George C. (Thesis advisor) / Gel, Esma S. (Committee member) / Li, Jing (Committee member) / Zhang, Weiwen (Committee member) / Arizona State University (Publisher)

Created2011

The application of texture analysis pipeline on MRE imaging for HCC diagnosis

Description

Hepatocellular carcinoma (HCC) is a malignant tumor and seventh most common cancer in human. Every year there is a significant rise in the number of patients suffering from HCC. Most clinical research has focused on HCC early detection so that there are high chances of patient's survival. Emerging advancements in…

Hepatocellular carcinoma (HCC) is a malignant tumor and seventh most common cancer in human. Every year there is a significant rise in the number of patients suffering from HCC. Most clinical research has focused on HCC early detection so that there are high chances of patient's survival. Emerging advancements in functional and structural imaging techniques have provided the ability to detect microscopic changes in tumor micro environment and micro structure. The prime focus of this thesis is to validate the applicability of advanced imaging modality, Magnetic Resonance Elastography (MRE), for HCC diagnosis. The research was carried out on three HCC patient's data and three sets of experiments were conducted. The main focus was on quantitative aspect of MRE in conjunction with Texture Analysis, an advanced imaging processing pipeline and multi-variate analysis machine learning method for accurate HCC diagnosis. We analyzed the techniques to handle unbalanced data and evaluate the efficacy of sampling techniques. Along with this we studied different machine learning algorithms and developed models using them. Performance metrics such as Prediction Accuracy, Sensitivity and Specificity have been used for evaluation for the final developed model. We were able to identify the significant features in the dataset and also the selected classifier was robust in predicting the response class variable with high accuracy.

ContributorsBansal, Gaurav (Author) / Wu, Teresa (Thesis advisor) / Mitchell, Ross (Thesis advisor) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2013

A P-value based approach for phase II profile monitoring

Description

A P-value based method is proposed for statistical monitoring of various types of profiles in phase II. The performance of the proposed method is evaluated by the average run length criterion under various shifts in the intercept, slope and error standard deviation of the model. In our proposed approach, P-values…

A P-value based method is proposed for statistical monitoring of various types of profiles in phase II. The performance of the proposed method is evaluated by the average run length criterion under various shifts in the intercept, slope and error standard deviation of the model. In our proposed approach, P-values are computed at each level within a sample. If at least one of the P-values is less than a pre-specified significance level, the chart signals out-of-control. The primary advantage of our approach is that only one control chart is required to monitor several parameters simultaneously: the intercept, slope(s), and the error standard deviation. A comprehensive comparison of the proposed method and the existing KMW-Shewhart method for monitoring linear profiles is conducted. In addition, the effect that the number of observations within a sample has on the performance of the proposed method is investigated. The proposed method was also compared to the T^2 method discussed in Kang and Albin (2000) for multivariate, polynomial, and nonlinear profiles. A simulation study shows that overall the proposed P-value method performs satisfactorily for different profile types.

ContributorsAdibi, Azadeh (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Thesis advisor) / Li, Jing (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2013

Novel statistical models for complex data structures

Description

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these…

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these massive datasets lay in their complex structures, such as high-dimensionality, hierarchy, multi-modality, heterogeneity and data uncertainty. Besides the statistical challenges, the associated computational approaches are also considered essential in achieving efficiency, effectiveness, as well as the numerical stability in practice. On the other hand, some recent developments in statistics and machine learning, such as sparse learning, transfer learning, and some traditional methodologies which still hold potential, such as multi-level models, all shed lights on addressing these complex datasets in a statistically powerful and computationally efficient way. In this dissertation, we identify four kinds of general complex datasets, including "high-dimensional datasets", "hierarchically-structured datasets", "multimodality datasets" and "data uncertainties", which are ubiquitous in many domains, such as biology, medicine, neuroscience, health care delivery, manufacturing, etc. We depict the development of novel statistical models to analyze complex datasets which fall under these four categories, and we show how these models can be applied to some real-world applications, such as Alzheimer's disease research, nursing care process, and manufacturing.

ContributorsHuang, Shuai (Author) / Li, Jing (Thesis advisor) / Askin, Ronald (Committee member) / Ye, Jieping (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)

Created2012

Cost Driven Agent Based Simulation of the Department of Defense Acquisition System

Description

The Department of Defense (DoD) acquisition system is a complex system riddled with cost and schedule overruns. These cost and schedule overruns are very serious issues as the acquisition system is responsible for aiding U.S. warfighters. Hence, if the acquisition process is failing that could be a potential threat to…

The Department of Defense (DoD) acquisition system is a complex system riddled with cost and schedule overruns. These cost and schedule overruns are very serious issues as the acquisition system is responsible for aiding U.S. warfighters. Hence, if the acquisition process is failing that could be a potential threat to our nation's security. Furthermore, the DoD acquisition system is responsible for proper allocation of billions of taxpayer's dollars and employs many civilians and military personnel. Much research has been done in the past on the acquisition system with little impact or success. One reason for this lack of success in improving the system is the lack of accurate models to test theories. This research is a continuation of the effort on the Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation modeling research on DoD acquisition system. We propose to extend ERAM using agent-based simulation principles due to the many interactions among the subsystems of the acquisition system. We initially identify ten sub models needed to simulate the acquisition system. This research focuses on three sub models related to the budget of acquisition programs. In this thesis, we present the data collection, data analysis, initial implementation, and initial validation needed to facilitate these sub models and lay the groundwork for a full agent-based simulation of the DoD acquisition system.

ContributorsBucknell, Sophia Robin (Author) / Wu, Teresa (Thesis director) / Li, Jing (Committee member) / Colombi, John (Committee member) / Industrial, Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Modeling supply chain dynamics with calibrated simulation using data fusion

Description

In today's global market, companies are facing unprecedented levels of uncertainties in supply, demand and in the economic environment. A critical issue for companies to survive increasing competition is to monitor the changing business environment and manage disturbances and changes in real time. In this dissertation, an integrated framework is…

In today's global market, companies are facing unprecedented levels of uncertainties in supply, demand and in the economic environment. A critical issue for companies to survive increasing competition is to monitor the changing business environment and manage disturbances and changes in real time. In this dissertation, an integrated framework is proposed using simulation and online calibration methods to enable the adaptive management of large-scale complex supply chain systems. The design, implementation and verification of the integrated approach are studied in this dissertation. The research contributions are two-fold. First, this work enriches symbiotic simulation methodology by proposing a framework of simulation and advanced data fusion methods to improve simulation accuracy. Data fusion techniques optimally calibrate the simulation state/parameters by considering errors in both the simulation models and in measurements of the real-world system. Data fusion methods - Kalman Filtering, Extended Kalman Filtering, and Ensemble Kalman Filtering - are examined and discussed under varied conditions of system chaotic levels, data quality and data availability. Second, the proposed framework is developed, validated and demonstrated in `proof-of-concept' case studies on representative supply chain problems. In the case study of a simplified supply chain system, Kalman Filtering is applied to fuse simulation data and emulation data to effectively improve the accuracy of the detection of abnormalities. In the case study of the `beer game' supply chain model, the system's chaotic level is identified as a key factor to influence simulation performance and the choice of data fusion method. Ensemble Kalman Filtering is found more robust than Extended Kalman Filtering in a highly chaotic system. With appropriate tuning, the improvement of simulation accuracy is up to 80% in a chaotic system, and 60% in a stable system. In the last study, the integrated framework is applied to adaptive inventory control of a multi-echelon supply chain with non-stationary demand. It is worth pointing out that the framework proposed in this dissertation is not only useful in supply chain management, but also suitable to model other complex dynamic systems, such as healthcare delivery systems and energy consumption networks.

ContributorsWang, Shanshan (Author) / Wu, Teresa (Thesis advisor) / Fowler, John (Thesis advisor) / Pfund, Michele (Committee member) / Li, Jing (Committee member) / Pavlicek, William (Committee member) / Arizona State University (Publisher)

Created2010

Merging Economics and Epidemiology to Improve the Prediction and Management of Infectious Disease

Description

Mathematical epidemiology, one of the oldest and richest areas in mathematical biology, has significantly enhanced our understanding of how pathogens emerge, evolve, and spread. Classical epidemiological models, the standard for predicting and managing the spread of infectious disease, assume that contacts between susceptible and infectious individuals depend on their relative…

Mathematical epidemiology, one of the oldest and richest areas in mathematical biology, has significantly enhanced our understanding of how pathogens emerge, evolve, and spread. Classical epidemiological models, the standard for predicting and managing the spread of infectious disease, assume that contacts between susceptible and infectious individuals depend on their relative frequency in the population. The behavioral factors that underpin contact rates are not generally addressed. There is, however, an emerging a class of models that addresses the feedbacks between infectious disease dynamics and the behavioral decisions driving host contact. Referred to as “economic epidemiology” or “epidemiological economics,” the approach explores the determinants of decisions about the number and type of contacts made by individuals, using insights and methods from economics. We show how the approach has the potential both to improve predictions of the course of infectious disease, and to support development of novel approaches to infectious disease management.

ContributorsPerrings, Charles (Author) / Castillo-Chavez, Carlos (Author) / Chowell-Puente, Gerardo (Author) / Daszak, Peter (Author) / Fenichel, Eli P. (Author) / Finnoff, David (Author) / Horan, Richard D. (Author) / Kilpatrick, A. Marm (Author) / Kinzig, Ann (Author) / Kuminoff, Nicolai (Author) / Levin, Simon (Author) / Morin, Benjamin (Author) / Smith, Katherine F. (Author) / Springborn, Michael (Author) / Simon M. Levin Mathematical, Computational and Modeling Sciences Center (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / W.P. Carey School of Business (Contributor) / Economics (Contributor) / Julie Ann Wrigley Global Institute of Sustainability (Contributor)

Created2015-12-01

Resource Consumption, Sustainability, and Cancer

Description

Preserving a system’s viability in the presence of diversity erosion is critical if the goal is to sustainably support biodiversity. Reduction in population heterogeneity, whether inter- or intraspecies, may increase population fragility, either decreasing its ability to adapt effectively to environmental changes or facilitating the survival and success of ordinarily…

Preserving a system’s viability in the presence of diversity erosion is critical if the goal is to sustainably support biodiversity. Reduction in population heterogeneity, whether inter- or intraspecies, may increase population fragility, either decreasing its ability to adapt effectively to environmental changes or facilitating the survival and success of ordinarily rare phenotypes. The latter may result in over-representation of individuals who may participate in resource utilization patterns that can lead to over-exploitation, exhaustion, and, ultimately, collapse of both the resource and the population that depends on it. Here, we aim to identify regimes that can signal whether a consumer–resource system is capable of supporting viable degrees of heterogeneity. The framework used here is an expansion of a previously introduced consumer–resource type system of a population of individuals classified by their resource consumption. Application of the Reduction Theorem to the system enables us to evaluate the health of the system through tracking both the mean value of the parameter of resource (over)consumption, and the population variance, as both change over time. The article concludes with a discussion that highlights applicability of the proposed system to investigation of systems that are affected by particularly devastating overly adapted populations, namely cancerous cells. Potential intervention approaches for system management are discussed in the context of cancer therapies.

ContributorsKareva, Irina (Author) / Morin, Benjamin (Author) / Castillo-Chavez, Carlos (Author) / Simon M. Levin Mathematical, Computational and Modeling Sciences Center (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / Julie Ann Wrigley Global Institute of Sustainability (Contributor)

Created2015-02-01

Mass Media and the Contagion of Fear: The Case of Ebola in America

Description

Background
In the weeks following the first imported case of Ebola in the U. S. on September 29, 2014, coverage of the very limited outbreak dominated the news media, in a manner quite disproportionate to the actual threat to national public health; by the end of October, 2014, there were only…

Background
In the weeks following the first imported case of Ebola in the U. S. on September 29, 2014, coverage of the very limited outbreak dominated the news media, in a manner quite disproportionate to the actual threat to national public health; by the end of October, 2014, there were only four laboratory confirmed cases of Ebola in the entire nation. Public interest in these events was high, as reflected in the millions of Ebola-related Internet searches and tweets performed in the month following the first confirmed case. Use of trending Internet searches and tweets has been proposed in the past for real-time prediction of outbreaks (a field referred to as “digital epidemiology”), but accounting for the biases of public panic has been problematic. In the case of the limited U. S. Ebola outbreak, we know that the Ebola-related searches and tweets originating the U. S. during the outbreak were due only to public interest or panic, providing an unprecedented means to determine how these dynamics affect such data, and how news media may be driving these trends.
Methodology
We examine daily Ebola-related Internet search and Twitter data in the U. S. during the six week period ending Oct 31, 2014. TV news coverage data were obtained from the daily number of Ebola-related news videos appearing on two major news networks. We fit the parameters of a mathematical contagion model to the data to determine if the news coverage was a significant factor in the temporal patterns in Ebola-related Internet and Twitter data.
Conclusions
We find significant evidence of contagion, with each Ebola-related news video inspiring tens of thousands of Ebola-related tweets and Internet searches. Between 65% to 76% of the variance in all samples is described by the news media contagion model.

ContributorsTowers, Sherry (Author) / Afzal, Shehzad (Author) / Bernal, Gilbert (Author) / Bliss, Nadya (Author) / Brown, Shala (Author) / Espinoza, Baltazar (Author) / Jackson, Jasmine (Author) / Judson-Garcia, Julia (Author) / Khan, Maryam (Author) / Lin, Michael (Author) / Mamada, Robert (Author) / Moreno, Victor (Author) / Nazari, Fereshteh (Author) / Okuneye, Kamaldeen (Author) / Ross, Mary (Author) / Rodriguez, Claudia (Author) / Medlock, Jan (Author) / Ebert, David (Author) / Castillo-Chavez, Carlos (Author) / Simon M. Levin Mathematical, Computational and Modeling Sciences Center (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / Mary Lou Fulton Teachers College (Contributor) / Educational Leadership and Innovation (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Global Security Initiative (Contributor)

Created2015-06-11

Did Modeling Overestimate the Transmission Potential of Pandemic (H1N1-2009)? Sample Size Estimation for Post-Epidemic Seroepidemiological Studies

Description

Background
Seroepidemiological studies before and after the epidemic wave of H1N1-2009 are useful for estimating population attack rates with a potential to validate early estimates of the reproduction number, R, in modeling studies.
Methodology/Principal Findings
Since the final epidemic size, the proportion of individuals in a population who become infected during an epidemic,…

Background
Seroepidemiological studies before and after the epidemic wave of H1N1-2009 are useful for estimating population attack rates with a potential to validate early estimates of the reproduction number, R, in modeling studies.
Methodology/Principal Findings
Since the final epidemic size, the proportion of individuals in a population who become infected during an epidemic, is not the result of a binomial sampling process because infection events are not independent of each other, we propose the use of an asymptotic distribution of the final size to compute approximate 95% confidence intervals of the observed final size. This allows the comparison of the observed final sizes against predictions based on the modeling study (R = 1.15, 1.40 and 1.90), which also yields simple formulae for determining sample sizes for future seroepidemiological studies. We examine a total of eleven published seroepidemiological studies of H1N1-2009 that took place after observing the peak incidence in a number of countries. Observed seropositive proportions in six studies appear to be smaller than that predicted from R = 1.40; four of the six studies sampled serum less than one month after the reported peak incidence. The comparison of the observed final sizes against R = 1.15 and 1.90 reveals that all eleven studies appear not to be significantly deviating from the prediction with R = 1.15, but final sizes in nine studies indicate overestimation if the value R = 1.90 is used.
Conclusions
Sample sizes of published seroepidemiological studies were too small to assess the validity of model predictions except when R = 1.90 was used. We recommend the use of the proposed approach in determining the sample size of post-epidemic seroepidemiological studies, calculating the 95% confidence interval of observed final size, and conducting relevant hypothesis testing instead of the use of methods that rely on a binomial proportion.

ContributorsNishiura, Hiroshi (Author) / Chowell-Puente, Gerardo (Author) / Castillo-Chavez, Carlos (Author) / Simon M. Levin Mathematical, Computational and Modeling Sciences Center (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor)

Created2011-03-24