Search Content

Dynamics and implications of data-based disease models in public health and agriculture

Description

The increased number of novel pathogens that potentially threaten the human population has motivated the development of mathematical and computational modeling approaches for forecasting epidemic impact and understanding key environmental characteristics that influence the spread of diseases. Yet, in the case that substantial uncertainty surrounds the transmission process during a…

The increased number of novel pathogens that potentially threaten the human population has motivated the development of mathematical and computational modeling approaches for forecasting epidemic impact and understanding key environmental characteristics that influence the spread of diseases. Yet, in the case that substantial uncertainty surrounds the transmission process during a rapidly developing infectious disease outbreak, complex mechanistic models may be too difficult to be calibrated quick enough for policy makers to make informed decisions. Simple phenomenological models that rely on a small number of parameters can provide an initial platform for assessing the epidemic trajectory, estimating the reproduction number and quantifying the disease burden from the early epidemic phase.

Chapter 1 provides background information and motivation for infectious disease forecasting and outlines the rest of the thesis.

In chapter 2, logistic patch models are used to assess and forecast the 2013-2015 West Africa Zaire ebolavirus epidemic. In particular, this chapter is concerned with comparing and contrasting the effects that spatial heterogeneity has on the forecasting performance of the cumulative infected case counts reported during the epidemic.

In chapter 3, two simple phenomenological models inspired from population biology are used to assess the Research and Policy for Infectious Disease Dynamics (RAPIDD) Ebola Challenge; a simulated epidemic that generated 4 infectious disease scenarios. Because of the nature of the synthetically generated data, model predictions are compared to exact epidemiological quantities used in the simulation.

In chapter 4, these models are applied to the 1904 Plague epidemic that occurred in Bombay. This chapter provides evidence that these simple models may be applicable to infectious diseases no matter the disease transmission mechanism.

Chapter 5, uses the patch models from chapter 2 to explore how migration in the 1904 Plague epidemic changes the final epidemic size.

The final chapter is an interdisciplinary project concerning within-host dynamics of cereal yellow dwarf virus-RPV, a plant pathogen from a virus group that infects over 150 grass species. Motivated by environmental nutrient enrichment due to anthropological activities, mathematical models are employed to investigate the relevance of resource competition to pathogen and host dynamics.

ContributorsPell, Bruce (Author) / Kuang, Yang (Thesis advisor) / Chowell-Puente, Gerardo (Committee member) / Nagy, John (Committee member) / Kostelich, Eric (Committee member) / Gardner, Carl (Committee member) / Arizona State University (Publisher)

Created2016

Gene Network Inference via Sequence Alignment and Rectification

Description

While techniques for reading DNA in some capacity has been possible for decades,

the ability to accurately edit genomes at scale has remained elusive. Novel techniques

have been introduced recently to aid in the writing of DNA sequences. While writing

DNA is more accessible, it still remains expensive, justifying the increased interest in

in…

While techniques for reading DNA in some capacity has been possible for decades,

the ability to accurately edit genomes at scale has remained elusive. Novel techniques

have been introduced recently to aid in the writing of DNA sequences. While writing

DNA is more accessible, it still remains expensive, justifying the increased interest in

in silico predictions of cell behavior. In order to accurately predict the behavior of

cells it is necessary to extensively model the cell environment, including gene-to-gene

interactions as completely as possible.

Significant algorithmic advances have been made for identifying these interactions,

but despite these improvements current techniques fail to infer some edges, and

fail to capture some complexities in the network. Much of this limitation is due to

heavily underdetermined problems, whereby tens of thousands of variables are to be

inferred using datasets with the power to resolve only a small fraction of the variables.

Additionally, failure to correctly resolve gene isoforms using short reads contributes

significantly to noise in gene quantification measures.

This dissertation introduces novel mathematical models, machine learning techniques,

and biological techniques to solve the problems described above. Mathematical

models are proposed for simulation of gene network motifs, and raw read simulation.

Machine learning techniques are shown for DNA sequence matching, and DNA

sequence correction.

Results provide novel insights into the low level functionality of gene networks. Also

shown is the ability to use normalization techniques to aggregate data for gene network

inference leading to larger data sets while minimizing increases in inter-experimental

noise. Results also demonstrate that high error rates experienced by third generation

sequencing are significantly different than previous error profiles, and that these errors can be modeled, simulated, and rectified. Finally, techniques are provided for amending this DNA error that preserve the benefits of third generation sequencing.

ContributorsFaucon, Philippe Christophe (Author) / Liu, Huan (Thesis advisor) / Wang, Xiao (Committee member) / Crook, Sharon M (Committee member) / Wang, Yalin (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2017

Deep Learning based Classification of FDG-PET Data for Alzheimer's Disease

Description

Alzheimer’s Disease (AD), a neurodegenerative disease is a progressive disease that affects the brain gradually with time and worsens. Reliable and early diagnosis of AD and its prodromal stages (i.e. Mild Cognitive Impairment(MCI)) is essential. Fluorodeoxyglucose (FDG) positron emission tomography (PET) measures the decline in the regional cerebral metabolic rate…

Alzheimer’s Disease (AD), a neurodegenerative disease is a progressive disease that affects the brain gradually with time and worsens. Reliable and early diagnosis of AD and its prodromal stages (i.e. Mild Cognitive Impairment(MCI)) is essential. Fluorodeoxyglucose (FDG) positron emission tomography (PET) measures the decline in the regional cerebral metabolic rate for glucose, offering a reliable metabolic biomarker even on presymptomatic AD patients. PET scans provide functional information that is unique and unavailable using other types of imaging. The computational efficacy of FDG-PET data alone, for the classification of various Alzheimer’s Diagnostic categories (AD, MCI (LMCI, EMCI), Control) has not been studied. This serves as motivation to correctly classify the various diagnostic categories using FDG-PET data. Deep learning has recently been applied to the analysis of structural and functional brain imaging data. This thesis is an introduction to a deep learning based classification technique using neural networks with dimensionality reduction techniques to classify the different stages of AD based on FDG-PET image analysis.

This thesis develops a classification method to investigate the performance of FDG-PET as an effective biomarker for Alzheimer's clinical group classification. This involves dimensionality reduction using Probabilistic Principal Component Analysis on max-pooled data and mean-pooled data, followed by a Multilayer Feed Forward Neural Network which performs binary classification. Max pooled features result into better classification performance compared to results on mean pooled features. Additionally, experiments are done to investigate if the addition of important demographic features such as Functional Activities Questionnaire(FAQ), gene information helps improve performance. Classification results indicate that our designed classifiers achieve competitive results, and better with the additional of demographic features.

ContributorsSingh, Shibani (Author) / Wang, Yalin (Thesis advisor) / Li, Baoxin (Committee member) / Liang, Jianming (Committee member) / Arizona State University (Publisher)

Created2017

Scaling Up Large-scale Sparse Learning and Its Application to Medical Imaging

Description

Large-scale $\ell_1$-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy…

Large-scale $\ell_1$-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to scaling up the optimization problem in parallel. Parallel solvers run multiple cores on a shared memory system or a distributed environment to speed up the computation, while the practical usage is limited by the huge dimension in the feature space and synchronization problems.

In this dissertation, I carry out the research along the direction with particular focuses on scaling up the optimization of sparse learning for supervised and unsupervised learning problems. For the supervised learning, I firstly propose an asynchronous parallel solver to optimize the large-scale sparse learning model in a multithreading environment. Moreover, I propose a distributed framework to conduct the learning process when the dataset is distributed stored among different machines. Then the proposed model is further extended to the studies of risk genetic factors for Alzheimer's Disease (AD) among different research institutions, integrating a group feature selection framework to rank the top risk SNPs for AD. For the unsupervised learning problem, I propose a highly efficient solver, termed Stochastic Coordinate Coding (SCC), scaling up the optimization of dictionary learning and sparse coding problems. The common issue for the medical imaging research is that the longitudinal features of patients among different time points are beneficial to study together. To further improve the dictionary learning model, I propose a multi-task dictionary learning method, learning the different task simultaneously and utilizing shared and individual dictionary to encode both consistent and changing imaging features.

ContributorsLi, Qingyang (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Thesis advisor) / He, Jingrui (Committee member) / Wang, Yalin (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2017

The Fusion of Multimodal Brain Imaging Data from Geometry Perspectives

Description

The rapid development in acquiring multimodal neuroimaging data provides opportunities to systematically characterize human brain structures and functions. For example, in the brain magnetic resonance imaging (MRI), a typical non-invasive imaging technique, different acquisition sequences (modalities) lead to the different descriptions of brain functional activities, or anatomical biomarkers. Nowadays, in…

The rapid development in acquiring multimodal neuroimaging data provides opportunities to systematically characterize human brain structures and functions. For example, in the brain magnetic resonance imaging (MRI), a typical non-invasive imaging technique, different acquisition sequences (modalities) lead to the different descriptions of brain functional activities, or anatomical biomarkers. Nowadays, in addition to the traditional voxel-level analysis of images, there is a trend to process and investigate the cross-modality relationship in a high dimensional level of images, e.g. surfaces and networks.

In this study, I aim to achieve multimodal brain image fusion by referring to some intrinsic properties of data, e.g. geometry of embedding structures where the commonly used image features reside. Since the image features investigated in this study share an identical embedding space, i.e. either defined on a brain surface or brain atlas, where a graph structure is easy to define, it is straightforward to consider the mathematically meaningful properties of the shared structures from the geometry perspective.

I first introduce the background of multimodal fusion of brain image data and insights of geometric properties playing a potential role to link different modalities. Then, several proposed computational frameworks either using the solid and efficient geometric algorithms or current geometric deep learning models are be fully discussed. I show how these designed frameworks deal with distinct geometric properties respectively, and their applications in the real healthcare scenarios, e.g. to enhanced detections of fetal brain diseases or abnormal brain development.

ContributorsZhang, Wen (Author) / Wang, Yalin (Thesis advisor) / Liu, Huan (Committee member) / Li, Baoxin (Committee member) / Braden, B. Blair (Committee member) / Arizona State University (Publisher)

Created2020

Transportation Techniques for Geometric Clustering

Description

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is…

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing.

ContributorsMi, Liang (Author) / Wang, Yalin (Thesis advisor) / Chen, Kewei (Committee member) / Karam, Lina (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2020

Super-resolution for Natural Images and Magnetic Resonance Images

Description

Image super-resolution (SR) is a low-level image processing task, which has manyapplications such as medical imaging, satellite image processing, and video enhancement,
etc. Given a low resolution image, it aims to reconstruct a high resolution
image. The problem is ill-posed since there can be more than one high resolution
image corresponding to the…

Image super-resolution (SR) is a low-level image processing task, which has manyapplications such as medical imaging, satellite image processing, and video enhancement,
etc. Given a low resolution image, it aims to reconstruct a high resolution
image. The problem is ill-posed since there can be more than one high resolution
image corresponding to the same low-resolution image. To address this problem, a
number of machine learning-based approaches have been proposed.
In this dissertation, I present my works on single image super-resolution (SISR)
and accelerated magnetic resonance imaging (MRI) (a.k.a. super-resolution on MR
images), followed by the investigation on transfer learning for accelerated MRI reconstruction.
For the SISR, a dictionary-based approach and two reconstruction based
approaches are presented. To be precise, a convex dictionary learning (CDL)
algorithm is proposed by constraining the dictionary atoms to be formed by nonnegative
linear combination of the training data, which is a natural, desired property.
Also, two reconstruction-based single methods are presented, which make use
of (i)the joint regularization, where a group-residual-based regularization (GRR) and
a ridge-regression-based regularization (3R) are combined; (ii)the collaborative representation
and non-local self-similarity. After that, two deep learning approaches
are proposed, aiming at reconstructing high-quality images from accelerated MRI
acquisition. Residual Dense Block (RDB) and feedback connection are introduced
in the proposed models. In the last chapter, the feasibility of transfer learning for
accelerated MRI reconstruction is discussed.

ContributorsDing, Pak Lun Kevin (Author) / Li, Baoxin (Thesis advisor) / Wu, Teresa (Committee member) / Wang, Yalin (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

Towards Robust Machine Learning Models for Data Scarcity

Description

Recently, a well-designed and well-trained neural network can yield state-of-the-art results across many domains, including data mining, computer vision, and medical image analysis. But progress has been limited for tasks where labels are difficult or impossible to obtain. This reliance on exhaustive labeling is a critical limitation in the rapid…

Recently, a well-designed and well-trained neural network can yield state-of-the-art results across many domains, including data mining, computer vision, and medical image analysis. But progress has been limited for tasks where labels are difficult or impossible to obtain. This reliance on exhaustive labeling is a critical limitation in the rapid deployment of neural networks. Besides, the current research scales poorly to a large number of unseen concepts and is passively spoon-fed with data and supervision.

To overcome the above data scarcity and generalization issues, in my dissertation, I first propose two unsupervised conventional machine learning algorithms, hyperbolic stochastic coding, and multi-resemble multi-target low-rank coding, to solve the incomplete data and missing label problem. I further introduce a deep multi-domain adaptation network to leverage the power of deep learning by transferring the rich knowledge from a large-amount labeled source dataset. I also invent a novel time-sequence dynamically hierarchical network that adaptively simplifies the network to cope with the scarce data.

To learn a large number of unseen concepts, lifelong machine learning enjoys many advantages, including abstracting knowledge from prior learning and using the experience to help future learning, regardless of how much data is currently available. Incorporating this capability and making it versatile, I propose deep multi-task weight consolidation to accumulate knowledge continuously and significantly reduce data requirements in a variety of domains. Inspired by the recent breakthroughs in automatically learning suitable neural network architectures (AutoML), I develop a nonexpansive AutoML framework to train an online model without the abundance of labeled data. This work automatically expands the network to increase model capability when necessary, then compresses the model to maintain the model efficiency.

In my current ongoing work, I propose an alternative method of supervised learning that does not require direct labels. This could utilize various supervision from an image/object as a target value for supervising the target tasks without labels, and it turns out to be surprisingly effective. The proposed method only requires few-shot labeled data to train, and can self-supervised learn the information it needs and generalize to datasets not seen during training.

ContributorsZhang, Jie (Author) / Wang, Yalin (Thesis advisor) / Liu, Huan (Committee member) / Stonnington, Cynthia (Committee member) / Liang, Jianming (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

The Geological Nature of Dark Material on Vesta and Implications for the Subsurface Structure

Description

Deposits of dark material appear on Vesta’s surface as features of relatively low-albedo in the visible wavelength range of Dawn’s camera and spectrometer. Mixed with the regolith and partially excavated by younger impacts, the material is exposed as individual layered outcrops in crater walls or ejecta patches, having been uncovered…

Deposits of dark material appear on Vesta’s surface as features of relatively low-albedo in the visible wavelength range of Dawn’s camera and spectrometer. Mixed with the regolith and partially excavated by younger impacts, the material is exposed as individual layered outcrops in crater walls or ejecta patches, having been uncovered and broken up by the impact. Dark fans on crater walls and dark deposits on crater floors are the result of gravity-driven mass wasting triggered by steep slopes and impact seismicity. The fact that dark material is mixed with impact ejecta indicates that it has been processed together with the ejected material. Some small craters display continuous dark ejecta similar to lunar dark-halo impact craters, indicating that the impact excavated the material from beneath a higher-albedo surface. The asymmetric distribution of dark material in impact craters and ejecta suggests non-continuous distribution in the local subsurface. Some positive-relief dark edifices appear to be impact-sculpted hills with dark material distributed over the hill slopes.

Dark features inside and outside of craters are in some places arranged as linear outcrops along scarps or as dark streaks perpendicular to the local topography. The spectral characteristics of the dark material resemble that of Vesta’s regolith. Dark material is distributed unevenly across Vesta’s surface with clusters of all types of dark material exposures. On a local scale, some craters expose or are associated with dark material, while others in the immediate vicinity do not show evidence for dark material. While the variety of surface exposures of dark material and their different geological correlations with surface features, as well as their uneven distribution, indicate a globally inhomogeneous distribution in the subsurface, the dark material seems to be correlated with the rim and ejecta of the older Veneneia south polar basin structure. The origin of the dark material is still being debated, however, the geological analysis suggests that it is exogenic, from carbon-rich low-velocity impactors, rather than endogenic, from freshly exposed mafic material or melt, exposed or created by impacts.

ContributorsJaumann, R. (Author) / Nass, A. (Author) / Otto, K. (Author) / Krohn, K. (Author) / Stephan, K. (Author) / McCord, T. B. (Author) / Williams, David (Author) / Raymond, C. A. (Author) / Blewett, D. T. (Author) / Hiesinger, H. (Author) / Yingst, R. A. (Author) / De Sanctis, M. C. (Author) / Palomba, E. (Author) / Roatsch, T. (Author) / Matz, K-D. (Author) / Preusker, F. (Author) / Scholten, F. (Author) / Russell, C. T. (Author) / College of Liberal Arts and Sciences (Contributor)

Created2014-09-15

Genetic Influence of Apolipoprotein E4 Genotype on Hippocampal Morphometry: An N=725 Surface-Based Alzheimer's Disease Neuroimaging Initiative Study

Description

The apolipoprotein E (APOE) e4 allele is the most prevalent genetic risk factor for Alzheimer's disease (AD). Hippocampal volumes are generally smaller in AD patients carrying the e4 allele compared to e4 noncarriers. Here we examined the effect of APOE e4 on hippocampal morphometry in a large imaging database—the Alzheimer's…

The apolipoprotein E (APOE) e4 allele is the most prevalent genetic risk factor for Alzheimer's disease (AD). Hippocampal volumes are generally smaller in AD patients carrying the e4 allele compared to e4 noncarriers. Here we examined the effect of APOE e4 on hippocampal morphometry in a large imaging database—the Alzheimer's Disease Neuroimaging Initiative (ADNI). We automatically segmented and constructed hippocampal surfaces from the baseline MR images of 725 subjects with known APOE genotype information including 167 with AD, 354 with mild cognitive impairment (MCI), and 204 normal controls. High-order correspondences between hippocampal surfaces were enforced across subjects with a novel inverse consistent surface fluid registration method. Multivariate statistics consisting of multivariate tensor-based morphometry (mTBM) and radial distance were computed for surface deformation analysis. Using Hotelling's T2 test, we found significant morphological deformation in APOE e4 carriers relative to noncarriers in the entire cohort as well as in the nondemented (pooled MCI and control) subjects, affecting the left hippocampus more than the right, and this effect was more pronounced in e4 homozygotes than heterozygotes. Our findings are consistent with previous studies that showed e4 carriers exhibit accelerated hippocampal atrophy; we extend these findings to a novel measure of hippocampal morphometry. Hippocampal morphometry has significant potential as an imaging biomarker of early stage AD.

ContributorsShi, Jie (Author) / Lepore, Natasha (Author) / Gutman, Boris A. (Author) / Thompson, Paul M. (Author) / Baxter, Leslie C. (Author) / Caselli, Richard J. (Author) / Wang, Yalin (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2014-08-01

Filtering by