Search Content

Applying distributional approaches to understand patterns of urban differentiation

Description

Urban scaling analysis has introduced a new scientific paradigm to the study of cities. With it, the notions of size, heterogeneity and structure have taken a leading role. These notions are assumed to be behind the causes for why cities differ from one another, sometimes wildly. However, the mechanisms by…

Urban scaling analysis has introduced a new scientific paradigm to the study of cities. With it, the notions of size, heterogeneity and structure have taken a leading role. These notions are assumed to be behind the causes for why cities differ from one another, sometimes wildly. However, the mechanisms by which size, heterogeneity and structure shape the general statistical patterns that describe urban economic output are still unclear. Given the rapid rate of urbanization around the globe, we need precise and formal mathematical understandings of these matters. In this context, I perform in this dissertation probabilistic, distributional and computational explorations of (i) how the broadness, or narrowness, of the distribution of individual productivities within cities determines what and how we measure urban systemic output, (ii) how urban scaling may be expressed as a statistical statement when urban metrics display strong stochasticity, (iii) how the processes of aggregation constrain the variability of total urban output, and (iv) how the structure of urban skills diversification within cities induces a multiplicative process in the production of urban output.

ContributorsGómez-Liévano, Andrés (Author) / Lobo, Jose (Thesis advisor) / Muneepeerakul, Rachata (Thesis advisor) / Bettencourt, Luis M. A. (Committee member) / Chowell-Puente, Gerardo (Committee member) / Arizona State University (Publisher)

Created2014

Graph-based sparse learning: models, algorithms, and applications

Description

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been shown to be powerful for improving the performance of sparse…

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been shown to be powerful for improving the performance of sparse learning models. A graph is a fundamental way to represent structural information of features. This dissertation focuses on graph-based sparse learning. The first part of this dissertation aims to integrate a graph into sparse learning to improve the performance. Specifically, the problem of feature grouping and selection over a given undirected graph is considered. Three models are proposed along with efficient solvers to achieve simultaneous feature grouping and selection, enhancing estimation accuracy. One major challenge is that it is still computationally challenging to solve large scale graph-based sparse learning problems. An efficient, scalable, and parallel algorithm for one widely used graph-based sparse learning approach, called anisotropic total variation regularization is therefore proposed, by explicitly exploring the structure of a graph. The second part of this dissertation focuses on uncovering the graph structure from the data. Two issues in graphical modeling are considered. One is the joint estimation of multiple graphical models using a fused lasso penalty and the other is the estimation of hierarchical graphical models. The key technical contribution is to establish the necessary and sufficient condition for the graphs to be decomposable. Based on this key property, a simple screening rule is presented, which reduces the size of the optimization problem, dramatically reducing the computational cost.

ContributorsYang, Sen (Author) / Ye, Jieping (Thesis advisor) / Wonka, Peter (Thesis advisor) / Wang, Yalin (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2014

Dynamics and implications of data-based disease models in public health and agriculture

Description

The increased number of novel pathogens that potentially threaten the human population has motivated the development of mathematical and computational modeling approaches for forecasting epidemic impact and understanding key environmental characteristics that influence the spread of diseases. Yet, in the case that substantial uncertainty surrounds the transmission process during a…

The increased number of novel pathogens that potentially threaten the human population has motivated the development of mathematical and computational modeling approaches for forecasting epidemic impact and understanding key environmental characteristics that influence the spread of diseases. Yet, in the case that substantial uncertainty surrounds the transmission process during a rapidly developing infectious disease outbreak, complex mechanistic models may be too difficult to be calibrated quick enough for policy makers to make informed decisions. Simple phenomenological models that rely on a small number of parameters can provide an initial platform for assessing the epidemic trajectory, estimating the reproduction number and quantifying the disease burden from the early epidemic phase.

Chapter 1 provides background information and motivation for infectious disease forecasting and outlines the rest of the thesis.

In chapter 2, logistic patch models are used to assess and forecast the 2013-2015 West Africa Zaire ebolavirus epidemic. In particular, this chapter is concerned with comparing and contrasting the effects that spatial heterogeneity has on the forecasting performance of the cumulative infected case counts reported during the epidemic.

In chapter 3, two simple phenomenological models inspired from population biology are used to assess the Research and Policy for Infectious Disease Dynamics (RAPIDD) Ebola Challenge; a simulated epidemic that generated 4 infectious disease scenarios. Because of the nature of the synthetically generated data, model predictions are compared to exact epidemiological quantities used in the simulation.

In chapter 4, these models are applied to the 1904 Plague epidemic that occurred in Bombay. This chapter provides evidence that these simple models may be applicable to infectious diseases no matter the disease transmission mechanism.

Chapter 5, uses the patch models from chapter 2 to explore how migration in the 1904 Plague epidemic changes the final epidemic size.

The final chapter is an interdisciplinary project concerning within-host dynamics of cereal yellow dwarf virus-RPV, a plant pathogen from a virus group that infects over 150 grass species. Motivated by environmental nutrient enrichment due to anthropological activities, mathematical models are employed to investigate the relevance of resource competition to pathogen and host dynamics.

ContributorsPell, Bruce (Author) / Kuang, Yang (Thesis advisor) / Chowell-Puente, Gerardo (Committee member) / Nagy, John (Committee member) / Kostelich, Eric (Committee member) / Gardner, Carl (Committee member) / Arizona State University (Publisher)

Created2016

Evaluating the effectiveness of tree locations and arrangements for improving urban thermal environment

Description

Trees serve as a natural umbrella to mitigate insolation absorbed by features of the urban environment, especially building structures and pavements. For a desert community, trees are a particularly valuable asset because they contribute to energy conservation efforts, improve home values, allow for cost savings, and promote enhanced health and…

Trees serve as a natural umbrella to mitigate insolation absorbed by features of the urban environment, especially building structures and pavements. For a desert community, trees are a particularly valuable asset because they contribute to energy conservation efforts, improve home values, allow for cost savings, and promote enhanced health and well-being. The main obstacle in creating a sustainable urban community in a desert city with trees is the scarceness and cost of irrigation water. Thus, strategically located and arranged desert trees with the fewest tree numbers possible potentially translate into significant energy, water and long-term cost savings as well as conservation, economic, and health benefits. The objective of this dissertation is to achieve this research goal with integrated methods from both theoretical and empirical perspectives.

This dissertation includes three main parts. The first part proposes a spatial optimization method to optimize the tree locations with the objective to maximize shade coverage on building facades and open structures and minimize shade coverage on building rooftops in a 3-dimensional environment. Second, an outdoor urban physical scale model with field measurement is presented to understand the cooling and locational benefits of tree shade. The third part implements a microclimate numerical simulation model to analyze how the specific tree locations and arrangements influence outdoor microclimates and improve human thermal comfort. These three parts of the dissertation attempt to fill the research gap of how to strategically locate trees at the building to neighborhood scale, and quantifying the impact of such arrangements.

Results highlight the significance of arranging residential shade trees across different geographical scales. In both the building and neighborhood scales, research results recommend that trees should be arranged in the central part of the building south front yard. More cooling benefits are provided to the building structures and outdoor microclimates with a cluster tree arrangement without canopy overlap; however, if residents are interested in creating a better outdoor thermal environment, open space between trees is needed to enhance the wind environment for better human thermal comfort. Considering the rapid urbanization process, limited water resources supply, and the severe heat stress in the urban areas, judicious design and planning of trees is of increasing importance for improving the life quality and sustaining the urban environment.

ContributorsZhao, Qunshan (Author) / Wentz, Elizabeth (Thesis advisor) / Sailor, David (Committee member) / Wang, Zhi-Hua (Committee member) / Arizona State University (Publisher)

Created2017

Gene Network Inference via Sequence Alignment and Rectification

Description

While techniques for reading DNA in some capacity has been possible for decades,

the ability to accurately edit genomes at scale has remained elusive. Novel techniques

have been introduced recently to aid in the writing of DNA sequences. While writing

DNA is more accessible, it still remains expensive, justifying the increased interest in

in…

While techniques for reading DNA in some capacity has been possible for decades,

the ability to accurately edit genomes at scale has remained elusive. Novel techniques

have been introduced recently to aid in the writing of DNA sequences. While writing

DNA is more accessible, it still remains expensive, justifying the increased interest in

in silico predictions of cell behavior. In order to accurately predict the behavior of

cells it is necessary to extensively model the cell environment, including gene-to-gene

interactions as completely as possible.

Significant algorithmic advances have been made for identifying these interactions,

but despite these improvements current techniques fail to infer some edges, and

fail to capture some complexities in the network. Much of this limitation is due to

heavily underdetermined problems, whereby tens of thousands of variables are to be

inferred using datasets with the power to resolve only a small fraction of the variables.

Additionally, failure to correctly resolve gene isoforms using short reads contributes

significantly to noise in gene quantification measures.

This dissertation introduces novel mathematical models, machine learning techniques,

and biological techniques to solve the problems described above. Mathematical

models are proposed for simulation of gene network motifs, and raw read simulation.

Machine learning techniques are shown for DNA sequence matching, and DNA

sequence correction.

Results provide novel insights into the low level functionality of gene networks. Also

shown is the ability to use normalization techniques to aggregate data for gene network

inference leading to larger data sets while minimizing increases in inter-experimental

noise. Results also demonstrate that high error rates experienced by third generation

sequencing are significantly different than previous error profiles, and that these errors can be modeled, simulated, and rectified. Finally, techniques are provided for amending this DNA error that preserve the benefits of third generation sequencing.

ContributorsFaucon, Philippe Christophe (Author) / Liu, Huan (Thesis advisor) / Wang, Xiao (Committee member) / Crook, Sharon M (Committee member) / Wang, Yalin (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2017

Deep Learning based Classification of FDG-PET Data for Alzheimer's Disease

Description

Alzheimer’s Disease (AD), a neurodegenerative disease is a progressive disease that affects the brain gradually with time and worsens. Reliable and early diagnosis of AD and its prodromal stages (i.e. Mild Cognitive Impairment(MCI)) is essential. Fluorodeoxyglucose (FDG) positron emission tomography (PET) measures the decline in the regional cerebral metabolic rate…

Alzheimer’s Disease (AD), a neurodegenerative disease is a progressive disease that affects the brain gradually with time and worsens. Reliable and early diagnosis of AD and its prodromal stages (i.e. Mild Cognitive Impairment(MCI)) is essential. Fluorodeoxyglucose (FDG) positron emission tomography (PET) measures the decline in the regional cerebral metabolic rate for glucose, offering a reliable metabolic biomarker even on presymptomatic AD patients. PET scans provide functional information that is unique and unavailable using other types of imaging. The computational efficacy of FDG-PET data alone, for the classification of various Alzheimer’s Diagnostic categories (AD, MCI (LMCI, EMCI), Control) has not been studied. This serves as motivation to correctly classify the various diagnostic categories using FDG-PET data. Deep learning has recently been applied to the analysis of structural and functional brain imaging data. This thesis is an introduction to a deep learning based classification technique using neural networks with dimensionality reduction techniques to classify the different stages of AD based on FDG-PET image analysis.

This thesis develops a classification method to investigate the performance of FDG-PET as an effective biomarker for Alzheimer's clinical group classification. This involves dimensionality reduction using Probabilistic Principal Component Analysis on max-pooled data and mean-pooled data, followed by a Multilayer Feed Forward Neural Network which performs binary classification. Max pooled features result into better classification performance compared to results on mean pooled features. Additionally, experiments are done to investigate if the addition of important demographic features such as Functional Activities Questionnaire(FAQ), gene information helps improve performance. Classification results indicate that our designed classifiers achieve competitive results, and better with the additional of demographic features.

ContributorsSingh, Shibani (Author) / Wang, Yalin (Thesis advisor) / Li, Baoxin (Committee member) / Liang, Jianming (Committee member) / Arizona State University (Publisher)

Created2017

Scaling Up Large-scale Sparse Learning and Its Application to Medical Imaging

Description

Large-scale $\ell_1$-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy…

Large-scale $\ell_1$-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to scaling up the optimization problem in parallel. Parallel solvers run multiple cores on a shared memory system or a distributed environment to speed up the computation, while the practical usage is limited by the huge dimension in the feature space and synchronization problems.

In this dissertation, I carry out the research along the direction with particular focuses on scaling up the optimization of sparse learning for supervised and unsupervised learning problems. For the supervised learning, I firstly propose an asynchronous parallel solver to optimize the large-scale sparse learning model in a multithreading environment. Moreover, I propose a distributed framework to conduct the learning process when the dataset is distributed stored among different machines. Then the proposed model is further extended to the studies of risk genetic factors for Alzheimer's Disease (AD) among different research institutions, integrating a group feature selection framework to rank the top risk SNPs for AD. For the unsupervised learning problem, I propose a highly efficient solver, termed Stochastic Coordinate Coding (SCC), scaling up the optimization of dictionary learning and sparse coding problems. The common issue for the medical imaging research is that the longitudinal features of patients among different time points are beneficial to study together. To further improve the dictionary learning model, I propose a multi-task dictionary learning method, learning the different task simultaneously and utilizing shared and individual dictionary to encode both consistent and changing imaging features.

ContributorsLi, Qingyang (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Thesis advisor) / He, Jingrui (Committee member) / Wang, Yalin (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2017

The Fusion of Multimodal Brain Imaging Data from Geometry Perspectives

Description

The rapid development in acquiring multimodal neuroimaging data provides opportunities to systematically characterize human brain structures and functions. For example, in the brain magnetic resonance imaging (MRI), a typical non-invasive imaging technique, different acquisition sequences (modalities) lead to the different descriptions of brain functional activities, or anatomical biomarkers. Nowadays, in…

The rapid development in acquiring multimodal neuroimaging data provides opportunities to systematically characterize human brain structures and functions. For example, in the brain magnetic resonance imaging (MRI), a typical non-invasive imaging technique, different acquisition sequences (modalities) lead to the different descriptions of brain functional activities, or anatomical biomarkers. Nowadays, in addition to the traditional voxel-level analysis of images, there is a trend to process and investigate the cross-modality relationship in a high dimensional level of images, e.g. surfaces and networks.

In this study, I aim to achieve multimodal brain image fusion by referring to some intrinsic properties of data, e.g. geometry of embedding structures where the commonly used image features reside. Since the image features investigated in this study share an identical embedding space, i.e. either defined on a brain surface or brain atlas, where a graph structure is easy to define, it is straightforward to consider the mathematically meaningful properties of the shared structures from the geometry perspective.

I first introduce the background of multimodal fusion of brain image data and insights of geometric properties playing a potential role to link different modalities. Then, several proposed computational frameworks either using the solid and efficient geometric algorithms or current geometric deep learning models are be fully discussed. I show how these designed frameworks deal with distinct geometric properties respectively, and their applications in the real healthcare scenarios, e.g. to enhanced detections of fetal brain diseases or abnormal brain development.

ContributorsZhang, Wen (Author) / Wang, Yalin (Thesis advisor) / Liu, Huan (Committee member) / Li, Baoxin (Committee member) / Braden, B. Blair (Committee member) / Arizona State University (Publisher)

Created2020

Transportation Techniques for Geometric Clustering

Description

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is…

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing.

ContributorsMi, Liang (Author) / Wang, Yalin (Thesis advisor) / Chen, Kewei (Committee member) / Karam, Lina (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2020

Super-resolution for Natural Images and Magnetic Resonance Images

Description

Image super-resolution (SR) is a low-level image processing task, which has manyapplications such as medical imaging, satellite image processing, and video enhancement,
etc. Given a low resolution image, it aims to reconstruct a high resolution
image. The problem is ill-posed since there can be more than one high resolution
image corresponding to the…

Image super-resolution (SR) is a low-level image processing task, which has manyapplications such as medical imaging, satellite image processing, and video enhancement,
etc. Given a low resolution image, it aims to reconstruct a high resolution
image. The problem is ill-posed since there can be more than one high resolution
image corresponding to the same low-resolution image. To address this problem, a
number of machine learning-based approaches have been proposed.
In this dissertation, I present my works on single image super-resolution (SISR)
and accelerated magnetic resonance imaging (MRI) (a.k.a. super-resolution on MR
images), followed by the investigation on transfer learning for accelerated MRI reconstruction.
For the SISR, a dictionary-based approach and two reconstruction based
approaches are presented. To be precise, a convex dictionary learning (CDL)
algorithm is proposed by constraining the dictionary atoms to be formed by nonnegative
linear combination of the training data, which is a natural, desired property.
Also, two reconstruction-based single methods are presented, which make use
of (i)the joint regularization, where a group-residual-based regularization (GRR) and
a ridge-regression-based regularization (3R) are combined; (ii)the collaborative representation
and non-local self-similarity. After that, two deep learning approaches
are proposed, aiming at reconstructing high-quality images from accelerated MRI
acquisition. Residual Dense Block (RDB) and feedback connection are introduced
in the proposed models. In the last chapter, the feasibility of transfer learning for
accelerated MRI reconstruction is discussed.

ContributorsDing, Pak Lun Kevin (Author) / Li, Baoxin (Thesis advisor) / Wu, Teresa (Committee member) / Wang, Yalin (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by