Matching Items (3)
Filtering by

Clear all filters

156060-Thumbnail Image.png
Description
As urban populations become increasingly dense, massive amounts of new 'big' data that characterize human activity are being made available and may be characterized as having a large volume of observations, being produced in real-time or near real-time, and including a diverse variety of information. In particular, spatial interaction (SI)

As urban populations become increasingly dense, massive amounts of new 'big' data that characterize human activity are being made available and may be characterized as having a large volume of observations, being produced in real-time or near real-time, and including a diverse variety of information. In particular, spatial interaction (SI) data - a collection of human interactions across a set of origins and destination locations - present unique challenges for distilling big data into insight. Therefore, this dissertation identifies some of the potential and pitfalls associated with new sources of big SI data. It also evaluates methods for modeling SI to investigate the relationships that drive SI processes in order to focus on human behavior rather than data description.

A critical review of the existing SI modeling paradigms is first presented, which also highlights features of big data that are particular to SI data. Next, a simulation experiment is carried out to evaluate three different statistical modeling frameworks for SI data that are supported by different underlying conceptual frameworks. Then, two approaches are taken to identify the potential and pitfalls associated with two newer sources of data from New York City - bike-share cycling trips and taxi trips. The first approach builds a model of commuting behavior using a traditional census data set and then compares the results for the same model when it is applied to these newer data sources. The second approach examines how the increased temporal resolution of big SI data may be incorporated into SI models.

Several important results are obtained through this research. First, it is demonstrated that different SI models account for different types of spatial effects and that the Competing Destination framework seems to be the most robust for capturing spatial structure effects. Second, newer sources of big SI data are shown to be very useful for complimenting traditional sources of data, though they are not sufficient substitutions. Finally, it is demonstrated that the increased temporal resolution of new data sources may usher in a new era of SI modeling that allows us to better understand the dynamics of human behavior.
ContributorsOshan, Taylor Matthew (Author) / Fotheringham, A. S. (Thesis advisor) / Farmer, Carson J.Q. (Committee member) / Rey, Sergio S.J. (Committee member) / Nelson, Trisalyn (Committee member) / Arizona State University (Publisher)
Created2017
156347-Thumbnail Image.png
Description

Factors that explain human mobility and active transportation include built environment and infrastructure features, though few studies incorporate specific geographic detail into examinations of mobility. Little is understood, for example, about the specific paths people take in urban areas or the influence of neighborhoods on their activity. Detailed analysis of

Factors that explain human mobility and active transportation include built environment and infrastructure features, though few studies incorporate specific geographic detail into examinations of mobility. Little is understood, for example, about the specific paths people take in urban areas or the influence of neighborhoods on their activity. Detailed analysis of human activity has been limited by the sampling strategies employed by conventional data sources. New crowdsourced datasets, or data gathered from smartphone applications, present an opportunity to examine factors that influence human activity in ways that have not been possible before; they typically contain more detail and are gathered more frequently than conventional sources. Questions remain, however, about the utility and representativeness of crowdsourced data. The overarching aim of this dissertation research is to identify how crowdsourced data can be used to better understand human mobility. Bicycling activity is used as a case study to examine human mobility because smartphone apps aimed at collecting bicycle routes are readily available and bicycling is under studied in comparison to other modes. The research herein aimed to contribute to the knowledge base on crowdsourced data and human mobility in three ways. First, the research examines how conventional (e.g., counts, travel surveys) and crowdsourced data correspond in representing bicycling activity. Results identified where the data correspond and differ significantly, which has implications for using crowdsourced data for planning and policy decisions. Second, the research examined the factors that influence cycling activity generated by smartphone cycling apps. The best predictors of activity were median weekly rent, percentage of residential land, and the number of people using two or more modes to commute in an area. Finally, the third part of the dissertation seeks to understand the impact of bicycle lanes and bicycle ridership on residential housing prices. Results confirmed that bicycle lanes in the neighborhood of a home positively influence sale prices, though ridership was marginally related to house price. This research demonstrates that knowledge obtained through crowdsourced data informs us about smaller geographic areas and details on where people bicycle, who uses bicycles, and the impact of the built environment on bicycling activity.

ContributorsConrow, Lindsey (Author) / Wentz, Elizabeth (Thesis advisor) / Nelson, Trisalyn (Committee member) / Mooney, Sian (Committee member) / Pettit, Christopher (Committee member) / Arizona State University (Publisher)
Created2018
158850-Thumbnail Image.png
Description
Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models and their real world applications. New methods and models were

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models and their real world applications. New methods and models were proposed to overcome the challenges in practice. There are three major parts in the dissertation.

In the first part, nonlinear regression models were embedded into a multistage workflow to predict the spatial abundance of reef fish species in the Gulf of Mexico. There were two challenges, zero-inflated data and out of sample prediction. The methods and models in the workflow could effectively handle the zero-inflated sampling data without strong assumptions. Three strategies were proposed to solve the out of sample prediction problem. The results and discussions showed that the nonlinear prediction had the advantages of high accuracy, low bias and well-performed in multi-resolution.

In the second part, a two-stage spatial regression model was proposed for analyzing soil carbon stock (SOC) data. In the first stage, there was a spatial linear mixed model that captured the linear and stationary effects. In the second stage, a generalized additive model was used to explain the nonlinear and nonstationary effects. The results illustrated that the two-stage model had good interpretability in understanding the effect of covariates, meanwhile, it kept high prediction accuracy which is competitive to the popular machine learning models, like, random forest, xgboost and support vector machine.

A new nonlinear regression model, Gaussian process BART (Bayesian additive regression tree), was proposed in the third part. Combining advantages in both BART and Gaussian process, the model could capture the nonlinear effects of both observed and latent covariates. To develop the model, first, the traditional BART was generalized to accommodate correlated errors. Then, the failure of likelihood based Markov chain Monte Carlo (MCMC) in parameter estimating was discussed. Based on the idea of analysis of variation, back comparing and tuning range, were proposed to tackle this failure. Finally, effectiveness of the new model was examined by experiments on both simulation and real data.
ContributorsLu, Xuetao (Author) / McCulloch, Robert (Thesis advisor) / Hahn, Paul (Committee member) / Lan, Shiwei (Committee member) / Zhou, Shuang (Committee member) / Saul, Steven (Committee member) / Arizona State University (Publisher)
Created2020