Search Content

A new era of spatial interaction: potential and pitfalls

Description

As urban populations become increasingly dense, massive amounts of new 'big' data that characterize human activity are being made available and may be characterized as having a large volume of observations, being produced in real-time or near real-time, and including a diverse variety of information. In particular, spatial interaction (SI)…

As urban populations become increasingly dense, massive amounts of new 'big' data that characterize human activity are being made available and may be characterized as having a large volume of observations, being produced in real-time or near real-time, and including a diverse variety of information. In particular, spatial interaction (SI) data - a collection of human interactions across a set of origins and destination locations - present unique challenges for distilling big data into insight. Therefore, this dissertation identifies some of the potential and pitfalls associated with new sources of big SI data. It also evaluates methods for modeling SI to investigate the relationships that drive SI processes in order to focus on human behavior rather than data description.

A critical review of the existing SI modeling paradigms is first presented, which also highlights features of big data that are particular to SI data. Next, a simulation experiment is carried out to evaluate three different statistical modeling frameworks for SI data that are supported by different underlying conceptual frameworks. Then, two approaches are taken to identify the potential and pitfalls associated with two newer sources of data from New York City - bike-share cycling trips and taxi trips. The first approach builds a model of commuting behavior using a traditional census data set and then compares the results for the same model when it is applied to these newer data sources. The second approach examines how the increased temporal resolution of big SI data may be incorporated into SI models.

Several important results are obtained through this research. First, it is demonstrated that different SI models account for different types of spatial effects and that the Competing Destination framework seems to be the most robust for capturing spatial structure effects. Second, newer sources of big SI data are shown to be very useful for complimenting traditional sources of data, though they are not sufficient substitutions. Finally, it is demonstrated that the increased temporal resolution of new data sources may usher in a new era of SI modeling that allows us to better understand the dynamics of human behavior.

ContributorsOshan, Taylor Matthew (Author) / Fotheringham, A. S. (Thesis advisor) / Farmer, Carson J.Q. (Committee member) / Rey, Sergio S.J. (Committee member) / Nelson, Trisalyn (Committee member) / Arizona State University (Publisher)

Created2017

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public.…

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

ContributorsMarshall, Grant A (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Reliance Dashboard: An Automated Real Estate Data Analysis Dashboard

Description

Investment real estate is unique among similar financial instruments by nature of each property's internal complexities and interaction with the external economy. Where a majority of tradable assets are static goods within a dynamic market, real estate investments are dynamic goods within a dynamic market. Furthermore, investment real estate, particularly…

Investment real estate is unique among similar financial instruments by nature of each property's internal complexities and interaction with the external economy. Where a majority of tradable assets are static goods within a dynamic market, real estate investments are dynamic goods within a dynamic market. Furthermore, investment real estate, particularly commercial properties, not only interacts with the surrounding economy, it reflects it. Alive with tenancy, each and every commercial investment property provides a microeconomic view of businesses that make up the local economy. Management of commercial investment real estate captures this economic snapshot in a unique abundance of untapped statistical data. While analysis of such data is undeniably valuable, the efforts involved with this process are time consuming. Given this unutilized potential our team has develop proprietary software to analyze this data and communicate the results automatically though and easy to use interface. We have worked with a local real estate property management and ownership firm, Reliance Management, to develop this system through the use of their current, historical, and future data. Our team has also built a relationship with the executives of Reliance Management to review functionality and pertinence of the system we have dubbed, Reliance Dashboard.

ContributorsBurton, Daryl (Co-author) / Workman, Jack (Co-author) / LePine, Marcie (Thesis director) / Atkinson, Robert (Committee member) / Barrett, The Honors College (Contributor) / Department of Finance (Contributor) / Department of Management (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

Big Data Network Analysis of Genetic Variation and Gene Expression in Individuals with Breast Cancer

Description

The advent of big data analytics tools and frameworks has allowed for a plethora of new approaches to research and analysis, making data sets that were previously too large or complex more accessible and providing methods to collect, store, and investigate non-traditional data. These tools are starting to be applied…

The advent of big data analytics tools and frameworks has allowed for a plethora of new approaches to research and analysis, making data sets that were previously too large or complex more accessible and providing methods to collect, store, and investigate non-traditional data. These tools are starting to be applied in more creative ways, and are being used to improve upon traditional computation methods through distributed computing. Statistical analysis of expression quantitative trait loci (eQTL) data has classically been performed using the open source tool PLINK - which runs on high performance computing (HPC) systems. However, progress has been made in running the statistical analysis in the ecosystem of the big data framework Hadoop, resulting in decreased run time, reduced storage footprint, reduced job micromanagement and increased data accessibility. Now that the data can be more readily manipulated, analyzed and accessed, there are opportunities to use the modularity and power of Hadoop to further process the data. This project focuses on adding a component to the data pipeline that will perform graph analysis on the data. This will provide more insight into the relation between various genetic differences in individuals with breast cancer, and the resulting variation - if any - in gene expression. Further, the investigation will look to see if there is anything to be garnered from a perspective shift; applying tools used in classical networking contexts (such as the Internet) to genetically derived networks.

ContributorsRandall, Jacob Christopher (Author) / Buetow, Kenneth (Thesis director) / Meuth, Ryan (Committee member) / Almalih, Sara (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Spatial Regression and Gaussian Process BART

Description

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models and their real world applications. New methods and models were…

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models and their real world applications. New methods and models were proposed to overcome the challenges in practice. There are three major parts in the dissertation.

In the first part, nonlinear regression models were embedded into a multistage workflow to predict the spatial abundance of reef fish species in the Gulf of Mexico. There were two challenges, zero-inflated data and out of sample prediction. The methods and models in the workflow could effectively handle the zero-inflated sampling data without strong assumptions. Three strategies were proposed to solve the out of sample prediction problem. The results and discussions showed that the nonlinear prediction had the advantages of high accuracy, low bias and well-performed in multi-resolution.

In the second part, a two-stage spatial regression model was proposed for analyzing soil carbon stock (SOC) data. In the first stage, there was a spatial linear mixed model that captured the linear and stationary effects. In the second stage, a generalized additive model was used to explain the nonlinear and nonstationary effects. The results illustrated that the two-stage model had good interpretability in understanding the effect of covariates, meanwhile, it kept high prediction accuracy which is competitive to the popular machine learning models, like, random forest, xgboost and support vector machine.

A new nonlinear regression model, Gaussian process BART (Bayesian additive regression tree), was proposed in the third part. Combining advantages in both BART and Gaussian process, the model could capture the nonlinear effects of both observed and latent covariates. To develop the model, first, the traditional BART was generalized to accommodate correlated errors. Then, the failure of likelihood based Markov chain Monte Carlo (MCMC) in parameter estimating was discussed. Based on the idea of analysis of variation, back comparing and tuning range, were proposed to tackle this failure. Finally, effectiveness of the new model was examined by experiments on both simulation and real data.

ContributorsLu, Xuetao (Author) / McCulloch, Robert (Thesis advisor) / Hahn, Paul (Committee member) / Lan, Shiwei (Committee member) / Zhou, Shuang (Committee member) / Saul, Steven (Committee member) / Arizona State University (Publisher)

Created2020

Introduction to Unstructured Case Management

Description

Unstructured data management proves an increasingly valuable asset for organizations today as the amount of data organizations own increases every year. The purpose of this project is to detail the process which ServiceNow and CommonSpirit Health use in developing their new IntelliRoute model which aims to classify and auto-resolve a…

Unstructured data management proves an increasingly valuable asset for organizations today as the amount of data organizations own increases every year. The purpose of this project is to detail the process which ServiceNow and CommonSpirit Health use in developing their new IntelliRoute model which aims to classify and auto-resolve a significant portion of CommonSpirit Health’s more than 3,000,000 HR service-related cases. This paper examines typical strategies used to manage unstructured data and ServiceNow’s approach. Their approach focuses on data labelling by attaching a criticality sentiment to unstructured data and relating helpful knowledge base articles. The labelled data is then used to train an Artificial Intelligence model which automatically labels cases and refers appropriate knowledge articles.

ContributorsDe Waard, Jan (Author) / Bergsagel, Matteo (Co-author) / Chavez-Echeagaray, Maria Elena (Thesis director) / Burns, Christopher (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05

Filtering by

A new era of spatial interaction: potential and pitfalls

Predicting Trends on Twitter with Time Series Analysis

Reliance Dashboard: An Automated Real Estate Data Analysis Dashboard

Big Data Network Analysis of Genetic Variation and Gene Expression in Individuals with Breast Cancer

Spatial Regression and Gaussian Process BART

Introduction to Unstructured Case Management