Matching Items (4)
Filtering by

Clear all filters

157491-Thumbnail Image.png
Description
Researchers and practitioners have widely studied road network traffic data in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale high-quality urban traffic data requires tremendous efforts because participating vehicles must

Researchers and practitioners have widely studied road network traffic data in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale high-quality urban traffic data requires tremendous efforts because participating vehicles must install Global Positioning System(GPS) receivers and administrators must continuously monitor these devices. There have been some urban traffic simulators trying to generate such data with different features. However, they suffer from two critical issues (1) Scalability: most of them only offer single-machine solution which is not adequate to produce large-scale data. Some simulators can generate traffic in parallel but do not well balance the load among machines in a cluster. (2) Granularity: many simulators do not consider microscopic traffic situations including traffic lights, lane changing, car following. This paper proposed GeoSparkSim, a scalable traffic simulator which extends Apache Spark to generate large-scale road network traffic datasets with microscopic traffic simulation. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark, to deliver a holistic approach that allows data scientists to simulate, analyze and visualize large-scale urban traffic data. To implement microscopic traffic models, GeoSparkSim employs a simulation-aware vehicle partitioning method to partition vehicles among different machines such that each machine has a balanced workload. The experimental analysis shows that GeoSparkSim can simulate the movements of 200 thousand cars over an extensive road network (250 thousand road junctions and 300 thousand road segments).
ContributorsFu, Zishan (Author) / Sarwat, Mohamed (Thesis advisor) / Pedrielli, Giulia (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)
Created2019
154864-Thumbnail Image.png
Description
Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users.

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp.
ContributorsPasumarthy, Nitin (Author) / Sarwat, Mohamed (Thesis advisor) / Papotti, Paolo (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2016
161829-Thumbnail Image.png
Description
The use of spatial data has become very fundamental in today's world. Ranging from fitness trackers to food delivery services, almost all application records users' location information and require clean geospatial data to enhance various application features. As spatial data flows in from heterogeneous sources various problems arise. The study

The use of spatial data has become very fundamental in today's world. Ranging from fitness trackers to food delivery services, almost all application records users' location information and require clean geospatial data to enhance various application features. As spatial data flows in from heterogeneous sources various problems arise. The study of entity matching has been a fervent step in the process of producing clean usable data. Entity matching is an amalgamation of various sub-processes including blocking and matching. At the end of an entity matching pipeline, we get deduplicated records of the same real-world entity. Identifying various mentions of the same real-world locations is known as spatial entity matching. While entity matching received significant interest in the field of relational entity matching, the same cannot be said about spatial entity matching. In this dissertation, I build an end-to-end Geospatial Entity Matching framework, GEM, exploring spatial entity matching from a novel perspective. In the current state-of-the-art systems spatial entity matching is only done on one type of geometrical data variant. Instead of confining to matching spatial entities of only point geometry type, I work on extending the boundaries of spatial entity matching to match the more generic polygon geometry entities as well. I propose a methodology to provide support for three entity matching scenarios across different geometrical data types: point X point, point X polygon, polygon X polygon. As mentioned above entity matching consists of various steps but blocking, feature vector creation, and classification are the core steps of the system. GEM comprises an efficient and lightweight blocking technique, GeoPrune, that uses the geohash encoding mechanism to prune away the obvious non-matching spatial entities. Geohashing is a technique to convert a point location coordinates to an alphanumeric code string. This technique proves to be very effective and swift for the blocking mechanism. I leverage the Apache Sedona engine to create the feature vectors. Apache Sedona is a spatial database management system that holds the capacity of processing spatial SQL queries with multiple geometry types without compromising on their original coordinate vector representation. In this step, I re-purpose the spatial proximity operators (SQL queries) in Apache Sedona to create spatial feature dimensions that capture the proximity between a geospatial entity pair. The last step of an entity matching process is matching or classification. The classification step in GEM is a pluggable component, which consumes the feature vector for a spatial entity pair and determines whether the geolocations match or not. The component provides 3 machine learning models that consume the same feature vector and provide a label for the test data based on the training. I conduct experiments with the three classifiers upon multiple large-scale geospatial datasets consisting of both spatial and relational attributes. Data considered for experiments arrives from heterogeneous sources and we pre-align its schema manually. GEM achieves an F-measure of 1.0 for a point X point dataset with 176k total pairs, which is 42% higher than a state-of-the-art spatial EM baseline. It achieves F-measures of 0.966 and 0.993 for the point X polygon dataset with 302M total pairs, and the polygon X polygon dataset with 16M total pairs respectively.
ContributorsShah, Setu Nilesh (Author) / Sarwat, Mohamed (Thesis advisor) / Pedrielli, Giulia (Committee member) / Boscovic, Dragan (Committee member) / Arizona State University (Publisher)
Created2021
155987-Thumbnail Image.png
Description
A volunteered geographic information system, e.g., OpenStreetMap (OSM), collects data from volunteers to generate geospatial maps. To keep the map consistent, volunteers are expected to perform the tedious task of updating the underlying geospatial data at regular intervals. Such a map curation step takes time and considerable human effort. In

A volunteered geographic information system, e.g., OpenStreetMap (OSM), collects data from volunteers to generate geospatial maps. To keep the map consistent, volunteers are expected to perform the tedious task of updating the underlying geospatial data at regular intervals. Such a map curation step takes time and considerable human effort. In this thesis, we propose a framework that improves the process of updating geospatial maps by automatically identifying road changes from user-generated GPS traces. Since GPS traces can be sparse and noisy, the proposed framework validates the map changes with the users before propagating them to a publishable version of the map. The proposed framework achieves up to four times faster map matching performance than the state-of-the-art algorithms with only 0.1-0.3% accuracy loss.
ContributorsVementala, Nikhil (Author) / Papotti, Paolo (Thesis advisor) / Sarwat, Mohamed (Thesis advisor) / Kasim, Selçuk Candan (Committee member) / Arizona State University (Publisher)
Created2017