Matching Items (19)
189297-Thumbnail Image.png
Description
This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other

This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other words, the complex architecture and millions of parameters present challenges in finding the right balance between capturing useful patterns and avoiding noise in the data. To address these issues, this thesis explores novel solutions based on knowledge distillation, enabling the learning of robust representations. Leveraging the capabilities of large-scale networks, effective learning strategies are developed. Moreover, the limitations of dependency on external networks in the distillation process, which often require large-scale models, are effectively overcome by proposing a self-distillation strategy. The proposed approach empowers the model to generate high-level knowledge within a single network, pushing the boundaries of knowledge distillation. The effectiveness of the proposed method is not only demonstrated across diverse applications, including image classification, object detection, and semantic segmentation but also explored in practical considerations such as handling data scarcity and assessing the transferability of the model to other learning tasks. Another major obstacle hindering the development of reliable and robust models lies in their black-box nature, impeding clear insights into the contributions toward the final predictions and yielding uninterpretable feature representations. To address this challenge, this thesis introduces techniques that incorporate simple yet powerful deep constraints rooted in Riemannian geometry. These constraints confer geometric qualities upon the latent representation, thereby fostering a more interpretable and insightful representation. In addition to its primary focus on general tasks like image classification and activity recognition, this strategy offers significant benefits in real-world applications where data scarcity is prevalent. Moreover, its robustness in feature removal showcases its potential for edge applications. By successfully tackling these challenges, this research contributes to advancing the field of machine learning and provides a foundation for building more reliable and robust systems across various application domains.
ContributorsChoi, Hongjun (Author) / Turaga, Pavan (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Wenwen (Committee member) / Fazli, Pooyan (Committee member) / Arizona State University (Publisher)
Created2023
130331-Thumbnail Image.png
Description
Urban economic modeling and effective spatial planning are critical tools towards achieving urban sustainability. However, in practice, many technical obstacles, such as information islands, poor documentation of data and lack of software platforms to facilitate virtual collaboration, are challenging the effectiveness of decision-making processes. In this paper, we report on

Urban economic modeling and effective spatial planning are critical tools towards achieving urban sustainability. However, in practice, many technical obstacles, such as information islands, poor documentation of data and lack of software platforms to facilitate virtual collaboration, are challenging the effectiveness of decision-making processes. In this paper, we report on our efforts to design and develop a geospatial cyberinfrastructure (GCI) for urban economic analysis and simulation. This GCI provides an operational graphic user interface, built upon a service-oriented architecture to allow (1) widespread sharing and seamless integration of distributed geospatial data; (2) an effective way to address the uncertainty and positional errors encountered in fusing data from diverse sources; (3) the decomposition of complex planning questions into atomic spatial analysis tasks and the generation of a web service chain to tackle such complex problems; and (4) capturing and representing provenance of geospatial data to trace its flow in the modeling task. The Greater Los Angeles Region serves as the test bed. We expect this work to contribute to effective spatial policy analysis and decision-making through the adoption of advanced GCI and to broaden the application coverage of GCI to include urban economic simulations.
Created2013-05-21
130375-Thumbnail Image.png
Description
This article reviews the range of delivery platforms that have been developed for the PySAL open source Python library for spatial analysis. This includes traditional desktop software (with a graphical user interface, command line or embedded in a computational notebook), open spatial analytics middleware, and web, cloud and distributed open

This article reviews the range of delivery platforms that have been developed for the PySAL open source Python library for spatial analysis. This includes traditional desktop software (with a graphical user interface, command line or embedded in a computational notebook), open spatial analytics middleware, and web, cloud and distributed open geospatial analytics for decision support. A common thread throughout the discussion is the emphasis on openness, interoperability, and provenance management in a scientific workflow. The code base of the PySAL library provides the common computing framework underlying all delivery mechanisms.
ContributorsRey, Sergio (Author) / Anselin, Luc (Author) / Li, Xun (Author) / Pahle, Robert (Author) / Laura, Jason (Author) / Li, Wenwen (Author) / Koschinsky, Julia (Author) / College of Liberal Arts and Sciences (Contributor) / School of Geographical Sciences and Urban Planning (Contributor) / Computational Spatial Science (Contributor)
Created2015-06-01
129674-Thumbnail Image.png
Description

A measure of shape compactness is a numerical quantity representing the degree to which a shape is compact. Ways to provide an accurate measure have been given great attention due to its application in a broad range of GIS problems, such as detecting clustering patterns from remote-sensing images, understanding urban

A measure of shape compactness is a numerical quantity representing the degree to which a shape is compact. Ways to provide an accurate measure have been given great attention due to its application in a broad range of GIS problems, such as detecting clustering patterns from remote-sensing images, understanding urban sprawl, and redrawing electoral districts to avoid gerrymandering. In this article, we propose an effective and efficient approach to computing shape compactness based on the moment of inertia (MI), a well-known concept in physics. The mathematical framework and the computer implementation for both raster and vector models are discussed in detail. In addition to computing compactness for a single shape, we propose a computational method that is capable of calculating the variations in compactness as a shape grows or shrinks, which is a typical application found in regionalization problems. We conducted a number of experiments that demonstrate the superiority of the MI over the popular isoperimetric quotient approach in terms of (1) computational efficiency; (2) tolerance of positional uncertainty and irregular boundaries; (3) ability to handle shapes with holes and multiple parts; and (4) applicability and efficacy in districting/zonation/regionalization problems.

ContributorsLi, Wenwen (Author) / Goodchild, Michael F. (Author) / Church, Richard (Author) / College of Liberal Arts and Sciences (Contributor)
Created2013-08-15
158510-Thumbnail Image.png
Description
The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and interactive analytics on big spatial data. Even though existing single-node

The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and interactive analytics on big spatial data. Even though existing single-node spatial database systems (DBMSs) provide support for spatial data, they suffer from performance issues when dealing with big spatial data. Challenges to building large-scale spatial data systems are as follows: (1) System Scalability: The massive-scale of available spatial data hinders making sense of it using traditional spatial database management systems. Moreover, large-scale spatial data, besides its tremendous storage footprint, may be extremely difficult to manage and maintain due to the heterogeneous shapes, skewed data distribution and complex spatial relationship. (2) Fast analytics: When the user runs spatial data analytics applications using graphical analytics tools, she does not tolerate delays introduced by the underlying spatial database system. Instead, the user needs to see useful information quickly.

In this dissertation, I focus on designing efficient data systems and data indexing mechanisms to bolster scalable and interactive analytics on large-scale geospatial data. I first propose a cluster computing system GeoSpark which extends the core engine of Apache Spark and Spark SQL to support spatial data types, indexes, and geometrical operations at scale. In order to reduce the indexing overhead, I propose Hippo, a fast, yet scalable, sparse database indexing approach. In contrast to existing tree index structures, Hippo stores disk page ranges (each works as a pointer of one or many pages) instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. Moreover, I present Tabula, a middleware framework that sits between a SQL data system and a spatial visualization dashboard to make the user experience with the dashboard more seamless and interactive. Tabula adopts a materialized sampling cube approach, which pre-materializes samples, not for the entire table as in the SampleFirst approach, but for the results of potentially unforeseen queries (represented by an OLAP cube cell).
ContributorsYu, Jia (Author) / Sarwat Abdelghany Aly Elsayed, Mohamed (Thesis advisor) / Candan, Kasim (Committee member) / Zhao, Ming (Committee member) / Li, Wenwen (Committee member) / Arizona State University (Publisher)
Created2020
158516-Thumbnail Image.png
Description
Geographically Weighted Regression (GWR) has been broadly used in various fields to

model spatially non-stationary relationships. Classic GWR is considered as a single-scale model that is based on one bandwidth parameter which controls the amount of distance-decay in weighting neighboring data around each location. The single bandwidth in GWR assumes that

Geographically Weighted Regression (GWR) has been broadly used in various fields to

model spatially non-stationary relationships. Classic GWR is considered as a single-scale model that is based on one bandwidth parameter which controls the amount of distance-decay in weighting neighboring data around each location. The single bandwidth in GWR assumes that processes (relationships between the response variable and the predictor variables) all operate at the same scale. However, this posits a limitation in modeling potentially multi-scale processes which are more often seen in the real world. For example, the measured ambient temperature of a location is affected by the built environment, regional weather and global warming, all of which operate at different scales. A recent advancement to GWR termed Multiscale GWR (MGWR) removes the single bandwidth assumption and allows the bandwidths for each covariate to vary. This results in each parameter surface being allowed to have a different degree of spatial variation, reflecting variation across covariate-specific processes. In this way, MGWR has the capability to differentiate local, regional and global processes by using varying bandwidths for covariates. Additionally, bandwidths in MGWR become explicit indicators of the scale at various processes operate. The proposed dissertation covers three perspectives centering on MGWR: Computation; Inference; and Application. The first component focuses on addressing computational issues in MGWR to allow MGWR models to be calibrated more efficiently and to be applied on large datasets. The second component aims to statistically differentiate the spatial scales at which different processes operate by quantifying the uncertainty associated with each bandwidth obtained from MGWR. In the third component, an empirical study will be conducted to model the changing relationships between county-level socio-economic factors and voter preferences in the 2008-2016 United States presidential elections using MGWR.
ContributorsLi, Ziqi (Author) / Fotheringham, A. Stewart (Thesis advisor) / Goodchild, Michael F. (Committee member) / Li, Wenwen (Committee member) / Arizona State University (Publisher)
Created2020
161787-Thumbnail Image.png
Description
The role of movement data is essential to understanding how geographic context influences movement patterns in urban areas. Owing to the growth in ubiquitous data collection platforms like smartphones, fitness trackers, and health monitoring apps, researchers are now able to collect movement data at increasingly fine spatial and temporal resolution.

The role of movement data is essential to understanding how geographic context influences movement patterns in urban areas. Owing to the growth in ubiquitous data collection platforms like smartphones, fitness trackers, and health monitoring apps, researchers are now able to collect movement data at increasingly fine spatial and temporal resolution. Despite the surge in volumes of fine-grained movement data, there is a gap in the availability of quantitative and analytical tools to extract actionable insights from such big datasets and tease out the role of context in movement pattern analysis. As cities aim to be safer and healthier, policymakers require methods to generate efficient strategies for urban planning utilizing high-frequency movement data to make targeted decisions for infrastructure investments without compromising the safety of its residents. The objective of this Ph.D. dissertation is to develop quantitative methods that combine big spatial-temporal data from crowdsourced platforms with geographic context to analyze movement patterns over space and time. Knowledge about the role of context can help in assessing why changes in movement patterns occur and how those changes are affected by the immediate natural and built environment. In this dissertation I contribute to the rapidly expanding body of quantitative movement pattern analysis research by 1) developing a bias-correction framework for improving the representativeness of crowdsourced movement data by modeling bias with training data and geographical variables, 2) understanding spatial-temporal changes in movement patterns at different periods and how context influences those changes by generating hourly and monthly change maps in bicycle ridership patterns, and 3) quantifying the variation in accuracy and generalizability of transportation mode detection models using GPS (Global Positioning Systems) data upon adding geographic context. Using statistical models, supervised classification algorithms, and functional data analysis approaches I develop modeling frameworks that address each of the research objectives. The results are presented as street-level maps and predictive models which are reproducible in nature. The methods developed in this dissertation can serve as analytical tools by policymakers to plan infrastructure changes and facilitate data collection efforts that represent movement patterns for all ages and abilities.
ContributorsRoy, Avipsa (Author) / Nelson, Trisalyn A. (Thesis advisor) / Kedron, Peter J. (Committee member) / Li, Wenwen (Committee member) / Arizona State University (Publisher)
Created2021
171945-Thumbnail Image.png
Description
Integrated water resources management for flood control, water distribution, conservation, and food security require understanding hydrological spatial and temporal trends. Proliferation of monitoring and sensor data has boosted data-driven simulation and evaluation. Developing data-driven models for such physical process-related phenomena, and meaningful interpretability therein, necessitates an inventive methodology. In this

Integrated water resources management for flood control, water distribution, conservation, and food security require understanding hydrological spatial and temporal trends. Proliferation of monitoring and sensor data has boosted data-driven simulation and evaluation. Developing data-driven models for such physical process-related phenomena, and meaningful interpretability therein, necessitates an inventive methodology. In this dissertation, I developed time series and deep learning model that connected rainfall, runoff, and fish species abundances. I also investigated the underlying explainabilty for hydrological processes and impacts on fish species. First, I created a streamflow simulation model using computer vision and natural language processing as an alternative to physical-based routing. I tested it on seven US river network sections and showed it outperformed time series models, deep learning baselines, and novel variants. In addition, my model explained flow routing without physical parameter input or time-consuming calibration. On the basis of this model, I expanded it from accepting dispersed spatial inputs to adopting comprehensive 2D grid data. I constructed a spatial-temporal deep leaning model for rainfall-runoff simulation. I tested it against a semi-distributed hydrological model and found superior results. Furthermore, I investigated the potential interpretability for rainfall-runoff process in both space and time. To understand impacts of flow variation on fish species, I applied a frequency based model framework for long term time series data simulation. First, I discovered that timing of hydrological anomalies was as crucial as their size. Flooding and drought, when properly timed, were both linked with excellent fishing productivity. To identify responses of various fish trait groups, I used this model to assess mitigated hydrological variation by fish attributes. Longitudinal migratory fish species were more impacted by flow variance, whereas migratory strategy species reacted in the same direction but to various degrees. Finally, I investigated future fish population changes under alternative design flow scenarios and showed that a protracted low flow with a powerful, on-time flood pulse would benefit fish. In my dissertation, I constructed three data-driven models that link the hydrological cycle to the stream environment and give insight into the underlying physical process, which is vital for quantitative, efficient, and integrated water resource management.
ContributorsDeng, Qi (Author) / Sabo, John (Thesis advisor) / Grimm, Nancy (Thesis advisor) / Ganguly, Auroop (Committee member) / Li, Wenwen (Committee member) / Mascaro, Giuseppe (Committee member) / Arizona State University (Publisher)
Created2022
157004-Thumbnail Image.png
Description
In the field of Geographic Information Science (GIScience), we have witnessed the unprecedented data deluge brought about by the rapid advancement of high-resolution data observing technologies. For example, with the advancement of Earth Observation (EO) technologies, a massive amount of EO data including remote sensing data and other sensor observation

In the field of Geographic Information Science (GIScience), we have witnessed the unprecedented data deluge brought about by the rapid advancement of high-resolution data observing technologies. For example, with the advancement of Earth Observation (EO) technologies, a massive amount of EO data including remote sensing data and other sensor observation data about earthquake, climate, ocean, hydrology, volcano, glacier, etc., are being collected on a daily basis by a wide range of organizations. In addition to the observation data, human-generated data including microblogs, photos, consumption records, evaluations, unstructured webpages and other Volunteered Geographical Information (VGI) are incessantly generated and shared on the Internet.

Meanwhile, the emerging cyberinfrastructure rapidly increases our capacity for handling such massive data with regard to data collection and management, data integration and interoperability, data transmission and visualization, high-performance computing, etc. Cyberinfrastructure (CI) consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high-performance networks to improve research productivity and enable breakthroughs that are not otherwise possible.

The Geospatial CI (GCI, or CyberGIS), as the synthesis of CI and GIScience has inherent advantages in enabling computationally intensive spatial analysis and modeling (SAM) and collaborative geospatial problem solving and decision making.

This dissertation is dedicated to addressing several critical issues and improving the performance of existing methodologies and systems in the field of CyberGIS. My dissertation will include three parts: The first part is focused on developing methodologies to help public researchers find appropriate open geo-spatial datasets from millions of records provided by thousands of organizations scattered around the world efficiently and effectively. Machine learning and semantic search methods will be utilized in this research. The second part develops an interoperable and replicable geoprocessing service by synthesizing the high-performance computing (HPC) environment, the core spatial statistic/analysis algorithms from the widely adopted open source python package – Python Spatial Analysis Library (PySAL), and rich datasets acquired from the first research. The third part is dedicated to studying optimization strategies for feature data transmission and visualization. This study is intended for solving the performance issue in large feature data transmission through the Internet and visualization on the client (browser) side.

Taken together, the three parts constitute an endeavor towards the methodological improvement and implementation practice of the data-driven, high-performance and intelligent CI to advance spatial sciences.
ContributorsShao, Hu (Author) / Li, Wenwen (Thesis advisor) / Rey, Sergio (Thesis advisor) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)
Created2018