Matching Items (10)
Filtering by

Clear all filters

151878-Thumbnail Image.png
Description
Researchers across a variety of fields are often interested in determining if data are of a random nature or if they exhibit patterning which may be the result of some alternative and potentially more interesting process. This dissertation explores a family of statistical methods, i.e. space-time interaction tests, designed to

Researchers across a variety of fields are often interested in determining if data are of a random nature or if they exhibit patterning which may be the result of some alternative and potentially more interesting process. This dissertation explores a family of statistical methods, i.e. space-time interaction tests, designed to detect structure within three-dimensional event data. These tests, widely employed in the fields of spatial epidemiology, criminology, ecology and beyond, are used to identify synergistic interaction across the spatial and temporal dimensions of a series of events. Exploration is needed to better understand these methods and determine how their results may be affected by data quality problems commonly encountered in their implementation; specifically, how inaccuracy and/or uncertainty in the input data analyzed by the methods may impact subsequent results. Additionally, known shortcomings of the methods must be ameliorated. The contributions of this dissertation are twofold: it develops a more complete understanding of how input data quality problems impact the results of a number of global and local tests of space-time interaction and it formulates an improved version of one global test which accounts for the previously identified problem of population shift bias. A series of simulation experiments reveal the global tests of space-time interaction explored here to be dramatically affected by the aforementioned deficiencies in the quality of the input data. It is shown that in some cases, a conservative degree of these common data problems can completely obscure evidence of space-time interaction and in others create it where it does not exist. Conversely, a local metric of space-time interaction examined here demonstrates a surprising robustness in the face of these same deficiencies. This local metric is revealed to be only minimally affected by the inaccuracies and incompleteness introduced in these experiments. Finally, enhancements to one of the global tests are presented which solve the problem of population shift bias associated with the test and better contextualize and visualize its results, thereby enhancing its utility for practitioners.
ContributorsMalizia, Nicholas (Author) / Anselin, Luc (Thesis advisor) / Murray, Alan (Committee member) / Rey, Sergio (Committee member) / Arizona State University (Publisher)
Created2013
151538-Thumbnail Image.png
Description
There exist many facets of error and uncertainty in digital spatial information. As error or uncertainty will not likely ever be completely eliminated, a better understanding of its impacts is necessary. Spatial analytical approaches, in particular, must somehow address data quality issues. This can range from evaluating impacts of potential

There exist many facets of error and uncertainty in digital spatial information. As error or uncertainty will not likely ever be completely eliminated, a better understanding of its impacts is necessary. Spatial analytical approaches, in particular, must somehow address data quality issues. This can range from evaluating impacts of potential data uncertainty in planning processes that make use of methods to devising methods that explicitly account for error/uncertainty. To date, little has been done to structure methods accounting for error. This research focuses on developing methods to address geographic data uncertainty in spatial optimization. An integrated approach that characterizes uncertainty impacts by constructing and solving a new multi-objective model that explicitly incorporates facets of data uncertainty is developed. Empirical findings illustrate that the proposed approaches can be applied to evaluate the impacts of data uncertainty with statistical confidence, which moves beyond popular practices of simulating errors in data. Spatial uncertainty impacts are evaluated in two contexts: harvest scheduling and sex offender residency. Owing to the integration of spatial uncertainty, the detailed multi-objective models are more complex and computationally challenging to solve. As a result, a new multi-objective evolutionary algorithm is developed to address the computational challenges posed. The proposed algorithm incorporates problem-specific spatial knowledge to significantly enhance the capability of the evolutionary algorithm for solving the model.  
ContributorsWei, Ran (Author) / Murray, Alan T. (Thesis advisor) / Anselin, Luc (Committee member) / Rey, Segio J (Committee member) / Mack, Elizabeth A. (Committee member) / Arizona State University (Publisher)
Created2013
152171-Thumbnail Image.png
Description

Choropleth maps are a common form of online cartographic visualization. They reveal patterns in spatial distributions of a variable by associating colors with data values measured at areal units. Although this capability of pattern revelation has popularized the use of choropleth maps, existing methods for their online delivery are limited

Choropleth maps are a common form of online cartographic visualization. They reveal patterns in spatial distributions of a variable by associating colors with data values measured at areal units. Although this capability of pattern revelation has popularized the use of choropleth maps, existing methods for their online delivery are limited in supporting dynamic map generation from large areal data. This limitation has become increasingly problematic in online choropleth mapping as access to small area statistics, such as high-resolution census data and real-time aggregates of geospatial data streams, has never been easier due to advances in geospatial web technologies. The current literature shows that the challenge of large areal data can be mitigated through tiled maps where pre-processed map data are hierarchically partitioned into tiny rectangular images or map chunks for efficient data transmission. Various approaches have emerged lately to enable this tile-based choropleth mapping, yet little empirical evidence exists on their ability to handle spatial data with large numbers of areal units, thus complicating technical decision making in the development of online choropleth mapping applications. To fill this knowledge gap, this dissertation study conducts a scalability evaluation of three tile-based methods discussed in the literature: raster, scalable vector graphics (SVG), and HTML5 Canvas. For the evaluation, the study develops two test applications, generates map tiles from five different boundaries of the United States, and measures the response times of the applications under multiple test operations. While specific to the experimental setups of the study, the evaluation results show that the raster method scales better across various types of user interaction than the other methods. Empirical evidence also points to the superior scalability of Canvas to SVG in dynamic rendering of vector tiles, but not necessarily for partial updates of the tiles. These findings indicate that the raster method is better suited for dynamic choropleth rendering from large areal data, while Canvas would be more suitable than SVG when such rendering frequently involves complete updates of vector shapes.

ContributorsHwang, Myunghwa (Author) / Anselin, Luc (Thesis advisor) / Rey, Sergio J. (Committee member) / Wentz, Elizabeth (Committee member) / Arizona State University (Publisher)
Created2013
150225-Thumbnail Image.png
Description
Regional differences of inventive activity and economic growth are important in economic geography. These differences are generally explained by the theory of localized knowledge spillovers, which argues that geographical proximity among economic actors fosters invention and innovation. However, knowledge production involves an increasing number of actors connecting to non-local partners.

Regional differences of inventive activity and economic growth are important in economic geography. These differences are generally explained by the theory of localized knowledge spillovers, which argues that geographical proximity among economic actors fosters invention and innovation. However, knowledge production involves an increasing number of actors connecting to non-local partners. The space of knowledge flows is not tightly bounded in a given territory, but functions as a network-based system where knowledge flows circulate around alignments of actors in different and distant places. The purpose of this dissertation is to understand the dynamics of network aspects of knowledge flows in American biotechnology. The first research task assesses both spatial and network-based dependencies of biotechnology co-invention across 150 large U.S. metropolitan areas over four decades (1979, 1989, 1999, and 2009). An integrated methodology including both spatial and social network analyses are explicitly applied and compared. Results show that the network-based proximity better defines the U.S. biotechnology co-invention urban system in recent years. Co-patenting relationships of major biotechnology centers has demonstrated national and regional association since the 1990s. Associations retain features of spatial proximity especially in some Midwestern and Northeastern cities, but these are no longer the strongest features affecting co-inventive links. The second research task examines how biotechnology knowledge flows circulate over space by focusing on the structural properties of intermetropolitan co-invention networks. All analyses in this task are conducted using social network analysis. Evidence shows that the architecture of the U.S. co-invention networks reveals a trend toward more organized structures and less fragmentation over the four years of analysis. Metropolitan areas are increasingly interconnected into a large web of networked environment. Knowledge flows are less likely to be controlled by a small number of intermediaries. San Francisco, New York, Boston, and San Diego monopolize the central positions of the intermetropolitan co-invention network as major American biotechnology concentrations. The overall network-based system comes close to a relational core/periphery structure where core metropolitan areas are strongly connected to one another and to some peripheral areas. Peripheral metropolitan areas are loosely connected or even disconnected with each other. This dissertation provides empirical evidence to support the argument that technological collaboration reveals a network-based system associated with different or even distant geographical places, which is somewhat different from the conventional theory of localized knowledge spillovers that once dominated understanding of the role of geography in technological advance.
ContributorsLee, Der-Shiuan (Author) / Ó Huallacháin, Breandán (Thesis advisor) / Anselin, Luc (Committee member) / Kuby, Michael (Committee member) / Lobo, Jose (Committee member) / Arizona State University (Publisher)
Created2011
151109-Thumbnail Image.png
Description
Decades ago in the U.S., clear lines delineated which neighborhoods were acceptable for certain people and which were not. Techniques such as steering and biased mortgage practices continue to perpetuate a segregated outcome for many residents. In contrast, ethnic enclaves and age restricted communities are viewed as voluntary segregation based

Decades ago in the U.S., clear lines delineated which neighborhoods were acceptable for certain people and which were not. Techniques such as steering and biased mortgage practices continue to perpetuate a segregated outcome for many residents. In contrast, ethnic enclaves and age restricted communities are viewed as voluntary segregation based on cultural and social amenities. This diversity surrounding the causes of segregation are not just region-wide characteristics, but can vary within a region. Local segregation analysis aims to uncover this local variation, and hence open the door to policy solutions not visible at the global scale. The centralization index, originally introduced as a global measure of segregation focused on spatial concentration of two population groups relative a region's urban center, has lost relevancy in recent decades as regions have become polycentric, and the index's magnitude is sensitive to the particular point chosen as the center. These attributes, which make it a poor global measure, are leveraged here to repurpose the index as a local measure. The index's ability to differentiate minority from majority segregation, and its focus on a particular location within a region make it an ideal local segregation index. Based on the local centralization index for two groups, a local multigroup variation is defined, and a local space-time redistribution index is presented capturing change in concentration of a single population group over two time periods. Permutation based inference approaches are used to test the statistical significance of measured index values. Applications to the Phoenix, Arizona metropolitan area show persistent cores of black and white segregation over the years 1990, 2000 and 2010, and a trend of white segregated neighborhoods increasing at a faster rate than black. An analysis of the Phoenix area's recently opened light rail system shows that its 28 stations are located in areas of significant white, black and Hispanic segregation, and there is a clear concentration of renters over owners around most stations. There is little indication of statistically significant change in segregation or population concentration around the stations, indicating a lack of near term impact of light rail on the region's overall demographics.
ContributorsFolch, David C. (Author) / Rey, Sergio J (Thesis advisor) / Anselin, Luc (Committee member) / Murray, Alan T. (Committee member) / Arizona State University (Publisher)
Created2012
155931-Thumbnail Image.png
Description
Gerrymandering is a central problem for many representative democracies. Formally, gerrymandering is the manipulation of spatial boundaries to provide political advantage to a particular group (Warf, 2006). The term often refers to political district design, where the boundaries of political districts are “unnaturally” manipulated by redistricting officials to generate durable

Gerrymandering is a central problem for many representative democracies. Formally, gerrymandering is the manipulation of spatial boundaries to provide political advantage to a particular group (Warf, 2006). The term often refers to political district design, where the boundaries of political districts are “unnaturally” manipulated by redistricting officials to generate durable advantages for one group or party. Since free and fair elections are possibly the critical part of representative democracy, it is important for this cresting tide to have scientifically validated tools. This dissertation supports a current wave of reform by developing a general inferential technique to “localize” inferential bias measures, generating a new type of district-level score. The new method relies on the statistical intuition behind jackknife methods to construct relative local indicators. I find that existing statewide indicators of partisan bias can be localized using this technique, providing an estimate of how strongly a district impacts statewide partisan bias over an entire decade. When compared to measures of shape compactness (a common gerrymandering detection statistic), I find that weirdly-shaped districts have no consistent relationship with impact in many states during the 2000 and 2010 redistricting plan. To ensure that this work is valid, I examine existing seats-votes modeling strategies and develop a novel method for constructing seats-votes curves. I find that, while the empirical structure of electoral swing shows significant spatial dependence (even in the face of spatial heterogeneity), existing seats-votes specifications are more robust than anticipated to spatial dependence. Centrally, this dissertation contributes to the much larger social aim to resist electoral manipulation: that individuals & organizations suffer no undue burden on political access from partisan gerrymandering.
ContributorsWolf, Levi (Author) / Rey, Sergio J (Thesis advisor) / Anselin, Luc (Committee member) / Fotheringham, A. Stewart (Committee member) / Tam Cho, Wendy K (Committee member) / Arizona State University (Publisher)
Created2017
136787-Thumbnail Image.png
Description
There is a serious need for early childhood intervention practices for children who are living at or below the poverty line. Since 1965 Head Start has provided a federally funded, free preschool program for children in this population. The City of Phoenix Head Start program consists of nine delegate agencies,

There is a serious need for early childhood intervention practices for children who are living at or below the poverty line. Since 1965 Head Start has provided a federally funded, free preschool program for children in this population. The City of Phoenix Head Start program consists of nine delegate agencies, seven of which reside in school districts. These agencies are currently not conducting local longitudinal evaluations of their preschool graduates. The purpose of this study was to recommend initial steps the City of Phoenix grantee and the delegate agencies can take to begin a longitudinal evaluation process of their Head Start programs. Seven City of Phoenix Head Start agency directors were interviewed. These interviews provided information about the attitudes of the directors when considering longitudinal evaluations and how Head Start already evaluates their programs through internal assessments. The researcher also took notes on the Third Grade Follow-Up to the Head Start Executive Summary in order to make recommendations to the City of Phoenix Head Start programs about the best practices for longitudinal student evaluations.
Created2014-05
154146-Thumbnail Image.png
Description
Science instructors need questions for use in exams, homework assignments, class discussions, reviews, and other instructional activities. Textbooks never have enough questions, so instructors must find them from other sources or generate their own questions. In order to supply instructors with biology questions, a semantic network approach was

Science instructors need questions for use in exams, homework assignments, class discussions, reviews, and other instructional activities. Textbooks never have enough questions, so instructors must find them from other sources or generate their own questions. In order to supply instructors with biology questions, a semantic network approach was developed for generating open response biology questions. The generated questions were compared to professional authorized questions.

To boost students’ learning experience, adaptive selection was built on the generated questions. Bayesian Knowledge Tracing was used as embedded assessment of the student’s current competence so that a suitable question could be selected based on the student’s previous performance. A between-subjects experiment with 42 participants was performed, where half of the participants studied with adaptive selected questions and the rest studied with mal-adaptive order of questions. Both groups significantly improved their test scores, and the participants in adaptive group registered larger learning gains than participants in the control group.

To explore the possibility of generating rich instructional feedback for machine-generated questions, a question-paragraph mapping task was identified. Given a set of questions and a list of paragraphs for a textbook, the goal of the task was to map the related paragraphs to each question. An algorithm was developed whose performance was comparable to human annotators.

A multiple-choice question with high quality distractors (incorrect answers) can be pedagogically valuable as well as being much easier to grade than open-response questions. Thus, an algorithm was developed to generate good distractors for multiple-choice questions. The machine-generated multiple-choice questions were compared to human-generated questions in terms of three measures: question difficulty, question discrimination and distractor usefulness. By recruiting 200 participants from Amazon Mechanical Turk, it turned out that the two types of questions performed very closely on all the three measures.
ContributorsZhang, Lishang (Author) / VanLehn, Kurt (Thesis advisor) / Baral, Chitta (Committee member) / Hsiao, Ihan (Committee member) / Wright, Christian (Committee member) / Arizona State University (Publisher)
Created2015
155841-Thumbnail Image.png
Description
A major challenge in health-related policy and program evaluation research is attributing underlying causal relationships where complicated processes may exist in natural or quasi-experimental settings. Spatial interaction and heterogeneity between units at individual or group levels can violate both components of the Stable-Unit-Treatment-Value-Assumption (SUTVA) that are core to the counterfactual

A major challenge in health-related policy and program evaluation research is attributing underlying causal relationships where complicated processes may exist in natural or quasi-experimental settings. Spatial interaction and heterogeneity between units at individual or group levels can violate both components of the Stable-Unit-Treatment-Value-Assumption (SUTVA) that are core to the counterfactual framework, making treatment effects difficult to assess. New approaches are needed in health studies to develop spatially dynamic causal modeling methods to both derive insights from data that are sensitive to spatial differences and dependencies, and also be able to rely on a more robust, dynamic technical infrastructure needed for decision-making. To address this gap with a focus on causal applications theoretically, methodologically and technologically, I (1) develop a theoretical spatial framework (within single-level panel econometric methodology) that extends existing theories and methods of causal inference, which tend to ignore spatial dynamics; (2) demonstrate how this spatial framework can be applied in empirical research; and (3) implement a new spatial infrastructure framework that integrates and manages the required data for health systems evaluation.

The new spatially explicit counterfactual framework considers how spatial effects impact treatment choice, treatment variation, and treatment effects. To illustrate this new methodological framework, I first replicate a classic quasi-experimental study that evaluates the effect of drinking age policy on mortality in the United States from 1970 to 1984, and further extend it with a spatial perspective. In another example, I evaluate food access dynamics in Chicago from 2007 to 2014 by implementing advanced spatial analytics that better account for the complex patterns of food access, and quasi-experimental research design to distill the impact of the Great Recession on the foodscape. Inference interpretation is sensitive to both research design framing and underlying processes that drive geographically distributed relationships. Finally, I advance a new Spatial Data Science Infrastructure to integrate and manage data in dynamic, open environments for public health systems research and decision- making. I demonstrate an infrastructure prototype in a final case study, developed in collaboration with health department officials and community organizations.
ContributorsKolak, Marynia Aniela (Author) / Anselin, Luc (Thesis advisor) / Rey, Sergio (Committee member) / Koschinsky, Julia (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)
Created2017
189209-Thumbnail Image.png
Description
In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained

In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained on massive curated data, they often need specific extracted knowledge to understand better and reason. This is because often relevant knowledge may be implicit or missing, which hampers machine reasoning. Apart from that, manual knowledge curation is time-consuming and erroneous. Hence, finding fast and effective methods to extract such knowledge from data is important for improving language models. This leads to finding ideal ways to utilize such knowledge by incorporating them into language models. Successful knowledge extraction and integration lead to an important question of knowledge evaluation of such models by developing tools or introducing challenging test suites to learn about their limitations and improve them further. So to improve the transformer-based models, understanding the role of knowledge becomes important. In the pursuit to improve language models with knowledge, in this dissertation I study three broad research directions spanning across the natural language, biomedical and cybersecurity domains: (1) Knowledge Extraction (KX) - How can transformer-based language models be leveraged to extract knowledge from data? (2) Knowledge Integration (KI) - How can such specific knowledge be used to improve such models? (3) Knowledge Evaluation (KE) - How can language models be evaluated for specific skills and understand their limitations? I propose methods to extract explicit textual, implicit structural, missing textual, and missing structural knowledge from natural language and binary programs using transformer-based language models. I develop ways to improve the language model’s multi-step and commonsense reasoning abilities using external knowledge. Finally, I develop challenging datasets which assess their numerical reasoning skills in both in-domain and out-of-domain settings.
ContributorsPal, Kuntal Kumar (Author) / Baral, Chitta (Thesis advisor) / Wang, Ruoyu (Committee member) / Blanco, Eduardo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2023