Matching Items (7)
Filtering by

Clear all filters

151349-Thumbnail Image.png
Description
This dissertation addresses the research challenge of developing efficient new methods for discovering useful patterns and knowledge in large volumes of electronically collected spatiotemporal activity data. I propose to analyze three types of such spatiotemporal activity data in a methodological framework that integrates spatial analysis, data mining, machine learning, and

This dissertation addresses the research challenge of developing efficient new methods for discovering useful patterns and knowledge in large volumes of electronically collected spatiotemporal activity data. I propose to analyze three types of such spatiotemporal activity data in a methodological framework that integrates spatial analysis, data mining, machine learning, and geovisualization techniques. Three different types of spatiotemporal activity data were collected through different data collection approaches: (1) crowd sourced geo-tagged digital photos, representing people's travel activity, were retrieved from the website Panoramio.com through information retrieval techniques; (2) the same techniques were used to crawl crowd sourced GPS trajectory data and related metadata of their daily activities from the website OpenStreetMap.org; and finally (3) preschool children's daily activities and interactions tagged with time and geographical location were collected with a novel TabletPC-based behavioral coding system. The proposed methodology is applied to these data to (1) automatically recommend optimal multi-day and multi-stay travel itineraries for travelers based on discovered attractions from geo-tagged photos, (2) automatically detect movement types of unknown moving objects from GPS trajectories, and (3) explore dynamic social and socio-spatial patterns of preschool children's behavior from both geographic and social perspectives.
ContributorsLi, Xun (Author) / Anselin, Luc (Thesis advisor) / Koschinsky, Julia (Committee member) / Maciejewski, Ross (Committee member) / Rey, Sergio (Committee member) / Griffin, William (Committee member) / Arizona State University (Publisher)
Created2012
151538-Thumbnail Image.png
Description
There exist many facets of error and uncertainty in digital spatial information. As error or uncertainty will not likely ever be completely eliminated, a better understanding of its impacts is necessary. Spatial analytical approaches, in particular, must somehow address data quality issues. This can range from evaluating impacts of potential

There exist many facets of error and uncertainty in digital spatial information. As error or uncertainty will not likely ever be completely eliminated, a better understanding of its impacts is necessary. Spatial analytical approaches, in particular, must somehow address data quality issues. This can range from evaluating impacts of potential data uncertainty in planning processes that make use of methods to devising methods that explicitly account for error/uncertainty. To date, little has been done to structure methods accounting for error. This research focuses on developing methods to address geographic data uncertainty in spatial optimization. An integrated approach that characterizes uncertainty impacts by constructing and solving a new multi-objective model that explicitly incorporates facets of data uncertainty is developed. Empirical findings illustrate that the proposed approaches can be applied to evaluate the impacts of data uncertainty with statistical confidence, which moves beyond popular practices of simulating errors in data. Spatial uncertainty impacts are evaluated in two contexts: harvest scheduling and sex offender residency. Owing to the integration of spatial uncertainty, the detailed multi-objective models are more complex and computationally challenging to solve. As a result, a new multi-objective evolutionary algorithm is developed to address the computational challenges posed. The proposed algorithm incorporates problem-specific spatial knowledge to significantly enhance the capability of the evolutionary algorithm for solving the model.  
ContributorsWei, Ran (Author) / Murray, Alan T. (Thesis advisor) / Anselin, Luc (Committee member) / Rey, Segio J (Committee member) / Mack, Elizabeth A. (Committee member) / Arizona State University (Publisher)
Created2013
152171-Thumbnail Image.png
Description

Choropleth maps are a common form of online cartographic visualization. They reveal patterns in spatial distributions of a variable by associating colors with data values measured at areal units. Although this capability of pattern revelation has popularized the use of choropleth maps, existing methods for their online delivery are limited

Choropleth maps are a common form of online cartographic visualization. They reveal patterns in spatial distributions of a variable by associating colors with data values measured at areal units. Although this capability of pattern revelation has popularized the use of choropleth maps, existing methods for their online delivery are limited in supporting dynamic map generation from large areal data. This limitation has become increasingly problematic in online choropleth mapping as access to small area statistics, such as high-resolution census data and real-time aggregates of geospatial data streams, has never been easier due to advances in geospatial web technologies. The current literature shows that the challenge of large areal data can be mitigated through tiled maps where pre-processed map data are hierarchically partitioned into tiny rectangular images or map chunks for efficient data transmission. Various approaches have emerged lately to enable this tile-based choropleth mapping, yet little empirical evidence exists on their ability to handle spatial data with large numbers of areal units, thus complicating technical decision making in the development of online choropleth mapping applications. To fill this knowledge gap, this dissertation study conducts a scalability evaluation of three tile-based methods discussed in the literature: raster, scalable vector graphics (SVG), and HTML5 Canvas. For the evaluation, the study develops two test applications, generates map tiles from five different boundaries of the United States, and measures the response times of the applications under multiple test operations. While specific to the experimental setups of the study, the evaluation results show that the raster method scales better across various types of user interaction than the other methods. Empirical evidence also points to the superior scalability of Canvas to SVG in dynamic rendering of vector tiles, but not necessarily for partial updates of the tiles. These findings indicate that the raster method is better suited for dynamic choropleth rendering from large areal data, while Canvas would be more suitable than SVG when such rendering frequently involves complete updates of vector shapes.

ContributorsHwang, Myunghwa (Author) / Anselin, Luc (Thesis advisor) / Rey, Sergio J. (Committee member) / Wentz, Elizabeth (Committee member) / Arizona State University (Publisher)
Created2013
154079-Thumbnail Image.png
Description
Nearly 25 years ago, parallel computing techniques were first applied to vector spatial analysis methods. This initial research was driven by the desire to reduce computing times in order to support scaling to larger problem sets. Since this initial work, rapid technological advancement has driven the availability of High Performance

Nearly 25 years ago, parallel computing techniques were first applied to vector spatial analysis methods. This initial research was driven by the desire to reduce computing times in order to support scaling to larger problem sets. Since this initial work, rapid technological advancement has driven the availability of High Performance Computing (HPC) resources, in the form of multi-core desktop computers, distributed geographic information processing systems, e.g. computational grids, and single site HPC clusters. In step with increases in computational resources, significant advancement in the capabilities to capture and store large quantities of spatially enabled data have been realized. A key component to utilizing vast data quantities in HPC environments, scalable algorithms, have failed to keep pace. The National Science Foundation has identified the lack of scalable algorithms in codified frameworks as an essential research product. Fulfillment of this goal is challenging given the lack of a codified theoretical framework mapping atomic numeric operations from the spatial analysis stack to parallel programming paradigms, the diversity in vernacular utilized by research groups, the propensity for implementations to tightly couple to under- lying hardware, and the general difficulty in realizing scalable parallel algorithms. This dissertation develops a taxonomy of parallel vector spatial analysis algorithms with classification being defined by root mathematical operation and communication pattern, a computational dwarf. Six computational dwarfs are identified, three being drawn directly from an existing parallel computing taxonomy and three being created to capture characteristics unique to spatial analysis algorithms. The taxonomy provides a high-level classification decoupled from low-level implementation details such as hardware, communication protocols, implementation language, decomposition method, or file input and output. By taking a high-level approach implementation specifics are broadly proposed, breadth of coverage is achieved, and extensibility is ensured. The taxonomy is both informed and informed by five case studies im- plemented across multiple, divergent hardware environments. A major contribution of this dissertation is a theoretical framework to support the future development of concrete parallel vector spatial analysis frameworks through the identification of computational dwarfs and, by extension, successful implementation strategies.
ContributorsLaura, Jason (Author) / Rey, Sergio J. (Thesis advisor) / Anselin, Luc (Committee member) / Wang, Shaowen (Committee member) / Li, Wenwen (Committee member) / Arizona State University (Publisher)
Created2015
136787-Thumbnail Image.png
Description
There is a serious need for early childhood intervention practices for children who are living at or below the poverty line. Since 1965 Head Start has provided a federally funded, free preschool program for children in this population. The City of Phoenix Head Start program consists of nine delegate agencies,

There is a serious need for early childhood intervention practices for children who are living at or below the poverty line. Since 1965 Head Start has provided a federally funded, free preschool program for children in this population. The City of Phoenix Head Start program consists of nine delegate agencies, seven of which reside in school districts. These agencies are currently not conducting local longitudinal evaluations of their preschool graduates. The purpose of this study was to recommend initial steps the City of Phoenix grantee and the delegate agencies can take to begin a longitudinal evaluation process of their Head Start programs. Seven City of Phoenix Head Start agency directors were interviewed. These interviews provided information about the attitudes of the directors when considering longitudinal evaluations and how Head Start already evaluates their programs through internal assessments. The researcher also took notes on the Third Grade Follow-Up to the Head Start Executive Summary in order to make recommendations to the City of Phoenix Head Start programs about the best practices for longitudinal student evaluations.
Created2014-05
154146-Thumbnail Image.png
Description
Science instructors need questions for use in exams, homework assignments, class discussions, reviews, and other instructional activities. Textbooks never have enough questions, so instructors must find them from other sources or generate their own questions. In order to supply instructors with biology questions, a semantic network approach was

Science instructors need questions for use in exams, homework assignments, class discussions, reviews, and other instructional activities. Textbooks never have enough questions, so instructors must find them from other sources or generate their own questions. In order to supply instructors with biology questions, a semantic network approach was developed for generating open response biology questions. The generated questions were compared to professional authorized questions.

To boost students’ learning experience, adaptive selection was built on the generated questions. Bayesian Knowledge Tracing was used as embedded assessment of the student’s current competence so that a suitable question could be selected based on the student’s previous performance. A between-subjects experiment with 42 participants was performed, where half of the participants studied with adaptive selected questions and the rest studied with mal-adaptive order of questions. Both groups significantly improved their test scores, and the participants in adaptive group registered larger learning gains than participants in the control group.

To explore the possibility of generating rich instructional feedback for machine-generated questions, a question-paragraph mapping task was identified. Given a set of questions and a list of paragraphs for a textbook, the goal of the task was to map the related paragraphs to each question. An algorithm was developed whose performance was comparable to human annotators.

A multiple-choice question with high quality distractors (incorrect answers) can be pedagogically valuable as well as being much easier to grade than open-response questions. Thus, an algorithm was developed to generate good distractors for multiple-choice questions. The machine-generated multiple-choice questions were compared to human-generated questions in terms of three measures: question difficulty, question discrimination and distractor usefulness. By recruiting 200 participants from Amazon Mechanical Turk, it turned out that the two types of questions performed very closely on all the three measures.
ContributorsZhang, Lishang (Author) / VanLehn, Kurt (Thesis advisor) / Baral, Chitta (Committee member) / Hsiao, Ihan (Committee member) / Wright, Christian (Committee member) / Arizona State University (Publisher)
Created2015
189209-Thumbnail Image.png
Description
In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained

In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained on massive curated data, they often need specific extracted knowledge to understand better and reason. This is because often relevant knowledge may be implicit or missing, which hampers machine reasoning. Apart from that, manual knowledge curation is time-consuming and erroneous. Hence, finding fast and effective methods to extract such knowledge from data is important for improving language models. This leads to finding ideal ways to utilize such knowledge by incorporating them into language models. Successful knowledge extraction and integration lead to an important question of knowledge evaluation of such models by developing tools or introducing challenging test suites to learn about their limitations and improve them further. So to improve the transformer-based models, understanding the role of knowledge becomes important. In the pursuit to improve language models with knowledge, in this dissertation I study three broad research directions spanning across the natural language, biomedical and cybersecurity domains: (1) Knowledge Extraction (KX) - How can transformer-based language models be leveraged to extract knowledge from data? (2) Knowledge Integration (KI) - How can such specific knowledge be used to improve such models? (3) Knowledge Evaluation (KE) - How can language models be evaluated for specific skills and understand their limitations? I propose methods to extract explicit textual, implicit structural, missing textual, and missing structural knowledge from natural language and binary programs using transformer-based language models. I develop ways to improve the language model’s multi-step and commonsense reasoning abilities using external knowledge. Finally, I develop challenging datasets which assess their numerical reasoning skills in both in-domain and out-of-domain settings.
ContributorsPal, Kuntal Kumar (Author) / Baral, Chitta (Thesis advisor) / Wang, Ruoyu (Committee member) / Blanco, Eduardo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2023