Search Content

Visual Analytic Tools for Geo-Genealogy and Geo-Demographics

Description

This work explores the development of a visual analytics tool for geodemographic exploration in an online environment. We mine 78 million records from the United States white pages, link the location data to demographic data (specifically income) from the United States Census Bureau, and allow users to interactively compare distributions…

This work explores the development of a visual analytics tool for geodemographic exploration in an online environment. We mine 78 million records from the United States white pages, link the location data to demographic data (specifically income) from the United States Census Bureau, and allow users to interactively compare distributions of names with regards to spatial location similarity and income. In order to enable interactive similarity exploration, we explore methods of pre-processing the data as well as on-the-fly lookups. As data becomes larger and more complex, the development of appropriate data storage and analytics solutions has become even more critical when enabling online visualization. We discuss problems faced in implementation, design decisions and directions for future work.

ContributorsIbarra, Jose Luis (Author) / Maciejewski, Ross (Thesis director) / Mack, Elizabeth (Committee member) / Longley, Paul (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2014-05

Constructing a Visualization Dashboard to Improve Educational Standards in Arizona Legislative Districts

Description

Education has been at the forefront of many issues in Arizona over the past several years with concerns over lack of funding sparking the Red for Ed movement. However, despite the push for educational change, there remain many barriers to education including a lack of visibility for how Arizona schools…

Education has been at the forefront of many issues in Arizona over the past several years with concerns over lack of funding sparking the Red for Ed movement. However, despite the push for educational change, there remain many barriers to education including a lack of visibility for how Arizona schools are performing at a legislative district level. While there are sources of information released at a school district level, many of these are limited and can become obscure to legislators when such school districts lie on the boundary between 2 different legislative districts. Moreover, much of this information is in the form of raw spreadsheets and is often fragmented between government websites and educational organizations. As such, a visualization dashboard that clearly identifies schools and their relative performance within each legislative district would be an extremely valuable tool to legislative bodies and the Arizona public. Although this dashboard and research are rough drafts of a larger concept, they would ideally increase transparency regarding public information about these districts and allow legislators to utilize the dashboard as a tool for greater understanding and more effective policymaking.

ContributorsColyar, Justin Dallas (Author) / Michael, Katina (Thesis director) / Maciejewski, Ross (Committee member) / Tate, Luke (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

An Empirical Study of View Construction for Multi-View Learning

Description

Multi-view learning, a subfield of machine learning that aims to improve model performance by training on multiple views of the data, has been studied extensively in the past decades. It is typically applied in contexts where the input features naturally form multiple groups or views. An example of a naturally…

Multi-view learning, a subfield of machine learning that aims to improve model performance by training on multiple views of the data, has been studied extensively in the past decades. It is typically applied in contexts where the input features naturally form multiple groups or views. An example of a naturally multi-view context is a data set of websites, where each website is described not only by the text on the page, but also by the text of hyperlinks pointing to the page. More recently, various studies have demonstrated the initial success of applying multi-view learning on single-view data with multiple artificially constructed views. However, there lacks a systematic study regarding the effectiveness of such artificially constructed views. To bridge this gap, this thesis begins by providing a high-level overview of multi-view learning with the co-training algorithm. Co-training is a classic semi-supervised learning algorithm that takes advantage of both labelled and unlabelled examples in the data set for training. Then, the thesis presents a web-based tool developed in Python allowing users to experiment with and compare the performance of multiple view construction approaches on various data sets. The supported view construction approaches in the web-based tool include subsampling, Optimal Feature Set Partitioning, and the genetic algorithm. Finally, the thesis presents an empirical comparison of the performance of these approaches, not only against one another, but also against traditional single-view models. The findings show that a simple subsampling approach combined with co-training often outperforms both the other view construction approaches, as well as traditional single-view methods.

ContributorsAksoy, Kaan (Author) / Maciejewski, Ross (Thesis director) / He, Jingrui (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

Visual Analytics and the Impact of Inter-Country Trade on Violence

Description

Global violent conflict has become an increasing problem in recent decades, especially in the African continent. Civil wars, terrorism, riots, and political violence has wrought havoc not only on civilian lives, but also on economic foundations. Trade networks are a way to measure these economic foundations. To summarize trade networks…

Global violent conflict has become an increasing problem in recent decades, especially in the African continent. Civil wars, terrorism, riots, and political violence has wrought havoc not only on civilian lives, but also on economic foundations. Trade networks are a way to measure these economic foundations. To summarize trade networks clustering coefficient as well as trade quantity/value summation measures are used. To understand effects of global trade on violent conflict, Pearson product-moment correlations are utilized. This work details a comparison of African national economies and violent conflict events using clustering coefficient, trade summation measures and Pearson correlation coefficient.

ContributorsKadambi, Sagarika Sanjay (Author) / Maciejewski, Ross (Thesis director) / Shutters, Shade (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

A Visual Analytics Workflow for Detecting Transition Regions in Large Scale Molecular Dynamics Simulations

Description

Molecular Dynamics (MD) simulations are ubiquitous throughout the physical sci-ences; they are critical in understanding how particle structures evolve over time given a particular energy function. A software package called ParSplice introduced a new method to generate these simulations in parallel that has significantly inflated their length. Typically, simulations are short discrete Markov…

Molecular Dynamics (MD) simulations are ubiquitous throughout the physical sci-ences; they are critical in understanding how particle structures evolve over time given a particular energy function. A software package called ParSplice introduced a new method to generate these simulations in parallel that has significantly inflated their length. Typically, simulations are short discrete Markov chains, only captur- ing a few microseconds of a particle’s behavior and containing tens of thousands of transitions between states; in contrast, a typical ParSplice simulation can be as long as a few milliseconds, containing tens of millions of transitions. Naturally, sifting through data of this size is impossible by hand, and there are a number of visualiza- tion systems that provide comprehensive and intuitive analyses of particle structures throughout MD simulations. However, no visual analytics systems have been built that can manage the simulations that ParSplice produces. To analyze these large data-sets, I built a visual analytics system that provides multiple coordinated views that simultaneously describe the data temporally, within its structural context, and based on its properties. The system provides fluid and powerful user interactions regardless of the size of the data, allowing the user to drill down into the data-set to get detailed insights, as well as run and save various calculations, most notably the Nudged Elastic Band method. The system also allows the comparison of multiple trajectories, revealing more information about the general behavior of particles at different temperatures, energy states etc.

ContributorsHnatyshyn, Rostyslav (Author) / Maciejewski, Ross (Thesis advisor) / Bryan, Chris (Committee member) / Ahrens, James (Committee member) / Arizona State University (Publisher)

Created2022

Explaining the Vulnerabilities of Machine Learning through Visual Analytics

Description

Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and safety of these models, given the potential negative consequences of…

Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and safety of these models, given the potential negative consequences of model vulnerabilities. The complexity of machine learning models, along with the extensive data sets they analyze, can result in unpredictable and unintended outcomes. Model vulnerabilities may manifest due to errors in data input, algorithm design, or model deployment, which can have significant implications for both individuals and society. To prevent such negative outcomes, it is imperative to identify model vulnerabilities at an early stage in the development process. This will aid in guaranteeing the integrity, dependability, and safety of the models, thus mitigating potential risks and enabling the full potential of these technologies to be realized. However, enumerating vulnerabilities can be challenging due to the complexity of the real-world environment. Visual analytics, situated at the intersection of human-computer interaction, computer graphics, and artificial intelligence, offers a promising approach for achieving high interpretability of complex black-box models, thus reducing the cost of obtaining insights into potential vulnerabilities of models. This research is devoted to designing novel visual analytics methods to support the identification and analysis of model vulnerabilities. Specifically, generalizable visual analytics frameworks are instantiated to explore vulnerabilities in machine learning models concerning security (adversarial attacks and data perturbation) and fairness (algorithmic bias). In the end, a visual analytics approach is proposed to enable domain experts to explain and diagnose the model improvement of addressing identified vulnerabilities of machine learning models in a human-in-the-loop fashion. The proposed methods hold the potential to enhance the security and fairness of machine learning models deployed in critical real-world applications.

ContributorsXie, Tiankai (Author) / Maciejewski, Ross (Thesis advisor) / Liu, Huan (Committee member) / Bryan, Chris (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2023

A Spatial Decision Support System for Oil Spill Response and Recovery

Description

Coastal areas are susceptible to man-made disasters, such as oil spills, which not

only have a dreadful impact on the lives of coastal communities and businesses but also

have lasting and hazardous consequences. The United States coastal areas, especially

the Gulf of Mexico, have witnessed devastating oil spills of varied sizes and durations

that…

Coastal areas are susceptible to man-made disasters, such as oil spills, which not

only have a dreadful impact on the lives of coastal communities and businesses but also

have lasting and hazardous consequences. The United States coastal areas, especially

the Gulf of Mexico, have witnessed devastating oil spills of varied sizes and durations

that resulted in major economic and ecological losses. These disasters affected the oil,

housing, forestry, tourism, and fishing industries with overall costs exceeding billions

of dollars (Baade et al. (2007); Smith et al. (2011)). Extensive research has been

done with respect to oil spill simulation techniques, spatial optimization models, and

innovative strategies to deal with spill response and planning efforts. However, most

of the research done in those areas is done independently of each other, leaving a

conceptual void between them.

In the following work, this thesis presents a Spatial Decision Support System

(SDSS), which efficiently integrates the independent facets of spill modeling techniques

and spatial optimization to enable officials to investigate and explore the various

options to clean up an offshore oil spill to make a more informed decision. This

thesis utilizes Blowout and Spill Occurrence Model (BLOSOM) developed by Sim

et al. (2015) to simulate hypothetical oil spill scenarios, followed by the Oil Spill

Cleanup and Operational Model (OSCOM) developed by Grubesic et al. (2017) to

spatially optimize the response efforts. The results of this combination are visualized

in the SDSS, featuring geographical maps, so the boat ramps from which the response

should be launched can be easily identified along with the amount of oil that hits the

shore thereby visualizing the intensity of the impact of the spill in the coastal areas

for various cleanup targets.

ContributorsPydi Medini, Prannoy Chandra (Author) / Maciejewski, Ross (Thesis advisor) / Grubesic, Anthony (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2018

The Perception of Graph Properties In Graph Layouts

Description

When looking at drawings of graphs, questions about graph density, community structures, local clustering and other graph properties may be of critical importance for analysis. While graph layout algorithms have focused on minimizing edge crossing, symmetry, and other such layout properties, there is not much known about how these algorithms…

When looking at drawings of graphs, questions about graph density, community structures, local clustering and other graph properties may be of critical importance for analysis. While graph layout algorithms have focused on minimizing edge crossing, symmetry, and other such layout properties, there is not much known about how these algorithms relate to a user’s ability to perceive graph properties for a given graph layout. This study applies previously established methodologies for perceptual analysis to identify which graph drawing layout will help the user best perceive a particular graph property. A large scale (n = 588) crowdsourced experiment is conducted to investigate whether the perception of two graph properties (graph density and average local clustering coefficient) can be modeled using Weber’s law. Three graph layout algorithms from three representative classes (Force Directed - FD, Circular, and Multi-Dimensional Scaling - MDS) are studied, and the results of this experiment establish the precision of judgment for these graph layouts and properties. The findings demonstrate that the perception of graph density can be modeled with Weber’s law. Furthermore, the perception of the average clustering coefficient can be modeled as an inverse of Weber’s law, and the MDS layout showed a significantly different precision of judgment than the FD layout.

ContributorsSoni, Utkarsh (Author) / Maciejewski, Ross (Thesis advisor) / Kobourov, Stephen (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2018

A Framework for Spatial Database Explanations

Description

In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by the fact that Big Data Research and Development Initiative was…

In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by the fact that Big Data Research and Development Initiative was announced by the United States administration in 2012 to address problems faced by the government. Various states and cities in the US gather spatial data about incidents like police calls for service.

When we query large amounts of data, it may lead to a lot of questions. For example, when we look at arithmetic relationships between queries in heterogeneous data, there are a lot of differences. How can we explain what factors account for these differences? If we define the observation as an arithmetic relationship between queries, this kind of problem can be solved by aggravation or intervention. Aggravation views the value of our observation for different set of tuples while intervention looks at the value of the observation after removing sets of tuples. We call the predicates which represent these tuples, explanations. Observations by themselves have limited importance. For example, if we observe a large number of taxi trips in a specific area, we might ask the question: Why are there so many trips here? Explanations attempt to answer these kinds of questions.

While aggravation and intervention are designed for non spatial data, we propose a new approach for explaining spatially heterogeneous data. Our approach expands on aggravation and intervention while using spatial partitioning/clustering to improve explanations for spatial data. Our proposed approach was evaluated against a real-world taxi dataset as well as a synthetic disease outbreak datasets. The approach was found to outperform aggravation in precision and recall while outperforming intervention in precision.

ContributorsTahir, Anique (Author) / Elsayed, Mohamed (Thesis advisor) / Hsiao, Ihan (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2018

Learning from task heterogeneity in social media

Description

In recent years, the rise in social media usage both vertically in terms of the number of users by platform and horizontally in terms of the number of platforms per user has led to data explosion.

User-generated social media content provides an excellent opportunity to mine data of interest and to…

In recent years, the rise in social media usage both vertically in terms of the number of users by platform and horizontally in terms of the number of platforms per user has led to data explosion.

User-generated social media content provides an excellent opportunity to mine data of interest and to build resourceful applications. The rise in the number of healthcare-related social media platforms and the volume of healthcare knowledge available online in the last decade has resulted in increased social media usage for personal healthcare. In the United States, nearly ninety percent of adults, in the age group 50-75, have used social media to seek and share health information. Motivated by the growth of social media usage, this thesis focuses on healthcare-related applications, study various challenges posed by social media data, and address them through novel and effective machine learning algorithms.

The major challenges for effectively and efficiently mining social media data to build functional applications include: (1) Data reliability and acceptance: most social media data (especially in the context of healthcare-related social media) is not regulated and little has been studied on the benefits of healthcare-specific social media; (2) Data heterogeneity: social media data is generated by users with both demographic and geographic diversity; (3) Model transparency and trustworthiness: most existing machine learning models for addressing heterogeneity are considered as black box models, not many providing explanations for why they do what they do to trust them.

In response to these challenges, three main research directions have been investigated in this thesis: (1) Analyzing social media influence on healthcare: to study the real world impact of social media as a source to offer or seek support for patients with chronic health conditions; (2) Learning from task heterogeneity: to propose various models and algorithms that are adaptable to new social media platforms and robust to dynamic social media data, specifically on modeling user behaviors, identifying similar actors across platforms, and adapting black box models to a specific learning scenario; (3) Explaining heterogeneous models: to interpret predictive models in the presence of task heterogeneity. In this thesis, novel algorithms with theoretical analysis from various aspects (e.g., time complexity, convergence properties) have been proposed. The effectiveness and efficiency of the proposed algorithms is demonstrated by comparison with state-of-the-art methods and relevant case studies.

ContributorsNelakurthi, Arun Reddy (Author) / He, Jingrui (Thesis advisor) / Cook, Curtiss B (Committee member) / Maciejewski, Ross (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by