For this creative project, I created a visually immersive and artistic data visualization of global space-related activities. The project aims to create a sense of wonder and creativity for space exploration through unconventional data visualization. By focusing mainly on the…
For this creative project, I created a visually immersive and artistic data visualization of global space-related activities. The project aims to create a sense of wonder and creativity for space exploration through unconventional data visualization. By focusing mainly on the artistic elements of the visualization, the project will have a larger emotional impact on its viewers, as opposed to a traditional data visualization. The project uses a comprehensive dataset of space-related articles, all of which include the location of the activity discussed in the article, as well as keywords and other fields. The dataset will serve as material to create a narrative that shows not only how space-related activities are distributed around the globe but also the overarching themes of the activities. To create the final project, I used the JavaScript library p5.js.
Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on…
Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on textual distribution shifts rather than joint shifts across modalities. This is suboptimal for properly assessing model robustness and generalization. To address this gap, a novel multi-modal VQA benchmark dataset is introduced for the first time. This dataset combines both visual and textual distribution shifts across training and test sets. Using this challenging benchmark exposes vulnerabilities in existing models relying on spurious correlations and overfitting to dataset biases. The novel dataset advances the field by enabling more robust model training and rigorous evaluation of multi-modal distribution shift generalization. In addition, a new few-shot multi-modal prompt fusion model is proposed to better adapt models for downstream VQA tasks. The model incorporates a prompt encoder module and dual-path design to align and fuse image and text prompts. This represents a novel prompt learning approach tailored for multi-modal learning across vision and language. Together, the introduced benchmark dataset and prompt fusion model address key limitations around evaluating and improving VQA model robustness. The work expands the methodology for training models resilient to multi-modal distribution shifts.
Gerrymandering involves the purposeful manipulation of districts in order to gain some political advantage. Because legislators have a vested interest in continuing their tenure, they can easily hijack the redistricting process each decade for their and their political party's benefit.…
Gerrymandering involves the purposeful manipulation of districts in order to gain some political advantage. Because legislators have a vested interest in continuing their tenure, they can easily hijack the redistricting process each decade for their and their political party's benefit. This threatens the cornerstone of democracy: a voter’s capability to select an elected official that accurately represents their interests. Instead, gerrymandering has legislators to choose their voters.
In recent years, the Supreme Court has heard challenges to state legislature-drawn districts, most recently in Allen v. Milligan for Alabama and Moore v. Harper for North Carolina. The highest court of the United States ruled that the two state maps were gerrymandered, and in coming to their decision, the 9 justices relied on a plethora of amicus briefs- one of which included the Markov Chain Monte Carlo method, a computational method used to find gerrymandering.
Because of how widespread gerrymandering has become on both sides of the political aisle, states have moved to create independent redistricting commissions. Qualitative research regarding the efficacy of independent commissions is present, but there is little research using the quantitative computational methods from these SCOTUS cases. As a result, my thesis will use the Markov Chain Monte Carlo method to answer if impartial redistricting commissions (like we have in Arizona) actually preclude unfair redistricting practices.
My completed project is located here:
https://dheetideliwala.github.io/honors-thesis/
Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and…
Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and safety of these models, given the potential negative consequences of model vulnerabilities. The complexity of machine learning models, along with the extensive data sets they analyze, can result in unpredictable and unintended outcomes. Model vulnerabilities may manifest due to errors in data input, algorithm design, or model deployment, which can have significant implications for both individuals and society. To prevent such negative outcomes, it is imperative to identify model vulnerabilities at an early stage in the development process. This will aid in guaranteeing the integrity, dependability, and safety of the models, thus mitigating potential risks and enabling the full potential of these technologies to be realized. However, enumerating vulnerabilities can be challenging due to the complexity of the real-world environment. Visual analytics, situated at the intersection of human-computer interaction, computer graphics, and artificial intelligence, offers a promising approach for achieving high interpretability of complex black-box models, thus reducing the cost of obtaining insights into potential vulnerabilities of models. This research is devoted to designing novel visual analytics methods to support the identification and analysis of model vulnerabilities. Specifically, generalizable visual analytics frameworks are instantiated to explore vulnerabilities in machine learning models concerning security (adversarial attacks and data perturbation) and fairness (algorithmic bias). In the end, a visual analytics approach is proposed to enable domain experts to explain and diagnose the model improvement of addressing identified vulnerabilities of machine learning models in a human-in-the-loop fashion. The proposed methods hold the potential to enhance the security and fairness of machine learning models deployed in critical real-world applications.
Distributed databases, such as Log-Structured Merge-Tree Key-Value Stores (LSM-KVS), are widely used in modern infrastructure. One of the primary challenges in these databases is ensuring consistency, meaning that all nodes have the same view of data at any given time.…
Distributed databases, such as Log-Structured Merge-Tree Key-Value Stores (LSM-KVS), are widely used in modern infrastructure. One of the primary challenges in these databases is ensuring consistency, meaning that all nodes have the same view of data at any given time. However, maintaining consistency requires a trade-off: the stronger the consistency, the more resources are necessary to replicate data across replicas, which decreases database performance. Addressing this trade-off poses two challenges: first, developing and managing multiple consistency levels within a single system, and second, assigning consistency levels to effectively balance the consistency-performance trade-off. This thesis introduces Self-configuring Consistency In Distributed LSM-KVS (SCID), a service that leverages unique properties of LSM KVS properties to manage consistency levels and automates level assignment with ML. To address the first challenge, SCID combines Dynamic read-only instances and Logical KV-based partitions to enable on-demand updates of read-only instances and facilitate the logical separation of groups of key-value pairs. SCID uses logical partitions as consistency levels and on-demand updates in dynamic read-only instances to allow for multiple consistency levels. To address the second challenge, the thesis presents an ML-based solution, SCID-ML to manage consistency-performance trade-off with better effectiveness. We evaluate SCID and find it to improve the write throughput up to 50% and achieve 62% accuracy for consistency-level predictions.
Component-based models are commonly employed to simulate discrete dynamicalsystems. These models lend themselves to formalizing the structures of systems at multiple levels of granularity. Visual development of component-based models serves to simplify the iterative and incremental model specification activities. The…
Component-based models are commonly employed to simulate discrete dynamicalsystems. These models lend themselves to formalizing the structures of systems at multiple levels of granularity. Visual development of component-based models serves to simplify the iterative and incremental model specification activities. The Parallel Discrete Events System Specification (DEVS) formalism offers a flexible yet rigorous approach for decomposing a whole model into its components or alternatively, composing a whole model from components. While different concepts, frameworks, and tools offer a variety of visual modeling capabilities, most pose limitations, such as visualizing multiple model hierarchies at any level with arbitrary depths. The visual and persistent layout of any number of hierarchy levels of models can be maintained and navigated seamlessly. Persistence storage is another capability needed for the modeling, simulating, verifying, and validating lifecycle. These are important features to improve the demanding task of creating and changing modular, hierarchical simulation models. This thesis proposes a new approach and develops a tool for the visual development of models. This tool supports storing and reconstructing graphical models using a NoSQL database. It offers unique capabilities important for developing increasingly larger and more complex models essential for analyzing, designing, and building Digital Twins.
As people begin to live longer and the population shifts to having more olderadults on Earth than young children, radical solutions will be needed to ease the
burden on society. It will be essential to develop technology that can age with…
As people begin to live longer and the population shifts to having more olderadults on Earth than young children, radical solutions will be needed to ease the
burden on society. It will be essential to develop technology that can age with the
individual. One solution is to keep older adults in their homes longer through smart
home and smart living technology, allowing them to age in place. People have many
choices when choosing where to age in place, including their own homes, assisted
living facilities, nursing homes, or family members. No matter where people choose to
age, they may face isolation and financial hardships. It is crucial to keep finances in
mind when developing Smart Home technology.
Smart home technologies seek to allow individuals to stay inside their homes for
as long as possible, yet little work looks at how we can use technology in different
life stages. Robots are poised to impact society and ease burns at home and in the
workforce. Special attention has been given to social robots to ease isolation. As
social robots become accepted into society, researchers need to understand how these
robots should mimic natural conversation. My work attempts to answer this question
within social robotics by investigating how to make conversational robots natural and
reciprocal.
I investigated this through a 2x2 Wizard of Oz between-subjects user study. The
study lasted four months, testing four different levels of interactivity with the robot.
None of the levels were significantly different from the others, an unexpected result. I
then investigated the robot’s personality, the participant’s trust, and the participant’s
acceptance of the robot and how that influenced the study.
Molecular Dynamics (MD) simulations are ubiquitous throughout the physical sci-ences; they are critical in understanding how particle structures evolve over time
given a particular energy function. A software package called ParSplice introduced a
new method to generate these simulations in parallel that…
Molecular Dynamics (MD) simulations are ubiquitous throughout the physical sci-ences; they are critical in understanding how particle structures evolve over time
given a particular energy function. A software package called ParSplice introduced a
new method to generate these simulations in parallel that has significantly inflated
their length. Typically, simulations are short discrete Markov chains, only captur-
ing a few microseconds of a particle’s behavior and containing tens of thousands of
transitions between states; in contrast, a typical ParSplice simulation can be as long
as a few milliseconds, containing tens of millions of transitions. Naturally, sifting
through data of this size is impossible by hand, and there are a number of visualiza-
tion systems that provide comprehensive and intuitive analyses of particle structures
throughout MD simulations. However, no visual analytics systems have been built
that can manage the simulations that ParSplice produces. To analyze these large
data-sets, I built a visual analytics system that provides multiple coordinated views
that simultaneously describe the data temporally, within its structural context, and
based on its properties. The system provides fluid and powerful user interactions
regardless of the size of the data, allowing the user to drill down into the data-set to
get detailed insights, as well as run and save various calculations, most notably the
Nudged Elastic Band method. The system also allows the comparison of multiple
trajectories, revealing more information about the general behavior of particles at different temperatures, energy states etc.
Data integration involves the reconciliation of data from diverse data sources in order to obtain a unified data repository, upon which an end user such as a data analyst can run analytics sessions to explore the data and obtain useful…
Data integration involves the reconciliation of data from diverse data sources in order to obtain a unified data repository, upon which an end user such as a data analyst can run analytics sessions to explore the data and obtain useful insights. Supervised Machine Learning (ML) for data integration tasks such as ontology (schema) or entity (instance) matching requires several training examples in terms of manually curated, pre-labeled matching and non-matching schema concept or entity pairs which are hard to obtain. On similar lines, an analytics system without predictive capabilities about the impending workload can incur huge querying latencies, while leaving the onus of understanding the underlying database schema and writing a meaningful query at every step during a data exploration session on the user. In this dissertation, I will describe the human-in-the-loop Machine Learning (ML) systems that I have built towards data integration and predictive analytics. I alleviate the need for extensive prior labeling by utilizing active learning (AL) for dataintegration. In each AL iteration, I detect the unlabeled entity or schema concept pairs that would strengthen the ML classifier and selectively query the human oracle for such labels in a budgeted fashion. Thus, I make use of human assistance for ML-based data integration. On the other hand, when the human is an end user exploring data through Online Analytical Processing (OLAP) queries, my goal is to pro-actively assist the human by predicting the top-K next queries that s/he is likely to be interested in. I will describe my proposed SQL-predictor, a Business Intelligence (BI) query predictor and a geospatial query cardinality estimator with an emphasis on schema abstraction, query representation and how I adapt the ML models for these tasks. For each system, I will discuss the evaluation metrics and how the proposed systems compare to the state-of-the-art baselines on multiple datasets and query workloads.
The drone industry is worth nearly 50 billion dollars in the public sector, and drone flight anomalies can cost up to 12 million dollars per drone. The project's objective is to explore various machine-learning techniques to identify anomalies in drone…
The drone industry is worth nearly 50 billion dollars in the public sector, and drone flight anomalies can cost up to 12 million dollars per drone. The project's objective is to explore various machine-learning techniques to identify anomalies in drone flight and express these anomalies effectively by creating relevant visualizations. The research goal is to solve the problem of finding anomalies inside drones to determine severity levels. The solution was visualization and statistical models, and the contribution was visualizations, patterns, models, and the interface.