Search Content

Space adaptation techniques for preference oriented skyline processing

Description

Skyline queries are a well-established technique used in multi criteria decision applications. There is a recent interest among the research community to efficiently compute skylines but the problem of presenting the skyline that takes into account the preferences of the user is still open. Each user has varying interests towards…

Skyline queries are a well-established technique used in multi criteria decision applications. There is a recent interest among the research community to efficiently compute skylines but the problem of presenting the skyline that takes into account the preferences of the user is still open. Each user has varying interests towards each attribute and hence "one size fits all" methodology might not satisfy all the users. True user satisfaction can be obtained only when the skyline is tailored specifically for each user based on his preferences.

This research investigates the problem of preference aware skyline processing which consists of inferring the preferences of users and computing a skyline specific to that user, taking into account his preferences. This research proposes a model that transforms the data from a given space to a user preferential space where each attribute represents the preference of the user. This study proposes two techniques "Preferential Skyline Processing" and "Latent Skyline Processing" to efficiently compute preference aware skylines in the user preferential space. Finally, through extensive experiments and performance analysis the correctness of the recommendations and the algorithm's ability to outperform the naïve ones is confirmed.

ContributorsRathinavelu, Sriram (Author) / Candan, Kasim Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2014

SearchViz: an interactive visual interface to navigate search-results in online discussion forums

Description

Online programming communities are widely used by programmers for troubleshooting or various problem solving tasks. Large and ever increasing volume of posts on these communities demands more efforts to read and comprehend thus making it harder to find relevant information. In my thesis; I designed and studied an alternate approach…

Online programming communities are widely used by programmers for troubleshooting or various problem solving tasks. Large and ever increasing volume of posts on these communities demands more efforts to read and comprehend thus making it harder to find relevant information. In my thesis; I designed and studied an alternate approach by using interactive network visualization to represent relevant search results for online programming discussion forums.

I conducted user study to evaluate the effectiveness of this approach. Results show that users were able to identify relevant information more precisely via visual interface as compared to traditional list based approach. Network visualization demonstrated effective search-result navigation support to facilitate user’s tasks and improved query quality for successive queries. Subjective evaluation also showed that visualizing search results conveys more semantic information in efficient manner and makes searching more effective.

ContributorsMehta, Vishal Vimal (Author) / Hsiao, Ihan (Thesis advisor) / Walker, Erin (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2015

Stochastic optimization for feasibility determination: an application to water pump operation in water distribution network

Description

The energy consumption by public drinking water and wastewater utilities represent up to 30%-40% of a municipality energy bill. The largest energy consumption is used to operate motors for pumping. As a result, the engineering and control community develop the Variable Speed Pumps (VSPs) which allow for regulating valves in…

The energy consumption by public drinking water and wastewater utilities represent up to 30%-40% of a municipality energy bill. The largest energy consumption is used to operate motors for pumping. As a result, the engineering and control community develop the Variable Speed Pumps (VSPs) which allow for regulating valves in the network instead of the traditional binary ON/OFF pumps. Potentially, VSPs save up to 90% of annual energy cost compared to the binary pump. The control problem has been tackled in the literature as “Pump Scheduling Optimization” (PSO) with a main focus on the cost minimization. Nonetheless, engineering literature is mostly concerned with the problem of understanding “healthy working conditions” (e.g., leakages, breakages) for a water infrastructure rather than the costs. This is very critical because if we operate a network under stress, it may satisfy the demand at present but will likely hinder network functionality in the future.

This research addresses the problem of analyzing working conditions of large water systems by means of a detailed hydraulic simulation model (e.g., EPANet) to gain insights into feasibility with respect to pressure, tank level, etc. This work presents a new framework called Feasible Set Approximation – Probabilistic Branch and Bound (FSA-PBnB) for the definition and determination of feasible solutions in terms of pumps regulation. We propose the concept of feasibility distance, which is measured as the distance of the current solution from the feasibility frontier to estimate the distribution of the feasibility values across the solution space. Based on this estimate, pruning the infeasible regions and maintaining the feasible regions are proposed to identify the desired feasible solutions. We test the proposed algorithm with both theoretical and real water networks. The results demonstrate that FSA-PBnB has the capability to identify the feasibility profile in an efficient way. Additionally, with the feasibility distance, we can understand the quality of sub-region in terms of feasibility.

The present work provides a basic feasibility determination framework on the low dimension problems. When FSA-PBnB extends to large scale constraint optimization problems, a more intelligent sampling method may be developed to further reduce the computational effort.

ContributorsTsai, Yi-An (Author) / Pedrielli, Giulia (Thesis advisor) / Mirchandani, Pitu (Committee member) / Mascaro, Giuseppe (Committee member) / Zabinsky, Zelda (Committee member) / Candelieri, Antonio (Committee member) / Arizona State University (Publisher)

Created2018

GeoSparkSim: A Scalable Microscopic Road Network Traffic Simulator Based on Apache Spark

Description

Researchers and practitioners have widely studied road network traffic data in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale high-quality urban traffic data requires tremendous efforts because participating vehicles must…

Researchers and practitioners have widely studied road network traffic data in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale high-quality urban traffic data requires tremendous efforts because participating vehicles must install Global Positioning System(GPS) receivers and administrators must continuously monitor these devices. There have been some urban traffic simulators trying to generate such data with different features. However, they suffer from two critical issues (1) Scalability: most of them only offer single-machine solution which is not adequate to produce large-scale data. Some simulators can generate traffic in parallel but do not well balance the load among machines in a cluster. (2) Granularity: many simulators do not consider microscopic traffic situations including traffic lights, lane changing, car following. This paper proposed GeoSparkSim, a scalable traffic simulator which extends Apache Spark to generate large-scale road network traffic datasets with microscopic traffic simulation. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark, to deliver a holistic approach that allows data scientists to simulate, analyze and visualize large-scale urban traffic data. To implement microscopic traffic models, GeoSparkSim employs a simulation-aware vehicle partitioning method to partition vehicles among different machines such that each machine has a balanced workload. The experimental analysis shows that GeoSparkSim can simulate the movements of 200 thousand cars over an extensive road network (250 thousand road junctions and 300 thousand road segments).

ContributorsFu, Zishan (Author) / Sarwat, Mohamed (Thesis advisor) / Pedrielli, Giulia (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2019

Multiobjective Optimization Based Approach for Truth Discovery

Description

There are many applications where the truth is unknown. The truth values are

guessed by different sources. The values of different properties can be obtained from

various sources. These will lead to the disagreement in sources. An important task

is to obtain the truth from these sometimes contradictory sources. In the extension

of computing…

There are many applications where the truth is unknown. The truth values are

guessed by different sources. The values of different properties can be obtained from

various sources. These will lead to the disagreement in sources. An important task

is to obtain the truth from these sometimes contradictory sources. In the extension

of computing the truth, the reliability of sources needs to be computed. There are

models which compute the precision values. In those earlier models Banerjee et al.

(2005) Dong and Naumann (2009) Kasneci et al. (2011) Li et al. (2012) Marian and

Wu (2011) Zhao and Han (2012) Zhao et al. (2012), multiple properties are modeled

individually. In one of the existing works, the heterogeneous properties are modeled in

a joined way. In that work, the framework i.e. Conflict Resolution on Heterogeneous

Data (CRH) framework is based on the single objective optimization. Due to the

single objective optimization and non-convex optimization problem, only one local

optimal solution is found. As this is a non-convex optimization problem, the optimal

point depends upon the initial point. This single objective optimization problem is

converted into a multi-objective optimization problem. Due to the multi-objective

optimization problem, the Pareto optimal points are computed. In an extension of

that, the single objective optimization problem is solved with numerous initial points.

The above two approaches are used for finding the solution better than the solution

obtained in the CRH with median as the initial point for the continuous variables and

majority voting as the initial point for the categorical variables. In the experiments,

the solution, coming from the CRH, lies in the Pareto optimal points of the multiobjective

optimization and the solution coming from the CRH is the optimum solution

in these experiments.

ContributorsJain, Karan (Author) / Xue, Guoliang (Thesis advisor) / Sen, Arunabha (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2019

Stochastic Modeling and Optimization to Improve Identification and Treatment of Alzheimer’s Disease

Description

Mathematical modeling and decision-making within the healthcare industry have given means to quantitatively evaluate the impact of decisions into diagnosis, screening, and treatment of diseases. In this work, we look into a specific, yet very important disease, the Alzheimer. In the United States, Alzheimer’s Disease (AD) is the 6th leading…

Mathematical modeling and decision-making within the healthcare industry have given means to quantitatively evaluate the impact of decisions into diagnosis, screening, and treatment of diseases. In this work, we look into a specific, yet very important disease, the Alzheimer. In the United States, Alzheimer’s Disease (AD) is the 6th leading cause of death. Diagnosis of AD cannot be confidently confirmed until after death. This has prompted the importance of early diagnosis of AD, based upon symptoms of cognitive decline. A symptom of early cognitive decline and indicator of AD is Mild Cognitive Impairment (MCI). In addition to this qualitative test, Biomarker tests have been proposed in the medical field including p-Tau, FDG-PET, and hippocampal. These tests can be administered to patients as early detectors of AD thus improving patients’ life quality and potentially reducing the costs of the health structure. Preliminary work has been conducted in the development of a Sequential Tree Based Classifier (STC), which helps medical providers predict if a patient will contract AD or not, by sequentially testing these biomarker tests. The STC model, however, has its limitations and the need for a more complex, robust model is needed. In fact, STC assumes a general linear model as the status of the patient based upon the tests results. We take a simulation perspective and try to define a more complex model that represents the patient evolution in time.

Specifically, this thesis focuses on the formulation of a Markov Chain model that is complex and robust. This Markov Chain model emulates the evolution of MCI patients based upon doctor visits and the sequential administration of biomarker tests. Data provided to create this Markov Chain model were collected by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The data lacked detailed information of the sequential administration of the biomarker tests and therefore, different analytical approaches were tried and conducted in order to calibrate the model. The resulting Markov Chain model provided the capability to conduct experiments regarding different parameters of the Markov Chain and yielded different results of patients that contracted AD and those that did not, leading to important insights into effect of thresholds and sequence on patient prediction capability as well as health costs reduction.

The data in this thesis was provided from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). ADNI investigators did not contribute to any analysis or writing of this thesis. A list of the ADNI investigators can be found at: http://adni.loni.usc.edu/about/governance/principal-investigators/ .

ContributorsCamarena, Raquel (Author) / Pedrielli, Giulia (Thesis advisor) / Li, Jing (Thesis advisor) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2018

Performance Analysis of a Double Crane with Finite Interoperational Buffer Capacity with Multiple Fidelity Simulations

Description

With trends of globalization on rise, predominant of the trades happen by sea, and experts have predicted an increase in trade volumes over the next few years. With increasing trade volumes, container ships’ upsizing is being carried out to meet the demand. But the problem with container ships’ upsizing is…

With trends of globalization on rise, predominant of the trades happen by sea, and experts have predicted an increase in trade volumes over the next few years. With increasing trade volumes, container ships’ upsizing is being carried out to meet the demand. But the problem with container ships’ upsizing is that the sea port terminals must be equipped adequately to improve the turnaround time otherwise the container ships’ upsizing would not yield the anticipated benefits. This thesis focus on a special type of a double automated crane set-up, with a finite interoperational buffer capacity. The buffer is placed in between the cranes, and the idea behind this research is to analyze the performance of the crane operations when this technology is adopted. This thesis proposes the approximation of this complex system, thereby addressing the computational time issue and allowing to efficiently analyze the performance of the system. The approach to model this system has been carried out in two phases. The first phase consists of the development of discrete event simulation model to make the system evolve over time. The challenges of this model are its high processing time which consists of performing large number of experimental runs, thus laying the foundation for the development of the analytical model of the system, and with respect to analytical modeling, a continuous time markov process approach has been adopted. Further, to improve the efficiency of the analytical model, a state aggregation approach is proposed. Thus, this thesis would give an insight on the outcomes of the two approaches and the behavior of the error space, and the performance of the models for the varying buffer capacities would reflect the scope of improvement in these kinds of operational set up.

ContributorsRengarajan, Sundaravaradhan (Author) / Pedrielli, Giulia (Thesis advisor) / Ju, Feng (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2018

SPSR efficient processing of socially k-nearest neighbors with spatial range filter

Description

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users.…

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp.

ContributorsPasumarthy, Nitin (Author) / Sarwat, Mohamed (Thesis advisor) / Papotti, Paolo (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2016

Query Workload-Aware Index Structures for Range Searches in 1D, 2D, and High-Dimensional Spaces

Description

Most current database management systems are optimized for single query execution.

Yet, often, queries come as part of a query workload. Therefore, there is a need

for index structures that can take into consideration existence of multiple queries in a

query workload and efficiently produce accurate results for the entire query workload.

These index…

Most current database management systems are optimized for single query execution.

Yet, often, queries come as part of a query workload. Therefore, there is a need

for index structures that can take into consideration existence of multiple queries in a

query workload and efficiently produce accurate results for the entire query workload.

These index structures should be scalable to handle large amounts of data as well as

large query workloads.

The main objective of this dissertation is to create and design scalable index structures

that are optimized for range query workloads. Range queries are an important

type of queries with wide-ranging applications. There are no existing index structures

that are optimized for efficient execution of range query workloads. There are

also unique challenges that need to be addressed for range queries in 1D, 2D, and

high-dimensional spaces. In this work, I introduce novel cost models, index selection

algorithms, and storage mechanisms that can tackle these challenges and efficiently

process a given range query workload in 1D, 2D, and high-dimensional spaces. In particular,

I introduce the index structures, HCS (for 1D spaces), cSHB (for 2D spaces),

and PSLSH (for high-dimensional spaces) that are designed specifically to efficiently

handle range query workload and the unique challenges arising from their respective

spaces. I experimentally show the effectiveness of the above proposed index structures

by comparing with state-of-the-art techniques.

ContributorsNagarkar, Parth (Author) / Candan, Kasim S (Thesis advisor) / Davulcu, Hasan (Committee member) / Sapino, Maria Luisa (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2017

Proactive Real-time Control of Multiple Interdependent Water Quality Variables in Buildings Water Networks

Description

Efforts to enhance the quality of life and promote better health have led to improved water quality standards. Adequate daily fluid intake, primarily from tap water, is crucial for human health. By improving drinking water quality, negative health effects associated with consuming inadequate water can be mitigated. Although the United…

Efforts to enhance the quality of life and promote better health have led to improved water quality standards. Adequate daily fluid intake, primarily from tap water, is crucial for human health. By improving drinking water quality, negative health effects associated with consuming inadequate water can be mitigated. Although the United States Environmental Protection Agency (EPA) sets and enforces federal water quality limits at water treatment plants, water quality reaching end users degrades during the water delivery process, emphasizing the need for proactive control systems in buildings to ensure safe drinking water.Future commercial and institutional buildings are anticipated to feature real-time water quality sensors, automated flushing and filtration systems, temperature control devices, and chemical boosters. Integrating these technologies with a reliable water quality control system that optimizes the use of chemical additives, filtration, flushing, and temperature adjustments ensures users consistently have access to water of adequate quality. Additionally, existing buildings can be retrofitted with these technologies at a reasonable cost, guaranteeing user safety. In the absence of smart buildings with the required technology, Chapter 2 describes developing an EPANET-MSX (a multi-species extension of EPA’s water simulation tool) model for a typical 5-story building. Chapter 3 involves creating accurate nonlinear approximation models of EPANET-MSX’s complex fluid dynamics and chemical reactions and developing an open-loop water quality control system that can regulate the water quality based on the approximated state of water quality. To address potential sudden changes in water quality, improve predictions, and reduce the gap between approximated and true state of water quality, a feedback control loop is developed in Chapter 4. Lastly, this dissertation includes the development of a reinforcement learning (RL) based water quality control system for cases where the approximation models prove inadequate and cause instability during implementation with a real building water network. The RL-based control system can be implemented in various buildings without the need to develop new hydraulic models and can handle the stochastic nature of water demand, ensuring the proactive control system’s effectiveness in maintaining water quality within safe limits for consumption.

ContributorsGhasemzadeh, Kiarash (Author) / Mirchandani, Pitu (Thesis advisor) / Boyer, Treavor (Committee member) / Ju, Feng (Committee member) / Pedrielli, Giulia (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by