Search Content

GPU-enabled Functional-as-a-Service

Description

Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve scalability and usability for a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable computation resources and complicated configurations. Many applications, including ML inference, rely on Graphics-Processing-Unit (GPU) to achieve high performance;…

Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve scalability and usability for a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable computation resources and complicated configurations. Many applications, including ML inference, rely on Graphics-Processing-Unit (GPU) to achieve high performance; however, support for GPUs is currently lacking in existing FaaS solutions. The unique event-triggered and short-lived nature of functions poses new challenges to enabling GPUs on FaaS which must consider the overhead of transferring data (e.g., ML model parameters and inputs/outputs) between GPU and host memory. This thesis presents a new GPU-enabled FaaS solution that enables functions to efficiently utilize GPUs to accelerate computations such as model inference. First, the work extends existing open-source FaaS frameworks such as OpenFaaS to support the scheduling and execution of functions across GPUs in a FaaS cluster. Second, it provides caching of ML models in GPU memory to improve the performance of model inference functions and global management of GPU memories to improve the cache utilization. Third, it offers co-designed GPU function scheduling and cache management to optimize the performance of ML inference functions. Specifically, the thesis proposes locality-aware scheduling which maximizes the utilization of both GPU memory for cache hits and GPU cores for parallel processing. A thorough evaluation based on real-world traces and ML models shows that the proposed GPU-enabled FaaS works well for ML inference tasks, and the proposed locality-aware scheduler achieves a speedup of 34x compared to the default, load-balancing only scheduler.

ContributorsHong, Sungho (Author) / Zhao, Ming (Thesis advisor) / Cao, Zhichao (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2022

Locality sensitive indexing for efficient high-dimensional query answering in the presence of excluded regions

Description

Similarity search in high-dimensional spaces is popular for applications like image

processing, time series, and genome data. In higher dimensions, the phenomenon of

curse of dimensionality kills the effectiveness of most of the index structures, giving

way to approximate methods like Locality Sensitive Hashing (LSH), to answer similarity

searches. In addition to range searches…

Similarity search in high-dimensional spaces is popular for applications like image

processing, time series, and genome data. In higher dimensions, the phenomenon of

curse of dimensionality kills the effectiveness of most of the index structures, giving

way to approximate methods like Locality Sensitive Hashing (LSH), to answer similarity

searches. In addition to range searches and k-nearest neighbor searches, there

is a need to answer negative queries formed by excluded regions, in high-dimensional

data. Though there have been a slew of variants of LSH to improve efficiency, reduce

storage, and provide better accuracies, none of the techniques are capable of

answering queries in the presence of excluded regions.

This thesis provides a novel approach to handle such negative queries. This is

achieved by creating a prefix based hierarchical index structure. First, the higher

dimensional space is projected to a lower dimension space. Then, a one-dimensional

ordering is developed, while retaining the hierarchical traits. The algorithm intelligently

prunes the irrelevant candidates while answering queries in the presence of

excluded regions. While naive LSH would need to filter out the negative query results

from the main results, the new algorithm minimizes the need to fetch the redundant

results in the first place. Experiment results show that this reduces post-processing

cost thereby reducing the query processing time.

ContributorsBhat, Aneesha (Author) / Candan, Kasim Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Sapino, Maria Luisa (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2016

Composition of Geographic-Based Component Simulation Models

Description

Component simulation models, such as agent-based models, may depend on spatial data associated with geographic locations. Composition of such models can be achieved using a Geographic Knowledge Interchange Broker (GeoKIB) enabled with spatial-temporal data transformation functions, each of which is responsible for a set of interactions between two independent models.…

Component simulation models, such as agent-based models, may depend on spatial data associated with geographic locations. Composition of such models can be achieved using a Geographic Knowledge Interchange Broker (GeoKIB) enabled with spatial-temporal data transformation functions, each of which is responsible for a set of interactions between two independent models. The use of autonomous interaction models allows model composition without alteration of the composed component models. An interaction model must handle differences in the spatial resolutions between models, in addition to differences in their temporal input/output data types and resolutions.

A generalized GeoKIB was designed that regulates unidirectional spatially-based interactions between composed models. Different input and output data types are used for the interaction model, depending on whether data transfer should be passive or active. Synchronization of time-tagged input/output values is made possible with the use of dependency on a discrete simulation clock. An algorithm supporting spatial conversion is developed to transform any two-dimensional geographic data map between different region specifications. Maps belonging to the composed models can have different regions, map cell sizes, or boundaries. The GeoKIB can be extended based on the model specifications to be composed and the target application domain.

Two separate, simplistic models were created to demonstrate model composition via the GeoKIB. An interaction model was created for each of the two directions the composed models interact. This exemplar is developed to demonstrate composition and simulation of geographic-based component models.

ContributorsBoyd, William Angelo (Author) / Sarjoughian, Hessam S. (Thesis advisor) / Maciejewski, Ross (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2019

Feature Extraction from Multi-variate Time Series and Resource-Aware Indexing

Description

In the presence of big data analysis, large volume of data needs to be systematically indexed to support analytical tasks, such as feature engineering, pattern recognition, data mining, and query processing. The volume, variety, and velocity of these data necessitate sophisticated systems to help researchers understand, analyze, and dis- cover…

In the presence of big data analysis, large volume of data needs to be systematically indexed to support analytical tasks, such as feature engineering, pattern recognition, data mining, and query processing. The volume, variety, and velocity of these data necessitate sophisticated systems to help researchers understand, analyze, and dis- cover insights from heterogeneous, multidimensional data sources. Many analytical frameworks have been proposed in the literature in recent years, but challenges to accuracy, speed, and effectiveness remain hence a systematic approach to perform data signature computation and query processing in multi-dimensional space is in people’s interest. In particular, real-time and near real-time queries pose significant challenges when working with large data sets.

To address these challenges, I develop an innovative robust multi-variate fea- ture extraction algorithm over multi-dimensional temporal datasets, which is able to help understand and analyze various real-world applications. Furthermore, to an- swer queries over these features, I develop a novel resource-aware indexing framework to approximately solve top-k queries by leveraging onion-layer indexing in conjunc- tion with locality sensitive hashing. The proposed indexing scheme allows people to answer top-k queries by only accessing a bounded amount of data, which optimizes big data small for queries.

ContributorsLiu, Sicong (Author) / Candan, Kasim Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Sapino, Maria Luisa (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2020

On Processing Spatial Queries in Graph Database Management Systems

Description

Spatial data is fundamental in many applications like map services, land resource management, etc. Meanwhile, spatial data inherently comes with abundant context information because spatial entities themselves possess different properties, e.g., graph or textual information, etc. Among all these compound spatial data, geospatial graph data is one of the most…

Spatial data is fundamental in many applications like map services, land resource management, etc. Meanwhile, spatial data inherently comes with abundant context information because spatial entities themselves possess different properties, e.g., graph or textual information, etc. Among all these compound spatial data, geospatial graph data is one of the most challenging for the complexity of graph data. Graph data is commonly used to model real scenarios and searching for the matching subgraphs is fundamental in retrieving and analyzing graph data. With the ubiquity of spatial data, vertexes or edges in graphs are enriched with spatial location attributes side by side with other non-spatial attributes. Graph-based applications integrate spatial data into the graph model and provide more spatial-aware services. The co-existence of the graph and spatial data in the same geospatial graph triggers some new applications. To solve new problems in these applications, existing solutions develop an integrated system that incorporates the graph database and spatial database engines. However, existing approaches suffer from the architecture where graph data and spatial data are isolated. In this dissertation, I will explain two indexing frameworks, GeoReach and RisoTree, which can significantly accelerate the queries in geospatial graphs. GeoReach includes a query operator that adds spatial data awareness to a graph database management system. In GeoReach, the neighborhood spatial information is summarized and stored on each vertex in the graph. The summarization includes three different structures according to the location distribution. These spatial summaries are utilized to terminate the graph search early.RisoTree is a hierarchical tree structure where each node is represented by a minimum bounding rectangle (MBR). The MBR of a node is a rectangle that encloses all its children. A key difference between RisoTree and RTree is that RisoTree contains pre-materialized subgraph information to each index node. The subgraph information is utilized during the spatial index search phase to prune search paths that cannot satisfy the query graph pattern. The RisoTree index reduces the search space when the spatial filtering phase is performed with relatively light cost.

ContributorsSun, Yuhan (Author) / Sarwat, Mohamed (Thesis advisor) / Tong, Hanghang (Committee member) / Candan, Kasim S (Committee member) / Zhao, Ming (Committee member) / Arizona State University (Publisher)

Created2021

GEM: An Efficient Entity Matching Framework for Geospatial Data

Description

The use of spatial data has become very fundamental in today's world. Ranging from fitness trackers to food delivery services, almost all application records users' location information and require clean geospatial data to enhance various application features. As spatial data flows in from heterogeneous sources various problems arise. The study…

The use of spatial data has become very fundamental in today's world. Ranging from fitness trackers to food delivery services, almost all application records users' location information and require clean geospatial data to enhance various application features. As spatial data flows in from heterogeneous sources various problems arise. The study of entity matching has been a fervent step in the process of producing clean usable data. Entity matching is an amalgamation of various sub-processes including blocking and matching. At the end of an entity matching pipeline, we get deduplicated records of the same real-world entity. Identifying various mentions of the same real-world locations is known as spatial entity matching. While entity matching received significant interest in the field of relational entity matching, the same cannot be said about spatial entity matching. In this dissertation, I build an end-to-end Geospatial Entity Matching framework, GEM, exploring spatial entity matching from a novel perspective. In the current state-of-the-art systems spatial entity matching is only done on one type of geometrical data variant. Instead of confining to matching spatial entities of only point geometry type, I work on extending the boundaries of spatial entity matching to match the more generic polygon geometry entities as well. I propose a methodology to provide support for three entity matching scenarios across different geometrical data types: point X point, point X polygon, polygon X polygon. As mentioned above entity matching consists of various steps but blocking, feature vector creation, and classification are the core steps of the system. GEM comprises an efficient and lightweight blocking technique, GeoPrune, that uses the geohash encoding mechanism to prune away the obvious non-matching spatial entities. Geohashing is a technique to convert a point location coordinates to an alphanumeric code string. This technique proves to be very effective and swift for the blocking mechanism. I leverage the Apache Sedona engine to create the feature vectors. Apache Sedona is a spatial database management system that holds the capacity of processing spatial SQL queries with multiple geometry types without compromising on their original coordinate vector representation. In this step, I re-purpose the spatial proximity operators (SQL queries) in Apache Sedona to create spatial feature dimensions that capture the proximity between a geospatial entity pair. The last step of an entity matching process is matching or classification. The classification step in GEM is a pluggable component, which consumes the feature vector for a spatial entity pair and determines whether the geolocations match or not. The component provides 3 machine learning models that consume the same feature vector and provide a label for the test data based on the training. I conduct experiments with the three classifiers upon multiple large-scale geospatial datasets consisting of both spatial and relational attributes. Data considered for experiments arrives from heterogeneous sources and we pre-align its schema manually. GEM achieves an F-measure of 1.0 for a point X point dataset with 176k total pairs, which is 42% higher than a state-of-the-art spatial EM baseline. It achieves F-measures of 0.966 and 0.993 for the point X polygon dataset with 302M total pairs, and the polygon X polygon dataset with 16M total pairs respectively.

ContributorsShah, Setu Nilesh (Author) / Sarwat, Mohamed (Thesis advisor) / Pedrielli, Giulia (Committee member) / Boscovic, Dragan (Committee member) / Arizona State University (Publisher)

Created2021

SATLAB - An End to End framework for Labelling Satellite Images

Description

In this work, I propose a novel, unsupervised framework titled SATLAB, to label satellite images, given a classification task at hand. Existing models for satellite image classification such as DeepSAT and DeepSAT-V2 rely on deep learning models that are label-hungry and require a significant amount of training data. Since manual…

In this work, I propose a novel, unsupervised framework titled SATLAB, to label satellite images, given a classification task at hand. Existing models for satellite image classification such as DeepSAT and DeepSAT-V2 rely on deep learning models that are label-hungry and require a significant amount of training data. Since manual curation of labels is expensive, I ensure that SATLAB requires zero training labels. SATLAB can work in conjunction with several generative and unsupervised machine learning models by allowing them to be seamlessly plugged into its architecture. I devise three operating modes for SATLAB - manual, semi-automatic and automatic which require varying levels of human intervention in creating the domain-specific labeling functions for each image that can be utilized by the candidate generative models such as Snorkel, as well as other unsupervised learners in SATLAB. Unlike existing supervised learning baselines which only extract textural features from satellite images, I support the extraction of both textural and geospatial features in SATLAB, and I empirically show that geospatial features enhance the classification F1-score by 33%. I build SATLAB on the top of Apache Sedona in order to leverage its rich set of spatial query processing operators for the extraction of geospatial features from satellite raster images. I evaluate SATLAB on a target binary classification task that distinguishes slum from non-slum areas, upon a repository of 100K satellite images captured by the Sentinel satellite program. My 5-Fold Cross Validation (CV) experiments show that SATLAB achieves competitive F1-scores (0.6) using 0% labeled data while the best baseline supervised learning baseline achieves 0.74 F1-score using 80% labeled data. I also show that Snorkel outperforms alternative generative and unsupervised candidate models that can be plugged into SATLAB by 33% to 71% w.r.t. F1-score and 3 times to 73 times w.r.t. latency. I also show that downstream classifiers trained using the labels generated by SATLAB are comparable in quality (0.63 F1) to their counterpart classifiers (0.74 F1) trained on manually curated labels.

ContributorsAggarwal, Shantanu (Author) / Sarwat, Mohamed (Thesis advisor) / Zou, Jia (Committee member) / Boscovic, Dragan (Committee member) / Arizona State University (Publisher)

Created2022

Filtering by

GPU-enabled Functional-as-a-Service

Locality sensitive indexing for efficient high-dimensional query answering in the presence of excluded regions

Composition of Geographic-Based Component Simulation Models

Feature Extraction from Multi-variate Time Series and Resource-Aware Indexing

On Processing Spatial Queries in Graph Database Management Systems

GEM: An Efficient Entity Matching Framework for Geospatial Data

SATLAB - An End to End framework for Labelling Satellite Images