Search Content

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

Moving into the Digital Age: Creating a Digital Presence for Alpha Homes Management, Inc.

Description

Businesses stand to face many uncertainties from the moment they start up to every moment in between. A business can try to recognize them and plan ahead, react to them as they occur, or be rocked by a black swan they never saw coming. How a business deals with unforeseen…

Businesses stand to face many uncertainties from the moment they start up to every moment in between. A business can try to recognize them and plan ahead, react to them as they occur, or be rocked by a black swan they never saw coming. How a business deals with unforeseen events can increase its potential for success or failure. With this in mind, there is no better bridge between the here and now and the future than planning for change in order to move a company toward preparing for change, adapting to change and achieving optimal results. Interested in taking a step toward the digital age, Alpha Homes Management, Inc. (Alpha Homes) sought our help to explore ideas and options to take their company to a new level. This Barrett Creative Project was centered on designing a system for Alpha Homes that will replace their outdated paper-based system with a more digital one. This aligns with the project also featured as a capstone project as required by the information technology degree expectations. In supplement to the capstone, and for the Barrett Creative Project, the final product was presented to the owners of Alpha Homes Management, Inc. to be utilized by the business. The end goal is to provide a platform which provides a paperless environment for documentation and bring the company a step closer to having a robust internet presence. Now that the web-based application product has been created and presented, the testing phase can now begin to evaluate its efficacy.

ContributorsBrice-Nash, Tristan (Co-author) / Alfawzan, Mohammad (Co-author) / Doheny, Damien (Thesis director) / Rodriguez, Carlos (Committee member) / Information Technology (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

SPSR efficient processing of socially k-nearest neighbors with spatial range filter

Description

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users.…

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp.

ContributorsPasumarthy, Nitin (Author) / Sarwat, Mohamed (Thesis advisor) / Papotti, Paolo (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2016

Semantic keyword search on large-scale semi-structured data

Description

Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet.

First, while an increasing amount of investigation has…

Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet.

First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data and in the user query. There are also studies that define results to be subtrees or subgraphs that contain all query keywords but are not necessarily ``minimal''. However, such result construction method suffers from the same problem of semantic mis-alignment between data and user query. In this work the semantics of how to {\em define} results that can capture users' search intention and then the generation of search intention aware results is studied.

Second, most existing research is incapable of handling large-scale structured data. However, as data volume has seen rapid growth in recent years, the problem of how to efficiently process keyword queries on large-scale structured data becomes important. MapReduce is widely acknowledged as an effective programming model to process big data. For keyword query processing on data graph, first graph algorithms which can efficiently return query results that are consistent with users' search intention are proposed. Then these algorithms are migrated to MapReduce to support big data. For keyword query processing on schema graph, it first transforms a keyword query into multiple SQL queries, then all generated SQL queries are run on the structured data. Therefore it is crucial to find the optimal way to execute a SQL query using MapReduce, which can minimize the processing time. In this work, a system called SOSQL is developed which generates the optimal query execution plan using MapReduce for a SQL query $Q$ with time complexity $O(n^2)$, where $n$ is the number of input tables of $Q$.

ContributorsShan, Yi (Author) / Chen, Yi (Thesis advisor) / Bansal, Srividya (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2016

Predicting minimum control speed on the ground (VMCG) and minimum control airspeed (VMCA) of engine inoperative flight using aerodynamic database and propulsion database generators

Description

There are many computer aided engineering tools and software used by aerospace engineers to design and predict specific parameters of an airplane. These tools help a design engineer predict and calculate such parameters such as lift, drag, pitching moment, takeoff range, maximum takeoff weight, maximum flight range and much more.…

There are many computer aided engineering tools and software used by aerospace engineers to design and predict specific parameters of an airplane. These tools help a design engineer predict and calculate such parameters such as lift, drag, pitching moment, takeoff range, maximum takeoff weight, maximum flight range and much more. However, there are very limited ways to predict and calculate the minimum control speeds of an airplane in engine inoperative flight. There are simple solutions, as well as complicated solutions, yet there is neither standard technique nor consistency throughout the aerospace industry. To further complicate this subject, airplane designers have the option of using an Automatic Thrust Control System (ATCS), which directly alters the minimum control speeds of an airplane.

This work addresses this issue with a tool used to predict and calculate the Minimum Control Speed on the Ground (VMCG) as well as the Minimum Control Airspeed (VMCA) of any existing or design-stage airplane. With simple line art of an airplane, a program called VORLAX is used to generate an aerodynamic database used to calculate the stability derivatives of an airplane. Using another program called Numerical Propulsion System Simulation (NPSS), a propulsion database is generated to use with the aerodynamic database to calculate both VMCG and VMCA.

This tool was tested using two airplanes, the Airbus A320 and the Lockheed Martin C130J-30 Super Hercules. The A320 does not use an Automatic Thrust Control System (ATCS), whereas the C130J-30 does use an ATCS. The tool was able to properly calculate and match known values of VMCG and VMCA for both of the airplanes. The fact that this tool was able to calculate the known values of VMCG and VMCA for both airplanes means that this tool would be able to predict the VMCG and VMCA of an airplane in the preliminary stages of design. This would allow design engineers the ability to use an Automatic Thrust Control System (ATCS) as part of the design of an airplane and still have the ability to predict the VMCG and VMCA of the airplane.

ContributorsHadder, Eric Michael (Author) / Takahashi, Timothy (Thesis advisor) / Mignolet, Marc (Committee member) / White, Daniel (Committee member) / Arizona State University (Publisher)

Created2016

Data driven framework for prognostics

Description

Prognostics and health management (PHM) is a method that permits the reliability of a system to be evaluated in its actual application conditions. This work involved developing a robust system to determine the advent of failure. Using the data from the PHM experiment, a model was developed to estimate the…

Prognostics and health management (PHM) is a method that permits the reliability of a system to be evaluated in its actual application conditions. This work involved developing a robust system to determine the advent of failure. Using the data from the PHM experiment, a model was developed to estimate the prognostic features and build a condition based system based on measured prognostics. To enable prognostics, a framework was developed to extract load parameters required for damage assessment from irregular time-load data. As a part of the methodology, a database engine was built to maintain and monitor the experimental data. This framework helps in significant reduction of the time-load data without compromising features that are essential for damage estimation. A failure precursor based approach was used for remaining life prognostics. The developed system has a throughput of 4MB/sec with 90% latency within 100msec. This work hence provides an overview on Prognostic framework survey, Prognostics Framework architecture and design approach with a robust system implementation.

ContributorsVaradarajan, Gayathri (Author) / Liu, Huan (Thesis advisor) / Ye, Jieping (Committee member) / Davalcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2010

A Framework for Top-k queries over weighted RDF graphs

Description

The Resource Description Framework (RDF) is a specification that aims to support the conceptual modeling of metadata or information about resources in the form of a directed graph composed of triples of knowledge (facts). RDF also provides mechanisms to encode meta-information (such as source, trust, and certainty) about facts already…

The Resource Description Framework (RDF) is a specification that aims to support the conceptual modeling of metadata or information about resources in the form of a directed graph composed of triples of knowledge (facts). RDF also provides mechanisms to encode meta-information (such as source, trust, and certainty) about facts already existing in a knowledge base through a process called reification. In this thesis, an extension to the current RDF specification is proposed in order to enhance RDF triples with an application specific weight (cost). Unlike reification, this extension treats these additional weights as first class knowledge attributes in the RDF model, which can be leveraged by the underlying query engine. Additionally, current RDF query languages, such as SPARQL, have a limited expressive power which limits the capabilities of applications that use them. Plus, even in the presence of language extensions, current RDF stores could not provide methods and tools to process extended queries in an efficient and effective way. To overcome these limitations, a set of novel primitives for the SPARQL language is proposed to express Top-k queries using traditional query patterns as well as novel predicates inspired by those from the XPath language. Plus, an extended query processor engine is developed to support efficient ranked path search, join, and indexing. In addition, several query optimization strategies are proposed, which employ heuristics, advanced indexing tools, and two graph metrics: proximity and sub-result inter-arrival time. These strategies aim to find join orders that reduce the total query execution time while avoiding worst-case pattern combinations. Finally, extensive experimental evaluation shows that using these two metrics in query optimization has a significant impact on the performance and efficiency of Top-k queries. Further experiments also show that proximity and inter-arrival have an even greater, although sometimes undesirable, impact when combined through aggregation functions. Based on these results, a hybrid algorithm is proposed which acknowledges that proximity is more important than inter-arrival time, due to its more complete nature, and performs a fine-grained combination of both metrics by analyzing the differences between their individual scores and performing the aggregation only if these differences are negligible.

ContributorsCedeno, Juan Pablo (Author) / Candan, Kasim S (Thesis advisor) / Chen, Yi (Committee member) / Sapino, Maria L (Committee member) / Arizona State University (Publisher)

Created2010

Project Stallion

Description

Creation of a database and Python API to clean, organize, and streamline data collection from an updated Qualtrics survey used to capture applicant information for the Fleischer Scholars Program run by the W. P. Carey UG Admissions Office.

ContributorsMoreno, Luciano (Co-author) / Gordan, Nicholas (Co-author) / Sopha, Matt (Thesis director) / Moser, Kathleen (Committee member) / Stark, Karen (Committee member) / Department of Information Systems (Contributor, Contributor) / Department of Supply Chain Management (Contributor) / Dean, W.P. Carey School of Business (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Project Stallion

Description

Creation of a database and Python API to clean, organize, and streamline data collection from an updated Qualtrics survey used to capture applicant information for the Fleischer Scholars Program run by the W. P. Carey UG Admissions Office.

ContributorsGordon, Nicolas A (Co-author) / Moreno, Luciano (Co-author) / Sopha, Matthew (Thesis director) / Moser, Kathleen (Committee member) / Department of Supply Chain Management (Contributor) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Byte-Sized Politics: A Recontextualization of International Relations

Description

Our thesis is a cross collaboration between international relations and industrial engineering. We used a combination of database logic, programming, and Microsoft Visual Studio to organize and analyze Middle Eastern politics. Not only does the final product show raw data entry, but it also can answer complex questions about Middle…

Our thesis is a cross collaboration between international relations and industrial engineering. We used a combination of database logic, programming, and Microsoft Visual Studio to organize and analyze Middle Eastern politics. Not only does the final product show raw data entry, but it also can answer complex questions about Middle Eastern relations- queries so complex that Google can’t answer them. We organized and analyzed geopolitical data to make it more accessible and easy, hopefully you enjoy!

ContributorsGranillo-Walker, Erin (Co-author) / Gomez, Livingstone (Co-author) / Wu, Teresa (Thesis director) / Thomson, Henry (Committee member) / Historical, Philosophical & Religious Studies (Contributor) / School of Social Transformation (Contributor) / Historical, Philosophical & Religious Studies, Sch (Contributor) / School of Public Affairs (Contributor) / School of Politics and Global Studies (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05