Matching Items (5)

137682-Thumbnail Image.png

GCKEngine - An Algorithm for Automatic Ontology Building

Description

To facilitate the development of the Semantic Web, we propose in this thesis a general automatic ontology-building algorithm which, given a pool of potential terms and a set of relationships

To facilitate the development of the Semantic Web, we propose in this thesis a general automatic ontology-building algorithm which, given a pool of potential terms and a set of relationships to include in the ontology, can utilize information gathered from Google queries to build a full ontology for a certain domain. We utilized this ontology-building algorithm as part of a larger system to tag computer tutorials for three systems with different kinds of metadata, and index the tagged documents into a search engine. Our evaluation of the resultant search engine indicates that our automatic ontology-building algorithm is able to build relatively good-quality ontologies and utilize this ontology to effectively apply metadata to documents.

Contributors

Agent

Created

Date Created
  • 2013-05

152127-Thumbnail Image.png

Leveraging metadata for extracting robust multi-variate temporal features

Description

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate time series. This thesis focuses on (a) defining a simplified inter-related multi-variate time series (IMTS) model and (b) developing robust multi-variate temporal (RMT) feature extraction algorithm that can be used for locating, filtering, and describing salient features in multi-variate time series data sets. The proposed RMT feature can also be used for supporting multiple analysis tasks, such as visualization, segmentation, and searching / retrieving based on multi-variate time series similarities. Experiments confirm that the proposed feature extraction algorithm is highly efficient and effective in identifying robust multi-scale temporal features of multi-variate time series.

Contributors

Agent

Created

Date Created
  • 2013

154174-Thumbnail Image.png

Multi-variate time series similarity measures and their robustness against temporal asynchrony

Description

The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The

The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The use of multiple sensors simultaneously

for capturing different aspects of the real world attributes has also led to an increase in dimensionality from uni-variate to multi-variate time series. This has facilitated richer data representation but also has necessitated algorithms determining similarity between two multi-variate time series for search and analysis.

Various algorithms have been extended from uni-variate to multi-variate case, such as multi-variate versions of Euclidean distance, edit distance, dynamic time warping. However, it has not been studied how these algorithms account for asynchronous in time series. Human gestures, for example, exhibit asynchrony in their patterns as different subjects perform the same gesture with varying movements in their patterns at different speeds. In this thesis, we propose several algorithms (some of which also leverage metadata describing the relationships among the variates). In particular, we present several techniques that leverage the contextual relationships among the variates when measuring multi-variate time series similarities. Based on the way correlation is leveraged, various weighing mechanisms have been proposed that determine the importance of a dimension for discriminating between the time series as giving the same weight to each dimension can led to misclassification. We next study the robustness of the considered techniques against different temporal asynchronies, including shifts and stretching.

Exhaustive experiments were carried on datasets with multiple types and amounts of temporal asynchronies. It has been observed that accuracy of algorithms that rely on data to discover variate relationships can be low under the presence of temporal asynchrony, whereas in case of algorithms that rely on external metadata, robustness against asynchronous distortions tends to be stronger. Specifically, algorithms using external metadata have better classification accuracy and cluster separation than existing state-of-the-art work, such as EROS, PCA, and naive dynamic time warping.

Contributors

Agent

Created

Date Created
  • 2015

149326-Thumbnail Image.png

A Framework for Top-k queries over weighted RDF graphs

Description

The Resource Description Framework (RDF) is a specification that aims to support the conceptual modeling of metadata or information about resources in the form of a directed graph composed of

The Resource Description Framework (RDF) is a specification that aims to support the conceptual modeling of metadata or information about resources in the form of a directed graph composed of triples of knowledge (facts). RDF also provides mechanisms to encode meta-information (such as source, trust, and certainty) about facts already existing in a knowledge base through a process called reification. In this thesis, an extension to the current RDF specification is proposed in order to enhance RDF triples with an application specific weight (cost). Unlike reification, this extension treats these additional weights as first class knowledge attributes in the RDF model, which can be leveraged by the underlying query engine. Additionally, current RDF query languages, such as SPARQL, have a limited expressive power which limits the capabilities of applications that use them. Plus, even in the presence of language extensions, current RDF stores could not provide methods and tools to process extended queries in an efficient and effective way. To overcome these limitations, a set of novel primitives for the SPARQL language is proposed to express Top-k queries using traditional query patterns as well as novel predicates inspired by those from the XPath language. Plus, an extended query processor engine is developed to support efficient ranked path search, join, and indexing. In addition, several query optimization strategies are proposed, which employ heuristics, advanced indexing tools, and two graph metrics: proximity and sub-result inter-arrival time. These strategies aim to find join orders that reduce the total query execution time while avoiding worst-case pattern combinations. Finally, extensive experimental evaluation shows that using these two metrics in query optimization has a significant impact on the performance and efficiency of Top-k queries. Further experiments also show that proximity and inter-arrival have an even greater, although sometimes undesirable, impact when combined through aggregation functions. Based on these results, a hybrid algorithm is proposed which acknowledges that proximity is more important than inter-arrival time, due to its more complete nature, and performs a fine-grained combination of both metrics by analyzing the differences between their individual scores and performing the aggregation only if these differences are negligible.

Contributors

Agent

Created

Date Created
  • 2010

151159-Thumbnail Image.png

The impact of subject indexes on semantic indeterminacy in enterprise document retrieval

Description

Ample evidence exists to support the conclusion that enterprise search is failing its users. This failure is costing corporate America billions of dollars every year. Most enterprise search engines are

Ample evidence exists to support the conclusion that enterprise search is failing its users. This failure is costing corporate America billions of dollars every year. Most enterprise search engines are built using web search engines as their foundations. These search engines are optimized for web use and are inadequate when used inside the firewall. Without the ability to use popularity-based measures for ranking documents returned to the searcher, these search engines must rely on full-text search technologies. The Information Science literature explains why full-text search, by itself, fails to adequately discriminate relevant from irrelevant documents. This failure in discrimination results in far too many documents being returned to the searcher, which causes enterprise searchers to abandon their searches in favor of re-creating the documents or information they seek. This dissertation describes and evaluates a potential solution to the problem of failed enterprise search derived from the Information Science literature: subject-aided search. In subject-aided search, full-text search is augmented with a search of subject metadata coded into each document based upon a hierarchically structured subject index. Using the Design Science methodology, this dissertation develops and evaluates three IT artifacts in the search for a solution to the wicked problem of enterprise search failure.

Contributors

Agent

Created

Date Created
  • 2012