Search Content

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

Association based prioritization of genes

Description

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them…

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.

ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)

Created2011

Modern Latin American repertoire for classical saxophone: a recording project and oerformance guide

Description

During the twentieth-century, the dual influence of nationalism and modernism in the eclectic music from Latin America promoted an idiosyncratic style which naturally combined traditional themes, popular genres and secular music. The saxophone, commonly used as a popular instrument, started to develop a prominent role in Latin American classical music…

During the twentieth-century, the dual influence of nationalism and modernism in the eclectic music from Latin America promoted an idiosyncratic style which naturally combined traditional themes, popular genres and secular music. The saxophone, commonly used as a popular instrument, started to develop a prominent role in Latin American classical music beginning in 1970. The lack of exposure and distribution of the Latin American repertoire has created a general perception that composers are not interested in the instrument, and that Latin American repertoire for classical saxophone is minimal. However, there are more than 1100 works originally written for saxophone in the region, and the amount continues to grow. This Modern Latin American Repertoire for Classical Saxophone: Recording Project and Performance Guide document establishes and exhibits seven works by seven representative Latin American composers.The recording includes works by Carlos Gonzalo Guzman (Colombia), Ricardo Tacuchian (Brazil), Roque Cordero (Panama), Luis Naón (Argentina), Andrés Alén-Rodriguez (Cuba), Alejandro César Morales (Mexico) and Jose-Luis Maúrtua (Peru), featuring a range of works for solo alto saxophone to alto saxophone with piano, alto saxophone with vibraphone, and tenor saxophone with electronic tape; thus forming an important selection of Latin American repertoire. Complete recorded performances of all seven pieces are supplemented by biographical, historical, and performance practice suggestions. The result is a written and audio guide to some of the most important pieces composed for classical saxophone in Latin America, with an emphasis on fostering interest in, and research into, composers who have contributed in the development and creation of the instrument in Latin America.

ContributorsOcampo Cardona, Javier Andrés (Author) / McAllister, Timothy (Thesis advisor) / Spring, Robert (Committee member) / Hill, Gary (Committee member) / Pilafian, Sam (Committee member) / Rogers, Rodney (Committee member) / Gardner, Joshua (Committee member) / Arizona State University (Publisher)

Created2011

Materialized views over heterogeneous structured data sources in a distributed event stream processing environment

Description

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query…

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment.

ContributorsChaudhari, Mahesh Balkrishna (Author) / Dietrich, Suzanne W (Thesis advisor) / Urban, Susan D (Committee member) / Davulcu, Hasan (Committee member) / Chen, Yi (Committee member) / Arizona State University (Publisher)

Created2011

Transcribing Al Grey: a legacy defined by thirteen improvisations

Description

The study of artist transcriptions is an effective vehicle for assimilating the language and style of jazz. Pairing transcriptions with historical context provides further insight into the back story of the artists' life and method. Innovators are often the subject of published studies of this kind, but transcriptions of plunger-mute…

The study of artist transcriptions is an effective vehicle for assimilating the language and style of jazz. Pairing transcriptions with historical context provides further insight into the back story of the artists' life and method. Innovators are often the subject of published studies of this kind, but transcriptions of plunger-mute master Al Grey have been overlooked. This document fills that void, combining historical context with thirteen transcriptions of Grey's trombone features and improvisations. Selection of transcribed materials was based on an examination of historically significant solos in Al Grey's fifty-five-year career. The results are a series of open-horn and plunger solos that showcase Grey's sound, technical brilliance, and wide range of dynamics and articulation. This collection includes performances from a mix of widely available and obscure recordings, the majority coming from engagements with the Count Basie Orchestra. Methods learned from the study of Al Grey's book Plunger Techniques were vital in the realization of his work. The digital transcription software Amazing Slow Downer by Roni Music aided in deciphering some of Grey's more complicated passages and, with octave displacement, helped bring previously inaudible moments to the foreground.

ContributorsHopkins, Charles E (Author) / Pilafian, Sam (Thesis advisor) / Stauffer, Sandra (Committee member) / Solís, Ted (Committee member) / Ericson, John (Committee member) / Kocour, Michael (Committee member) / Arizona State University (Publisher)

Created2011

Learning in an online jazz history class

Description

This study examines the experiences of participants enrolled in an online community college jazz history course. I surveyed the participants before the course began and observed them in the online space through the duration of the course. Six students also participated in interviews during and after the course. Coded data…

This study examines the experiences of participants enrolled in an online community college jazz history course. I surveyed the participants before the course began and observed them in the online space through the duration of the course. Six students also participated in interviews during and after the course. Coded data from the interviews, surveys, and recorded discussion posts and journal entries provided evidence about the nature of interaction and engagement in learning in an online environment. I looked for evidence either supporting or detracting from a democratic online learning environment, concentrating on the categories of student engagement, freedom of expression, and accessibility. The data suggested that the participants' behaviors in and abilities to navigate the online class were influenced by their pre-existing native media habits. Participants' reasons for enrolling in the online course, which included convenience and schedule flexibility, informed their actions and behaviors in the class. Analysis revealed that perceived positive student engagement did not contribute to a democratic learning environment but rather to an easy, convenient experience in the online class. Finally, the data indicated that participants' behaviors in their future lives would not be affected by the online class in that their learning experiences were not potent enough to alter or inform their behavior in society. As online classes gain popularity, the ability of these classes to provide meaningful learning experiences must be questioned. Students in this online jazz history class presented, at times, a façade of participation and community building but demonstrated a lack of sincerity and interest in the course. The learning environment supported accessibility and freedom of expression to an extent, but students' engagement with their peers was limited. Overall, this study found a need for more research into the quality of online classes as learning platforms that support democracy, student-to-student interaction, and community building.

ContributorsHunter, Robert W. (Author) / Stauffer, Sandra L (Thesis advisor) / Tobias, Evan (Thesis advisor) / Bush, Jeffrey (Committee member) / Kocour, Michael (Committee member) / Pilafian, Sam (Committee member) / Arizona State University (Publisher)

Created2011

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

ContributorsCarey, Maurice (Author) / Colbourn, Charles (Thesis advisor) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Application of a temporal database framework for processing event queries

Description

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query…

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to support temporal queries. The language features and formal semantic extensions to extend the relational framework are identified. The extended framework supports continuous, step-wise evaluation of temporal queries. The incremental evaluation of TEQL operators is formalized to avoid re-computation of previous results. The research includes the development of a prototype that supports the integrated event and temporal query processing framework, with support for incremental evaluation and materialization of intermediate results. TEQL enables reporting temporal data in the output, direct specification of conditions over timestamps, and specification of temporal relational operators. Through the integration of temporal database operators with event languages, a new class of temporal queries is made possible for querying event streams. New features include semantic aggregation, extraction of temporal patterns using set operators, and a more accurate specification of event co-occurrence.

ContributorsShiva, Foruhar Ali (Author) / Urban, Susan D (Thesis advisor) / Chen, Yi (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2012

Representing, reasoning and answering questions about biological pathways various applications

Description

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and…

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and drug development. Scientists studying these interconnected processes have identified various pathways involved in drug metabolism, diseases, and signal transduction, etc. High-throughput technologies, new algorithms and speed improvements over the last decade have resulted in deeper knowledge about biological systems, leading to more refined pathways. Such pathways tend to be large and complex, making it difficult for an individual to remember all aspects. Thus, computer models are needed to represent and analyze them. The refinement activity itself requires reasoning with a pathway model by posing queries against it and comparing the results against the real biological system. Many existing models focus on structural and/or factoid questions, relying on surface-level information. These are generally not the kind of questions that a biologist may ask someone to test their understanding of biological processes. Examples of questions requiring understanding of biological processes are available in introductory college level biology text books. Such questions serve as a model for the question answering system developed in this thesis. Thus, the main goal of this thesis is to develop a system that allows the encoding of knowledge about biological pathways to answer questions demonstrating understanding of the pathways. To that end, a language is developed to specify a pathway and pose questions against it. Some existing tools are modified and used to accomplish this goal. The utility of the framework developed in this thesis is illustrated with applications in the biological domain. Finally, the question answering system is used in real world applications by extracting pathway knowledge from text and answering questions related to drug development.

ContributorsAnwar, Saadat (Author) / Baral, Chitta (Thesis advisor) / Inoue, Katsumi (Committee member) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)

Created2014

The Cavaillé-Coll organ and César Franck's Six pièces

Description

Nineteenth-century French organ builder Aristide Cavaillé-Coll and organist-composer César Franck established a foundation for the revival of organ music in France. Following the French Revolution, organ culture had degenerated because of the instrument's association with the church. Beginning with his instrument at St. Dénis, Cavaillé-Coll created a new symphonic organ…

Nineteenth-century French organ builder Aristide Cavaillé-Coll and organist-composer César Franck established a foundation for the revival of organ music in France. Following the French Revolution, organ culture had degenerated because of the instrument's association with the church. Beginning with his instrument at St. Dénis, Cavaillé-Coll created a new symphonic organ that made it possible for composers to write organ music in the new Romantic aesthetic. In 1859, Franck received a new Cavaillé-Coll organ at the Parisian church where he served as organist, Sainte-Clotilde. He began experimenting with the innovations of this instrument: an expressive division, mechanical assists, new types of tone color, and an expanded pedal division. From about 1860, Franck began composing his first pieces for the Cavaillé-Coll organ; these were published in 1868 as the Six Pièces. With these compositions, Franck led the way in adapting the resources of the French symphonic organ to Romantic music. In this paper, I provide an analysis of the structure of each of the Six Pièces as a foundation for exploring ways in which Franck exploited the new features of his Cavaillé-Coll organ. I have made sound recordings to demonstrate specific examples of how the music fits the organ. Thanks to Cavaillé-Coll's innovations in organ building, Franck was able to write large-scale, multi-thematic works with the sonorous resources necessary to render them convincingly. The Six Pièces reveal a strong creative exchange between organist and organ builder, and they portend many of the subsequent developments of the French symphonic organ school.

ContributorsSung, Anna (Author) / Marshall, Kimberly (Thesis advisor) / Ryan, Russell (Committee member) / Rogers, Rodney (Committee member) / Pagano, Caio (Committee member) / Arizona State University (Publisher)

Created2012

Filtering by