Search Content

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

Association based prioritization of genes

Description

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them…

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.

ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)

Created2011

Multi-carrier communications over underwater acoustic channels

Description

Underwater acoustic communications face significant challenges unprecedented in radio terrestrial communications including long multipath delay spreads, strong Doppler effects, and stringent bandwidth requirements. Recently, multi-carrier communications based on orthogonal frequency division multiplexing (OFDM) have seen significant growth in underwater acoustic (UWA) communications, thanks to their well well-known robustness against severely…

Underwater acoustic communications face significant challenges unprecedented in radio terrestrial communications including long multipath delay spreads, strong Doppler effects, and stringent bandwidth requirements. Recently, multi-carrier communications based on orthogonal frequency division multiplexing (OFDM) have seen significant growth in underwater acoustic (UWA) communications, thanks to their well well-known robustness against severely time-dispersive channels. However, the performance of OFDM systems over UWA channels significantly deteriorates due to severe intercarrier interference (ICI) resulting from rapid time variations of the channel. With the motivation of developing enabling techniques for OFDM over UWA channels, the major contributions of this thesis include (1) two effective frequencydomain equalizers that provide general means to counteract the ICI; (2) a family of multiple-resampling receiver designs dealing with distortions caused by user and/or path specific Doppler scaling effects; (3) proposal of using orthogonal frequency division multiple access (OFDMA) as an effective multiple access scheme for UWA communications; (4) the capacity evaluation for single-resampling versus multiple-resampling receiver designs. All of the proposed receiver designs have been verified both through simulations and emulations based on data collected in real-life UWA communications experiments. Particularly, the frequency domain equalizers are shown to be effective with significantly reduced pilot overhead and offer robustness against Doppler and timing estimation errors. The multiple-resampling designs, where each branch is tasked with the Doppler distortion of different paths and/or users, overcome the disadvantages of the commonly-used single-resampling receivers and yield significant performance gains. Multiple-resampling receivers are also demonstrated to be necessary for UWA OFDMA systems. The unique design effectively mitigates interuser interference (IUI), opening up the possibility to exploit advanced user subcarrier assignment schemes. Finally, the benefits of the multiple-resampling receivers are further demonstrated through channel capacity evaluation results.

ContributorsTu, Kai (Author) / Duman, Tolga M. (Thesis advisor) / Zhang, Junshan (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2011

Practical coding schemes for multi-user communications

Description

There are many wireless communication and networking applications that require high transmission rates and reliability with only limited resources in terms of bandwidth, power, hardware complexity etc.. Real-time video streaming, gaming and social networking are a few such examples. Over the years many problems have been addressed towards the goal…

There are many wireless communication and networking applications that require high transmission rates and reliability with only limited resources in terms of bandwidth, power, hardware complexity etc.. Real-time video streaming, gaming and social networking are a few such examples. Over the years many problems have been addressed towards the goal of enabling such applications; however, significant challenges still remain, particularly, in the context of multi-user communications. With the motivation of addressing some of these challenges, the main focus of this dissertation is the design and analysis of capacity approaching coding schemes for several (wireless) multi-user communication scenarios. Specifically, three main themes are studied: superposition coding over broadcast channels, practical coding for binary-input binary-output broadcast channels, and signalling schemes for two-way relay channels. As the first contribution, we propose an analytical tool that allows for reliable comparison of different practical codes and decoding strategies over degraded broadcast channels, even for very low error rates for which simulations are impractical. The second contribution deals with binary-input binary-output degraded broadcast channels, for which an optimal encoding scheme that achieves the capacity boundary is found, and a practical coding scheme is given by concatenation of an outer low density parity check code and an inner (non-linear) mapper that induces desired distribution of "one" in a codeword. The third contribution considers two-way relay channels where the information exchange between two nodes takes place in two transmission phases using a coding scheme called physical-layer network coding. At the relay, a near optimal decoding strategy is derived using a list decoding algorithm, and an approximation is obtained by a joint decoding approach. For the latter scheme, an analytical approximation of the word error rate based on a union bounding technique is computed under the assumption that linear codes are employed at the two nodes exchanging data. Further, when the wireless channel is frequency selective, two decoding strategies at the relay are developed, namely, a near optimal decoding scheme implemented using list decoding, and a reduced complexity detection/decoding scheme utilizing a linear minimum mean squared error based detector followed by a network coded sequence decoder.

ContributorsBhat, Uttam (Author) / Duman, Tolga M. (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Li, Baoxin (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2011

Incorporating auditory models in speech/audio applications

Description

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding…

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.

ContributorsKrishnamoorthi, Harish (Author) / Spanias, Andreas (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2011

The ensemble étude for violins: an examination with an annotated survey of violin trios and quartets and an original étude for four violins

Description

ABSTRACT &eacutetudes; written for violin ensemble, which include violin duets, trios, and quartets, are less numerous than solo &eacutetudes.; These works rarely go by the title "&eacutetude;," and have not been the focus of much scholarly research. Ensemble &eacutetudes; have much to offer students, teachers and…

ABSTRACT &eacutetudes; written for violin ensemble, which include violin duets, trios, and quartets, are less numerous than solo &eacutetudes.; These works rarely go by the title "&eacutetude;," and have not been the focus of much scholarly research. Ensemble &eacutetudes; have much to offer students, teachers and composers, however, because they add an extra dimension to the learning, teaching, and composing processes. This document establishes the value of ensemble &eacutetudes; in pedagogy and explores applications of the repertoire currently available. Rather than focus on violin duets, the most common form of ensemble &eacutetude;, it mainly considers works for three and four violins without accompaniment. Concentrating on the pedagogical possibilities of studying &eacutetudes; in a group, this document introduces creative ways that works for violin ensemble can be used as both &eacutetudes; and performance pieces. The first two chapters explore the history and philosophy of the violin &eacutetude; and multiple-violin works, the practice of arranging of solo &eacutetudes; for multiple instruments, and the benefits of group learning and cooperative learning that distinguish ensemble &eacutetude; study from solo &eacutetude; study. The third chapter is an annotated survey of works for three and four violins without accompaniment, and serves as a pedagogical guide to some of the available repertoire. Representing a wide variety of styles, techniques and levels, it illuminates an historical association between violin ensemble works and pedagogy. The fourth chapter presents an original composition by the author, titled Variations on a Scottish Folk Song: &eacutetude; for Four Violins, with an explanation of the process and techniques used to create this ensemble &eacutetude.; This work is an example of the musical and technical integration essential to &eacutetude; study, and demonstrates various compositional traits that promote cooperative learning. Ensemble &eacutetudes; are valuable pedagogical tools that deserve wider exposure. It is my hope that the information and ideas about ensemble &eacutetudes; in this paper and the individual descriptions of the works presented will increase interest in and application of violin trios and quartets at the university level.

ContributorsLundell, Eva Rachel (Contributor) / Swartz, Jonathan (Thesis advisor) / Rockmaker, Jody (Committee member) / Buck, Nancy (Committee member) / Koonce, Frank (Committee member) / Norton, Kay (Committee member) / Arizona State University (Publisher)

Created2011

Materialized views over heterogeneous structured data sources in a distributed event stream processing environment

Description

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query…

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment.

ContributorsChaudhari, Mahesh Balkrishna (Author) / Dietrich, Suzanne W (Thesis advisor) / Urban, Susan D (Committee member) / Davulcu, Hasan (Committee member) / Chen, Yi (Committee member) / Arizona State University (Publisher)

Created2011

An investigation of finger motion and hand posture during clarinet performance

Description

Finger motion and hand posture of six professional clarinetists (defined by entrance into or completion of a doctorate of musical arts degree in clarinet performance) were recorded using a pair of CyberGloves® in Arizona State University's Center for Cognitive Ubiquitous Computing Laboratory. Performance tasks included performing a slurred three-octave chromatic…

Finger motion and hand posture of six professional clarinetists (defined by entrance into or completion of a doctorate of musical arts degree in clarinet performance) were recorded using a pair of CyberGloves® in Arizona State University's Center for Cognitive Ubiquitous Computing Laboratory. Performance tasks included performing a slurred three-octave chromatic scale in sixteenth notes, at sixty quarter-note beats per minute, three times, with a metronome and a short pause between repetitions, and forming three pedagogical hand postures. Following the CyberGloves® tasks, each subject completed a questionnaire about equipment, playing history, practice routines, health practices, and hand usage during computer and sports activities. CyberGlove® data were analyzed to find average hand/finger postures and differences for each pitch across subjects, subject variance in the performance task and differences in ascending and descending postures of the chromatic scale. The data were also analyzed to describe generalized finger posture characteristics based on hand size, whether right hand thumb position affects finger flexion, and whether professional clarinetists use similar finger/hand postures when performing on clarinet, holding a tennis ball, allowing hands to hang freely by the sides, or form a "C" shape. The findings of this study suggest an individual approach based on hand size is necessary for teaching clarinet hand posture.

ContributorsHarger, Stefanie (Author) / Spring, Robert (Thesis advisor) / Hill, Gary (Committee member) / Koonce, Frank (Committee member) / Norton, Kay (Committee member) / Stauffer, Sandy (Committee member) / Arizona State University (Publisher)

Created2011

Waveform mapping and time-frequency processing of biological sequences and structures

Description

Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry,…

Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry, physics, computer science and electrical engineering. In particular, signal processing techniques were applied to the problems of sequence querying and alignment, that compare and classify regions of similarity in the sequences based on their composition. However, although current approaches obtain results that can be attributed to key biological properties, they require pre-processing and lack robustness to sequence repetitions. In addition, these approaches do not provide much support for efficiently querying sub-sequences, a process that is essential for tracking localized database matches. In this work, a query-based alignment method for biological sequences that maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane is first proposed. The mapping uses waveforms, such as time-domain Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach is demonstrated for both DNA and protein sequences using the matching pursuit decomposition as the signal basis expansion. The alignment localization of WAVEQuery is specifically evaluated over repetitive database segments, and operable in real-time without pre-processing. It is demonstrated that WAVEQuery significantly outperforms the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction. For protein alignment, it is often necessary to not only compare the one-dimensional (1-D) primary sequence structure but also the secondary and tertiary three-dimensional (3-D) space structures. This is done after considering the conformations in the 3-D space due to the degrees of freedom of these structures. As a result, a novel directionality based 3-D waveform mapping for the 3-D protein structures is also proposed and it is used to compare protein structures using a matched filter approach. By incorporating a 3-D time axis, a highly-localized Gaussian-windowed chirp waveform is defined, and the amino acid information is mapped to the chirp parameters that are then directly used to obtain directionality in the 3-D space. This mapping is unique in that additional characteristic protein information such as hydrophobicity, that relates the sequence with the structure, can be added as another representation parameter. The additional parameter helps tracking similarities over local segments of the structure, this enabling classification of distantly related proteins which have partial structural similarities. This approach is successfully tested for pairwise alignments over full length structures, alignments over multiple structures to form a phylogenetic trees, and also alignments over local segments. Also, basic classification over protein structural classes using directional descriptors for the protein structure is performed.

ContributorsRavichandran, Lakshminarayan (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Spanias, Andreas S (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Lacroix, Zoé (Committee member) / Arizona State University (Publisher)

Created2011

Survey of selected contemporary Taiwanese female composers of music for solo piano

Description

The purpose of this project was to examine the lives and solo piano works of four members of the early generation of female composers in Taiwan. These four women were born between 1950 and 1960, began to appear on the Taiwanese musical scene after 1980, and were still active as…

The purpose of this project was to examine the lives and solo piano works of four members of the early generation of female composers in Taiwan. These four women were born between 1950 and 1960, began to appear on the Taiwanese musical scene after 1980, and were still active as composers at the time of this study. They include Fan-Ling Su (b. 1955), Hwei-Lee Chang (b. 1956), Shyh-Ji Pan-Chew (b. 1957), and Kwang-I Ying (b. 1960). Detailed biographical information on the four composers is presented and discussed. In addition, the musical form and features of all solo piano works at all levels by the four composers are analyzed, and the musical characteristics of each composer's work are discussed. The biography of a fifth composer, Wei-Ho Dai (b. 1950), is also discussed but is placed in the Appendices because her piano music could not be located. This research paper is presented in six chapters: (1) Prologue; the life and music of (2) Fan-Ling Su, (3) Hwei-Lee Chang, (4) Shyh-Ji Pan-Chew, and (5) Kwang-I Ying; and (6) Conclusion. The Prologue provides an overview of the development of Western classical music in Taiwan, a review of extant literature on the selected composers and their music, and the development of piano music in Taiwan. The Conclusion is comprised of comparisons of the four composers' music, including their personal interests and preferences as exhibited in their music. For example, all of the composers have used atonality in their music. Two of the composers, Fan-Ling Su and Kwang-I Ying, openly apply Chinese elements in their piano works, while Hwei-Lee Chang tries to avoid direct use of the Chinese pentatonic scale. The piano works of Hwei-Lee Chang and Shyh-Ji Pan-Chew are chromatic and atonal, and show an economical usage of material. Biographical information on Wei-Ho Dai and an overview of Taiwanese history are presented in the Appendices.

ContributorsWang, Jinding (Author) / Pagano, Caio (Thesis advisor) / Campbell, Andrew (Committee member) / Humphreys, Jere T. (Committee member) / Meyer-Thompson, Janice (Committee member) / Norton, Kay (Committee member) / Arizona State University (Publisher)

Created2011

Filtering by