Search Content

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

Sensing and knowledge mining for structural health management

Description

Current economic conditions necessitate the extension of service lives for a variety of aerospace systems. As a result, there is an increased need for structural health management (SHM) systems to increase safety, extend life, reduce maintenance costs, and minimize downtime, lowering life cycle costs for these aging systems. The implementation…

Current economic conditions necessitate the extension of service lives for a variety of aerospace systems. As a result, there is an increased need for structural health management (SHM) systems to increase safety, extend life, reduce maintenance costs, and minimize downtime, lowering life cycle costs for these aging systems. The implementation of such a system requires a collaborative research effort in a variety of areas such as novel sensing techniques, robust algorithms for damage interrogation, high fidelity probabilistic progressive damage models, and hybrid residual life estimation models. This dissertation focuses on the sensing and damage estimation aspects of this multidisciplinary topic for application in metallic and composite material systems. The primary means of interrogating a structure in this work is through the use of Lamb wave propagation which works well for the thin structures used in aerospace applications. Piezoelectric transducers (PZTs) were selected for this application since they can be used as both sensors and actuators of guided waves. Placement of these transducers is an important issue in wave based approaches as Lamb waves are sensitive to changes in material properties, geometry, and boundary conditions which may obscure the presence of damage if they are not taken into account during sensor placement. The placement scheme proposed in this dissertation arranges piezoelectric transducers in a pitch-catch mode so the entire structure can be covered using a minimum number of sensors. The stress distribution of the structure is also considered so PZTs are placed in regions where they do not fail before the host structure. In order to process the data from these transducers, advanced signal processing techniques are employed to detect the presence of damage in complex structures. To provide a better estimate of the damage for accurate life estimation, machine learning techniques are used to classify the type of damage in the structure. A data structure analysis approach is used to reduce the amount of data collected and increase computational efficiency. In the case of low velocity impact damage, fiber Bragg grating (FBG) sensors were used with a nonlinear regression tool to reconstruct the loading at the impact site.

ContributorsCoelho, Clyde (Author) / Chattopadhyay, Aditi (Thesis advisor) / Dai, Lenore (Committee member) / Wu, Tong (Committee member) / Das, Santanu (Committee member) / Rajadas, John (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2011

The detection of reliability prediction cues in manufacturing data from statistically controlled processes

Description

Many products undergo several stages of testing ranging from tests on individual components to end-item tests. Additionally, these products may be further "tested" via customer or field use. The later failure of a delivered product may in some cases be due to circumstances that have no correlation with the product's…

Many products undergo several stages of testing ranging from tests on individual components to end-item tests. Additionally, these products may be further "tested" via customer or field use. The later failure of a delivered product may in some cases be due to circumstances that have no correlation with the product's inherent quality. However, at times, there may be cues in the upstream test data that, if detected, could serve to predict the likelihood of downstream failure or performance degradation induced by product use or environmental stresses. This study explores the use of downstream factory test data or product field reliability data to infer data mining or pattern recognition criteria onto manufacturing process or upstream test data by means of support vector machines (SVM) in order to provide reliability prediction models. In concert with a risk/benefit analysis, these models can be utilized to drive improvement of the product or, at least, via screening to improve the reliability of the product delivered to the customer. Such models can be used to aid in reliability risk assessment based on detectable correlations between the product test performance and the sources of supply, test stands, or other factors related to product manufacture. As an enhancement to the usefulness of the SVM or hyperplane classifier within this context, L-moments and the Western Electric Company (WECO) Rules are used to augment or replace the native process or test data used as inputs to the classifier. As part of this research, a generalizable binary classification methodology was developed that can be used to design and implement predictors of end-item field failure or downstream product performance based on upstream test data that may be composed of single-parameter, time-series, or multivariate real-valued data. Additionally, the methodology provides input parameter weighting factors that have proved useful in failure analysis and root cause investigations as indicators of which of several upstream product parameters have the greater influence on the downstream failure outcomes.

ContributorsMosley, James (Author) / Morrell, Darryl (Committee member) / Cochran, Douglas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Roberts, Chell (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2011

Association based prioritization of genes

Description

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them…

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.

ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)

Created2011

Multi-carrier communications over underwater acoustic channels

Description

Underwater acoustic communications face significant challenges unprecedented in radio terrestrial communications including long multipath delay spreads, strong Doppler effects, and stringent bandwidth requirements. Recently, multi-carrier communications based on orthogonal frequency division multiplexing (OFDM) have seen significant growth in underwater acoustic (UWA) communications, thanks to their well well-known robustness against severely…

Underwater acoustic communications face significant challenges unprecedented in radio terrestrial communications including long multipath delay spreads, strong Doppler effects, and stringent bandwidth requirements. Recently, multi-carrier communications based on orthogonal frequency division multiplexing (OFDM) have seen significant growth in underwater acoustic (UWA) communications, thanks to their well well-known robustness against severely time-dispersive channels. However, the performance of OFDM systems over UWA channels significantly deteriorates due to severe intercarrier interference (ICI) resulting from rapid time variations of the channel. With the motivation of developing enabling techniques for OFDM over UWA channels, the major contributions of this thesis include (1) two effective frequencydomain equalizers that provide general means to counteract the ICI; (2) a family of multiple-resampling receiver designs dealing with distortions caused by user and/or path specific Doppler scaling effects; (3) proposal of using orthogonal frequency division multiple access (OFDMA) as an effective multiple access scheme for UWA communications; (4) the capacity evaluation for single-resampling versus multiple-resampling receiver designs. All of the proposed receiver designs have been verified both through simulations and emulations based on data collected in real-life UWA communications experiments. Particularly, the frequency domain equalizers are shown to be effective with significantly reduced pilot overhead and offer robustness against Doppler and timing estimation errors. The multiple-resampling designs, where each branch is tasked with the Doppler distortion of different paths and/or users, overcome the disadvantages of the commonly-used single-resampling receivers and yield significant performance gains. Multiple-resampling receivers are also demonstrated to be necessary for UWA OFDMA systems. The unique design effectively mitigates interuser interference (IUI), opening up the possibility to exploit advanced user subcarrier assignment schemes. Finally, the benefits of the multiple-resampling receivers are further demonstrated through channel capacity evaluation results.

ContributorsTu, Kai (Author) / Duman, Tolga M. (Thesis advisor) / Zhang, Junshan (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2011

Incorporating auditory models in speech/audio applications

Description

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding…

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.

ContributorsKrishnamoorthi, Harish (Author) / Spanias, Andreas (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2011

Materialized views over heterogeneous structured data sources in a distributed event stream processing environment

Description

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query…

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment.

ContributorsChaudhari, Mahesh Balkrishna (Author) / Dietrich, Suzanne W (Thesis advisor) / Urban, Susan D (Committee member) / Davulcu, Hasan (Committee member) / Chen, Yi (Committee member) / Arizona State University (Publisher)

Created2011

Waveform mapping and time-frequency processing of biological sequences and structures

Description

Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry,…

Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry, physics, computer science and electrical engineering. In particular, signal processing techniques were applied to the problems of sequence querying and alignment, that compare and classify regions of similarity in the sequences based on their composition. However, although current approaches obtain results that can be attributed to key biological properties, they require pre-processing and lack robustness to sequence repetitions. In addition, these approaches do not provide much support for efficiently querying sub-sequences, a process that is essential for tracking localized database matches. In this work, a query-based alignment method for biological sequences that maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane is first proposed. The mapping uses waveforms, such as time-domain Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach is demonstrated for both DNA and protein sequences using the matching pursuit decomposition as the signal basis expansion. The alignment localization of WAVEQuery is specifically evaluated over repetitive database segments, and operable in real-time without pre-processing. It is demonstrated that WAVEQuery significantly outperforms the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction. For protein alignment, it is often necessary to not only compare the one-dimensional (1-D) primary sequence structure but also the secondary and tertiary three-dimensional (3-D) space structures. This is done after considering the conformations in the 3-D space due to the degrees of freedom of these structures. As a result, a novel directionality based 3-D waveform mapping for the 3-D protein structures is also proposed and it is used to compare protein structures using a matched filter approach. By incorporating a 3-D time axis, a highly-localized Gaussian-windowed chirp waveform is defined, and the amino acid information is mapped to the chirp parameters that are then directly used to obtain directionality in the 3-D space. This mapping is unique in that additional characteristic protein information such as hydrophobicity, that relates the sequence with the structure, can be added as another representation parameter. The additional parameter helps tracking similarities over local segments of the structure, this enabling classification of distantly related proteins which have partial structural similarities. This approach is successfully tested for pairwise alignments over full length structures, alignments over multiple structures to form a phylogenetic trees, and also alignments over local segments. Also, basic classification over protein structural classes using directional descriptors for the protein structure is performed.

ContributorsRavichandran, Lakshminarayan (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Spanias, Andreas S (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Lacroix, Zoé (Committee member) / Arizona State University (Publisher)

Created2011

Multiscale Modeling & Virtual Sensing for Structural Health Monitoring

Description

Damage assessment and residual useful life estimation (RULE) are essential for aerospace, civil and naval structures. Structural Health Monitoring (SHM) attempts to automate the process of damage detection and identification. Multiscale modeling is a key element in SHM. It not only provides important information on the physics of failure, such…

Damage assessment and residual useful life estimation (RULE) are essential for aerospace, civil and naval structures. Structural Health Monitoring (SHM) attempts to automate the process of damage detection and identification. Multiscale modeling is a key element in SHM. It not only provides important information on the physics of failure, such as damage initiation and growth, the output can be used as "virtual sensing" data for detection and prognosis. The current research is part of an ongoing multidisciplinary effort to develop an integrated SHM framework for metallic aerospace components. In this thesis a multiscale model has been developed by bridging the relevant length scales, micro, meso and macro (or structural scale). Micro structural representations obtained from material characterization studies are used to define the length scales and to capture the size and orientation of the grains at the micro level. Parametric studies are conducted to estimate material parameters used in this constitutive model. Numerical and experimental simulations are performed to investigate the effects of Representative Volume Element (RVE) size, defect area fraction and distribution. A multiscale damage criterion accounting for crystal orientation effect is developed. This criterion is applied for fatigue crack initial stage prediction. A damage evolution rule based on strain energy density is modified to incorporate crystal plasticity at the microscale (local). Optimization approaches are used to calculate global damage index which is used for the RVE failure prediciton. Potential cracking directions are provided from the damage criterion simultaneously. A wave propagation model is incorporated with the damage model to detect changes in sensing signals due to plastic deformation and damage growth.

ContributorsLuo, Chuntao (Author) / Chattopadhyay, Aditi (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Jiang, Hanqing (Committee member) / Dai, Lenore (Committee member) / Li, Jian (Committee member) / Arizona State University (Publisher)

Created2011

Nonlinear inelastic mechanical behavior of epoxy resin polymeric materials

Description

Polymer and polymer matrix composites (PMCs) materials are being used extensively in different civil and mechanical engineering applications. The behavior of the epoxy resin polymers under different types of loading conditions has to be understood before the mechanical behavior of Polymer Matrix Composites (PMCs) can be accurately predicted. In many…

Polymer and polymer matrix composites (PMCs) materials are being used extensively in different civil and mechanical engineering applications. The behavior of the epoxy resin polymers under different types of loading conditions has to be understood before the mechanical behavior of Polymer Matrix Composites (PMCs) can be accurately predicted. In many structural applications, PMC structures are subjected to large flexural loadings, examples include repair of structures against earthquake and engine fan cases. Therefore it is important to characterize and model the flexural mechanical behavior of epoxy resin materials. In this thesis, a comprehensive research effort was undertaken combining experiments and theoretical modeling to investigate the mechanical behavior of epoxy resins subject to different loading conditions. Epoxy resin E 863 was tested at different strain rates. Samples with dog-bone geometry were used in the tension tests. Small sized cubic, prismatic, and cylindrical samples were used in compression tests. Flexural tests were conducted on samples with different sizes and loading conditions. Strains were measured using the digital image correlation (DIC) technique, extensometers, strain gauges, and actuators. Effects of triaxiality state of stress were studied. Cubic, prismatic, and cylindrical compression samples undergo stress drop at yield, but it was found that only cubic samples experience strain hardening before failure. Characteristic points of tensile and compressive stress strain relation and load deflection curve in flexure were measured and their variations with strain rate studied. Two different stress strain models were used to investigate the effect of out-of-plane loading on the uniaxial stress strain response of the epoxy resin material. The first model is a strain softening with plastic flow for tension and compression. The influence of softening localization on material behavior was investigated using the DIC system. It was found that compression plastic flow has negligible influence on flexural behavior in epoxy resins, which are stronger in pre-peak and post-peak softening in compression than in tension. The second model was a piecewise-linear stress strain curve simplified in the post-peak response. Beams and plates with different boundary conditions were tested and analytically studied. The flexural over-strength factor for epoxy resin polymeric materials were also evaluated.

ContributorsYekani Fard, Masoud (Author) / Chattopadhyay, Aditi (Thesis advisor) / Dai, Lenore (Committee member) / Li, Jian (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Rajadas, John (Committee member) / Arizona State University (Publisher)

Created2011

Filtering by