Search Content

Waveform mapping and time-frequency processing of biological sequences and structures

Description

Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry,…

Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry, physics, computer science and electrical engineering. In particular, signal processing techniques were applied to the problems of sequence querying and alignment, that compare and classify regions of similarity in the sequences based on their composition. However, although current approaches obtain results that can be attributed to key biological properties, they require pre-processing and lack robustness to sequence repetitions. In addition, these approaches do not provide much support for efficiently querying sub-sequences, a process that is essential for tracking localized database matches. In this work, a query-based alignment method for biological sequences that maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane is first proposed. The mapping uses waveforms, such as time-domain Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach is demonstrated for both DNA and protein sequences using the matching pursuit decomposition as the signal basis expansion. The alignment localization of WAVEQuery is specifically evaluated over repetitive database segments, and operable in real-time without pre-processing. It is demonstrated that WAVEQuery significantly outperforms the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction. For protein alignment, it is often necessary to not only compare the one-dimensional (1-D) primary sequence structure but also the secondary and tertiary three-dimensional (3-D) space structures. This is done after considering the conformations in the 3-D space due to the degrees of freedom of these structures. As a result, a novel directionality based 3-D waveform mapping for the 3-D protein structures is also proposed and it is used to compare protein structures using a matched filter approach. By incorporating a 3-D time axis, a highly-localized Gaussian-windowed chirp waveform is defined, and the amino acid information is mapped to the chirp parameters that are then directly used to obtain directionality in the 3-D space. This mapping is unique in that additional characteristic protein information such as hydrophobicity, that relates the sequence with the structure, can be added as another representation parameter. The additional parameter helps tracking similarities over local segments of the structure, this enabling classification of distantly related proteins which have partial structural similarities. This approach is successfully tested for pairwise alignments over full length structures, alignments over multiple structures to form a phylogenetic trees, and also alignments over local segments. Also, basic classification over protein structural classes using directional descriptors for the protein structure is performed.

ContributorsRavichandran, Lakshminarayan (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Spanias, Andreas S (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Lacroix, Zoé (Committee member) / Arizona State University (Publisher)

Created2011

Sparse learning package with stability selection and application to alzheimer's disease

Description

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of…

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for learning a sparse set of the most relevant features for both regression and classification problems. Structural dependencies among features which introduce additional requirements are also provided as part of the package. The features may be grouped together, and there may exist hierarchies and over- lapping groups among these, and there may be requirements for selecting the most relevant groups among them. In spite of getting sparse solutions, the solutions are not guaranteed to be robust. For the selection to be robust, there are certain techniques which provide theoretical justification of why certain features are selected. The stability selection, is a method for feature selection which allows the use of existing sparse learning methods to select the stable set of features for a given training sample. This is done by assigning probabilities for the features: by sub-sampling the training data and using a specific sparse learning technique to learn the relevant features, and repeating this a large number of times, and counting the probability as the number of times a feature is selected. Cross-validation which is used to determine the best parameter value over a range of values, further allows to select the best parameter value. This is done by selecting the parameter value which gives the maximum accuracy score. With such a combination of algorithms, with good convergence guarantees, stable feature selection properties and the inclusion of various structural dependencies among features, the sparse learning package will be a powerful tool for machine learning research. Modular structure, C implementation, ATLAS integration for fast linear algebraic subroutines, make it one of the best tool for a large sparse setting. The varied collection of algorithms, support for group sparsity, batch algorithms, are a few of the notable functionality of the SLEP package, and these features can be used in a variety of fields to infer relevant elements. The Alzheimer Disease(AD) is a neurodegenerative disease, which gradually leads to dementia. The SLEP package is used for feature selection for getting the most relevant biomarkers from the available AD dataset, and the results show that, indeed, only a subset of the features are required to gain valuable insights.

ContributorsThulasiram, Ramesh (Author) / Ye, Jieping (Thesis advisor) / Xue, Guoliang (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Directional information flow and applications

Description

In the late 1960s, Granger published a seminal study on causality in time series, using linear interdependencies and information transfer. Recent developments in the field of information theory have introduced new methods to investigate the transfer of information in dynamical systems. Using concepts from Chaos and Markov theory, much of…

In the late 1960s, Granger published a seminal study on causality in time series, using linear interdependencies and information transfer. Recent developments in the field of information theory have introduced new methods to investigate the transfer of information in dynamical systems. Using concepts from Chaos and Markov theory, much of these methods have evolved to capture non-linear relations and information flow between coupled dynamical systems with applications to fields like biomedical signal processing. This thesis deals with the application of information theory to non-linear multivariate time series and develops measures of information flow to identify significant drivers and response (driven) components in networks of coupled sub-systems with variable coupling in strength and direction (uni- or bi-directional) for each connection. Transfer Entropy (TE) is used to quantify pairwise directional information. Four TE-based measures of information flow are proposed, namely TE Outflow (TEO), TE Inflow (TEI), TE Net flow (TEN), and Average TE flow (ATE). First, the reliability of the information flow measures on models, with and without noise, is evaluated. The driver and response sub-systems in these models are identified. Second, these measures are applied to electroencephalographic (EEG) data from two patients with focal epilepsy. The analysis showed dominant directions of information flow between brain sites and identified the epileptogenic focus as the system component typically with the highest value for the proposed measures (for example, ATE). Statistical tests between pre-seizure (preictal) and post-seizure (postictal) information flow also showed a breakage of the driving of the brain by the focus after seizure onset. The above findings shed light on the function of the epileptogenic focus and understanding of ictogenesis. It is expected that they will contribute to the diagnosis of epilepsy, for example by accurate identification of the epileptogenic focus from interictal periods, as well as the development of better seizure detection, prediction and control methods, for example by isolating pathologic areas of excessive information flow through electrical stimulation.

ContributorsPrasanna, Shashank (Author) / Jassemidis, Leonidas (Thesis advisor) / Tsakalis, Konstantinos (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2011

Multi-task learning via structured regularization: formulations, algorithms, and applications

Description

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It…

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose a series of MTL approaches which assume that multiple tasks are intrinsically related via a shared low-dimensional feature space. The proposed MTL approaches are developed to deal with different scenarios and settings; they are respectively formulated as mathematical optimization problems of minimizing the empirical loss regularized by different structures. For all proposed MTL formulations, I develop the associated optimization algorithms to find their globally optimal solution efficiently. I also conduct theoretical analysis for certain MTL approaches by deriving the globally optimal solution recovery condition and the performance bound. To demonstrate the practical performance, I apply the proposed MTL approaches on different real-world applications: (1) Automated annotation of the Drosophila gene expression pattern images; (2) Categorization of the Yahoo web pages. Our experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.

ContributorsChen, Jianhui (Author) / Ye, Jieping (Thesis advisor) / Kumar, Sudhir (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2011

Adaptive parameter estimation, modeling and patient-specific classification of electrocardiogram signals

Description

Adaptive processing and classification of electrocardiogram (ECG) signals are important in eliminating the strenuous process of manually annotating ECG recordings for clinical use. Such algorithms require robust models whose parameters can adequately describe the ECG signals. Although different dynamic statistical models describing ECG signals currently exist, they depend considerably on…

Adaptive processing and classification of electrocardiogram (ECG) signals are important in eliminating the strenuous process of manually annotating ECG recordings for clinical use. Such algorithms require robust models whose parameters can adequately describe the ECG signals. Although different dynamic statistical models describing ECG signals currently exist, they depend considerably on a priori information and user-specified model parameters. Also, ECG beat morphologies, which vary greatly across patients and disease states, cannot be uniquely characterized by a single model. In this work, sequential Bayesian based methods are used to appropriately model and adaptively select the corresponding model parameters of ECG signals. An adaptive framework based on a sequential Bayesian tracking method is proposed to adaptively select the cardiac parameters that minimize the estimation error, thus precluding the need for pre-processing. Simulations using real ECG data from the online Physionet database demonstrate the improvement in performance of the proposed algorithm in accurately estimating critical heart disease parameters. In addition, two new approaches to ECG modeling are presented using the interacting multiple model and the sequential Markov chain Monte Carlo technique with adaptive model selection. Both these methods can adaptively choose between different models for various ECG beat morphologies without requiring prior ECG information, as demonstrated by using real ECG signals. A supervised Bayesian maximum-likelihood (ML) based classifier uses the estimated model parameters to classify different types of cardiac arrhythmias. However, the non-availability of sufficient amounts of representative training data and the large inter-patient variability pose a challenge to the existing supervised learning algorithms, resulting in a poor classification performance. In addition, recently developed unsupervised learning methods require a priori knowledge on the number of diseases to cluster the ECG data, which often evolves over time. In order to address these issues, an adaptive learning ECG classification method that uses Dirichlet process Gaussian mixture models is proposed. This approach does not place any restriction on the number of disease classes, nor does it require any training data. This algorithm is adapted to be patient-specific by labeling or identifying the generated mixtures using the Bayesian ML method, assuming the availability of labeled training data.

ContributorsEdla, Shwetha Reddy (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kovvali, Narayan (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2012

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

On asynchronous communication systems: capacity bounds and relaying schemes

Description

Practical communication systems are subject to errors due to imperfect time alignment among the communicating nodes. Timing errors can occur in different forms depending on the underlying communication scenario. This doctoral study considers two different classes of asynchronous systems; point-to-point (P2P) communication systems with synchronization errors, and asynchronous cooperative systems.…

Practical communication systems are subject to errors due to imperfect time alignment among the communicating nodes. Timing errors can occur in different forms depending on the underlying communication scenario. This doctoral study considers two different classes of asynchronous systems; point-to-point (P2P) communication systems with synchronization errors, and asynchronous cooperative systems. In particular, the focus is on an information theoretic analysis for P2P systems with synchronization errors and developing new signaling solutions for several asynchronous cooperative communication systems. The first part of the dissertation presents several bounds on the capacity of the P2P systems with synchronization errors. First, binary insertion and deletion channels are considered where lower bounds on the mutual information between the input and output sequences are computed for independent uniformly distributed (i.u.d.) inputs. Then, a channel suffering from both synchronization errors and additive noise is considered as a serial concatenation of a synchronization error-only channel and an additive noise channel. It is proved that the capacity of the original channel is lower bounded in terms of the synchronization error-only channel capacity and the parameters of both channels. On a different front, to better characterize the deletion channel capacity, the capacity of three independent deletion channels with different deletion probabilities are related through an inequality resulting in the tightest upper bound on the deletion channel capacity for deletion probabilities larger than 0.65. Furthermore, the first non-trivial upper bound on the 2K-ary input deletion channel capacity is provided by relating the 2K-ary input deletion channel capacity with the binary deletion channel capacity through an inequality. The second part of the dissertation develops two new relaying schemes to alleviate asynchronism issues in cooperative communications. The first one is a single carrier (SC)-based scheme providing a spectrally efficient Alamouti code structure at the receiver under flat fading channel conditions by reducing the overhead needed to overcome the asynchronism and obtain spatial diversity. The second one is an orthogonal frequency division multiplexing (OFDM)-based approach useful for asynchronous cooperative systems experiencing excessive relative delays among the relays under frequency-selective channel conditions to achieve a delay diversity structure at the receiver and extract spatial diversity.

ContributorsRahmati, Mojtaba (Author) / Duman, Tolga M. (Thesis advisor) / Zhang, Junshan (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Reisslein, Martin (Committee member) / Arizona State University (Publisher)

Created2013

Multipath mitigating correlation kernels

Description

Autonomous vehicle control systems utilize real-time kinematic Global Navigation Satellite Systems (GNSS) receivers to provide a position within two-centimeter of truth. GNSS receivers utilize the satellite signal time of arrival estimates to solve for position; and multipath corrupts the time of arrival estimates with a time-varying bias. Time of arrival…

Autonomous vehicle control systems utilize real-time kinematic Global Navigation Satellite Systems (GNSS) receivers to provide a position within two-centimeter of truth. GNSS receivers utilize the satellite signal time of arrival estimates to solve for position; and multipath corrupts the time of arrival estimates with a time-varying bias. Time of arrival estimates are based upon accurate direct sequence spread spectrum (DSSS) code and carrier phase tracking. Current multipath mitigating GNSS solutions include fixed radiation pattern antennas and windowed delay-lock loop code phase discriminators. A new multipath mitigating code tracking algorithm is introduced that utilizes a non-symmetric correlation kernel to reject multipath. Independent parameters provide a means to trade-off code tracking discriminant gain against multipath mitigation performance. The algorithm performance is characterized in terms of multipath phase error bias, phase error estimation variance, tracking range, tracking ambiguity and implementation complexity. The algorithm is suitable for modernized GNSS signals including Binary Phase Shift Keyed (BPSK) and a variety of Binary Offset Keyed (BOC) signals. The algorithm compensates for unbalanced code sequences to ensure a code tracking bias does not result from the use of asymmetric correlation kernels. The algorithm does not require explicit knowledge of the propagation channel model. Design recommendations for selecting the algorithm parameters to mitigate precorrelation filter distortion are also provided.

ContributorsMiller, Steven (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2013

Audio processing and loudness estimation algorithms with iOS simulations

Description

The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters…

The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.

ContributorsKalyanasundaram, Girish (Author) / Spanias, Andreas S (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2013

Low velocity impact properties of sandwich insulated panels with textile - reinforced concrete skin and aerated concrete core

Description

The main objective of this study is to develop an innovative system in the form of a sandwich panel type composite with textile reinforced skins and aerated concrete core. Existing theoretical concepts along with extensive experimental investigations were utilized to characterize the behavior of cement based systems in the presence…

The main objective of this study is to develop an innovative system in the form of a sandwich panel type composite with textile reinforced skins and aerated concrete core. Existing theoretical concepts along with extensive experimental investigations were utilized to characterize the behavior of cement based systems in the presence of individual fibers and textile yarns. Part of this thesis is based on a material model developed here in Arizona State University to simulate experimental flexural response and back calculate tensile response. This concept is based on a constitutive law consisting of a tri-linear tension model with residual strength and a bilinear elastic perfectly plastic compression stress strain model. This parametric model was used to characterize Textile Reinforced Concrete (TRC) with aramid, carbon, alkali resistant glass, polypropylene TRC and hybrid systems of aramid and polypropylene. The same material model was also used to characterize long term durability issues with glass fiber reinforced concrete (GFRC). Historical data associated with effect of temperature dependency in aging of GFRC composites were used. An experimental study was conducted to understand the behavior of aerated concrete systems under high stain rate impact loading. Test setup was modeled on a free fall drop of an instrumented hammer using three point bending configuration. Two types of aerated concrete: autoclaved aerated concrete (AAC) and polymeric fiber-reinforced aerated concrete (FRAC) were tested and compared in terms of their impact behavior. The effect of impact energy on the mechanical properties was investigated for various drop heights and different specimen sizes. Both materials showed similar flexural load carrying capacity under impact, however, flexural toughness of fiber-reinforced aerated concrete was proved to be several degrees higher in magnitude than that provided by plain autoclaved aerated concrete. Effect of specimen size and drop height on the impact response of AAC and FRAC was studied and discussed. Results obtained were compared to the performance of sandwich beams with AR glass textile skins with aerated concrete core under similar impact conditions. After this extensive study it was concluded that this type of sandwich composite could be effectively used in low cost sustainable infrastructure projects.

ContributorsDey, Vikram (Author) / Mobasher, Barzin (Thesis advisor) / Rajan, Subramaniam D. (Committee member) / Neithalath, Narayanan (Committee member) / Arizona State University (Publisher)

Created2012