Search Content

Learning Robust and Repeatable Speech Features for Clinical Applications

Description

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on…

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on conventional acoustic feature changes in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI). This work demonstrates the effectiveness of using speech signals to assess the pathological status of individuals. At the same time, it highlights some of the limitations of conventional acoustic and linguistic features, such as low repeatability and generalizability. Two critical characteristics of speech features are (1) good robustness, as speech features need to generalize across different corpora, and (2) high repeatability, as speech features need to be invariant to all confounding factors except the pathological state of targets. This thesis presents two research thrusts in the context of speech signals in clinical applications that focus on improving the robustness and repeatability of speech features, respectively. The first thrust introduces a deep learning framework to generate acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss combined with a classification loss is used to train the model jointly, and data-warping techniques are employed to improve the robustness of embeddings. Empirical results demonstrate that the proposed method achieves high in-corpus and cross-corpus classification accuracy and generates good embeddings sensitive to voice quality and robust across different corpora. The second thrust introduces using the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. A novel regularizer, the ICC regularizer, is proposed to regularize deep neural networks to produce embeddings with higher repeatability. This ICC regularizer is implemented and applied to three speech applications: a clinical application, speaker verification, and voice style conversion. The experimental results reveal that the ICC regularizer improves the repeatability of learned embeddings compared to the contrastive loss, leading to enhanced performance in downstream tasks.

ContributorsZhang, Jianwei (Author) / Jayasuriya, Suren (Thesis advisor) / Berisha, Visar (Thesis advisor) / Liss, Julie (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2023

Investigating Quantum Approaches to Algorithm Privacy and Speech Processing

Description

Quantum computing is becoming more accessible through modern noisy intermediate scale quantum (NISQ) devices. These devices require substantial error correction and scaling before they become capable of fulfilling many of the promises that quantum computing algorithms make. This work investigates the current state of NISQ devices by implementing multiple classical…

Quantum computing is becoming more accessible through modern noisy intermediate scale quantum (NISQ) devices. These devices require substantial error correction and scaling before they become capable of fulfilling many of the promises that quantum computing algorithms make. This work investigates the current state of NISQ devices by implementing multiple classical computing scenarios with a quantum analog to observe how current quantum technology can be leveraged to achieve different tasks. First, quantum homomorphic encryption (QHE) is applied to the quantum teleportation protocol to show that this form of algorithm security is possible to implement with modern quantum computing simulators. QHE is capable of completely obscuring a teleported state with a liner increase in the number of qubit gates O(n). Additionally, the circuit depth increases minimally by only a constant factor O(c) when using only stabilizer circuits. Quantum machine learning (QML) is another potential application of NISQ technology that can be used to modify classical AI. QML is investigated using quantum hybrid neural networks for the classification of spoken commands on live audio data. Additionally, an edge computing scenario is examined to profile the interactions between a quantum simulator acting as a cloud server and an embedded processor board at the network edge. It is not practical to embed NISQ processors at a network edge, so this paradigm is important to study for practical quantum computing systems. The quantum hybrid neural network (QNN) learned to classify audio with equivalent accuracy (~94%) to a classical recurrent neural network. Introducing quantum simulation slows the systems responsiveness because it takes significantly longer to process quantum simulations than a classical neural network. This work shows that it is viable to implement classical computing techniques with quantum algorithms, but that current NISQ processing is sub-optimal when compared to classical methods.

ContributorsYarter, Maxwell (Author) / Spanias, Andreas (Thesis advisor) / Arenz, Christian (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)

Created2023

Distributed Learning and Data Collection with Strategic Agents

Description

The presence of strategic agents can pose unique challenges to data collection and distributed learning. This dissertation first explores the social network dimension of data collection markets, and then focuses on how the strategic agents can be efficiently and effectively incentivized to cooperate in distributed machine learning frameworks. The first problem…

The presence of strategic agents can pose unique challenges to data collection and distributed learning. This dissertation first explores the social network dimension of data collection markets, and then focuses on how the strategic agents can be efficiently and effectively incentivized to cooperate in distributed machine learning frameworks. The first problem explores the impact of social learning in collecting and trading unverifiable information where a data collector purchases data from users through a payment mechanism. Each user starts with a personal signal which represents the knowledge about the underlying state the data collector desires to learn. Through social interactions, each user also acquires additional information from his neighbors in the social network. It is revealed that both the data collector and the users can benefit from social learning which drives down the privacy costs and helps to improve the state estimation for a given total payment budget. In the second half, a federated learning scheme to train a global learning model with strategic agents, who are not bound to contribute their resources unconditionally, is considered. Since the agents are not obliged to provide their true stochastic gradient updates and the server is not capable of directly validating the authenticity of reported updates, the learning process may reach a noncooperative equilibrium. First, the actions of the agents are assumed to be binary: cooperative or defective. If the cooperative action is taken, the agent sends a privacy-preserved version of stochastic gradient signal. If the defective action is taken, the agent sends an arbitrary uninformative noise signal. Furthermore, this setup is extended into the scenarios with more general actions spaces where the quality of the stochastic gradient updates have a range of discrete levels. The proposed methodology evaluates each agent's stochastic gradient according to a reference gradient estimate which is constructed from the gradients provided by other agents, and rewards the agent based on that evaluation.

ContributorsAkbay, Abdullah Basar (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Spanias, Andreas (Committee member) / Kosut, Oliver (Committee member) / Ewaisha, Ahmed (Committee member) / Arizona State University (Publisher)

Created2023

Development of Signal Analysis Synthesis Methods : Quantum Fourier Transforms and Quantum Linear Prediction Algorithms

Description

Quantum computing has the potential to revolutionize the signal-processing field by providing more efficient methods for analyzing signals. This thesis explores the application of quantum computing in signal analysis synthesis for compression applications. More specifically, the study focuses on two key approaches: quantum Fourier transform (QFT) and quantum linear prediction…

Quantum computing has the potential to revolutionize the signal-processing field by providing more efficient methods for analyzing signals. This thesis explores the application of quantum computing in signal analysis synthesis for compression applications. More specifically, the study focuses on two key approaches: quantum Fourier transform (QFT) and quantum linear prediction (QLP). The research is motivated by the potential advantages offered by quantum computing in massive signal processing tasks and presents novel quantum circuit designs for QFT, quantum autocorrelation, and QLP, enabling signal analysis synthesis using quantum algorithms. The two approaches are explained as follows. The Quantum Fourier transform (QFT) demonstrates the potential for improved speed in quantum computing compared to classical methods. This thesis focuses on quantum encoding of signals and designing quantum algorithms for signal analysis synthesis, and signal compression using QFTs. Comparative studies are conducted to evaluate quantum computations for Fourier transform applications, considering Signal-to-Noise-Ratio results. The effects of qubit precision and quantum noise are also analyzed. The QFT algorithm is also developed in the J-DSP simulation environment, providing hands-on laboratory experiences for signal-processing students. User-friendly simulation programs on QFT-based signal analysis synthesis using peak picking, and perceptual selection using psychoacoustics in the J-DSP are developed. Further, this research is extended to analyze the autocorrelation of the signal using QFTs and develop a quantum linear prediction (QLP) algorithm for speech processing applications. QFTs and IQFTs are used to compute the quantum autocorrelation of the signal, and the HHL algorithm is modified and used to compute the solutions of the linear equations using quantum computing. The performance of the QLP algorithm is evaluated for system identification, spectral estimation, and speech analysis synthesis, and comparisons are performed for QLP and CLP results. The results demonstrate the following: effective quantum circuits for accurate QFT-based speech analysis synthesis, evaluation of performance with quantum noise, design of accurate quantum autocorrelation, and development of a modified HHL algorithm for efficient QLP. Overall, this thesis contributes to the research on quantum computing for signal processing applications and provides a foundation for further exploration of quantum algorithms for signal analysis synthesis.

ContributorsSharma, Aradhita (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2023

Methodologies to Improve Fidelity and Reliability of Deep Learning Models for Real-World Deployment

Description

The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target…

The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target distributions and (iv) belief on existing metrics as reliable indicators of performance. When any of these assumptions are violated, the models exhibit brittleness producing adversely varied behavior. This dissertation focuses on methods for accurate model design and characterization that enhance process reliability when certain assumptions are not met. With the need to safely adopt artificial intelligence tools in practice, it is vital to build reliable failure detectors that indicate regimes where the model must not be invoked. To that end, an error predictor trained with a self-calibration objective is developed to estimate loss consistent with the underlying model. The properties of the error predictor are described and their utility in supporting introspection via feature importances and counterfactual explanations is elucidated. While such an approach can signal data regime changes, it is critical to calibrate models using regimes of inlier (training) and outlier data to prevent under- and over-generalization in models i.e., incorrectly identifying inliers as outliers and vice-versa. By identifying the space for specifying inliers and outliers, an anomaly detector that can effectively flag data of varying semantic complexities in medical imaging is next developed. Uncertainty quantification in deep learning models involves identifying sources of failure and characterizing model confidence to enable actionability. A training strategy is developed that allows the accurate estimation of model uncertainties and its benefits are demonstrated for active learning and generalization gap prediction. This helps identify insufficiently sampled regimes and representation insufficiency in models. In addition, the task of deep inversion under data scarce scenarios is considered, which in practice requires a prior to control the optimization. By identifying limitations in existing work, data priors powered by generative models and deep model priors are designed for audio restoration. With relevant empirical studies on a variety of benchmarks, the need for such design strategies is demonstrated.

ContributorsNarayanaswamy, Vivek Sivaraman (Author) / Spanias, Andreas (Thesis advisor) / J. Thiagarajan, Jayaraman (Committee member) / Berisha, Visar (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2023

Does Human Speech Follow Benford's Law

Description

Researchers have observed that the frequencies of leading digits in many man-made and naturally occurring datasets follow a logarithmic curve, with digits that start with the number 1 accounting for 30% of all numbers in the dataset and digits that start with the number 9 accounting for 5% of all…

Researchers have observed that the frequencies of leading digits in many man-made and naturally occurring datasets follow a logarithmic curve, with digits that start with the number 1 accounting for 30% of all numbers in the dataset and digits that start with the number 9 accounting for 5% of all numbers in the dataset. This phenomenon, known as Benford's Law, is highly repeatable and appears in lists of numbers from electricity bills, stock prices, tax returns, house prices, death rates, lengths of rivers, and naturally occurring images. This paper will demonstrate that human speech spectra also follow Benford's Law. This observation is used to motivate a new set of features that can be efficiently extracted from speech and demonstrate that these features can be used to classify between human speech and synthetic speech.

ContributorsHsu, Leo (Author) / Berisha, Visar (Thesis advisor) / Spanias, Andreas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2022

Covid-19 Hotspot Estimation Using Consensus Methods, SEIR Models and ML Algorithms

Description

The primary objective of this thesis is to identify locations or regions where COVID-19 transmission is more prevalent, termed “hotspots,” assess the likelihood of contracting the virus after visiting crowded areas or potential hotspots, and make predictions on confirmed COVID-19 cases and recoveries. A consensus algorithm is used to identify…

The primary objective of this thesis is to identify locations or regions where COVID-19 transmission is more prevalent, termed “hotspots,” assess the likelihood of contracting the virus after visiting crowded areas or potential hotspots, and make predictions on confirmed COVID-19 cases and recoveries. A consensus algorithm is used to identify such hotspots; the SEIR epidemiological model tracks COVID-19 cases, allowing for a better understanding of the disease dynamics and enabling informed decision-making in public health strategies. Consensus-based distributed methodologies have been developed to estimate the magnitude, density, and locations of COVID-19 hotspots to provide well-informed alerts based on continuous data risk assessments. Assuming agents own a mobile device, transmission hotspots use information from user devices with Bluetooth and WiFi. In a consensus-based distributed clustering algorithm, users are divided into smaller groups, and then the number of users is estimated in each group. This process allows for the determination of the population of an outdoor site and the distances between individuals. The proposed algorithm demonstrates versatility by being applicable not only in outdoor environments but also in indoor settings. Considerations are made for signal attenuation caused by walls and other barriers to adapt to indoor environments, and a wall detection algorithm is employed for this purpose. The clustering mechanism is designed to dynamically choose the appropriate clustering technique based on data-dependent patterns, ensuring that every node undergoes proper clustering. After networks have been established and clustered, the output of the consensus algorithmis fed as one of many inputs into the SEIR model. SEIR, representing Susceptible, Exposed, Infectious, and Removed, forms the basis of a model designed to assess the probability of infection at a Point of Interest (POI). The SEIR model utilizes calculated parameters such as β (contact), σ (latency),γ (recovery), ω (loss of immunity) along with current COVID-19 case data to precisely predict the infection spread in a specific area. The SEIR model is implemented with diverse methodologies for transitioning populations between compartments. Hence, the model identifies optimal parameter values under different conditions and scenarios and forecasts the number of infected and recovered cases for the upcoming days.

ContributorsPatel, Bhavikkumar (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Thesis advisor) / Banavar, Mahesh (Committee member) / Arizona State University (Publisher)

Created2024

Computational Modeling and Analysis of Symmetry in Human Movements

Description

Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees…

Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees of quality, and the quality can be defined with regard to aesthetic, athletic, or health-related ratings. It is useful to measure and track the quality of a person's movements, for various applications, especially with the prevalence of low-cost and portable cameras and sensors today. Furthermore, based on such measurements, feedback systems can be designed for people to practice their movements towards certain goals. In this dissertation, I introduce symmetry as a family of measures for movement quality, and utilize recent advances in computer vision and differential geometry to model and analyze different types of symmetry in human movements. Movements are modeled as trajectories on different types of manifolds, according to the representations of movements from sensor data. The benefit of such a universal framework is that it can accommodate different existing and future features that describe human movements. The theory and tools developed in this dissertation will also be useful in other scientific areas to analyze symmetry from high-dimensional signals.

ContributorsWang, Qiao (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Srivastava, Anuj (Committee member) / Sha, Xin Wei (Committee member) / Arizona State University (Publisher)

Created2018

Geometric approaches for modeling movement quality: applications in motor control and therapy

Description

There has been tremendous technological advancement in the past two decades. Faster computers and improved sensing devices have broadened the research scope in computer vision. With these developments, the task of assessing the quality of human actions, is considered an important problem that needs to be tackled. Movement quality assessment…

There has been tremendous technological advancement in the past two decades. Faster computers and improved sensing devices have broadened the research scope in computer vision. With these developments, the task of assessing the quality of human actions, is considered an important problem that needs to be tackled. Movement quality assessment finds wide range of application in motor control, health-care, rehabilitation and physical therapy. Home-based interactive physical therapy requires the ability to monitor, inform and assess the quality of everyday movements. Obtaining labeled data from trained therapists/experts is the main limitation, since it is both expensive and time consuming.

Motivated by recent studies in motor control and therapy, in this thesis an existing computational framework is used to assess balance impairment and disease severity in people suffering from Parkinson's disease. The framework uses high-dimensional shape descriptors of the reconstructed phase space, of the subjects' center of pressure (CoP) tracings while performing dynamical postural shifts. The performance of the framework is evaluated using a dataset collected from 43 healthy and 17 Parkinson's disease impaired subjects, and outperforms other methods, such as dynamical shift indices and use of chaotic invariants, in assessment of balance impairment.

In this thesis, an unsupervised method is also proposed that measures movement quality assessment of simple actions like sit-to-stand and dynamic posture shifts by modeling the deviation of a given movement from an ideal movement path in the configuration space, i.e. the quality of movement is directly related to similarity to the ideal trajectory, between the start and end pose. The S^1xS^1 configuration space was used to model the interaction of two joint angles in sit-to-stand actions, and the R^2 space was used to model the subject's CoP while performing dynamic posture shifts for application in movement quality estimation.

ContributorsSom, Anirudh (Author) / Turaga, Pavan (Thesis advisor) / Krishnamurthi, Narayanan (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2016

PID controller tuning and adaptation of a buck converter

Description

Buck converters are electronic devices that changes a voltage from one level to a lower one and are present in many everyday applications. However, due to factors like aging, degradation or failures, these devices require a system identification process to track and diagnose their parameters. The system identification process should…

Buck converters are electronic devices that changes a voltage from one level to a lower one and are present in many everyday applications. However, due to factors like aging, degradation or failures, these devices require a system identification process to track and diagnose their parameters. The system identification process should be performed on-line to not affect the normal operation of the device. Identifying the parameters of the system is essential to design and tune an adaptive proportional-integral-derivative (PID) controller.

Three techniques were used to design the PID controller. Phase and gain margin still prevails as one of the easiest methods to design controllers. Pole-zero cancellation is another technique which is based on pole-placement. However, although these controllers can be easily designed, they did not provide the best response compared to the Frequency Loop Shaping (FLS) technique. Therefore, since FLS showed to have a better frequency and time responses compared to the other two controllers, it was selected to perform the adaptation of the system.

An on-line system identification process was performed for the buck converter using indirect adaptation and the least square algorithm. The estimation error and the parameter error were computed to determine the rate of convergence of the system. The indirect adaptation required about 2000 points to converge to the true parameters prior designing the controller. These results were compared to the adaptation executed using robust stability condition (RSC) and a switching controller. Two different scenarios were studied consisting of five plants that defined the percentage of deterioration of the capacitor and inductor within the buck converter. The switching logic did not always select the optimal controller for the first scenario because the frequency response of the different plants was not significantly different. However, the second scenario consisted of plants with more noticeable different frequency responses and the switching logic selected the optimal controller all the time in about 500 points. Additionally, a disturbance was introduced at the plant input to observe its effect in the switching controller. However, for reasonable low disturbances no change was detected in the proper selection of controllers.

ContributorsSerrano Rodriguez, Victoria Melissa (Author) / Tsakalis, Konstantinos (Thesis advisor) / Bakkaloglu, Bertan (Thesis advisor) / Rodriguez, Armando (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by