Search Content

Learning Robust and Repeatable Speech Features for Clinical Applications

Description

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on…

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on conventional acoustic feature changes in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI). This work demonstrates the effectiveness of using speech signals to assess the pathological status of individuals. At the same time, it highlights some of the limitations of conventional acoustic and linguistic features, such as low repeatability and generalizability. Two critical characteristics of speech features are (1) good robustness, as speech features need to generalize across different corpora, and (2) high repeatability, as speech features need to be invariant to all confounding factors except the pathological state of targets. This thesis presents two research thrusts in the context of speech signals in clinical applications that focus on improving the robustness and repeatability of speech features, respectively. The first thrust introduces a deep learning framework to generate acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss combined with a classification loss is used to train the model jointly, and data-warping techniques are employed to improve the robustness of embeddings. Empirical results demonstrate that the proposed method achieves high in-corpus and cross-corpus classification accuracy and generates good embeddings sensitive to voice quality and robust across different corpora. The second thrust introduces using the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. A novel regularizer, the ICC regularizer, is proposed to regularize deep neural networks to produce embeddings with higher repeatability. This ICC regularizer is implemented and applied to three speech applications: a clinical application, speaker verification, and voice style conversion. The experimental results reveal that the ICC regularizer improves the repeatability of learned embeddings compared to the contrastive loss, leading to enhanced performance in downstream tasks.

ContributorsZhang, Jianwei (Author) / Jayasuriya, Suren (Thesis advisor) / Berisha, Visar (Thesis advisor) / Liss, Julie (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2023

Towards Robust VQA: Evaluations and Methods

Description

Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on textual distribution shifts rather than joint shifts across modalities. This…

Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on textual distribution shifts rather than joint shifts across modalities. This is suboptimal for properly assessing model robustness and generalization. To address this gap, a novel multi-modal VQA benchmark dataset is introduced for the first time. This dataset combines both visual and textual distribution shifts across training and test sets. Using this challenging benchmark exposes vulnerabilities in existing models relying on spurious correlations and overfitting to dataset biases. The novel dataset advances the field by enabling more robust model training and rigorous evaluation of multi-modal distribution shift generalization. In addition, a new few-shot multi-modal prompt fusion model is proposed to better adapt models for downstream VQA tasks. The model incorporates a prompt encoder module and dual-path design to align and fuse image and text prompts. This represents a novel prompt learning approach tailored for multi-modal learning across vision and language. Together, the introduced benchmark dataset and prompt fusion model address key limitations around evaluating and improving VQA model robustness. The work expands the methodology for training models resilient to multi-modal distribution shifts.

ContributorsJyothi Unni, Suraj (Author) / Liu, Huan (Thesis advisor) / Davalcu, Hasan (Committee member) / Bryan, Chris (Committee member) / Arizona State University (Publisher)

Created2023

Sparse-Tensor Methods in Physics

Description

In this thesis, applications of sparsity, specifically sparse-tensors are motivated in physics.An algorithm is introduced to natively compute sparse-tensor's partial-traces, along with direct implementations in popular python libraries for immediate use. These applications include the infamous exponentially-scaling (with system size) Quantum-Many-Body problems (both Heisenberg/spin-chain-like and Chemical Hamiltonian models). This sparsity…

In this thesis, applications of sparsity, specifically sparse-tensors are motivated in physics.An algorithm is introduced to natively compute sparse-tensor's partial-traces, along with direct implementations in popular python libraries for immediate use. These applications include the infamous exponentially-scaling (with system size) Quantum-Many-Body problems (both Heisenberg/spin-chain-like and Chemical Hamiltonian models). This sparsity aspect is stressed as an important and essential feature in solving many real-world physical problems approximately-and-numerically. These include the original motivation of solving radiation-damage questions for ultrafast light and electron sources.

ContributorsCandanedo, Julio (Author) / Beckstein, Oliver (Thesis advisor) / Arenz, Christian (Thesis advisor) / Keeler, Cynthia (Committee member) / Erten, Onur (Committee member) / Arizona State University (Publisher)

Created2023

Tree Guided Personalized Machine Learning Prediction With Applications To Precision Diagnostics

Description

The proposed research is motivated by the colon cancer bio-marker study, which recruited case (or colon cancer) and healthy control samples and quantified their large number of candidate bio-markers using a high-throughput technology, called nucleicacid-programmable protein array (NAPPA). The study aimed to identify a panel of biomarkers to accurately distinguish…

The proposed research is motivated by the colon cancer bio-marker study, which recruited case (or colon cancer) and healthy control samples and quantified their large number of candidate bio-markers using a high-throughput technology, called nucleicacid-programmable protein array (NAPPA). The study aimed to identify a panel of biomarkers to accurately distinguish between the cases and controls. A major challenge in analyzing this study was the bio-marker heterogeneity, where bio-marker responses differ from sample to sample. The goal of this research is to improve prediction accuracy for motivating or similar studies. Most machine learning (ML) algorithms, developed under the one-size-fits-all strategy, were not able to analyze the above-mentioned heterogeneous data. Failing to capture the individuality of each subject, several standard ML algorithms tested against this dataset performed poorly resulting in 55-61% accuracy. Alternatively, the proposed personalized ML (PML) strategy aims at tailoring the optimal ML models for each subject according to their individual characteristics yielding best highest accuracy of 72%.

ContributorsShah, Nishtha (Author) / Chung, Yunro (Thesis advisor) / Lee, Kookjin (Thesis advisor) / Ghasemzadeh, Hassan (Committee member) / Arizona State University (Publisher)

Created2023

Modeling Spatial Competition and Adaptive Therapy Protocols in Three-Dimensional Vascularized Tumors

Description

In contrast to traditional chemotherapy for cancer which fails to address tumor heterogeneity, raises patients’ levels of toxicity, and selects for drug-resistant cells, adaptive therapy applies ideas from cancer ecology in employing low-dose drugs to encourage competition between cancerous cells, reducing toxicity and potentially prolonging disease progression. Despite promising results…

In contrast to traditional chemotherapy for cancer which fails to address tumor heterogeneity, raises patients’ levels of toxicity, and selects for drug-resistant cells, adaptive therapy applies ideas from cancer ecology in employing low-dose drugs to encourage competition between cancerous cells, reducing toxicity and potentially prolonging disease progression. Despite promising results in some clinical trials, optimizing adaptive therapy routines involves navigating a vast space of combina- torial possibilities, including the number of drugs, drug holiday duration, and drug dosages. Computational models can serve as precursors to efficiently explore this space, narrowing the scope of possibilities for in-vivo and in-vitro experiments which are time-consuming, expensive, and specific to tumor types. Among the existing modeling techniques, agent-based models are particularly suited for studying the spatial inter- actions critical to successful adaptive therapy. In this thesis, I introduce CancerSim, a three-dimensional agent-based model fully implemented in C++ that is designed to simulate tumorigenesis, angiogenesis, drug resistance, and resource competition within a tissue. Additionally, the model is equipped to assess the effectiveness of various adaptive therapy regimens. The thesis provides detailed insights into the biological motivation and calibration of different model parameters. Lastly, I propose a series of research questions and experiments for adaptive therapy that CancerSim can address in the pursuit of advancing cancer treatment strategies.

ContributorsShah, Sanjana Saurin (Author) / Daymude, Joshua J (Thesis advisor) / Forrest, Stephanie (Committee member) / Maley, Carlo C (Committee member) / Arizona State University (Publisher)

Created2023

A Value-preserving Approach to Regression Test Selection in Agile Methods

Description

This thesis introduces a requirement-based regression test selection approach in an agile development context. Regression testing is critical in ensuring software quality but demands substantial time and resources. The rise of agile methodologies emphasizes the need for swift, iterative software delivery, requiring efficient regression testing. Although executing all existing test…

This thesis introduces a requirement-based regression test selection approach in an agile development context. Regression testing is critical in ensuring software quality but demands substantial time and resources. The rise of agile methodologies emphasizes the need for swift, iterative software delivery, requiring efficient regression testing. Although executing all existing test cases is the most thorough approach, it becomes impractical and resource-intensive for large real-world projects. Regression test selection emerges as a solution to this challenge, focusing on identifying a subset of test cases that efficiently uncover potential faults due to changes in the existing code. Existing literature on regression test selection in agile settings presents strategies that may only partially embrace agile characteristics. This research proposes a regression test selection method by utilizing data from user stories—agile's equivalent of requirements—and the associated business value spanning successive releases to pinpoint regression test cases. Given that value is a chief metric in agile, and testing—particularly regression testing—is often viewed more as value preservation than creation, the approach in this thesis demonstrates that integrating user stories and business value can lead to notable advancements in agile regression testing efficiency.

ContributorsMondal, Aniruddha (Author) / Gary, Kevin KG (Thesis advisor) / Bansal, Srividya SB (Thesis advisor) / Tuzmen, Ayca AT (Committee member) / Arizona State University (Publisher)

Created2023

Sound & Sight and Phase Music: Two Transdisciplinary Courses

Description

From the earliest operatic spectacles to the towering Coachella-esque stages that dominate today’s music industry, there are no shortage of successful examples of artists combining music and visual art. The advancement of technology has created greater potential for these combinations today. Music curriculums that wish to produce well-rounded graduates capable…

From the earliest operatic spectacles to the towering Coachella-esque stages that dominate today’s music industry, there are no shortage of successful examples of artists combining music and visual art. The advancement of technology has created greater potential for these combinations today. Music curriculums that wish to produce well-rounded graduates capable of realizing this potential need to adapt to teach how to incorporate technology in performances. This paper presents two new courses that integrate technology with performance: Sound & Sight: A Practical Approach to Audio-Visual Performances; and Phase Music: An Introduction to Design and Fabrication. In Sound & Sight, students will learn how to “storyboard” pieces of music, realize that vision through object-oriented programming in Processing, and synchronize audio and visual elements in live performance settings using Ableton Live and Max. In Phase Music, students will be introduced to Phase Music, learn how to use Ableton Live to perform one of Steve Reich’s phase pieces or compose and perform their own piece of phase music, and design and build a custom Musical Instrument Digital Interface (MIDI) controller using Arduino, Adobe Illustrator, and Max. The document includes complete fifteen-week lesson plans for each course, which detail learning objectives, assignments, use of class time, original video coding tutorials, and lecture notes.

ContributorsNguyen, Julian Tuan Anh (Author) / Swartz, Jonathan (Thesis advisor) / Thorn, Seth (Thesis advisor) / Navarro, Fernanda (Committee member) / Arizona State University (Publisher)

Created2023

Exploring the Potential of Blockchain Technology for Improved Management and Safety of Underground Utilities

Description

The management of underground utilities is a complex and challenging task due to the uncertainty regarding the location of existing infrastructure. The lack of accurate information often leads to excavation-related damages, which pose a threat to public safety. In recent years, advanced underground utilities management systems have been developed to…

The management of underground utilities is a complex and challenging task due to the uncertainty regarding the location of existing infrastructure. The lack of accurate information often leads to excavation-related damages, which pose a threat to public safety. In recent years, advanced underground utilities management systems have been developed to improve the safety and efficiency of excavation work. This dissertation aims to explore the potential applications of blockchain technology in the management of underground utilities and reduction of excavation-related damage. The literature review provides an overview of the current systems for managing underground infrastructure, including Underground Infrastructure Management (UIM) and 811, and highlights the benefits of advanced underground utilities management systems in enhancing safety and efficiency on construction sites. The review also examines the limitations and challenges of the existing systems and identifies the opportunities for integrating blockchain technology to improve their performance. The proposed application involves the creation of a shared database of information about the location and condition of pipes, cables, and other underground infrastructure, which can be updated in real time by authorized users such as utility companies and government agencies. The use of blockchain technology can provide an additional layer of security and transparency to the system, ensuring the reliability and accuracy of the information. Contractors and excavation companies can access this information before commencing work, reducing the risk of accidental damage to underground utilities.

ContributorsAlnahari, Mohammed S (Author) / Ariaratnam, Samuel T (Thesis advisor) / El Asmar, Mounir (Committee member) / Czerniawski, Thomas (Committee member) / Arizona State University (Publisher)

Created2023

Elliptic Fourier Features for Robustness to Rotations and Translations in Neural Networks

Description

In image classification tasks, images are often corrupted by spatial transformationslike translations and rotations. In this work, I utilize an existing method that uses the Fourier series expansion to generate a rotation and translation invariant representation of closed contours found in sketches, aiming to attenuate the effects of distribution shift caused…

In image classification tasks, images are often corrupted by spatial transformationslike translations and rotations. In this work, I utilize an existing method that uses the Fourier series expansion to generate a rotation and translation invariant representation of closed contours found in sketches, aiming to attenuate the effects of distribution shift caused by the aforementioned transformations. I use this technique to transform input images into one of two different invariant representations, a Fourier series representation and a corrected raster image representation, prior to passing them to a neural network for classification. The architectures used include convolutional neutral networks (CNNs), multi-layer perceptrons (MLPs), and graph neural networks (GNNs). I compare the performance of this method to using data augmentation during training, the standard approach for addressing distribution shift, to see which strategy yields the best performance when evaluated against a test set with rotations and translations applied. I include experiments where the augmentations applied during training both do and do not accurately reflect the transformations encountered at test time. Additionally, I investigate the robustness of both approaches to high-frequency noise. In each experiment, I also compare training efficiency across models. I conduct experiments on three data sets, the MNIST handwritten digit dataset, a custom dataset (QD-3) consisting of three classes of geometric figures from the Quick, Draw! hand-drawn sketch dataset, and another custom dataset (QD-345) featuring sketches from all 345 classes found in Quick, Draw!. On the smaller problem space of MNIST and QD-3, the networks utilizing the Fourier-based technique to attenuate distribution shift perform competitively with the standard data augmentation strategy. On the more complex problem space of QD-345, the networks using the Fourier technique do not achieve the same test performance as correctly-applied data augmentation. However, they still outperform instances where train-time augmentations mis-predict test-time transformations, and outperform a naive baseline model where no strategy is used to attenuate distribution shift. Overall, this work provides evidence that strategies which attempt to directly mitigate distribution shift, rather than simply increasing the diversity of the training data, can be successful when certain conditions hold.

ContributorsWatson, Matthew (Author) / Yang, Yezhou YY (Thesis advisor) / Kerner, Hannah HK (Committee member) / Yang, Yingzhen YY (Committee member) / Arizona State University (Publisher)

Created2023

Semantic Information Extraction From Natural Language Using a Learning and Rule-Based Approach

Description

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from…

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from natural language is an essential step towards importing data into web ontologies as part of the linked open data cloud on the Semantic web. There have been a number of related techniques for extraction of triples from plain natural language text including but not limited to ClausIE, OLLIE, Reverb, and DeepEx. This proposed study aims to reduce the dependency on conventional machine learning models since they require training datasets, and the models are not easily customizable or explainable. By leveraging a context-free grammar (CFG) based model, this thesis aims to address some of these issues while minimizing the trade-offs on performance and accuracy. Furthermore, a deep-dive is conducted to analyze the strengths and limitations of the proposed approach.

ContributorsSingh, Varun (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by