Matching Items (11)
Filtering by

Clear all filters

156587-Thumbnail Image.png
Description
Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance.

Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction.

We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems.

In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph.
ContributorsSong, Huan (Author) / Spanias, Andreas (Thesis advisor) / Thiagarajan, Jayaraman (Committee member) / Berisha, Visar (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)
Created2018
156610-Thumbnail Image.png
Description
Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.

This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs.
ContributorsSrivastava, Gaurav (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2018
156936-Thumbnail Image.png
Description
In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network.

In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly.

Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%.
ContributorsXiong, Yan (Author) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Thesis advisor) / Weng, Yang (Committee member) / Arizona State University (Publisher)
Created2018
155155-Thumbnail Image.png
Description
Compressed sensing (CS) is a novel approach to collecting and analyzing data of all types. By exploiting prior knowledge of the compressibility of many naturally-occurring signals, specially designed sensors can dramatically undersample the data of interest and still achieve high performance. However, the generated data are pseudorandomly mixed and

Compressed sensing (CS) is a novel approach to collecting and analyzing data of all types. By exploiting prior knowledge of the compressibility of many naturally-occurring signals, specially designed sensors can dramatically undersample the data of interest and still achieve high performance. However, the generated data are pseudorandomly mixed and must be processed before use. In this work, a model of a single-pixel compressive video camera is used to explore the problems of performing inference based on these undersampled measurements. Three broad types of inference from CS measurements are considered: recovery of video frames, target tracking, and object classification/detection. Potential applications include automated surveillance, autonomous navigation, and medical imaging and diagnosis.



Recovery of CS video frames is far more complex than still images, which are known to be (approximately) sparse in a linear basis such as the discrete cosine transform. By combining sparsity of individual frames with an optical flow-based model of inter-frame dependence, the perceptual quality and peak signal to noise ratio (PSNR) of reconstructed frames is improved. The efficacy of this approach is demonstrated for the cases of \textit{a priori} known image motion and unknown but constant image-wide motion.



Although video sequences can be reconstructed from CS measurements, the process is computationally costly. In autonomous systems, this reconstruction step is unnecessary if higher-level conclusions can be drawn directly from the CS data. A tracking algorithm is described and evaluated which can hold target vehicles at very high levels of compression where reconstruction of video frames fails. The algorithm performs tracking by detection using a particle filter with likelihood given by a maximum average correlation height (MACH) target template model.



Motivated by possible improvements over the MACH filter-based likelihood estimation of the tracking algorithm, the application of deep learning models to detection and classification of compressively sensed images is explored. In tests, a Deep Boltzmann Machine trained on CS measurements outperforms a naive reconstruct-first approach.



Taken together, progress in these three areas of CS inference has the potential to lower system cost and improve performance, opening up new applications of CS video cameras.
ContributorsBraun, Henry Carlton (Author) / Turaga, Pavan K (Thesis advisor) / Spanias, Andreas S (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2016
148040-Thumbnail Image.png
Description

Purpose: This qualitative research aimed to create a developmentally and gender-appropriate game-based intervention to promote Human Papillomavirus (HPV) vaccination in adolescents. <br/>Background: Ranking as the most common sexually transmitted infection, about 80 million Americans are currently infected by HPV, and it continues to increase with an estimated 14 million new

Purpose: This qualitative research aimed to create a developmentally and gender-appropriate game-based intervention to promote Human Papillomavirus (HPV) vaccination in adolescents. <br/>Background: Ranking as the most common sexually transmitted infection, about 80 million Americans are currently infected by HPV, and it continues to increase with an estimated 14 million new cases yearly. Certain types of HPV have been significantly associated with cervical, vaginal, and vulvar cancers in women; penile cancers in men; and oropharyngeal and anal cancers in both men and women. Despite HPV vaccination being one of the most effective methods in preventing HPV-associated cancers, vaccination rates remain suboptimal in adolescents. Game-based intervention, a novel medium that is popular with adolescents, has been shown to be effective in promoting health behaviors. <br/>Methods: Sample/Sampling. We used purposeful sampling to recruit eight adolescent-parent dyads (N = 16) which represented both sexes (4 boys, 4 girls) and different racial/ethnic groups (White, Black, Latino, Asian American) in the United States. The inclusion criteria for the dyads were: (1) a child aged 11-14 years and his/her parent, and (2) ability to speak, read, write, and understand English. Procedure. After eligible families consented to their participation, semi-structured interviews (each 60-90 minutes long) were conducted with each adolescent-parent dyad in a quiet and private room. Each dyad received $50 to acknowledge their time and effort. Measure. The interview questions consisted of two parts: (a) those related to game design, functioning, and feasibility of implementation; (b) those related to theoretical constructs of the Health Belief Model (HBM) and the Theory of Planned Behavior (TPB). Data analysis. The interviews were audio-recorded with permission and manually transcribed into textual data. Two researchers confirmed the verbatim transcription. We use pre-developed codes to identify each participant’s responses and organize data and develop themes based on the HBM and TPB constructs. After the analysis was completed, three researchers in the team reviewed the results and discussed the discrepancies until a consensus is reached.<br/>Results: The findings suggested that the most common motivating factors for adolescents’ HPV vaccination were its effectiveness, benefits, convenience, affordable cost, reminders via text, and recommendation by a health care provider. Regarding the content included in the HPV game, participants suggested including information about who and when should receive the vaccine, what is HPV and the vaccination, what are the consequences if infected, the side effects of the vaccine, and where to receive the vaccine. The preferred game design elements were: 15 minutes long, stories about fighting or action, option to choose characters/avatars, motivating factors (i.e., rewards such as allowing users to advance levels and receive coins when correctly answering questions), use of a portable electronic device (e.g., tablet) to deliver the education. Participants were open to multiplayer function which assists in a facilitated conversation about HPV and the HPV vaccine. Overall, the participants concluded enthusiasm for an interactive yet engaging game-based intervention to learn about the HPV vaccine with the goal to increase HPV vaccination in adolescents. <br/>Implications: Tailored educational games have the potential to decrease the stigma of HPV and HPV vaccination, increasing communication between the adolescent, parent, and healthcare provider, as well as increase the overall HPV vaccination rate.

ContributorsBeaman, Abigail Marie (Author) / Chen, Angela Chia-Chen (Thesis director) / Amresh, Ashish (Committee member) / Edson College of Nursing and Health Innovation (Contributor) / Barrett, The Honors College (Contributor)
Created2021-05
168287-Thumbnail Image.png
Description
Dealing with relational data structures is central to a wide-range of applications including social networks, epidemic modeling, molecular chemistry, medicine, energy distribution, and transportation. Machine learning models that can exploit the inherent structural/relational bias in the graph structured data have gained prominence in recent times. A recurring idea that appears

Dealing with relational data structures is central to a wide-range of applications including social networks, epidemic modeling, molecular chemistry, medicine, energy distribution, and transportation. Machine learning models that can exploit the inherent structural/relational bias in the graph structured data have gained prominence in recent times. A recurring idea that appears in all approaches is to encode the nodes in the graph (or the entire graph) as low-dimensional vectors also known as embeddings, prior to carrying out downstream task-specific learning. It is crucial to eliminate hand-crafted features and instead directly incorporate the structural inductive bias into the deep learning architectures. In this dissertation, deep learning models that directly operate on graph structured data are proposed for effective representation learning. A literature review on existing graph representation learning is provided in the beginning of the dissertation. The primary focus of dissertation is on building novel graph neural network architectures that are robust against adversarial attacks. The proposed graph neural network models are extended to multiplex graphs (heterogeneous graphs). Finally, a relational neural network model is proposed to operate on a human structural connectome. For every research contribution of this dissertation, several empirical studies are conducted on benchmark datasets. The proposed graph neural network models, approaches, and architectures demonstrate significant performance improvements in comparison to the existing state-of-the-art graph embedding strategies.
ContributorsShanthamallu, Uday Shankar (Author) / Spanias, Andreas (Thesis advisor) / Thiagarajan, Jayaraman J (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)
Created2021
187456-Thumbnail Image.png
Description
The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target

The past decade witnessed the success of deep learning models in various applications of computer vision and natural language processing. This success can be predominantly attributed to the (i) availability of large amounts of training data; (ii) access of domain aware knowledge; (iii) i.i.d assumption between the train and target distributions and (iv) belief on existing metrics as reliable indicators of performance. When any of these assumptions are violated, the models exhibit brittleness producing adversely varied behavior. This dissertation focuses on methods for accurate model design and characterization that enhance process reliability when certain assumptions are not met. With the need to safely adopt artificial intelligence tools in practice, it is vital to build reliable failure detectors that indicate regimes where the model must not be invoked. To that end, an error predictor trained with a self-calibration objective is developed to estimate loss consistent with the underlying model. The properties of the error predictor are described and their utility in supporting introspection via feature importances and counterfactual explanations is elucidated. While such an approach can signal data regime changes, it is critical to calibrate models using regimes of inlier (training) and outlier data to prevent under- and over-generalization in models i.e., incorrectly identifying inliers as outliers and vice-versa. By identifying the space for specifying inliers and outliers, an anomaly detector that can effectively flag data of varying semantic complexities in medical imaging is next developed. Uncertainty quantification in deep learning models involves identifying sources of failure and characterizing model confidence to enable actionability. A training strategy is developed that allows the accurate estimation of model uncertainties and its benefits are demonstrated for active learning and generalization gap prediction. This helps identify insufficiently sampled regimes and representation insufficiency in models. In addition, the task of deep inversion under data scarce scenarios is considered, which in practice requires a prior to control the optimization. By identifying limitations in existing work, data priors powered by generative models and deep model priors are designed for audio restoration. With relevant empirical studies on a variety of benchmarks, the need for such design strategies is demonstrated.
ContributorsNarayanaswamy, Vivek Sivaraman (Author) / Spanias, Andreas (Thesis advisor) / J. Thiagarajan, Jayaraman (Committee member) / Berisha, Visar (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)
Created2023
158318-Thumbnail Image.png
Description
Speech is known to serve as an early indicator of neurological decline, particularly in motor diseases. There is significant interest in developing automated, objective signal analytics that detect clinically-relevant changes and in evaluating these algorithms against the existing gold-standard: perceptual evaluation by trained speech and language pathologists. Hypernasality, the result

Speech is known to serve as an early indicator of neurological decline, particularly in motor diseases. There is significant interest in developing automated, objective signal analytics that detect clinically-relevant changes and in evaluating these algorithms against the existing gold-standard: perceptual evaluation by trained speech and language pathologists. Hypernasality, the result of poor control of the velopharyngeal flap---the soft palate regulating airflow between the oral and nasal cavities---is one such speech symptom of interest, as precise velopharyngeal control is difficult to achieve under neuromuscular disorders. However, a host of co-modulating variables give hypernasal speech a complex and highly variable acoustic signature, making it difficult for skilled clinicians to assess and for automated systems to evaluate. Previous work in rating hypernasality from speech relies on either engineered features based on statistical signal processing or machine learning models trained end-to-end on clinical ratings of disordered speech examples. Engineered features often fail to capture the complex acoustic patterns associated with hypernasality, while end-to-end methods tend to overfit to the small datasets on which they are trained. In this thesis, I present a set of acoustic features, models, and strategies for characterizing hypernasality in dysarthric speech that split the difference between these two approaches, with the aim of capturing the complex perceptual character of hypernasality without overfitting to the small datasets available. The features are based on acoustic models trained on a large corpus of healthy speech, integrating expert knowledge to capture known perceptual characteristics of hypernasal speech. They are then used in relatively simple linear models to predict clinician hypernasality scores. These simple models are robust, generalizing across diseases and outperforming comprehensive set of baselines in accuracy and correlation. This novel approach represents a new state-of-the-art in objective hypernasality assessment.
ContributorsSaxon, Michael Stephen (Author) / Berisha, Visar (Thesis advisor) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2020
164815-Thumbnail Image.png
Description

This paper serves to report the research performed towards detecting PD and the effects of medication through the use of machine learning and finger tapping data collected through mobile devices. The primary objective for this research is to prototype a PD classification model and a medication classification model that predict

This paper serves to report the research performed towards detecting PD and the effects of medication through the use of machine learning and finger tapping data collected through mobile devices. The primary objective for this research is to prototype a PD classification model and a medication classification model that predict the following: the individual’s disease status and the medication intake time relative to performing the finger-tapping activity, respectively.

ContributorsGin, Taylor (Author) / McCarthy, Alexandra (Co-author) / Berisha, Visar (Thesis director) / Baumann, Alicia (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)
Created2022-05
164816-Thumbnail Image.png
Description

This paper serves to report the research performed towards detecting PD and the effects of medication through the use of machine learning and finger tapping data collected through mobile devices. The primary objective for this research is to prototype a PD classification model and a medication classification model that predict

This paper serves to report the research performed towards detecting PD and the effects of medication through the use of machine learning and finger tapping data collected through mobile devices. The primary objective for this research is to prototype a PD classification model and a medication classification model that predict the following: the individual’s disease status and the medication intake time relative to performing the finger-tapping activity, respectively.

ContributorsMcCarthy, Alexandra (Author) / Gin, Taylor (Co-author) / Berisha, Visar (Thesis director) / Baumann, Alicia (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)
Created2022-05