Matching Items (29)
Filtering by

Clear all filters

151383-Thumbnail Image.png
Description
Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera -

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera - Kinect is presented. We address this problem by first conducting a systematic analysis of the usability of Kinect for motion analysis in stroke rehabilitation. Then a hybrid upper body tracking approach is proposed which combines off-the-shelf skeleton tracking with a novel depth-fused mean shift tracking method. We proposed several kinematic features reliably extracted from the proposed inexpensive and portable motion capture system and classifiers that correlate torso movement to clinical measures of unimpaired and impaired. Experiment results show that the proposed sensing and analysis works reliably on measuring torso movement quality and is promising for end-point tracking. The system is currently being deployed for large-scale evaluations.
ContributorsDu, Tingfang (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Rikakis, Thanassis (Committee member) / Arizona State University (Publisher)
Created2012
190759-Thumbnail Image.png
Description
This thesis presents robust and novel solutions using knowledge distillation with geometric approaches and multimodal data that can address the current challenges in deep learning, providing a comprehensive understanding of the learning process involved in knowledge distillation. Deep learning has attained significant success in various applications, such as health and

This thesis presents robust and novel solutions using knowledge distillation with geometric approaches and multimodal data that can address the current challenges in deep learning, providing a comprehensive understanding of the learning process involved in knowledge distillation. Deep learning has attained significant success in various applications, such as health and wellness promotion, smart homes, and intelligent surveillance. In general, stacking more layers or increasing the number of trainable parameters causes deep networks to exhibit improved performance. However, this causes the model to become large, resulting in an additional need for computing and power resources for training, storage, and deployment. These are the core challenges in incorporating such models into small devices with limited power and computational resources. In this thesis, robust solutions aimed at addressing the aforementioned challenges are presented. These proposed methodologies and algorithmic contributions enhance the performance and efficiency of deep learning models. The thesis encompasses a comprehensive exploration of knowledge distillation, an approach that holds promise for creating compact models from high-capacity ones, while preserving their performance. This exploration covers diverse datasets, including both time series and image data, shedding light on the pivotal role of augmentation methods in knowledge distillation. The effects of these methods are rigorously examined through empirical experiments. Furthermore, the study within this thesis delves into the efficient utilization of features derived from two different teacher models, each trained on dissimilar data representations, including time-series and image data. Through these investigations, I present novel approaches to knowledge distillation, leveraging geometric techniques for the analysis of multimodal data. These solutions not only address real-world challenges but also offer valuable insights and recommendations for modeling in new applications.
ContributorsJeon, Eunsom (Author) / Turaga, Pavan (Thesis advisor) / Li, Baoxin (Committee member) / Lee, Hyunglae (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2023
189297-Thumbnail Image.png
Description
This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other

This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other words, the complex architecture and millions of parameters present challenges in finding the right balance between capturing useful patterns and avoiding noise in the data. To address these issues, this thesis explores novel solutions based on knowledge distillation, enabling the learning of robust representations. Leveraging the capabilities of large-scale networks, effective learning strategies are developed. Moreover, the limitations of dependency on external networks in the distillation process, which often require large-scale models, are effectively overcome by proposing a self-distillation strategy. The proposed approach empowers the model to generate high-level knowledge within a single network, pushing the boundaries of knowledge distillation. The effectiveness of the proposed method is not only demonstrated across diverse applications, including image classification, object detection, and semantic segmentation but also explored in practical considerations such as handling data scarcity and assessing the transferability of the model to other learning tasks. Another major obstacle hindering the development of reliable and robust models lies in their black-box nature, impeding clear insights into the contributions toward the final predictions and yielding uninterpretable feature representations. To address this challenge, this thesis introduces techniques that incorporate simple yet powerful deep constraints rooted in Riemannian geometry. These constraints confer geometric qualities upon the latent representation, thereby fostering a more interpretable and insightful representation. In addition to its primary focus on general tasks like image classification and activity recognition, this strategy offers significant benefits in real-world applications where data scarcity is prevalent. Moreover, its robustness in feature removal showcases its potential for edge applications. By successfully tackling these challenges, this research contributes to advancing the field of machine learning and provides a foundation for building more reliable and robust systems across various application domains.
ContributorsChoi, Hongjun (Author) / Turaga, Pavan (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Wenwen (Committee member) / Fazli, Pooyan (Committee member) / Arizona State University (Publisher)
Created2023
171844-Thumbnail Image.png
Description
Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews

Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews and a set of neuropsychiatric batteries; a key component of nearly all of these evaluations is some spoken language task. Clinicians have long used speech and language production as a proxy for neurological health, but most of these assessments are subjective in nature. Meanwhile, technological advancements in speech and natural language processing have grown exponentially over the past decade, increasing the capacity of computer models to assess particular aspects of speech and language. For this reason, many have seen an opportunity to leverage signal processing and machine learning applications to objectively assess clinical speech samples in order to automatically compute objective measures of neurological health. This document summarizes several contributions to expand upon this body of research. Mainly, there is still a large gap between the theoretical power of computational language models and their actual use in clinical applications. One of the largest concerns is the limited and inconsistent reliability of speech and language features used in models for assessing specific aspects of mental health; numerous methods may exist to measure the same or similar constructs and lead researchers to different conclusions in different studies. To address this, a novel measurement model based on a theoretical framework of speech production is used to motivate feature selection, while also performing a smoothing operation on features across several domains of interest. Then, these composite features are used to perform a much wider range of analyses than is typical of previous studies, looking at everything from diagnosis to functional competency assessments. Lastly, potential improvements to address practical implementation challenges associated with the use of speech and language technology in a real-world environment are investigated. The goal of this work is to demonstrate the ability of speech and language technology to aid clinical practitioners toward improvements in quality of life outcomes for their patients.
ContributorsVoleti, Rohit Nihar Uttam (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie M (Thesis advisor) / Turaga, Pavan (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2022
Description
Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s ARKit (Apple’s Augmented Reality Software Development Kit) are computationally expensive and have very slow refresh rates, which make them incompatible for dynamic environments and low-end mobile devices. Recently, there have been other illumination estimation frameworks such as GLEAM, Xihe, which aim at providing better illumination with faster refresh rates. GLEAM is an illumination estimation framework that understands the real scene by collecting pixel data from a reflecting spherical light probe. GLEAM uses this data to form environment cubemaps which are later mapped onto a reflection probe to generate illumination for AR objects. It is noticed that from a single viewpoint only one half of the light probe can be observed at a time which does not give complete information about the environment. This leads to the idea of having a multi-viewpoint estimation for better performance. This thesis work analyzes the multi-viewpoint capabilities of AR illumination frameworks that use physical light probes to understand the environment. The current work builds networking using TCP and UDP protocols on GLEAM. This thesis work also documents how processor load sharing has been done while networking devices and how that benefits the performance of GLEAM on mobile devices. Some enhancements using multi-threading have also been made to the already existing GLEAM model to improve its performance.
ContributorsGurram, Sahithi (Author) / LiKamWa, Robert (Thesis advisor) / Jayasuriya, Suren (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2022
161945-Thumbnail Image.png
Description
Statistical Shape Modeling is widely used to study the morphometrics of deformable objects in computer vision and biomedical studies. There are mainly two viewpoints to understand the shapes. On one hand, the outer surface of the shape can be taken as a two-dimensional embedding in space. On the other hand,

Statistical Shape Modeling is widely used to study the morphometrics of deformable objects in computer vision and biomedical studies. There are mainly two viewpoints to understand the shapes. On one hand, the outer surface of the shape can be taken as a two-dimensional embedding in space. On the other hand, the outer surface along with its enclosed internal volume can be taken as a three-dimensional embedding of interests. Most studies focus on the surface-based perspective by leveraging the intrinsic features on the tangent plane. But a two-dimensional model may fail to fully represent the realistic properties of shapes with both intrinsic and extrinsic properties. In this thesis, severalStochastic Partial Differential Equations (SPDEs) are thoroughly investigated and several methods are originated from these SPDEs to try to solve the problem of both two-dimensional and three-dimensional shape analyses. The unique physical meanings of these SPDEs inspired the findings of features, shape descriptors, metrics, and kernels in this series of works. Initially, the data generation of high-dimensional shapes, here, the tetrahedral meshes, is introduced. The cerebral cortex is taken as the study target and an automatic pipeline of generating the gray matter tetrahedral mesh is introduced. Then, a discretized Laplace-Beltrami operator (LBO) and a Hamiltonian operator (HO) in tetrahedral domain with Finite Element Method (FEM) are derived. Two high-dimensional shape descriptors are defined based on the solution of the heat equation and Schrödinger’s equation. Considering the fact that high-dimensional shape models usually contain massive redundancies, and the demands on effective landmarks in many applications, a Gaussian process landmarking on tetrahedral meshes is further studied. A SIWKS-based metric space is used to define a geometry-aware Gaussian process. The study of the periodic potential diffusion process further inspired the idea of a new kernel call the geometry-aware convolutional kernel. A series of Bayesian learning methods are then introduced to tackle the problem of shape retrieval and classification. Experiments of every single item are demonstrated. From the popular SPDE such as the heat equation and Schrödinger’s equation to the general potential diffusion equation and the specific periodic potential diffusion equation, it clearly shows that classical SPDEs play an important role in discovering new features, metrics, shape descriptors and kernels. I hope this thesis could be an example of using interdisciplinary knowledge to solve problems.
ContributorsFan, Yonghui (Author) / Wang, Yalin (Thesis advisor) / Lepore, Natasha (Committee member) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2021
187831-Thumbnail Image.png
Description
This project explores the potential for the accurate prediction of basketball shooting posture with machine learning (ML) prediction algorithms, using the data collected by an Internet of Things (IoT) based motion capture system. Specifically, this question is addressed in the research - Can I develop an ML model to generalize

This project explores the potential for the accurate prediction of basketball shooting posture with machine learning (ML) prediction algorithms, using the data collected by an Internet of Things (IoT) based motion capture system. Specifically, this question is addressed in the research - Can I develop an ML model to generalize a decent basketball shot pattern? - by introducing a supervised learning paradigm, where the ML method takes acceleration attributes to predict the basketball shot efficiency. The solution presented in this study considers motion capture devices configuration on the right upper limb with a sole motion sensor made by BNO080 and ESP32 attached on the right wrist, right forearm, and right shoulder, respectively, By observing the rate of speed changing in the shooting movement and comparing their performance, ML models that apply K-Nearest Neighbor, and Decision Tree algorithm, conclude the best range of acceleration that different spots on the arm should implement.
ContributorsLiang, Chengxu (Author) / Ingalls, Todd (Thesis advisor) / Turaga, Pavan (Thesis advisor) / De Luca, Gennaro (Committee member) / Arizona State University (Publisher)
Created2023
191748-Thumbnail Image.png
Description
Millimeter-wave (mmWave) and sub-terahertz (sub-THz) systems aim to utilize the large bandwidth available at these frequencies. This has the potential to enable several future applications that require high data rates, such as autonomous vehicles and digital twins. These systems, however, have several challenges that need to be addressed to realize

Millimeter-wave (mmWave) and sub-terahertz (sub-THz) systems aim to utilize the large bandwidth available at these frequencies. This has the potential to enable several future applications that require high data rates, such as autonomous vehicles and digital twins. These systems, however, have several challenges that need to be addressed to realize their gains in practice. First, they need to deploy large antenna arrays and use narrow beams to guarantee sufficient receive power. Adjusting the narrow beams of the large antenna arrays incurs massive beam training overhead. Second, the sensitivity to blockages is a key challenge for mmWave and THz networks. Since these networks mainly rely on line-of-sight (LOS) links, sudden link blockages highly threaten the reliability of the networks. Further, when the LOS link is blocked, the network typically needs to hand off the user to another LOS basestation, which may incur critical time latency, especially if a search over a large codebook of narrow beams is needed. A promising way to tackle both these challenges lies in leveraging additional side information such as visual, LiDAR, radar, and position data. These sensors provide rich information about the wireless environment, which can be utilized for fast beam and blockage prediction. This dissertation presents a machine-learning framework for sensing-aided beam and blockage prediction. In particular, for beam prediction, this work proposes to utilize visual and positional data to predict the optimal beam indices. For the first time, this work investigates the sensing-aided beam prediction task in a real-world vehicle-to-infrastructure and drone communication scenario. Similarly, for blockage prediction, this dissertation proposes a multi-modal wireless communication solution that utilizes bimodal machine learning to perform proactive blockage prediction and user hand-off. Evaluations on both real-world and synthetic datasets illustrate the promising performance of the proposed solutions and highlight their potential for next-generation communication and sensing systems.
ContributorsCharan, Gouranga (Author) / Alkhateeb, Ahmed (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Turaga, Pavan (Committee member) / Michelusi, Nicolò (Committee member) / Arizona State University (Publisher)
Created2024
Description
Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power.

We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.
ContributorsGunnam, Sridhar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2018
156747-Thumbnail Image.png
Description
Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements.

First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance.



Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model.

Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.
ContributorsDodge, Samuel Fuller (Author) / Karam, Lina (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2018