Search Content

Diversity Promoting Online Sampling for Streaming Video Summarization

Description

Video summarization is gaining popularity in the technological culture, where positioning the mouse pointer on top of a video results in a quick overview of what the video is about. The algorithm usually selects frames in a time sequence through systematic sampling. Invariably, there are other applications like video surveillance,…

Video summarization is gaining popularity in the technological culture, where positioning the mouse pointer on top of a video results in a quick overview of what the video is about. The algorithm usually selects frames in a time sequence through systematic sampling. Invariably, there are other applications like video surveillance, web-based video surfing and video archival applications which can benefit from efficient and concise video summaries. In this project, we explored several clustering algorithms and how these can be combined and deconstructed to make summarization algorithm more efficient and relevant. We focused on two metrics to summarize: reducing error and redundancy in the summary. To reduce the error online k-means clustering algorithm was used; to reduce redundancy we applied two different methods: volume of convex hulls and the true diversity measure that is usually used in biological disciplines. The algorithm was efficient and computationally cost effective due to its online nature. The diversity maximization (or redundancy reduction) using technique of volume of convex hulls showed better results compared to other conventional methods on 50 different videos. For the true diversity measure, there has not been much work done on the nature of the measure in the context of video summarization. When we applied it, the algorithm stalled due to the true diversity saturating because of the inherent initialization present in the algorithm. We explored the nature of this measure to gain better understanding on how it can help to make summarization more intuitive and give the user a handle to customize the summary.

ContributorsMasroor, Ahnaf (Co-author) / Anirudh, Rushil (Co-author) / Turaga, Pavan (Thesis director) / Spanias, Andreas (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Diagrammatic Media / Subjectivity—Ecology—Event / Generating Organizational Techniques Through Creative Practice for a Post-Media Era

Description

This dissertation charts another path for Media Arts and Sciences (MAS) by generating institutional and creative research practices working against logics of integration and extraction. Drawing on activist, psychoanalyst, and philosopherFélix Guattari, I use institutional analysis to model how MAS came to inherit legacies of 1970s cyberlibertarianism and digital utopianism,…

This dissertation charts another path for Media Arts and Sciences (MAS) by generating institutional and creative research practices working against logics of integration and extraction. Drawing on activist, psychoanalyst, and philosopherFélix Guattari, I use institutional analysis to model how MAS came to inherit legacies of 1970s cyberlibertarianism and digital utopianism, which disavow politics in favor of technocratic interventions. I also identify the homogenizing and reactionary political and disciplinary consequences of MAS’s embrace of integrative modes of interdisciplinarity. Responding to integrative and technocratic MAS, I argue for reconsideration of politics in MAS through an approach to research, creation, and practice informed by Guattari’s concept of diagrammatics. Diagrammatics emphasizes the centrality of subjectivity in crises of mental, social, and environmental ecology. Through creative practice with computational media, art and technology, and social design, I work towards a practice-driven notion of diagrammatic media. I outline media diagrammatics as an intertwining of extensive engineering of concrete machines (artmaking, systems building, bookmaking, event making) and a speculative engineering of abstract machines (dreaming, conceptualizing, modeling, critiquing, analyzing, actualizing, virtualizing). In this sense, diagrammatics mediates mental and social individuations between a preindividual and an individuation. Diagrammatic media objects (e.g., a radiophonic aberrance in the electromagnetic field, a book, an autumn leaf) are lures for thinking-feeling embedded into a diagram. Diagrammatic media proposes we stop thinking in terms of computational media systems altogether and begin thinking about diagrammatic assemblages of concrete and abstract machines. A prototype of a tangible media-rich operating system called diagrammatic elucidates the complexities of the relationship between lateral thinking, moving, and feeling in learning and writing. I outline ways the prototype could be brought into a slow network that speculates on new modes of collaborative writing. Portacular Resonances, a radiophonic media installation, drives a Sci-Phi endeavor orbiting contemporary anxiety differently: as a clue for cosmic becoming spiraling out of the reactive affect of alienations and emotional capitalistic exploitation and into a potential collectivizing force. Finally, through the Guattarian concept of the machine, I ask how potential becomings are embedded through gathering events such as SloMoCo, a slow conference for artist researchers.

ContributorsJohnson, Garrett Laroy (Author) / Sha, Xin Wei (Thesis advisor) / Nocek, Adam J (Committee member) / Hayes, Lauren S (Committee member) / Arizona State University (Publisher)

Created2022

A Wearable Real-Time Auditory Feedback System to Improve Gait and Posture in Parkinson’s Disease

Description

Nearly one percent of the population over 65 years of age is living with Parkinson’s disease (PD) and this population worldwide is projected to be approximately nine million by 2030. PD is a progressive neurological disease characterized by both motor and cognitive impairments. One of the most serious challenges for…

Nearly one percent of the population over 65 years of age is living with Parkinson’s disease (PD) and this population worldwide is projected to be approximately nine million by 2030. PD is a progressive neurological disease characterized by both motor and cognitive impairments. One of the most serious challenges for an individual as the disease progresses is the increasing severity of gait and posture impairments since they result in debilitating conditions such as freezing of gait, increased likelihood of falls, and poor quality of life. Although dopaminergic therapy and deep brain stimulation are generally effective, they often fail to improve gait and posture deficits. Several recent studies have employed real-time feedback (RTF) of gait parameters to improve walking patterns in PD. In earlier work, results from the investigation of the effects of RTF of step length and back angle during treadmill walking demonstrated that people with PD could follow the feedback and utilize it to modulate movements favorably in a manner that transferred, at least acutely, to overground walking. In this work, recent advances in wearable technologies were leveraged to develop a wearable real-time feedback (WRTF) system that can monitor and evaluate movements and provide feedback during daily activities that involve overground walking. Specifically, this work addressed the challenges of obtaining accurate gait and posture measures from wearable sensors in real-time and providing auditory feedback on the calculated real-time measures for rehabilitation. An algorithm was developed to calculate gait and posture variables from wearable sensor measurements, which were then validated against gold-standard measurements. The WRTF system calculates these measures and provides auditory feedback in real-time. The WRTF system was evaluated as a potential rehabilitation tool for use by people with mild to moderate PD. Results from the study indicated that the system can accurately measure step length and back angle, and that subjects could respond to real-time auditory feedback in a manner that improved their step length and uprightness. These improvements were exhibited while using the system that provided feedback and were sustained in subsequent trials immediately thereafter in which subjects walked without receiving feedback from the system.

ContributorsMuthukrishnan, Niveditha (Author) / Abbas, James (Thesis advisor) / Krishnamurthi, Narayanan (Thesis advisor) / Shill, Holly A (Committee member) / Honeycutt, Claire (Committee member) / Turaga, Pavan (Committee member) / Ingalls, Todd (Committee member) / Arizona State University (Publisher)

Created2022

Effective Prior Selection and Knowledge Transfer for Deep Learning Applications

Description

In the recent years, deep learning has gained popularity for its ability to be utilized for several computer vision applications without any apriori knowledge. However, to introduce better inductive bias incorporating prior knowledge along with learnedinformation is critical. To that end, human intervention including choice of algorithm, data and model…

In the recent years, deep learning has gained popularity for its ability to be utilized for several computer vision applications without any apriori knowledge. However, to introduce better inductive bias incorporating prior knowledge along with learnedinformation is critical. To that end, human intervention including choice of algorithm, data and model in deep learning pipelines can be considered a prior. Thus, it is extremely important to select effective priors for a given application. This dissertation explores different aspects of a deep learning pipeline and provides insights as to why a particular prior is effective for the corresponding application. For analyzing the effect of model priors, three applications which involvesequential modelling problems i.e. Audio Source Separation, Clinical Time-series (Electroencephalogram (EEG)/Electrocardiogram(ECG)) based Differential Diagnosis and Global Horizontal Irradiance Forecasting for Photovoltaic (PV) Applications are chosen. For data priors, the application of image classification is chosen and a new algorithm titled,“Invenio” that can effectively use data semantics for both task and distribution shift scenarios is proposed. Finally, the effectiveness of a data selection prior is shown using the application of object tracking wherein the aim is to maintain the tracking performance while prolonging the battery usage of image sensors by optimizing the data selected for reading from the environment. For every research contribution of this dissertation, several empirical studies are conducted on benchmark datasets. The proposed design choices demonstrate significant performance improvements in comparison to the existing application specific state-of-the-art deep learning strategies.

ContributorsKatoch, Sameeksha (Author) / Spanias, Andreas (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Thiagarajan, Jayaraman J. (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2022

Knowledge Distillation with Geometric Approaches for Multimodal Data Analysis

Description

This thesis presents robust and novel solutions using knowledge distillation with geometric approaches and multimodal data that can address the current challenges in deep learning, providing a comprehensive understanding of the learning process involved in knowledge distillation. Deep learning has attained significant success in various applications, such as health and…

This thesis presents robust and novel solutions using knowledge distillation with geometric approaches and multimodal data that can address the current challenges in deep learning, providing a comprehensive understanding of the learning process involved in knowledge distillation. Deep learning has attained significant success in various applications, such as health and wellness promotion, smart homes, and intelligent surveillance. In general, stacking more layers or increasing the number of trainable parameters causes deep networks to exhibit improved performance. However, this causes the model to become large, resulting in an additional need for computing and power resources for training, storage, and deployment. These are the core challenges in incorporating such models into small devices with limited power and computational resources. In this thesis, robust solutions aimed at addressing the aforementioned challenges are presented. These proposed methodologies and algorithmic contributions enhance the performance and efficiency of deep learning models. The thesis encompasses a comprehensive exploration of knowledge distillation, an approach that holds promise for creating compact models from high-capacity ones, while preserving their performance. This exploration covers diverse datasets, including both time series and image data, shedding light on the pivotal role of augmentation methods in knowledge distillation. The effects of these methods are rigorously examined through empirical experiments. Furthermore, the study within this thesis delves into the efficient utilization of features derived from two different teacher models, each trained on dissimilar data representations, including time-series and image data. Through these investigations, I present novel approaches to knowledge distillation, leveraging geometric techniques for the analysis of multimodal data. These solutions not only address real-world challenges but also offer valuable insights and recommendations for modeling in new applications.

ContributorsJeon, Eunsom (Author) / Turaga, Pavan (Thesis advisor) / Li, Baoxin (Committee member) / Lee, Hyunglae (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2023

Building Reliable and Robust Deep Neural Networks with Improved Representations using Model Distillation and Deep Constraints

Description

This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other…

This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other words, the complex architecture and millions of parameters present challenges in finding the right balance between capturing useful patterns and avoiding noise in the data. To address these issues, this thesis explores novel solutions based on knowledge distillation, enabling the learning of robust representations. Leveraging the capabilities of large-scale networks, effective learning strategies are developed. Moreover, the limitations of dependency on external networks in the distillation process, which often require large-scale models, are effectively overcome by proposing a self-distillation strategy. The proposed approach empowers the model to generate high-level knowledge within a single network, pushing the boundaries of knowledge distillation. The effectiveness of the proposed method is not only demonstrated across diverse applications, including image classification, object detection, and semantic segmentation but also explored in practical considerations such as handling data scarcity and assessing the transferability of the model to other learning tasks. Another major obstacle hindering the development of reliable and robust models lies in their black-box nature, impeding clear insights into the contributions toward the final predictions and yielding uninterpretable feature representations. To address this challenge, this thesis introduces techniques that incorporate simple yet powerful deep constraints rooted in Riemannian geometry. These constraints confer geometric qualities upon the latent representation, thereby fostering a more interpretable and insightful representation. In addition to its primary focus on general tasks like image classification and activity recognition, this strategy offers significant benefits in real-world applications where data scarcity is prevalent. Moreover, its robustness in feature removal showcases its potential for edge applications. By successfully tackling these challenges, this research contributes to advancing the field of machine learning and provides a foundation for building more reliable and robust systems across various application domains.

ContributorsChoi, Hongjun (Author) / Turaga, Pavan (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Wenwen (Committee member) / Fazli, Pooyan (Committee member) / Arizona State University (Publisher)

Created2023

Robust and Controllable Generative Models by Leveraging Physics-Based, Probabilistic, and Geometric Methods

Description

Generative models are deep neural network-based models trained to learn the underlying distribution of a dataset. Once trained, these models can be used to sample novel data points from this distribution. Their impressive capabilities have been manifested in various generative tasks, encompassing areas like image-to-image translation, style transfer, image editing,…

Generative models are deep neural network-based models trained to learn the underlying distribution of a dataset. Once trained, these models can be used to sample novel data points from this distribution. Their impressive capabilities have been manifested in various generative tasks, encompassing areas like image-to-image translation, style transfer, image editing, and more. One notable application of generative models is data augmentation, aimed at expanding and diversifying the training dataset to augment the performance of deep learning models for a downstream task. Generative models can be used to create new samples similar to the original data but with different variations and properties that are difficult to capture with traditional data augmentation techniques. However, the quality, diversity, and controllability of the shape and structure of the generated samples from these models are often directly proportional to the size and diversity of the training dataset. A more extensive and diverse training dataset allows the generative model to capture overall structures present in the data and generate more diverse and realistic-looking samples. In this dissertation, I present innovative methods designed to enhance the robustness and controllability of generative models, drawing upon physics-based, probabilistic, and geometric techniques. These methods help improve the generalization and controllability of the generative model without necessarily relying on large training datasets. I enhance the robustness of generative models by integrating classical geometric moments for shape awareness and minimizing trainable parameters. Additionally, I employ non-parametric priors for the generative model's latent space through basic probability and optimization methods to improve the fidelity of interpolated images. I adopt a hybrid approach to address domain-specific challenges with limited data and controllability, combining physics-based rendering with generative models for more realistic results. These approaches are particularly relevant in industrial settings, where the training datasets are small and class imbalance is common. Through extensive experiments on various datasets, I demonstrate the effectiveness of the proposed methods over conventional approaches.

ContributorsSingh, Rajhans (Author) / Turaga, Pavan (Thesis advisor) / Jayasuriya, Suren (Committee member) / Berisha, Visar (Committee member) / Fazli, Pooyan (Committee member) / Arizona State University (Publisher)

Created2023

Modeling and Exploiting the Structure of Data via Meta-Features for Robust and Efficient Machine Learning

Description

In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide…

In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide which features to extract, which hypothesis class to condition on, which hyperparameters to select, and how to train the model. The design process is iterative with the developer trying different classifiers, feature sets, and hyper-parameters and using cross-validation to pick the model with the lowest error. As there are no guidelines for when to stop searching, developers can continue "optimizing" the model to the point where they begin to "fit to the dataset". These problems are amplified in the active learning setting, where the initial dataset may be unlabeled and label acquisition is costly. The aim in this dissertation is to develop algorithms that provide ML developers with additional information about the complexity of the underlying problem to guide downstream model development. I introduce the concept of "meta-features" - features extracted from a dataset that characterize the complexity of the underlying data generating process. In the context of classification, the complexity of the problem can be characterized by understanding two complementary meta-features: (a) the amount of overlap between classes, and (b) the geometry/topology of the decision boundary. Across three complementary works, I present a series of estimators for the meta-features that characterize overlap and geometry/topology of the decision boundary, and demonstrate how they can be used in algorithm development.

ContributorsLi, Weizhi (Author) / Berisha, Visar (Thesis advisor) / Dasarathy, Gautam (Thesis advisor) / Natesan Ramamurthy, Karthikeyan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2022

Addressing the Challenges of Automated Speech and Language Analysis for the Assessment of Mental Health and Functional Competency

Description

Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews…

Severe forms of mental illness, such as schizophrenia and bipolar disorder, are debilitating conditions that negatively impact an individual's quality of life. Additionally, they are often difficult and expensive to diagnose and manage, placing a large burden on society. Mental illness is typically diagnosed by the use of clinical interviews and a set of neuropsychiatric batteries; a key component of nearly all of these evaluations is some spoken language task. Clinicians have long used speech and language production as a proxy for neurological health, but most of these assessments are subjective in nature. Meanwhile, technological advancements in speech and natural language processing have grown exponentially over the past decade, increasing the capacity of computer models to assess particular aspects of speech and language. For this reason, many have seen an opportunity to leverage signal processing and machine learning applications to objectively assess clinical speech samples in order to automatically compute objective measures of neurological health. This document summarizes several contributions to expand upon this body of research. Mainly, there is still a large gap between the theoretical power of computational language models and their actual use in clinical applications. One of the largest concerns is the limited and inconsistent reliability of speech and language features used in models for assessing specific aspects of mental health; numerous methods may exist to measure the same or similar constructs and lead researchers to different conclusions in different studies. To address this, a novel measurement model based on a theoretical framework of speech production is used to motivate feature selection, while also performing a smoothing operation on features across several domains of interest. Then, these composite features are used to perform a much wider range of analyses than is typical of previous studies, looking at everything from diagnosis to functional competency assessments. Lastly, potential improvements to address practical implementation challenges associated with the use of speech and language technology in a real-world environment are investigated. The goal of this work is to demonstrate the ability of speech and language technology to aid clinical practitioners toward improvements in quality of life outcomes for their patients.

ContributorsVoleti, Rohit Nihar Uttam (Author) / Berisha, Visar (Thesis advisor) / Liss, Julie M (Thesis advisor) / Turaga, Pavan (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2022

Analyzing Multi-viewpoint Capabilities of Light Estimation Frameworks for Augmented Reality Using TCP/IP and UDP

Description

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s…

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s ARKit (Apple’s Augmented Reality Software Development Kit) are computationally expensive and have very slow refresh rates, which make them incompatible for dynamic environments and low-end mobile devices. Recently, there have been other illumination estimation frameworks such as GLEAM, Xihe, which aim at providing better illumination with faster refresh rates. GLEAM is an illumination estimation framework that understands the real scene by collecting pixel data from a reflecting spherical light probe. GLEAM uses this data to form environment cubemaps which are later mapped onto a reflection probe to generate illumination for AR objects. It is noticed that from a single viewpoint only one half of the light probe can be observed at a time which does not give complete information about the environment. This leads to the idea of having a multi-viewpoint estimation for better performance. This thesis work analyzes the multi-viewpoint capabilities of AR illumination frameworks that use physical light probes to understand the environment. The current work builds networking using TCP and UDP protocols on GLEAM. This thesis work also documents how processor load sharing has been done while networking devices and how that benefits the performance of GLEAM on mobile devices. Some enhancements using multi-threading have also been made to the already existing GLEAM model to improve its performance.

ContributorsGurram, Sahithi (Author) / LiKamWa, Robert (Thesis advisor) / Jayasuriya, Suren (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2022