Search Content

Data Representation for Predicting Harmonic Clusters with LSTM

Description

The purpose of this project is to create a useful tool for musicians that utilizes the harmonic content of their playing to recommend new, relevant chords to play. This is done by training various Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) on the lead sheets of 100 different jazz…

The purpose of this project is to create a useful tool for musicians that utilizes the harmonic content of their playing to recommend new, relevant chords to play. This is done by training various Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) on the lead sheets of 100 different jazz standards. A total of 200 unique datasets were produced and tested, resulting in the prediction of nearly 51 million chords. A note-prediction accuracy of 82.1% and a chord-prediction accuracy of 34.5% were achieved across all datasets. Methods of data representation that were rooted in valid music theory frameworks were found to increase the efficacy of harmonic prediction by up to 6%. Optimal LSTM input sizes were also determined for each method of data representation.

ContributorsRangaswami, Sriram Madhav (Author) / Lalitha, Sankar (Thesis director) / Jayasuriya, Suren (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Thermal noise analysis of near-sensor image processing

Description

Commonly, image processing is handled on a CPU that is connected to the image sensor by a wire. In these far-sensor processing architectures, there is energy loss associated with sending data across an interconnect from the sensor to the CPU. In an effort to increase energy efficiency, near-sensor processing architectures…

Commonly, image processing is handled on a CPU that is connected to the image sensor by a wire. In these far-sensor processing architectures, there is energy loss associated with sending data across an interconnect from the sensor to the CPU. In an effort to increase energy efficiency, near-sensor processing architectures have been developed, in which the sensor and processor are stacked directly on top of each other. This reduces energy loss associated with sending data off-sensor. However, processing near the image sensor causes the sensor to heat up. Reports of thermal noise in near-sensor processing architectures motivated us to study how temperature affects image quality on a commercial image sensor and how thermal noise affects computer vision task accuracy. We analyzed image noise across nine different temperatures and three sensor configurations to determine how image noise responds to an increase in temperature. Ultimately, our team used this information, along with transient analysis of a stacked image sensor’s thermal behavior, to advise thermal management strategies that leverage the benefits of near-sensor processing and prevent accuracy loss at problematic temperatures.

ContributorsJones, Britton Steele (Author) / LiKamWa, Robert (Thesis director) / Jayasuriya, Suren (Committee member) / Watts College of Public Service & Community Solut (Contributor) / Electrical Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2020-12

Investigating Methods of Achieving Photorealistic Materials for Augmented Reality Applications on Mobile Devices

Description

As the prevalence of augmented reality (AR) technology continues to increase, so too have methods for improving the appearance and behavior of computer-generated objects. This is especially significant as AR applications now expand to territories outside of the entertainment sphere and can be utilized for numerous purposes encompassing but…

As the prevalence of augmented reality (AR) technology continues to increase, so too have methods for improving the appearance and behavior of computer-generated objects. This is especially significant as AR applications now expand to territories outside of the entertainment sphere and can be utilized for numerous purposes encompassing but not limited to education, specialized occupational training, retail & online shopping, design, marketing, and manufacturing. Due to the nature of AR technology, where computer-generated objects are being placed into a real-world environment, a decision has to be made regarding the visual connection between the tangible and the intangible. Should the objects blend seamlessly into their environment or purposefully stand out? It is not purely a stylistic choice. A developer must consider how their application will be used — in many instances an optimal user experience is facilitated by mimicking the real world as closely as possible; even simpler applications, such as those built primarily for mobile devices, can benefit from realistic AR. The struggle here lies in creating an immersive user experience that is not reliant on computationally-expensive graphics or heavy-duty models. The research contained in this thesis provides several ways for achieving photorealistic rendering in AR applications using a range of techniques, all of which are supported on mobile devices. These methods can be employed within the Unity Game Engine and incorporate shaders, render pipelines, node-based editors, post-processing, and light estimation.

ContributorsSchanberger, Schuyler Catherine (Author) / LiKamWa, Robert (Thesis director) / Jayasuriya, Suren (Committee member) / Arts, Media and Engineering Sch T (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Portable and Low-Cost Detection Platform for Hepatitis B Virus Infections

Description

Approximately 248 million people in the world are currently living with chronic Hepatitis B virus (HBV) infection. HBV and HCV infections are the primary cause of liver diseases such as cirrhosis and hepatocellular carcinomas in the world with an estimated 1.4 million deaths annually. HBV in the Republic of Peru…

Approximately 248 million people in the world are currently living with chronic Hepatitis B virus (HBV) infection. HBV and HCV infections are the primary cause of liver diseases such as cirrhosis and hepatocellular carcinomas in the world with an estimated 1.4 million deaths annually. HBV in the Republic of Peru was used as a case study of an emerging and rapidly spreading disease in a developing nation. Wherein, clinical diagnosis of HBV infections in at-risk communities such the Amazon Region and the Andes Mountains are challenging due to a myriad of reasons. High prices of clinical diagnosis and limited access to treatment are alone the most significant deterrent for individuals living in at-risk communities to get the much need help. Additionally, limited testing facilities, lack of adequate testing policies or national guidelines, poor laboratory capacity, resource-limited settings, geographical isolation, and public mistrust are among the chief reasons for low HBV testing. Although, preventative vaccination programs deployed by the Peruvian health officials have reduced the number of infected individuals by year and region. To significantly reduce or eradicate HBV in hyperendemic areas and countries such as Peru, preventative clinical diagnosis and vaccination programs are an absolute necessity. Consequently, the need for a portable low-priced diagnostic platform for the detection of HBV and other diseases is substantial and urgent not only in Peru but worldwide. Some of these concerns were addressed by designing a low-cost, rapid detection platform. In that, an immunosignature technology (IMST) slide used to test for reactivity against the presence of antibodies in the serum-sample was used to test for picture resolution and clarity. IMST slides were scanned using a smartphone camera placed on top of the designed device housing a circuit of 32 LED lights at 647 nm, an optical magnifier at 15X, and a linear polarizing film sheet. Tow 9V batteries powered the scanning device LED circuit ensuring enough lighting. The resulting pictures from the first prototype showed that by lighting the device at 647 nm and using a smartphone camera, the camera could capture high-resolution images. These results conclusively indicate that with any modern smartphone camera, a small box lighted to 647 nm, and optical magnifier; a powerful and expensive laboratory scanning machine can be replaced by another that is inexpensive, portable and ready to use anywhere.

ContributorsMakimaa, Heyde (Author) / Holechek, Susan (Thesis director) / Stafford, Phillip (Committee member) / Jayasuriya, Suren (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Video Captioning with Commonsense Knowledge Anchors

Description

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents…

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.

ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)

Created2022

Software-Defined Imaging for Embedded Computer Vision: Adaptive Subsampling and Event-based Visual Navigation

Description

Huge advancements have been made over the years in terms of modern image-sensing hardware and visual computing algorithms (e.g. computer vision, image processing, computational photography). However, to this day, there still exists a current gap between the hardware and software design in an imaging system, which silos one research domain…

Huge advancements have been made over the years in terms of modern image-sensing hardware and visual computing algorithms (e.g. computer vision, image processing, computational photography). However, to this day, there still exists a current gap between the hardware and software design in an imaging system, which silos one research domain from another. Bridging this gap is the key to unlocking new visual computing capabilities for end applications in commercial photography, industrial inspection, and robotics. This thesis explores avenues where hardware-software co-design of image sensors can be leveraged to replace conventional hardware components in an imaging system with software for enhanced reconfigurability. As a result, the user can program the image sensor in a way best suited to the end application. This is referred to as software-defined imaging (SDI), where image sensor behavior can be altered by the system software depending on the user's needs. The scope of this thesis covers the development and deployment of SDI algorithms for low-power computer vision. Strategies for sparse spatial sampling have been developed in this thesis for power optimization of the vision sensor. This dissertation shows how a hardware-compatible state-of-the-art object tracker can be coupled with a Kalman filter for energy gains at the sensor level. Extensive experiments reveal how adaptive spatial sampling of image frames with this hardware-friendly framework offers attractive energy-accuracy tradeoffs. Another thrust of this thesis is to demonstrate the benefits of reinforcement learning in this research avenue. A major finding reported in this dissertation shows how neural-network-based reinforcement learning can be exploited for the adaptive subsampling framework to achieve improved sampling performance, thereby optimizing the energy efficiency of the image sensor. The last thrust of this thesis is to leverage emerging event-based SDI technology for building a low-power navigation system. A homography estimation pipeline has been proposed in this thesis which couples the right data representation with a differential scale-invariant feature transform (SIFT) module to extract rich visual cues from event streams. Positional encoding is leveraged with a multilayer perceptron (MLP) network to get robust homography estimation from event data.

ContributorsIqbal, Odrika (Author) / Jayasuriya, Suren (Thesis advisor) / Spanias, Andreas (Thesis advisor) / LiKamWa, Robert (Committee member) / Owens, Chris (Committee member) / Arizona State University (Publisher)

Created2023

Knowledge Distillation with Geometric Approaches for Multimodal Data Analysis

Description

This thesis presents robust and novel solutions using knowledge distillation with geometric approaches and multimodal data that can address the current challenges in deep learning, providing a comprehensive understanding of the learning process involved in knowledge distillation. Deep learning has attained significant success in various applications, such as health and…

This thesis presents robust and novel solutions using knowledge distillation with geometric approaches and multimodal data that can address the current challenges in deep learning, providing a comprehensive understanding of the learning process involved in knowledge distillation. Deep learning has attained significant success in various applications, such as health and wellness promotion, smart homes, and intelligent surveillance. In general, stacking more layers or increasing the number of trainable parameters causes deep networks to exhibit improved performance. However, this causes the model to become large, resulting in an additional need for computing and power resources for training, storage, and deployment. These are the core challenges in incorporating such models into small devices with limited power and computational resources. In this thesis, robust solutions aimed at addressing the aforementioned challenges are presented. These proposed methodologies and algorithmic contributions enhance the performance and efficiency of deep learning models. The thesis encompasses a comprehensive exploration of knowledge distillation, an approach that holds promise for creating compact models from high-capacity ones, while preserving their performance. This exploration covers diverse datasets, including both time series and image data, shedding light on the pivotal role of augmentation methods in knowledge distillation. The effects of these methods are rigorously examined through empirical experiments. Furthermore, the study within this thesis delves into the efficient utilization of features derived from two different teacher models, each trained on dissimilar data representations, including time-series and image data. Through these investigations, I present novel approaches to knowledge distillation, leveraging geometric techniques for the analysis of multimodal data. These solutions not only address real-world challenges but also offer valuable insights and recommendations for modeling in new applications.

ContributorsJeon, Eunsom (Author) / Turaga, Pavan (Thesis advisor) / Li, Baoxin (Committee member) / Lee, Hyunglae (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2023

Building Reliable and Robust Deep Neural Networks with Improved Representations using Model Distillation and Deep Constraints

Description

This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other…

This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other words, the complex architecture and millions of parameters present challenges in finding the right balance between capturing useful patterns and avoiding noise in the data. To address these issues, this thesis explores novel solutions based on knowledge distillation, enabling the learning of robust representations. Leveraging the capabilities of large-scale networks, effective learning strategies are developed. Moreover, the limitations of dependency on external networks in the distillation process, which often require large-scale models, are effectively overcome by proposing a self-distillation strategy. The proposed approach empowers the model to generate high-level knowledge within a single network, pushing the boundaries of knowledge distillation. The effectiveness of the proposed method is not only demonstrated across diverse applications, including image classification, object detection, and semantic segmentation but also explored in practical considerations such as handling data scarcity and assessing the transferability of the model to other learning tasks. Another major obstacle hindering the development of reliable and robust models lies in their black-box nature, impeding clear insights into the contributions toward the final predictions and yielding uninterpretable feature representations. To address this challenge, this thesis introduces techniques that incorporate simple yet powerful deep constraints rooted in Riemannian geometry. These constraints confer geometric qualities upon the latent representation, thereby fostering a more interpretable and insightful representation. In addition to its primary focus on general tasks like image classification and activity recognition, this strategy offers significant benefits in real-world applications where data scarcity is prevalent. Moreover, its robustness in feature removal showcases its potential for edge applications. By successfully tackling these challenges, this research contributes to advancing the field of machine learning and provides a foundation for building more reliable and robust systems across various application domains.

ContributorsChoi, Hongjun (Author) / Turaga, Pavan (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Wenwen (Committee member) / Fazli, Pooyan (Committee member) / Arizona State University (Publisher)

Created2023

Robust and Controllable Generative Models by Leveraging Physics-Based, Probabilistic, and Geometric Methods

Description

Generative models are deep neural network-based models trained to learn the underlying distribution of a dataset. Once trained, these models can be used to sample novel data points from this distribution. Their impressive capabilities have been manifested in various generative tasks, encompassing areas like image-to-image translation, style transfer, image editing,…

Generative models are deep neural network-based models trained to learn the underlying distribution of a dataset. Once trained, these models can be used to sample novel data points from this distribution. Their impressive capabilities have been manifested in various generative tasks, encompassing areas like image-to-image translation, style transfer, image editing, and more. One notable application of generative models is data augmentation, aimed at expanding and diversifying the training dataset to augment the performance of deep learning models for a downstream task. Generative models can be used to create new samples similar to the original data but with different variations and properties that are difficult to capture with traditional data augmentation techniques. However, the quality, diversity, and controllability of the shape and structure of the generated samples from these models are often directly proportional to the size and diversity of the training dataset. A more extensive and diverse training dataset allows the generative model to capture overall structures present in the data and generate more diverse and realistic-looking samples. In this dissertation, I present innovative methods designed to enhance the robustness and controllability of generative models, drawing upon physics-based, probabilistic, and geometric techniques. These methods help improve the generalization and controllability of the generative model without necessarily relying on large training datasets. I enhance the robustness of generative models by integrating classical geometric moments for shape awareness and minimizing trainable parameters. Additionally, I employ non-parametric priors for the generative model's latent space through basic probability and optimization methods to improve the fidelity of interpolated images. I adopt a hybrid approach to address domain-specific challenges with limited data and controllability, combining physics-based rendering with generative models for more realistic results. These approaches are particularly relevant in industrial settings, where the training datasets are small and class imbalance is common. Through extensive experiments on various datasets, I demonstrate the effectiveness of the proposed methods over conventional approaches.

ContributorsSingh, Rajhans (Author) / Turaga, Pavan (Thesis advisor) / Jayasuriya, Suren (Committee member) / Berisha, Visar (Committee member) / Fazli, Pooyan (Committee member) / Arizona State University (Publisher)

Created2023

Toward an Ethic of Queerness for Engineering Education Research

Description

This dissertation features three pieces of scholarship which showcase and demonstrate an ethic of queerness for engineering education research (EER). The concept of an ethic of queerness is introduced and constructed in Chapter 1 using tenets from the philosophy of pragmatism, systems thinking, critical theory, and the personal and collective…

This dissertation features three pieces of scholarship which showcase and demonstrate an ethic of queerness for engineering education research (EER). The concept of an ethic of queerness is introduced and constructed in Chapter 1 using tenets from the philosophy of pragmatism, systems thinking, critical theory, and the personal and collective experiences of queered communities immersed in normative spaces, such as engineering and engineering education. Chapter 2 is a scoping literature review on the state of research on the LGBTQIA+ engineering student experience compared to other relevant fields, revealing that EER is still nascent on the topic. Chapter 3 leverages arts-based qualitative inquiry to explore the opportunities and limitations of mixed-initiative creative interfaces (MICIs) when used as a tool for self care by queer(ed) subjects. Chapter 4 connects Patricia Hill Collins’ insider/outsider paradox framework to recent engineering education research through collaborative autoethnographies, illuminating the ways in which normative, oppressive social discourses are embedded within the EER system. Although Chapters 2-4 feature their own unique methodology and topic of inquiry, they are united through a motivation to deconstruct and re-imagine sociotechnical systems throughout engineering and EER through the lens of radical queerness. Chapter 5 summarizes how each of the prior chapters aligns with queerness as an ethic and explores avenues of future work from this dissertation. More specifically, each chapter represents a way of queering engineering education research methodology through the embrace of ambiguity and ephemerality, particularly with regard to the ways in which the author’s subjectivity and relationality to the roles of researcher, student, engineer, and engineering education researcher emerged throughout their doctoral education.

ContributorsJennings, Madeleine (Author) / Kellam, Nadia (Thesis advisor) / Jayasuriya, Suren (Thesis advisor) / Roscoe, Rod (Committee member) / Brunhaver, Samantha (Committee member) / Arizona State University (Publisher)

Created2023