Search Content

Study of Knowledge Transfer Techniques For Deep Learning on Edge Devices

Description

With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational heavy, whereas edge devices are usually equipped with limited computational and…

With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational heavy, whereas edge devices are usually equipped with limited computational and storage resources. Such models can be compressed and reduced in order to be placed on edge devices, but they may loose their capability and may not generalize and perform well compared to large models. Recent works used knowledge transfer techniques to transfer information from a large network (termed teacher) to a small one (termed student) in order to improve the performance of the latter. This approach seems to be promising for learning on edge devices, but a thorough investigation on its effectiveness is lacking.

The purpose of this work is to provide an extensive study on the performance (both in terms of accuracy and convergence speed) of knowledge transfer, considering different student-teacher architectures, datasets and different techniques for transferring knowledge from teacher to student.

A good performance improvement is obtained by transferring knowledge from both the intermediate layers and last layer of the teacher to a shallower student. But other architectures and transfer techniques do not fare so well and some of them even lead to negative performance impact. For example, a smaller and shorter network, trained with knowledge transfer on Caltech 101 achieved a significant improvement of 7.36\% in the accuracy and converges 16 times faster compared to the same network trained without knowledge transfer. On the other hand, smaller network which is thinner than the teacher network performed worse with an accuracy drop of 9.48\% on Caltech 101, even with utilization of knowledge transfer.

ContributorsSistla, Ragini (Author) / Zhao, Ming (Thesis advisor, Committee member) / Li, Baoxin (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2018

Convolutional Neural Networks for Facial Expression Recognition

Description

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can…

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can be used to understand the image better through recognizing different features present within the image. Deep CNNs, however, require training sets that can be larger than a million pictures in order to fine tune their feature detectors. For the case of facial expression datasets, none of these large datasets are available. Due to this limited availability of data required to train a new CNN, the idea of using naïve domain adaptation is explored. Instead of creating and using a new CNN trained specifically to extract features related to FER, a previously trained CNN originally trained for another computer vision task is used. Work for this research involved creating a system that can run a CNN, can extract feature vectors from the CNN, and can classify these extracted features. Once this system was built, different aspects of the system were tested and tuned. These aspects include the pre-trained CNN that was used, the layer from which features were extracted, normalization used on input images, and training data for the classifier. Once properly tuned, the created system returned results more accurate than previous attempts on facial expression recognition. Based on these positive results, naïve domain adaptation is shown to successfully leverage advantages of deep CNNs for facial expression recognition.

ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Raspberry Pi Radio: Programming a Multiple Source Music Player

Description

The purpose of this project was to program a Raspberry Pi to be able to play music from both local storage on the Pi and from internet radio stations such as Pandora. The Pi also needs to be able to play various types of file formats, such as mp3 and…

The purpose of this project was to program a Raspberry Pi to be able to play music from both local storage on the Pi and from internet radio stations such as Pandora. The Pi also needs to be able to play various types of file formats, such as mp3 and FLAC. Finally, the project is also to be driven by a mobile app running on a smartphone or tablet. To achieve this, a client server design was employed where the Raspberry Pi acts as the server and the mobile app is the client. The server functionality was achieved using a Python script that listens on a socket and calls various executables that handle the different formats of music being played. The client functionality was achieved by programming an Android app in Java that sends encoded commands to the server, which the server decodes and begins playing the music that command dictates. The designs for both the client and server are easily extensible and allow for any future modifications to the project to be easily made.

ContributorsStorto, Michael Olson (Author) / Burger, Kevin (Thesis director) / Meuth, Ryan (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

Software for Agent-Based Computational Economics

Description

Agent Based modeling has been used in computer science to simulate complex phenomena. The introduction of Agent Based Models into the field of economics (Agent Based Computational Economics ACE) is not new, however work on making model environments simpler to design for individuals without a background in computer science or…

Agent Based modeling has been used in computer science to simulate complex phenomena. The introduction of Agent Based Models into the field of economics (Agent Based Computational Economics ACE) is not new, however work on making model environments simpler to design for individuals without a background in computer science or computer engineering is a constantly evolving topic. The issue is a trade off of how much is handled by the framework and how much control the modeler has, as well as what tools exist to allow the user to develop insights from the behavior of the model. The solutions looked at in this thesis are the construction of a simplified grammar for model construction, the design of an economic based library to assist in ACE modeling, and examples of how to construct interactive models.

ContributorsAnderson, Brandon David (Author) / Bazzi, Rida (Thesis director) / Kuminoff, Nicolai (Committee member) / Roberts, Nancy (Committee member) / Computer Science and Engineering Program (Contributor) / Economics Program in CLAS (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Learning Generalized Heuristics Using Deep Neural Networks

Description

Classical planning is a field of Artificial Intelligence concerned with allowing autonomous agents to make reasonable decisions in complex environments. This work investigates
the application of deep learning and planning techniques, with the aim of constructing generalized plans capable of solving multiple problem instances. We construct a Deep Neural Network that,…

Classical planning is a field of Artificial Intelligence concerned with allowing autonomous agents to make reasonable decisions in complex environments. This work investigates
the application of deep learning and planning techniques, with the aim of constructing generalized plans capable of solving multiple problem instances. We construct a Deep Neural Network that, given an abstract problem state, predicts both (i) the best action to be taken from that state and (ii) the generalized “role” of the object being manipulated. The neural network was tested on two classical planning domains: the blocks world domain and the logistic domain. Results indicate that neural networks are capable of making such
predictions with high accuracy, indicating a promising new framework for approaching generalized planning problems.

ContributorsNakhleh, Julia Blair (Author) / Srivastava, Siddharth (Thesis director) / Fainekos, Georgios (Committee member) / Computer Science and Engineering Program (Contributor) / School of International Letters and Cultures (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Prescription Information Extraction from Electronic Health Records using BiLSTM-CRF and Word Embeddings

Description

Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important…

Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important aspect within these records is the presence of prescription information. Existing techniques for extracting prescription information — which includes medication names, dosages, frequencies, reasons for taking, and mode of administration — from unstructured text have focused on the application of rule- and classifier-based methods. While state-of-the-art systems can be effective in extracting many types of information, they require significant effort to develop hand-crafted rules and conduct effective feature engineering. This paper presents the use of a bidirectional LSTM with CRF tagging model initialized with precomputed word embeddings for extracting prescription information from sentences without requiring significant feature engineering. The experimental results, run on the i2b2 2009 dataset, achieve an F1 macro measure of 0.8562, and scores above 0.9449 on four of the six categories, indicating significant potential for this model.

ContributorsRawal, Samarth Chetan (Author) / Baral, Chitta (Thesis director) / Anwar, Saadat (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Immersion of Choice

Description

The topic of my creative project centers on the question of "How can the audience's choices influence dancers' improvisation?" This dance work seeks to redefine the relationship between audience and performers through integration of audience, technology, and movement in real-time. This topic was derived from the fields of Computer Science…

The topic of my creative project centers on the question of "How can the audience's choices influence dancers' improvisation?" This dance work seeks to redefine the relationship between audience and performers through integration of audience, technology, and movement in real-time. This topic was derived from the fields of Computer Science and Dance. To answer my main question, I need to explore how I can interconnect the theory of Computer Science/fundamentals of a web application and the elements of dance improvisation. This topic interests me because it focuses on combining two studies that do not seem related. However, I find that when I am coding a web application, I can insert blocks of code. This relates to dance improvisation where I have a movement vocabulary, and I can insert different moves based on the context. The idea of gathering data from an audience in real time also interests me. I find that data is most useful when a story can be deduced from that data. To figure out how I can use dance to create and tell a story about the data that is collected, I find that to be intriguing as well. The main goals of my Creative Project are to learn the skills needed to develop a web application using the knowledge and theory that I am acquiring through Computer Science as well as learning about the skills needed to produce a performance piece. My object for the overall project is to create an audience-interactive experience that presents choices for dancers and creates a connection between two completely different studies: Computer Science and Dance. My project will consist of having the audience enter their answers to preset questions via an online voting application. The stage background screen will be utilized to show the question results in percentages in the form of a chart. The dancers will then serve as a live interpretation of these results. This Creative Project will serve as a gateway between the work that has been cultivated in my studies and the real world. The methods involve exploring movement qualities in improvisation, communicating with my cast about what worked best for the transitions between each section of the piece, and testing for the web applications. I learned the importance of having structure within improvisational movement for the purpose of choreography. The significance of structure is that it provides direction, clarity, and a sense of unification for the dancers. I also learned the basics of the programming language, Python, in order to develop the two real-time web applications. The significance of learning Python is that I will be able to add this to my skillset of programming languages as well as build upon my knowledge of Computer Science and develop more real-world applications in the future.

ContributorsNgai, Courtney Taylor (Author) / Britt, Melissa (Thesis director) / Standley, Eileen (Committee member) / Computer Science and Engineering Program (Contributor) / School of Film, Dance and Theatre (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Comparison of Machine Learning Algorithms for Predicting Breast Cancer Malignancy

Description

Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized…

Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized images of breast tissue samples, called fine-needle aspirates. Breast cancer diagnosis typically involves a combination of mammography, ultrasound, and biopsy. However, machine learning algorithms can assist in the detection and diagnosis of breast cancer by analyzing large amounts of data and identifying patterns that may not be discernible to the human eye. By using these algorithms, healthcare professionals can potentially detect breast cancer at an earlier stage, leading to more effective treatment and better patient outcomes. The results showed that the gradient boosting classifier performed the best, achieving an accuracy of 96% on the test set. This indicates that this algorithm can be a useful tool for healthcare professionals in the early detection and diagnosis of breast cancer, potentially leading to improved patient outcomes.

ContributorsMallya, Aatmik (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Effects of Training Dataset Variance on Artificial Intelligence Image Generation

Description

This research paper explores the effects of data variance on the quality of Artificial Intelligence image generation models and the impact on a viewer's perception of the generated images. The study examines how the quality and accuracy of the images produced by these models are influenced by factors such as…

This research paper explores the effects of data variance on the quality of Artificial Intelligence image generation models and the impact on a viewer's perception of the generated images. The study examines how the quality and accuracy of the images produced by these models are influenced by factors such as size, labeling, and format of the training data. The findings suggest that reducing the training dataset size can lead to a decrease in image coherence, indicating that AI models get worse as the training dataset gets smaller. Moreover, the study makes surprising discoveries regarding AI image generation models that are trained on highly varied datasets. In addition, the study involves a survey in which people were asked to rate the subjective realism of the generated images on a scale ranging from 1 to 5 as well as sorting the images into their respective classes. The findings of this study emphasize the importance of considering dataset variance and size as a critical aspect of improving image generation models as well as the implications of using AI technology in the future.

ContributorsPunyamurthula, Rushil (Author) / Carter, Lynn (Thesis director) / Sarmento, Rick (Committee member) / Barrett, The Honors College (Contributor) / School of Sustainability (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Machine Learning and Mario Speedruns

Description

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will…

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will be described is self-play in video games, where a neural model will be researched and described that will teach a computer to complete a level of Super Mario World (1990) on its own. The neural model in question was inspired by the academic paper “Evolving Neural Networks through Augmenting Topologies”, which was written by Kenneth O. Stanley and Risto Miikkulainen of University of Texas at Austin. The model that will actually be described is from YouTuber SethBling of the California Institute of Technology. The second department that will be described is cybersecurity, where an algorithm is described from the academic paper “Process Based Volatile Memory Forensics for Ransomware Detection”, written by Asad Arfeen, Muhammad Asim Khan, Obad Zafar, and Usama Ahsan. This algorithm utilizes Python and the Volatility framework to detect malicious software in an infected system.

ContributorsBallecer, Joshua (Author) / Yang, Yezhou (Thesis director) / Luo, Yiran (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Filtering by