Matching Items (36)
Filtering by

Clear all filters

155963-Thumbnail Image.png
Description
Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is generalization of supervised learning, is one

example of task learning that is discussed. In particular, a novel non-parametric k-

NN-based multiple-instance learning is proposed, which is shown to outperform other

existing approaches. This solution is applied to a diabetic retinopathy pathology

detection problem eectively.

In cases of representation learning, generality of neural features are investigated

rst. This investigation leads to some critical understanding and results in feature

generality among datasets. The possibility of learning from a mentor network instead

of from labels is then investigated. Distillation of dark knowledge is used to eciently

mentor a small network from a pre-trained large mentor network. These studies help

in understanding representation learning with smaller and compressed networks.
ContributorsVenkatesan, Ragav (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2017
156084-Thumbnail Image.png
Description
The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle

The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.

The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.

In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.
ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2017
157226-Thumbnail Image.png
Description
Rapid intraoperative diagnosis of brain tumors is of great importance for planning treatment and guiding the surgeon about the extent of resection. Currently, the standard for the preliminary intraoperative tissue analysis is frozen section biopsy that has major limitations such as tissue freezing and cutting artifacts, sampling errors, lack of

Rapid intraoperative diagnosis of brain tumors is of great importance for planning treatment and guiding the surgeon about the extent of resection. Currently, the standard for the preliminary intraoperative tissue analysis is frozen section biopsy that has major limitations such as tissue freezing and cutting artifacts, sampling errors, lack of immediate interaction between the pathologist and the surgeon, and time consuming.

Handheld, portable confocal laser endomicroscopy (CLE) is being explored in neurosurgery for its ability to image histopathological features of tissue at cellular resolution in real time during brain tumor surgery. Over the course of examination of the surgical tumor resection, hundreds to thousands of images may be collected. The high number of images requires significant time and storage load for subsequent reviewing, which motivated several research groups to employ deep convolutional neural networks (DCNNs) to improve its utility during surgery. DCNNs have proven to be useful in natural and medical image analysis tasks such as classification, object detection, and image segmentation.

This thesis proposes using DCNNs for analyzing CLE images of brain tumors. Particularly, it explores the practicality of DCNNs in three main tasks. First, off-the shelf DCNNs were used to classify images into diagnostic and non-diagnostic. Further experiments showed that both ensemble modeling and transfer learning improved the classifier’s accuracy in evaluating the diagnostic quality of new images at test stage. Second, a weakly-supervised learning pipeline was developed for localizing key features of diagnostic CLE images from gliomas. Third, image style transfer was used to improve the diagnostic quality of CLE images from glioma tumors by transforming the histology patterns in CLE images of fluorescein sodium-stained tissue into the ones in conventional hematoxylin and eosin-stained tissue slides.

These studies suggest that DCNNs are opted for analysis of CLE images. They may assist surgeons in sorting out the non-diagnostic images, highlighting the key regions and enhancing their appearance through pattern transformation in real time. With recent advances in deep learning such as generative adversarial networks and semi-supervised learning, new research directions need to be followed to discover more promises of DCNNs in CLE image analysis.
ContributorsIzady Yazdanabadi, Mohammadhassan (Author) / Preul, Mark (Thesis advisor) / Yang, Yezhou (Thesis advisor) / Nakaji, Peter (Committee member) / Vernon, Brent (Committee member) / Arizona State University (Publisher)
Created2019
135660-Thumbnail Image.png
Description
This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can be used to understand the image better through recognizing different features present within the image. Deep CNNs, however, require training sets that can be larger than a million pictures in order to fine tune their feature detectors. For the case of facial expression datasets, none of these large datasets are available. Due to this limited availability of data required to train a new CNN, the idea of using naïve domain adaptation is explored. Instead of creating and using a new CNN trained specifically to extract features related to FER, a previously trained CNN originally trained for another computer vision task is used. Work for this research involved creating a system that can run a CNN, can extract feature vectors from the CNN, and can classify these extracted features. Once this system was built, different aspects of the system were tested and tuned. These aspects include the pre-trained CNN that was used, the layer from which features were extracted, normalization used on input images, and training data for the classifier. Once properly tuned, the created system returned results more accurate than previous attempts on facial expression recognition. Based on these positive results, naïve domain adaptation is shown to successfully leverage advantages of deep CNNs for facial expression recognition.
ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
Description
Technical innovation has always played a part in live theatre, whether in the form of mechanical pieces like lifts and trapdoors to the more recent integration of digital media. The advances of the art form encourage the development of technology, and at the same time, technological development enables the advancement

Technical innovation has always played a part in live theatre, whether in the form of mechanical pieces like lifts and trapdoors to the more recent integration of digital media. The advances of the art form encourage the development of technology, and at the same time, technological development enables the advancement of theatrical expression. As mechanics, lighting, sound, and visual media have made their way into the spotlight, advances in theatrical robotics continue to push for their inclusion in the director's toolbox. However, much of the technology available is gated by high prices and unintuitive interfaces, designed for large troupes and specialized engineers, making it difficult to access for small schools and students new to the medium. As a group of engineering students with a vested interest in the development of the arts, this thesis team designed a system that will enable troupes from any background to participate in the advent of affordable automation. The intended result of this thesis project was to create a robotic platform that interfaces with custom software, receiving commands and transmitting position data, and to design that software so that a user can define intuitive cues for their shows. In addition, a new pathfinding algorithm was developed to support free-roaming automation in a 2D space. The final product consisted of a relatively inexpensive (< $2000) free-roaming platform, made entirely with COTS and standard materials, and a corresponding control system with cue design, wireless path following, and position tracking. This platform was built to support 1000 lbs, and includes integrated emergency stopping. The software allows for custom cue design, speed variation, and dynamic path following. Both the blueprints and the source code for the platform and control system have been released to open-source repositories, to encourage further development in the area of affordable automation. The platform itself was donated to the ASU School of Theater.
ContributorsHollenbeck, Matthew D. (Co-author) / Wiebel, Griffin (Co-author) / Winnemann, Christopher (Thesis director) / Christensen, Stephen (Committee member) / Computer Science and Engineering Program (Contributor) / School of Film, Dance and Theatre (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
135230-Thumbnail Image.png
Description
Agent Based modeling has been used in computer science to simulate complex phenomena. The introduction of Agent Based Models into the field of economics (Agent Based Computational Economics ACE) is not new, however work on making model environments simpler to design for individuals without a background in computer science or

Agent Based modeling has been used in computer science to simulate complex phenomena. The introduction of Agent Based Models into the field of economics (Agent Based Computational Economics ACE) is not new, however work on making model environments simpler to design for individuals without a background in computer science or computer engineering is a constantly evolving topic. The issue is a trade off of how much is handled by the framework and how much control the modeler has, as well as what tools exist to allow the user to develop insights from the behavior of the model. The solutions looked at in this thesis are the construction of a simplified grammar for model construction, the design of an economic based library to assist in ACE modeling, and examples of how to construct interactive models.
ContributorsAnderson, Brandon David (Author) / Bazzi, Rida (Thesis director) / Kuminoff, Nicolai (Committee member) / Roberts, Nancy (Committee member) / Computer Science and Engineering Program (Contributor) / Economics Program in CLAS (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
134769-Thumbnail Image.png
Description
In order to adequately introduce students to computer science and robotics in an exciting and engaging manner certain teaching techniques should be used. In recent years some of the most popular paradigms are Visual Programming Languages. Visual Programming Languages are meant to introduce problem solving skills and basic programming constructs

In order to adequately introduce students to computer science and robotics in an exciting and engaging manner certain teaching techniques should be used. In recent years some of the most popular paradigms are Visual Programming Languages. Visual Programming Languages are meant to introduce problem solving skills and basic programming constructs inherent to all modern day languages by allowing users to write programs visually as opposed to textually. By bypassing the need to learn syntax students can focus on the thinking behind developing an algorithm and see immediate results that help generate excitement for the field and reduce disinterest due to startup complexity and burnout. The Introduction to Engineering course at Arizona State University supports this approach by teaching students the basics of autonomous maze traversing algorithms and using ASU VIPLE, a Visual Programming Language developed to connect with and direct real-world robots. However, some startup time is needed to learn how to interface with these robots using ASU VIPLE. That is why the HTML5 Autonomous Robot Web Simulator was created -- by encouraging students to use the simulator the problem solving behind autonomous maze traversing algorithms can be introduced more quickly and with immediate affirmation. Our goal was to improve this simulator and add features so that the simulator could be accessed and used for a more wide variety of introductory Computer Science lessons. Features scattered across past implementations of robotic simulators were aggregated in a cross platform solution. Upon initial development, a classroom test group revealed usability concerns and a demonstration of students' mental models. Mean time for task completion was 8.1min - compared to 2min for the authors. The simulator was updated in response to test group feedback and new instructor requirements. The new implementation reduces programming overhead while maintaining a learning environment with support for even the most complex applications.
ContributorsRodewald, Spencer (Co-author, Co-author) / Patel, Ankit (Co-author) / Chen, Yinong (Thesis director) / Chattin, Linda (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
132967-Thumbnail Image.png
Description
Classical planning is a field of Artificial Intelligence concerned with allowing autonomous agents to make reasonable decisions in complex environments. This work investigates
the application of deep learning and planning techniques, with the aim of constructing generalized plans capable of solving multiple problem instances. We construct a Deep Neural Network that,

Classical planning is a field of Artificial Intelligence concerned with allowing autonomous agents to make reasonable decisions in complex environments. This work investigates
the application of deep learning and planning techniques, with the aim of constructing generalized plans capable of solving multiple problem instances. We construct a Deep Neural Network that, given an abstract problem state, predicts both (i) the best action to be taken from that state and (ii) the generalized “role” of the object being manipulated. The neural network was tested on two classical planning domains: the blocks world domain and the logistic domain. Results indicate that neural networks are capable of making such
predictions with high accuracy, indicating a promising new framework for approaching generalized planning problems.
ContributorsNakhleh, Julia Blair (Author) / Srivastava, Siddharth (Thesis director) / Fainekos, Georgios (Committee member) / Computer Science and Engineering Program (Contributor) / School of International Letters and Cultures (Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
133339-Thumbnail Image.png
Description
Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important

Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important aspect within these records is the presence of prescription information. Existing techniques for extracting prescription information — which includes medication names, dosages, frequencies, reasons for taking, and mode of administration — from unstructured text have focused on the application of rule- and classifier-based methods. While state-of-the-art systems can be effective in extracting many types of information, they require significant effort to develop hand-crafted rules and conduct effective feature engineering. This paper presents the use of a bidirectional LSTM with CRF tagging model initialized with precomputed word embeddings for extracting prescription information from sentences without requiring significant feature engineering. The experimental results, run on the i2b2 2009 dataset, achieve an F1 macro measure of 0.8562, and scores above 0.9449 on four of the six categories, indicating significant potential for this model.
ContributorsRawal, Samarth Chetan (Author) / Baral, Chitta (Thesis director) / Anwar, Saadat (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
134113-Thumbnail Image.png
Description
Since the release of Discord in March of 2015 it has become the choice VoIP client for 25 million users, pulling in more each day from many sources including online video games with no voice chat, such as League of Legends. With such an expansive user base and many servers

Since the release of Discord in March of 2015 it has become the choice VoIP client for 25 million users, pulling in more each day from many sources including online video games with no voice chat, such as League of Legends. With such an expansive user base and many servers hosting multiple users during all times of the day, for a server admin to always be monitoring users is unreasonable. AhriBot aims to solve this problem by providing general administration through a command system to a server while it is logged onto that server. Specifically, AhriBot will be tailored for use on servers where League of Legends is primarily being played. Using commands issued to AhriBot, users can get statistics about their current game. By providing a set of features for general users, and a more specific set of features for League of Legends, AhriBot provides a greater experience and will help players to have quicker access to information about the game without having to travel to multiple outside sources.
ContributorsKoehler, Brendan Joseph (Author) / Balasooriya, Janaka (Thesis director) / Faucon, Philippe (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2017-12