Matching Items (10)

132574-Thumbnail Image.png

Pose Estimation with Convolutional Neural Networks

Description

Convolutional neural networks boast a myriad of applications in artificial intelligence, but one of the most common uses for such networks is image extraction. The ability of convolutional layers to extract and combine data features for the purpose of image

Convolutional neural networks boast a myriad of applications in artificial intelligence, but one of the most common uses for such networks is image extraction. The ability of convolutional layers to extract and combine data features for the purpose of image analysis can be leveraged for pose estimation on an object - detecting the presence and attitude of corners and edges allows a convolutional neural network to identify how an object is positioned. This task can assist in working to grasp an object correctly in robotics applications, or to track an object more accurately in 3D space. However, the effectiveness of pose estimation may change based on properties of the object; the pose of a complex object, complexity being determined by internal occlusions, similar faces, etcetera, can be difficult to resolve.
This thesis is part of a collaboration between ASU’s Interactive Robotics Laboratory and NASA’s Jet Propulsion Laboratory. In this thesis, the training pipeline from Sharma’s paper “Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks” was modified to perform pose estimation on a complex object - specifically, a segment of a hollow truss. After initial attempts to replicate the architecture used in the paper and train solely on synthetic images, a combination of synthetic dataset generation and transfer learning on an ImageNet-pretrained AlexNet model was implemented to mitigate the difficulty of gathering large amounts of real-world data. Experimentation with pose estimation accuracy and hyperparameters of the model resulted in gradual test accuracy improvement, and future work is suggested to improve pose estimation for complex objects with some form of rotational symmetry.

Contributors

Created

Date Created
2019-05

132708-Thumbnail Image.png

PRACTICAL APPLICATION OF CONVOLUTIONAL NEURAL NETWORKS FOR SKIN LESION CLASSIFICATION

Description

In this paper, I explore practical applications of neural networks for automated skin lesion identification. The visual characteristics are of primary importance in the recognition of skin diseases, hence, the development of deep neural network models proven capable of classifying

In this paper, I explore practical applications of neural networks for automated skin lesion identification. The visual characteristics are of primary importance in the recognition of skin diseases, hence, the development of deep neural network models proven capable of classifying skin lesions can potentially change the face of modern medicine by extending the availability and lowering the cost of diagnostic care. Previous work has demonstrated the effectiveness of convolutional neural networks in image classification in general, with even higher accuracy achievable by data augmentation techniques, such as cropping, rotating, and flipping input images, along with more advanced computationally intensive approaches. In this research, I provide an overview of Convolutional Neural Networks (CNN) and CNN implementation with TensorFlow and Keras API in context of image recognition and classification. I also experiment with custom convolutional neural network model architecture trained using HAM10000 dataset. The dataset used for the case study is obtained from Harvard Dataverse and is maintained by Medical University of Vienna. The HAM10000 dataset is a large collection of multi-source dermatoscopic images of common pigmented skin lesions and is available for academic research under Creative Commons Attribution-Noncommercial 4.0 International Public License. With over ten thousand dermatoscopic images of seven classes of benign and malignant skin lesions, the dataset is substantial for academic machine learning purposes for multiclass image classification. I discuss the successes and shortcomings of the model in respect to its application to the dataset.

Contributors

Agent

Created

Date Created
2019-05

134000-Thumbnail Image.png

Big Data Analytics for Pipe Damage and Risk Identification

Description

In this thesis, Inception V3, a convolutional neural network model from Google, was partially retrained to categorize pipeline images based on their damage modes. The images for different damage modes of the pipeline were simulated through MATLAB to represent image

In this thesis, Inception V3, a convolutional neural network model from Google, was partially retrained to categorize pipeline images based on their damage modes. The images for different damage modes of the pipeline were simulated through MATLAB to represent image data collected from in-line pipe inspection. The final convolutional layer of the model was retrained with the simulated pipeline images using TensorFlow as the base platform. First, a small-scale retraining was done with real images and simulated images to compare the differences in performance. Then, using simulated images, a 2^5 full factorial design of experiment and individual parametric studies were performed on five different chosen parameters, including training steps, learning rate, batch size, training data size and image noise. The effect of each parameter on the performance of the model was evaluated and analyzed. It is crucial to understand that due to the nature of the experiment, the learnings may or may not apply to neural network models trained for other tasks. After analyzing the results, the effects and trade-offs for each parameter are discussed in detail. In addition, a method of predicting the training time was proposed. Based on the findings, an optimized model was proposed for this training exercise, with 1180 training steps, a learning rate of 0.01, a batch size of 100 and a training data set of 200 images. The optimized model reached 87.2% accuracy with a training time of 2 minutes and 6 seconds. This study will enhance our understanding in applying machine learning techniques in damage and risk identification.

Contributors

Agent

Created

Date Created
2018-05

131212-Thumbnail Image.png

Facial Expression Recognition Using Machine Learning

Description

In recent years, the development of new Machine Learning models has allowed for new technological advancements to be introduced for practical use across the world. Multiple studies and experiments have been conducted to create new variations of Machine Learning models

In recent years, the development of new Machine Learning models has allowed for new technological advancements to be introduced for practical use across the world. Multiple studies and experiments have been conducted to create new variations of Machine Learning models with different algorithms to determine if potential systems would prove to be successful. Even today, there are still many research initiatives that are continuing to develop new models in the hopes to discover potential solutions for problems such as autonomous driving or determining the emotional value from a single sentence. One of the current popular research topics for Machine Learning is the development of Facial Expression Recognition systems. These Machine Learning models focus on classifying images of human faces that are expressing different emotions through facial expressions. In order to develop effective models to perform Facial Expression Recognition, researchers have gone on to utilize Deep Learning models, which are a more advanced implementation of Machine Learning models, known as Neural Networks. More specifically, the use of Convolutional Neural Networks has proven to be the most effective models for achieving highly accurate results at classifying images of various facial expressions. Convolutional Neural Networks are Deep Learning models that are capable of processing visual data, such as images and videos, and can be used to identify various facial expressions. The purpose of this project, I focused on learning about the important concepts of Machine Learning, Deep Learning, and Convolutional Neural Networks to implement a Convolutional Neural Network that was previously developed by a recommended research paper.

Contributors

Created

Date Created
2020-05

154703-Thumbnail Image.png

A unified framework based on convolutional neural networks for interpreting carotid intima-media thickness videos

Description

Cardiovascular disease (CVD) is the leading cause of mortality yet largely preventable, but the key to prevention is to identify at-risk individuals before adverse events. For predicting individual CVD risk, carotid intima-media thickness (CIMT), a noninvasive ultrasound method, has proven

Cardiovascular disease (CVD) is the leading cause of mortality yet largely preventable, but the key to prevention is to identify at-risk individuals before adverse events. For predicting individual CVD risk, carotid intima-media thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable, offering several advantages over CT coronary artery calcium score. However, each CIMT examination includes several ultrasound videos, and interpreting each of these CIMT videos involves three operations: (1) select three enddiastolic ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI) in each selected frame, and (3) trace the lumen-intima interface and the media-adventitia interface in each ROI to measure CIMT. These operations are tedious, laborious, and time consuming, a serious limitation that hinders the widespread utilization of CIMT in clinical practice. To overcome this limitation, this paper presents a new system to automate CIMT video interpretation. Our extensive experiments demonstrate that the suggested system significantly outperforms the state-of-the-art methods. The superior performance is attributable to our unified framework based on convolutional neural networks (CNNs) coupled with our informative image representation and effective post-processing of the CNN outputs, which are uniquely designed for each of the above three operations.

Contributors

Agent

Created

Date Created
2016

154885-Thumbnail Image.png

A computational approach to relative image aesthetics

Description

Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement,

Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their aesthetic quality instead of binary-categorizing them. Furthermore, in such applications, it may be possible that all images belong to the same category. Hence determining the aesthetic ranking of the images is more appropriate. To this end, a novel problem of ranking images with respect to their aesthetic quality is formulated in this work. A new data-set of image pairs with relative labels is constructed by carefully selecting images from the popular AVA data-set. Unlike in aesthetics classification, there is no single threshold which would determine the ranking order of the images across the entire data-set.

This problem is attempted using a deep neural network based approach that is trained on image pairs by incorporating principles from relative learning. Results show that such relative training procedure allows the network to rank the images with a higher accuracy than a state-of-art network trained on the same set of images using binary labels. Further analyzing the results show that training a model using the image pairs learnt better aesthetic features than training on same number of individual binary labelled images.

Additionally, an attempt is made at enhancing the performance of the system by incorporating saliency related information. Given an image, humans might fixate their vision on particular parts of the image, which they might be subconsciously intrigued to. I therefore tried to utilize the saliency information both stand-alone as well as in combination with the global and local aesthetic features by performing two separate sets of experiments. In both the cases, a standard saliency model is chosen and the generated saliency maps are convoluted with the images prior to passing them to the network, thus giving higher importance to the salient regions as compared to the remaining. Thus generated saliency-images are either used independently or along with the global and the local features to train the network. Empirical results show that the saliency related aesthetic features might already be learnt by the network as a sub-set of the global features from automatic feature extraction, thus proving the redundancy of the additional saliency module.

Contributors

Agent

Created

Date Created
2016

156845-Thumbnail Image.png

Hardware Acceleration of Deep Convolutional Neural Networks on FPGA

Description

The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one

The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the deep learning algorithm inference. However, deploying CNNs on portable and embedded systems is still challenging due to large data volume, intensive computation, varying algorithm structures, and frequent memory accesses. This dissertation proposes a complete design methodology and framework to accelerate the inference process of various CNN algorithms on FPGA hardware with high performance, efficiency and flexibility.

As convolution contributes most operations in CNNs, the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. An efficient dataflow and hardware architecture of CNN acceleration are proposed to minimize the data communication while maximizing the resource utilization to achieve high performance.

Although great performance and efficiency can be achieved by customizing the FPGA hardware for each CNN model, significant efforts and expertise are required leading to long development time, which makes it difficult to catch up with the rapid development of CNN algorithms. In this work, we present an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g. GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g. NiN, VGG, GoogLeNet and ResNet, on two different standalone FPGAs achieving state-of-the art performance.

Based on the optimized acceleration strategy, there are still a lot of design options, e.g. the degree and dimension of computation parallelism, the size of on-chip buffers, and the external memory bandwidth, which impact the utilization of computation resources and data communication efficiency, and finally affect the performance and energy consumption of the accelerator. The large design space of the accelerator makes it impractical to explore the optimal design choice during the real implementation phase. Therefore, a performance model is proposed in this work to quantitatively estimate the accelerator performance and resource utilization. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase.

Contributors

Agent

Created

Date Created
2018

156611-Thumbnail Image.png

Content Detection in Handwritten Documents

Description

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a mathematical expression; in this case, the label would be math. This differentiation facilitates the performance of specific recognition tasks depending on the content type. We hypothesize that the recognition accuracy of the subsequent tasks such as textual, math, and shape recognition will increase, further leading to a better analysis of the document.

Content detection on handwritten documents assigns a particular class to a homogeneous portion of the document. To complete this task, a set of handwritten solutions was digitally collected from middle school students located in two different geographical regions in 2017 and 2018. This research discusses the methods to collect, pre-process and detect content type in the collected handwritten documents. A total of 4049 documents were extracted in the form of image, and json format; and were labelled using an object labelling software with tags being text, math, diagram, cross out, table, graph, tick mark, arrow, and doodle. The labelled images were fed to the Tensorflow’s object detection API to learn a neural network model. We show our results from two neural networks models, Faster Region-based Convolutional Neural Network (Faster R-CNN) and Single Shot detection model (SSD).

Contributors

Agent

Created

Date Created
2018

157202-Thumbnail Image.png

Identifying critical regions for robot planning using convolutional neural networks

Description

In this thesis, a new approach to learning-based planning is presented where critical regions of an environment with low probability measure are learned from a given set of motion plans. Critical regions are learned using convolutional neural networks (CNN) to

In this thesis, a new approach to learning-based planning is presented where critical regions of an environment with low probability measure are learned from a given set of motion plans. Critical regions are learned using convolutional neural networks (CNN) to improve sampling processes for motion planning (MP).

In addition to an identification network, a new sampling-based motion planner, Learn and Link, is introduced. This planner leverages critical regions to overcome the limitations of uniform sampling while still maintaining guarantees of correctness inherent to sampling-based algorithms. Learn and Link is evaluated against planners from the Open Motion Planning Library (OMPL) on an extensive suite of challenging navigation planning problems. This work shows that critical areas of an environment are learnable, and can be used by Learn and Link to solve MP problems with far less planning time than existing sampling-based planners.

Contributors

Agent

Created

Date Created
2019

157226-Thumbnail Image.png

Confocal Laser Endomicroscopy Image Analysis with Deep Convolutional Neural Networks

Description

Rapid intraoperative diagnosis of brain tumors is of great importance for planning treatment and guiding the surgeon about the extent of resection. Currently, the standard for the preliminary intraoperative tissue analysis is frozen section biopsy that has major limitations such

Rapid intraoperative diagnosis of brain tumors is of great importance for planning treatment and guiding the surgeon about the extent of resection. Currently, the standard for the preliminary intraoperative tissue analysis is frozen section biopsy that has major limitations such as tissue freezing and cutting artifacts, sampling errors, lack of immediate interaction between the pathologist and the surgeon, and time consuming.

Handheld, portable confocal laser endomicroscopy (CLE) is being explored in neurosurgery for its ability to image histopathological features of tissue at cellular resolution in real time during brain tumor surgery. Over the course of examination of the surgical tumor resection, hundreds to thousands of images may be collected. The high number of images requires significant time and storage load for subsequent reviewing, which motivated several research groups to employ deep convolutional neural networks (DCNNs) to improve its utility during surgery. DCNNs have proven to be useful in natural and medical image analysis tasks such as classification, object detection, and image segmentation.

This thesis proposes using DCNNs for analyzing CLE images of brain tumors. Particularly, it explores the practicality of DCNNs in three main tasks. First, off-the shelf DCNNs were used to classify images into diagnostic and non-diagnostic. Further experiments showed that both ensemble modeling and transfer learning improved the classifier’s accuracy in evaluating the diagnostic quality of new images at test stage. Second, a weakly-supervised learning pipeline was developed for localizing key features of diagnostic CLE images from gliomas. Third, image style transfer was used to improve the diagnostic quality of CLE images from glioma tumors by transforming the histology patterns in CLE images of fluorescein sodium-stained tissue into the ones in conventional hematoxylin and eosin-stained tissue slides.

These studies suggest that DCNNs are opted for analysis of CLE images. They may assist surgeons in sorting out the non-diagnostic images, highlighting the key regions and enhancing their appearance through pattern transformation in real time. With recent advances in deep learning such as generative adversarial networks and semi-supervised learning, new research directions need to be followed to discover more promises of DCNNs in CLE image analysis.

Contributors

Agent

Created

Date Created
2019