Search Content

Content Detection in Handwritten Documents

Description

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a…

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a mathematical expression; in this case, the label would be math. This differentiation facilitates the performance of specific recognition tasks depending on the content type. We hypothesize that the recognition accuracy of the subsequent tasks such as textual, math, and shape recognition will increase, further leading to a better analysis of the document.

Content detection on handwritten documents assigns a particular class to a homogeneous portion of the document. To complete this task, a set of handwritten solutions was digitally collected from middle school students located in two different geographical regions in 2017 and 2018. This research discusses the methods to collect, pre-process and detect content type in the collected handwritten documents. A total of 4049 documents were extracted in the form of image, and json format; and were labelled using an object labelling software with tags being text, math, diagram, cross out, table, graph, tick mark, arrow, and doodle. The labelled images were fed to the Tensorflow’s object detection API to learn a neural network model. We show our results from two neural networks models, Faster Region-based Convolutional Neural Network (Faster R-CNN) and Single Shot detection model (SSD).

ContributorsFaizaan, Shaik Mohammed (Author) / VanLehn, Kurt (Thesis advisor) / Cheema, Salman Shaukat (Thesis advisor) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Performance Evaluation of Object Proposal Generators for Salient Object Detection

Description

The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these…

The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these methods are capable of detecting the salient objects in the scene when constraining the number of proposals that can be generated due to constraints on timing or computations during execution. Salient objects are objects that tend to be more fixated by human subjects. The detection of salient objects is important in applications such as image collection browsing, image display on small devices, and perceptual compression.

This thesis proposes a novel evaluation framework that analyses the performance of popular existing object proposal generators in detecting the most salient objects. This work also shows that, by incorporating saliency constraints, the number of generated object proposals and thus the computational cost can be decreased significantly for a target true positive detection rate (TPR).

As part of the proposed framework, salient ground-truth masks are generated from the given original ground-truth masks for a given dataset. Given an object detection dataset, this work constructs salient object location ground-truth data, referred to here as salient ground-truth data for short, that only denotes the locations of salient objects. This is obtained by first computing a saliency map for the input image and then using it to assign a saliency score to each object in the image. Objects whose saliency scores are sufficiently high are referred to as salient objects. The detection rates are analyzed for existing object proposal generators with respect to the original ground-truth masks and the generated salient ground-truth masks.

As part of this work, a salient object detection database with salient ground-truth masks was constructed from the PASCAL VOC 2007 dataset. Not only does this dataset aid in analyzing the performance of existing object detectors for salient object detection, but it also helps in the development of new object detection methods and evaluating their performance in terms of successful detection of salient objects.

ContributorsKotamraju, Sai Prajwal (Author) / Karam, Lina J (Thesis advisor) / Yu, Hongbin (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2019

Enhancing Object Detection In An Augmented Reality Learning System

Description

The goal of the ANLGE Lab's AR assembly project is to create/save assemblies as well as to replicate assemblies later with real-time AR feedback. In this iteration of the project, the SURF algorithm was used to provide object detection for 5 featureful objects (a Lego girl piece, a Lego guy…

The goal of the ANLGE Lab's AR assembly project is to create/save assemblies as well as to replicate assemblies later with real-time AR feedback. In this iteration of the project, the SURF algorithm was used to provide object detection for 5 featureful objects (a Lego girl piece, a Lego guy piece, a blue Lego car piece, a window piece, and a fence piece). Functionality was added to determine the location of these 5 featureful objects within a frame as well by using the SURF keypoints associated with detection. Finally, the feedback mechanism by which the system detects connections between objects was improved to consider the size of the blocks in determining connections rather than using static values. Additional user features such as adding a new object and using voice commands were also implemented to make the system more user friendly.

ContributorsSelvam, Nikil Panneer (Author) / Atkinson, Robert (Thesis director) / Runger, George (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / Economics Program in CLAS (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Visual Saliency Application in Object Detection for Search Space Reduction

Description

Vision is the ability to see and interpret any visual stimulus. It is one of the most fundamental and complex tasks the brain performs. Its complexity can be understood from the fact that close to 50% of the human brain is dedicated to vision. The brain receives an overwhelming amount…

Vision is the ability to see and interpret any visual stimulus. It is one of the most fundamental and complex tasks the brain performs. Its complexity can be understood from the fact that close to 50% of the human brain is dedicated to vision. The brain receives an overwhelming amount of sensory information from the retina – estimated at up to 100 Mbps per optic nerve. Parallel processing of the entire visual field in real time is likely impossible for even the most sophisticated brains due to the high computational complexity of the task [1]. Yet, organisms can efficiently process this information to parse complex scenes in real time. This amazing feat of nature relies on selective attention which allows the brain to filter sensory information to select only a small subset of it for further processing.

Today, Computer Vision has become ubiquitous in our society with several in image understanding, medicine, drones, self-driving cars and many more. With the advent of GPUs and the availability of huge datasets like ImageNet, Convolutional Neural Networks (CNNs) have come to play a very important role in solving computer vision tasks, e.g object detection. However, the size of the networks become

prohibitive when higher accuracies are needed, which in turn demands more hardware. This hinders the application of CNNs to mobile platforms and stops them from hitting the real-time mark. The computational efficiency of a computer vision task, like object detection, can be enhanced by adopting a selective attention mechanism into the algorithm. In this work, this idea is explored by using Visual Proto Object Saliency algorithm [1] to crop out the areas of an image without relevant objects before a computationally intensive network like the Faster R-CNN [2] processes it.

ContributorsGorthy, Sai Rama Srivatsava (Author) / Cao, Yu (Thesis advisor) / Seo, Jae-Sun (Committee member) / Vrudhula, Sarma (Committee member) / Arizona State University (Publisher)

Created2017

Robust Object Detection under Varying Illuminations and Distortions

Description

Object detection is an interesting computer vision area that is concerned with the detection of object instances belonging to specific classes of interest as well as the localization of these instances in images and/or videos. Object detection serves as a vital module in many computer vision based applications. This work…

Object detection is an interesting computer vision area that is concerned with the detection of object instances belonging to specific classes of interest as well as the localization of these instances in images and/or videos. Object detection serves as a vital module in many computer vision based applications. This work focuses on the development of object detection methods that exhibit increased robustness to varying illuminations and image quality. In this work, two methods for robust object detection are presented.

In the context of varying illumination, this work focuses on robust generic obstacle detection and collision warning in Advanced Driver Assistance Systems (ADAS) under varying illumination conditions. The highlight of the first method is the ability to detect all obstacles without prior knowledge and detect partially occluded obstacles including the obstacles that have not completely appeared in the frame (truncated obstacles). It is first shown that the angular distortion in the Inverse Perspective Mapping (IPM) domain belonging to obstacle edges varies as a function of their corresponding 2D location in the camera plane. This information is used to generate object proposals. A novel proposal assessment method based on fusing statistical properties from both the IPM image and the camera image to perform robust outlier elimination and false positive reduction is also proposed.

In the context of image quality, this work focuses on robust multiple-class object detection using deep neural networks for images with varying quality. The use of Generative Adversarial Networks (GANs) is proposed in a novel generative framework to generate features that provide robustness for object detection on reduced quality images. The proposed GAN-based Detection of Objects (GAN-DO) framework is not restricted to any particular architecture and can be generalized to several deep neural network (DNN) based architectures. The resulting deep neural network maintains the exact architecture as the selected baseline model without adding to the model parameter complexity or inference speed. Performance results provided using GAN-DO on object detection datasets establish an improved robustness to varying image quality and a higher object detection and classification accuracy compared to the existing approaches.

ContributorsPrakash, Charan Dudda (Author) / Karam, Lina (Thesis advisor) / Abousleman, Glen (Committee member) / Jayasuriya, Suren (Committee member) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2020

Cross Platform Training of Neural Networks to Enable Object Identification by Autonomous Vehicles

Description

Autonomous vehicle technology has been evolving for years since the Automated Highway System Project. However, this technology has been under increased scrutiny ever since an autonomous vehicle killed Elaine Herzberg, who was crossing the street in Tempe, Arizona in March 2018. Recent tests of autonomous vehicles on public roads…

Autonomous vehicle technology has been evolving for years since the Automated Highway System Project. However, this technology has been under increased scrutiny ever since an autonomous vehicle killed Elaine Herzberg, who was crossing the street in Tempe, Arizona in March 2018. Recent tests of autonomous vehicles on public roads have faced opposition from nearby residents. Before these vehicles are widely deployed, it is imperative that the general public trusts them. For this, the vehicles must be able to identify objects in their surroundings and demonstrate the ability to follow traffic rules while making decisions with human-like moral integrity when confronted with an ethical dilemma, such as an unavoidable crash that will injure either a pedestrian or the passenger.

Testing autonomous vehicles in real-world scenarios would pose a threat to people and property alike. A safe alternative is to simulate these scenarios and test to ensure that the resulting programs can work in real-world scenarios. Moreover, in order to detect a moral dilemma situation quickly, the vehicle should be able to identify objects in real-time while driving. Toward this end, this thesis investigates the use of cross-platform training for neural networks that perform visual identification of common objects in driving scenarios. Here, the object detection algorithm Faster R-CNN is used. The hypothesis is that it is possible to train a neural network model to detect objects from two different domains, simulated or physical, using transfer learning. As a proof of concept, an object detection model is trained on image datasets extracted from CARLA, a virtual driving environment, via transfer learning. After bringing the total loss factor to 0.4, the model is evaluated with an IoU metric. It is determined that the model has a precision of 100% and 75% for vehicles and traffic lights respectively. The recall is found to be 84.62% and 75% for the same. It is also shown that this model can detect the same classes of objects from other virtual environments and real-world images. Further modifications to the algorithm that may be required to improve performance are discussed as future work.

ContributorsSankaramangalam Ulhas, Sangeet (Author) / Berman, Spring (Thesis advisor) / Johnson, Kathryn (Committee member) / Yong, Sze Zheng (Committee member) / Arizona State University (Publisher)

Created2019

Computationally Efficient Object Detection Strategy from Water Surfaces with Specularity Removal

Description

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these…

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these trash is a hazardous and time-consuming task. Employing autonomous robots for these tasks is a better approach since it is more efficient and faster than humans. However, for a robot to clean the trash objects, a good detection algorithm is required. Real-time object detection on water surfaces is a challenging issue due to nature of the environment and the volatility of the water surface. In addition to this, running an object detection algorithm on an on-board processor of a robot limits the amount of CPU consumption that the algorithm can utilize. In this thesis, a computationally low cost object detection approach for robust detection of trash objects that was run on an on-board processor of a multirotor is presented. To account for specular reflections on the water surface, we use a polarization filter and integrate a specularity removal algorithm on our approach as well. The challenges faced during testing and the means taken to eliminate those challenges are also discussed. The algorithm was compared with two other object detectors using 4 different metrics. The testing was carried out using videos of 5 different objects collected at different illumination conditions over a lake using a multirotor. The results indicate that our algorithm is much suitable to be employed in real-time since it had the highest processing speed of 21 FPS, the lowest CPU consumption of 37.5\% and considerably high precision and recall values in detecting the object.

ContributorsSyed, Danish Faraaz (Author) / Zhang, Wenlong (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2021

Control of Unmanned Aerial Vehicles for Mission Critical Tasks

Description

Unmanned aerial vehicles (UAVs) have reshaped the world of aviation. With the emergence of different types of UAVs, a multitude of mission critical applications, e.g., aerial photography, package delivery, grasping and manipulation, aerial reconnaissance and surveillance have been accomplished successfully. All of the aforementioned applications require the UAVs to be…

Unmanned aerial vehicles (UAVs) have reshaped the world of aviation. With the emergence of different types of UAVs, a multitude of mission critical applications, e.g., aerial photography, package delivery, grasping and manipulation, aerial reconnaissance and surveillance have been accomplished successfully. All of the aforementioned applications require the UAVs to be robust to external disturbances and safe while flying in cluttered environments and these factors are of paramount importance for task completion. In the first phase, this dissertation starts by presenting the synthesis and experimental validation of real-time low-level estimation and robust attitude and position controllers for multirotors. For the task of reliable position estimation, a hybrid low-pass de-trending filter is proposed for attenuating noise and drift in the velocity and position estimates respectively. Subsequently, a disturbance observer (DOB) approach with online Q-filter tuning is proposed for disturbance rejection and precise position control. Finally, a non-linear disturbance observer (NDOB) approach, along with a parameter optimization framework, is proposed for robust attitude control of multirotors. Multiple simulation and experimental flight tests are performed to demonstrate the efficacy of the proposed algorithms. Aerial grasping and collection is a type of mission-critical task which requires vision based sensing and robust control algorithms for successful task completion. In the second phase, this dissertation initially explores different object grasping approaches utilizing soft and rigid graspers. Additionally, vision based control paradigms are developed for object grasping and collection applications, specifically from water surfaces. Autonomous object collection from water surfaces presents a multitude of challenges: i) object drift due to propeller outwash, ii) reflection and glare from water surfaces makes object detection extremely challenging and iii) lack of reliable height sensors above water surface (for autonomous landing on water). Finally, a first of its kind aerial manipulation system, with an integrated net system and a robust vision based control structure, is proposed for floating object collection from water surfaces. Objects of different shapes and sizes are collected, through multiple experimental flight tests, with a success rate of 91.6%. To the best of the author's knowledge, this is the first work demonstrating autonomous object collection from water surfaces.

ContributorsMishra, Shatadal (Author) / Zhang, Wenlong (Thesis advisor) / Berman, Spring M (Committee member) / Sugar, Thomas G (Committee member) / Arizona State University (Publisher)

Created2021

Advanced Radar Detection

Description

This paper will primarily deal with obstacle detection and the benefits that radar technology provides as the primary interface. The concept that is being proposed involves using a non-industrialized radar to achieve similar results when trying to detect a present object. By being able to achieve a working radar detection…

This paper will primarily deal with obstacle detection and the benefits that radar technology provides as the primary interface. The concept that is being proposed involves using a non-industrialized radar to achieve similar results when trying to detect a present object. By being able to achieve a working radar detection system at a more general domain, the path to it becoming more universal accessible increases. This, in turn, will hopefully amplify the areas in which radar technology can be applied to and lead to great benefits universally. From the compiled data and the work that has been done to achieve a responsive radar, it is noted that the radar will provide an accurate reading in most conditions that it is introduced to. These conditions vary from range resolution aspects to various weather environments, as well as the visibility aspect. However, based on the results that were achieved, through various testing, there are still some areas in which radar technology needs to improve in, for it to be fully considered as the sole interface when it comes to obstacle detection and its integration into future technology like self-driving cars. Nevertheless, the capabilities of radar technology at this caliber is noted to be quite impressive and similar to other more expansive options that are available.

ContributorsMartinez, Johan (Author) / Yu, Hongbin (Thesis director) / Houghton, Todd (Committee member) / Electrical Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

Convoluted Processes: The Use and Misuse of Machine Learning in Data Analysis and Prediction

Description

With the rapid increase of technological capabilities, particularly in processing power and speed, the usage of machine learning is becoming increasingly widespread, especially in fields where real-time assessment of complex data is extremely valuable. This surge in popularity of machine learning gives rise to an abundance of potential research and…

With the rapid increase of technological capabilities, particularly in processing power and speed, the usage of machine learning is becoming increasingly widespread, especially in fields where real-time assessment of complex data is extremely valuable. This surge in popularity of machine learning gives rise to an abundance of potential research and projects on further broadening applications of artificial intelligence. From these opportunities comes the purpose of this thesis. Our work seeks to meaningfully increase our understanding of current capabilities of machine learning and the problems they can solve. One extremely popular application of machine learning is in data prediction, as machines are capable of finding trends that humans often miss. Our effort to this end was to examine the CVE dataset and attempt to predict future entries with Random Forests. The second area of interest lies within the great promise being demonstrated by neural networks in the field of autonomous driving. We sought to understand the research being put out by the most prominent bodies within this field and to implement a model on one of the largest standing datasets, Berkeley DeepDrive 100k. This thesis describes our efforts to build, train, and optimize a Random Forest model on the CVE dataset and a convolutional neural network on the Berkeley DeepDrive 100k dataset. We document these efforts with the goal of growing our knowledge on (and usage of) machine learning in these topics.

ContributorsSelzer, Cora (Author) / Smith, Zachary (Co-author) / Ingram-Waters, Mary (Thesis director) / Rendell, Dawn (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05

Filtering by