Search Content

Content Detection in Handwritten Documents

Description

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a…

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a mathematical expression; in this case, the label would be math. This differentiation facilitates the performance of specific recognition tasks depending on the content type. We hypothesize that the recognition accuracy of the subsequent tasks such as textual, math, and shape recognition will increase, further leading to a better analysis of the document.

Content detection on handwritten documents assigns a particular class to a homogeneous portion of the document. To complete this task, a set of handwritten solutions was digitally collected from middle school students located in two different geographical regions in 2017 and 2018. This research discusses the methods to collect, pre-process and detect content type in the collected handwritten documents. A total of 4049 documents were extracted in the form of image, and json format; and were labelled using an object labelling software with tags being text, math, diagram, cross out, table, graph, tick mark, arrow, and doodle. The labelled images were fed to the Tensorflow’s object detection API to learn a neural network model. We show our results from two neural networks models, Faster Region-based Convolutional Neural Network (Faster R-CNN) and Single Shot detection model (SSD).

ContributorsFaizaan, Shaik Mohammed (Author) / VanLehn, Kurt (Thesis advisor) / Cheema, Salman Shaukat (Thesis advisor) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Performance Evaluation of Object Proposal Generators for Salient Object Detection

Description

The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these…

The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these methods are capable of detecting the salient objects in the scene when constraining the number of proposals that can be generated due to constraints on timing or computations during execution. Salient objects are objects that tend to be more fixated by human subjects. The detection of salient objects is important in applications such as image collection browsing, image display on small devices, and perceptual compression.

This thesis proposes a novel evaluation framework that analyses the performance of popular existing object proposal generators in detecting the most salient objects. This work also shows that, by incorporating saliency constraints, the number of generated object proposals and thus the computational cost can be decreased significantly for a target true positive detection rate (TPR).

As part of the proposed framework, salient ground-truth masks are generated from the given original ground-truth masks for a given dataset. Given an object detection dataset, this work constructs salient object location ground-truth data, referred to here as salient ground-truth data for short, that only denotes the locations of salient objects. This is obtained by first computing a saliency map for the input image and then using it to assign a saliency score to each object in the image. Objects whose saliency scores are sufficiently high are referred to as salient objects. The detection rates are analyzed for existing object proposal generators with respect to the original ground-truth masks and the generated salient ground-truth masks.

As part of this work, a salient object detection database with salient ground-truth masks was constructed from the PASCAL VOC 2007 dataset. Not only does this dataset aid in analyzing the performance of existing object detectors for salient object detection, but it also helps in the development of new object detection methods and evaluating their performance in terms of successful detection of salient objects.

ContributorsKotamraju, Sai Prajwal (Author) / Karam, Lina J (Thesis advisor) / Yu, Hongbin (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2019

Visual Saliency Application in Object Detection for Search Space Reduction

Description

Vision is the ability to see and interpret any visual stimulus. It is one of the most fundamental and complex tasks the brain performs. Its complexity can be understood from the fact that close to 50% of the human brain is dedicated to vision. The brain receives an overwhelming amount…

Vision is the ability to see and interpret any visual stimulus. It is one of the most fundamental and complex tasks the brain performs. Its complexity can be understood from the fact that close to 50% of the human brain is dedicated to vision. The brain receives an overwhelming amount of sensory information from the retina – estimated at up to 100 Mbps per optic nerve. Parallel processing of the entire visual field in real time is likely impossible for even the most sophisticated brains due to the high computational complexity of the task [1]. Yet, organisms can efficiently process this information to parse complex scenes in real time. This amazing feat of nature relies on selective attention which allows the brain to filter sensory information to select only a small subset of it for further processing.

Today, Computer Vision has become ubiquitous in our society with several in image understanding, medicine, drones, self-driving cars and many more. With the advent of GPUs and the availability of huge datasets like ImageNet, Convolutional Neural Networks (CNNs) have come to play a very important role in solving computer vision tasks, e.g object detection. However, the size of the networks become

prohibitive when higher accuracies are needed, which in turn demands more hardware. This hinders the application of CNNs to mobile platforms and stops them from hitting the real-time mark. The computational efficiency of a computer vision task, like object detection, can be enhanced by adopting a selective attention mechanism into the algorithm. In this work, this idea is explored by using Visual Proto Object Saliency algorithm [1] to crop out the areas of an image without relevant objects before a computationally intensive network like the Faster R-CNN [2] processes it.

ContributorsGorthy, Sai Rama Srivatsava (Author) / Cao, Yu (Thesis advisor) / Seo, Jae-Sun (Committee member) / Vrudhula, Sarma (Committee member) / Arizona State University (Publisher)

Created2017

Cross Platform Training of Neural Networks to Enable Object Identification by Autonomous Vehicles

Description

Autonomous vehicle technology has been evolving for years since the Automated Highway System Project. However, this technology has been under increased scrutiny ever since an autonomous vehicle killed Elaine Herzberg, who was crossing the street in Tempe, Arizona in March 2018. Recent tests of autonomous vehicles on public roads…

Autonomous vehicle technology has been evolving for years since the Automated Highway System Project. However, this technology has been under increased scrutiny ever since an autonomous vehicle killed Elaine Herzberg, who was crossing the street in Tempe, Arizona in March 2018. Recent tests of autonomous vehicles on public roads have faced opposition from nearby residents. Before these vehicles are widely deployed, it is imperative that the general public trusts them. For this, the vehicles must be able to identify objects in their surroundings and demonstrate the ability to follow traffic rules while making decisions with human-like moral integrity when confronted with an ethical dilemma, such as an unavoidable crash that will injure either a pedestrian or the passenger.

Testing autonomous vehicles in real-world scenarios would pose a threat to people and property alike. A safe alternative is to simulate these scenarios and test to ensure that the resulting programs can work in real-world scenarios. Moreover, in order to detect a moral dilemma situation quickly, the vehicle should be able to identify objects in real-time while driving. Toward this end, this thesis investigates the use of cross-platform training for neural networks that perform visual identification of common objects in driving scenarios. Here, the object detection algorithm Faster R-CNN is used. The hypothesis is that it is possible to train a neural network model to detect objects from two different domains, simulated or physical, using transfer learning. As a proof of concept, an object detection model is trained on image datasets extracted from CARLA, a virtual driving environment, via transfer learning. After bringing the total loss factor to 0.4, the model is evaluated with an IoU metric. It is determined that the model has a precision of 100% and 75% for vehicles and traffic lights respectively. The recall is found to be 84.62% and 75% for the same. It is also shown that this model can detect the same classes of objects from other virtual environments and real-world images. Further modifications to the algorithm that may be required to improve performance are discussed as future work.

ContributorsSankaramangalam Ulhas, Sangeet (Author) / Berman, Spring (Thesis advisor) / Johnson, Kathryn (Committee member) / Yong, Sze Zheng (Committee member) / Arizona State University (Publisher)

Created2019

Computationally Efficient Object Detection Strategy from Water Surfaces with Specularity Removal

Description

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these…

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these trash is a hazardous and time-consuming task. Employing autonomous robots for these tasks is a better approach since it is more efficient and faster than humans. However, for a robot to clean the trash objects, a good detection algorithm is required. Real-time object detection on water surfaces is a challenging issue due to nature of the environment and the volatility of the water surface. In addition to this, running an object detection algorithm on an on-board processor of a robot limits the amount of CPU consumption that the algorithm can utilize. In this thesis, a computationally low cost object detection approach for robust detection of trash objects that was run on an on-board processor of a multirotor is presented. To account for specular reflections on the water surface, we use a polarization filter and integrate a specularity removal algorithm on our approach as well. The challenges faced during testing and the means taken to eliminate those challenges are also discussed. The algorithm was compared with two other object detectors using 4 different metrics. The testing was carried out using videos of 5 different objects collected at different illumination conditions over a lake using a multirotor. The results indicate that our algorithm is much suitable to be employed in real-time since it had the highest processing speed of 21 FPS, the lowest CPU consumption of 37.5\% and considerably high precision and recall values in detecting the object.

ContributorsSyed, Danish Faraaz (Author) / Zhang, Wenlong (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by

Content Detection in Handwritten Documents

Performance Evaluation of Object Proposal Generators for Salient Object Detection

Visual Saliency Application in Object Detection for Search Space Reduction

Cross Platform Training of Neural Networks to Enable Object Identification by Autonomous Vehicles

Computationally Efficient Object Detection Strategy from Water Surfaces with Specularity Removal