Search Content

Content Detection in Handwritten Documents

Description

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a…

Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a mathematical expression; in this case, the label would be math. This differentiation facilitates the performance of specific recognition tasks depending on the content type. We hypothesize that the recognition accuracy of the subsequent tasks such as textual, math, and shape recognition will increase, further leading to a better analysis of the document.

Content detection on handwritten documents assigns a particular class to a homogeneous portion of the document. To complete this task, a set of handwritten solutions was digitally collected from middle school students located in two different geographical regions in 2017 and 2018. This research discusses the methods to collect, pre-process and detect content type in the collected handwritten documents. A total of 4049 documents were extracted in the form of image, and json format; and were labelled using an object labelling software with tags being text, math, diagram, cross out, table, graph, tick mark, arrow, and doodle. The labelled images were fed to the Tensorflow’s object detection API to learn a neural network model. We show our results from two neural networks models, Faster Region-based Convolutional Neural Network (Faster R-CNN) and Single Shot detection model (SSD).

ContributorsFaizaan, Shaik Mohammed (Author) / VanLehn, Kurt (Thesis advisor) / Cheema, Salman Shaukat (Thesis advisor) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Pain-Inspired Intrinsic Reward For Deep Reinforcement Learning

Description

Reinforcement learning (RL) is a powerful methodology for teaching autonomous agents complex behaviors and skills. A critical component in most RL algorithms is the reward function -- a mathematical function that provides numerical estimates for desirable and undesirable states. Typically, the reward function must be hand-designed by a human expert…

Reinforcement learning (RL) is a powerful methodology for teaching autonomous agents complex behaviors and skills. A critical component in most RL algorithms is the reward function -- a mathematical function that provides numerical estimates for desirable and undesirable states. Typically, the reward function must be hand-designed by a human expert and, as a result, the scope of a robot's autonomy and ability to safely explore and learn in new and unforeseen environments is constrained by the specifics of the designed reward function. In this thesis, I design and implement a stateful collision anticipation model with powerful predictive capability based upon my research of sequential data modeling and modern recurrent neural networks. I also develop deep reinforcement learning methods whose rewards are generated by self-supervised training and intrinsic signals. The main objective is to work towards the development of resilient robots that can learn to anticipate and avoid damaging interactions by combining visual and proprioceptive cues from internal sensors. The introduced solutions are inspired by pain pathways in humans and animals, because such pathways are known to guide decision-making processes and promote self-preservation. A new "robot dodge ball' benchmark is introduced in order to test the validity of the developed algorithms in dynamic environments.

ContributorsRichardson, Trevor W (Author) / Ben Amor, Heni (Thesis advisor) / Yang, Yezhou (Committee member) / Srivastava, Siddharth (Committee member) / Arizona State University (Publisher)

Created2018

MirrorGen Wearable Gesture Recognition using Synthetic Videos

Description

In recent years, deep learning systems have outperformed traditional machine learning systems in most domains. There has been a lot of research recently in the field of hand gesture recognition using wearable sensors due to the numerous advantages these systems have over vision-based ones. However, due to the lack of…

In recent years, deep learning systems have outperformed traditional machine learning systems in most domains. There has been a lot of research recently in the field of hand gesture recognition using wearable sensors due to the numerous advantages these systems have over vision-based ones. However, due to the lack of extensive datasets and the nature of the Inertial Measurement Unit (IMU) data, there are difficulties in applying deep learning techniques to them. Although many machine learning models have good accuracy, most of them assume that training data is available for every user while other works that do not require user data have lower accuracies. MirrorGen is a technique which uses wearable sensor data and generates synthetic videos using hand movements and it mitigates the traditional challenges of vision based recognition such as occlusion, lighting restrictions, lack of viewpoint variations, and environmental noise. In addition, MirrorGen allows for user-independent recognition involving minimal human effort during data collection. It also helps leverage the advances in vision-based recognition by using various techniques like optical flow extraction, 3D convolution. Projecting the orientation (IMU) information to a video helps in gaining position information of the hands. To validate these claims, we perform entropy analysis on various configurations such as raw data, stick model, hand model and real video. Human hand model is found to have an optimal entropy that helps in achieving user independent recognition. It also serves as a pervasive option as opposed to a video-based recognition. The average user independent recognition accuracy of 99.03% was achieved for a sign language dataset with 59 different users, 20 different signs with 20 repetitions each for a total of 23k training instances. Moreover, synthetic videos can be used to augment real videos to improve recognition accuracy.

ContributorsRamesh, Arun Srivatsa (Author) / Gupta, Sandeep K S (Thesis advisor) / Banerjee, Ayan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation

Description

Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task…

Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-à-vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated.

Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated.

Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset.

ContributorsSaha, Rudra (Author) / Yang, Yezhou (Thesis advisor) / Singh, Maneesh Kumar (Committee member) / Baral, Chitta (Committee member) / Arizona State University (Publisher)

Created2018

Generating Light Estimation for Mixed-reality Devices through Collaborative Visual Sensing

Description

Mixed reality mobile platforms co-locate virtual objects with physical spaces, creating immersive user experiences. To create visual harmony between virtual and physical spaces, the virtual scene must be accurately illuminated with realistic physical lighting. To this end, a system was designed that Generates Light Estimation Across Mixed-reality (GLEAM) devices to…

Mixed reality mobile platforms co-locate virtual objects with physical spaces, creating immersive user experiences. To create visual harmony between virtual and physical spaces, the virtual scene must be accurately illuminated with realistic physical lighting. To this end, a system was designed that Generates Light Estimation Across Mixed-reality (GLEAM) devices to continually sense realistic lighting of a physical scene in all directions. GLEAM optionally operate across multiple mobile mixed-reality devices to leverage collaborative multi-viewpoint sensing for improved estimation. The system implements policies that prioritize resolution, coverage, or update interval of the illumination estimation depending on the situational needs of the virtual scene and physical environment.

To evaluate the runtime performance and perceptual efficacy of the system, GLEAM was implemented on the Unity 3D Game Engine. The implementation was deployed on Android and iOS devices. On these implementations, GLEAM can prioritize dynamic estimation with update intervals as low as 15 ms or prioritize high spatial quality with update intervals of 200 ms. User studies across 99 participants and 26 scene comparisons reported a preference towards GLEAM over other lighting techniques in 66.67% of the presented augmented scenes and indifference in 12.57% of the scenes. A controlled lighting user study on 18 participants revealed a general preference for policies that strike a balance between resolution and update rate.

ContributorsPrakash, Siddhant (Author) / LiKamWa, Robert (Thesis advisor) / Yang, Yezhou (Thesis advisor) / Hansford, Dianne (Committee member) / Arizona State University (Publisher)

Created2018

Improving Intent Classication By Automatic Data Augmentation Using Word Sense Disambiguation

Description

Virtual digital assistants are automated software systems which assist humans by understanding natural languages such as English, either in voice or textual form. In recent times, a lot of digital applications have shifted towards providing a user experience using natural language interface. The change is brought up by the degree…

Virtual digital assistants are automated software systems which assist humans by understanding natural languages such as English, either in voice or textual form. In recent times, a lot of digital applications have shifted towards providing a user experience using natural language interface. The change is brought up by the degree of ease with which the virtual digital assistants such as Google Assistant and Amazon Alexa can be integrated into your application. These assistants make use of a Natural Language Understanding (NLU) system which acts as an interface to translate unstructured natural language data into a structured form. Such an NLU system uses an intent finding algorithm which gives a high-level idea or meaning of a user query, termed as intent classification. The intent classification step identifies the action(s) that a user wants the assistant to perform. The intent classification step is followed by an entity recognition step in which the entities in the utterance are identified on which the intended action is performed. This step can be viewed as a sequence labeling task which maps an input word sequence into a corresponding sequence of slot labels. This step is also termed as slot filling.

In this thesis, we improve the intent classification and slot filling in the virtual voice agents by automatic data augmentation. Spoken Language Understanding systems face the issue of data sparsity. The reason behind this is that it is hard for a human-created training sample to represent all the patterns in the language. Due to the lack of relevant data, deep learning methods are unable to generalize the Spoken Language Understanding model. This thesis expounds a way to overcome the issue of data sparsity in deep learning approaches on Spoken Language Understanding tasks. Here we have described the limitations in the current intent classifiers and how the proposed algorithm uses existing knowledge bases to overcome those limitations. The method helps in creating a more robust intent classifier and slot filling system.

ContributorsGarg, Prashant (Author) / Baral, Chitta (Thesis advisor) / Kumar, Hemanth (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2018

Standard Feeder and Load Model Synthesis Using Voltage and Current Measurements

Description

Until late 1970’s the primary focus in power system modeling has been largely directed towards power system generation and transmission. Over the years, the importance of load modeling grew and having an accurate representation of load played an important role in the planning and operation studies. With an emphasis on…

Until late 1970’s the primary focus in power system modeling has been largely directed towards power system generation and transmission. Over the years, the importance of load modeling grew and having an accurate representation of load played an important role in the planning and operation studies. With an emphasis on tackling the topic of load modeling, this thesis presents the following intermediary steps in developing accurate load models:

1. Synthesis of a three-phase standard feeder and load model using the measured voltages and currents, for events such as faults and feeder pickup cases, obtained at the head of the feeder.

2. Investigated the impact of the synthesized standard feeder and load model on the sub-transmission system for a feeder pick-up case.

In the first phase of this project, a standard feeder and load model had been synthesized by capturing the current transients when three-phase voltage measurements (obtained from a local electric utility) are played-in as input to the synthesized model. The comparison between the measured currents and the simulated currents obtained using an electromagnetic transient analysis software (PSCAD) are made at the head of the designed feeder. The synthesized load model has a load composition which includes impedance loads, single-phase induction motor loads and three-phase induction motor loads. The parameters of the motor models are adjusted to obtain a good correspondence between measured three-phase currents and simulated current responses at the head of the feeder when subjected to events under which measurements were obtained on the feeder. These events include faults which occurred upstream of the feeder at a higher voltage level and a feeder pickup event that occurred downstream from the head of the feeder. Two different load compositions have been obtained for this feeder and load model depending on the types of load present in the surrounding area (residential or industrial/commercial).

The second phase of this project examines the impact of the feeder pick-up event on the 69 kV sub-transmission system using the obtained standard feeder and load model. Using the 69 kV network data obtained from a local utility, a sub-transmission network has been built in PSCAD. The main difference between the first and second phase of this project is that no measurements are played-in to the model in the latter case. Instead, the feeder pick-up event at a particular substation is simulated using the reduced equivalent of the 69 kV sub-transmission circuit together with the synthesized three-phase models of the feeder and the loads obtained in the first phase of the project. Using this analysis, it is observed that a good correspondence between the PSCAD simulated values of both three-phase voltages and currents with their corresponding measured responses at the substation is achieved.

ContributorsNekkalapu, Sameer (Author) / Vittal, Vijay (Thesis advisor) / Undrill, John M (Committee member) / Ayyanar, Raja (Committee member) / Arizona State University (Publisher)

Created2018

OLGGA: the optimaL ground grid application

Description

The grounding system in a substation is used to protect personnel and equipment. When there is fault current injected into the ground, a well-designed grounding system should disperse the fault current into the ground in order to limit the touch potential and the step potential to an acceptable level defined…

The grounding system in a substation is used to protect personnel and equipment. When there is fault current injected into the ground, a well-designed grounding system should disperse the fault current into the ground in order to limit the touch potential and the step potential to an acceptable level defined by the IEEE Std 80. On the other hand, from the point of view of economy, it is desirable to design a ground grid that minimizes the cost of labor and material. To design such an optimal ground grid that meets the safety metrics and has the minimum cost, an optimal ground grid application was developed in MATLAB, the OptimaL Ground Grid Application (OLGGA).

In the process of ground grid optimization, the touch potential and the step potential are introduced as nonlinear constraints in a two layer soil model whose parameters are set by the user. To obtain an accurate expression for these nonlinear constraints, the ground grid is discretized by using a ground-conductor (and ground-rod) segmentation method that breaks each conductor into reasonable-size segments. The leakage current on each segment and the ground potential rise (GPR) are calculated by solving a matrix equation involving the mutual resistance matrix. After the leakage current on each segment is obtained, the touch potential and the step potential can be calculated using the superposition principle.

A genetic algorithm is used in the optimization of the ground grid and a pattern search algorithm is used to accelerate the convergence. To verify the accuracy of the application, the touch potential and the step potential calculated by the MATLAB application are compared with those calculated by the commercialized grounding system analysis software, WinIGS.

The user's manual of the optimal ground grid application is also presented in this work.

ContributorsLi, Songyan (Author) / Tylavsky, Daniel J. (Thesis advisor) / Ayyanar, Raja (Committee member) / Vittal, Vijay (Committee member) / Arizona State University (Publisher)

Created2016

A probabilistic cost to benefit assessment of a next generation electric power distribution system

Description

This thesis provides a cost to benefit assessment of the proposed next generation distribution system, the Future Renewable Electric Energy Distribution Management (FREEDM) system. In this thesis, a probabilistic study is conducted to determine the payback period for an investment made in the FREEDM distribution system. The stochastic study will…

This thesis provides a cost to benefit assessment of the proposed next generation distribution system, the Future Renewable Electric Energy Distribution Management (FREEDM) system. In this thesis, a probabilistic study is conducted to determine the payback period for an investment made in the FREEDM distribution system. The stochastic study will help in performing a detailed analysis in estimating the probability density function and statistics associated with the payback period.

This thesis also identifies several parameters associated with the FREEDM system, which are used in the cost benefit study to evaluate the investment and several direct and indirect benefits. Different topologies are selected to represent the FREEDM test bed. Considering the cost of high speed fault isolation devices, the topology design is selected based on the minimum number of fault isolation devices constrained by enhanced reliability. A case study is also performed to assess the economic impact of energy storage devices in the solid state transformers so that the fault isolation devices may be replaced by conventional circuit breakers.

A reliability study is conducted on the FREEDM distribution system to examine the customer centric reliability index, System Average Interruption Frequency Index (SAIFI). It is observed that the SAIFI was close to 0.125 for the FREEDM distribution system. In addition, a comparison study is performed based on the SAIFI for a representative U.S. distribution system and the FREEDM distribution system.

The payback period is also determined by adopting a theoretical approach and the results are compared with the Monte Carlo simulation outcomes to understand the variation in the payback period. It is observed that the payback period is close to 60 years but if an annual rebate is considered, the payback period reduces to 20 years. This shows that the FREEDM system has a significant potential which cannot be overlooked. Several direct and indirect benefits arising from the FREEDM system have also been discussed in this thesis.

ContributorsDinakar, Abhishek (Author) / Heydt, Gerald T (Thesis advisor) / Vittal, Vijay (Committee member) / Ayyanar, Raja (Committee member) / Arizona State University (Publisher)

Created2016

Robots that anticipate pain: anticipating physical perturbations from visual cues through deep predictive models

Description

To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge…

To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge about sources of perturbation to be encoded before deployment, our method is based on experiential learning. Robots learn to associate visual cues with subsequent physical perturbations and contacts. In turn, these extracted visual cues are then used to predict potential future perturbations acting on the robot. To this end, we introduce a novel deep network architecture which combines multiple sub- networks for dealing with robot dynamics and perceptual input from the environment. We present a self-supervised approach for training the system that does not require any labeling of training data. Extensive experiments in a human-robot interaction task show that a robot can learn to predict physical contact by a human interaction partner without any prior information or labeling. Furthermore, the network is able to successfully predict physical contact from either depth stream input or traditional video input or using both modalities as input.

ContributorsSur, Indranil (Author) / Amor, Heni B (Thesis advisor) / Fainekos, Georgios (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2017

Filtering by