Search Content

Displaying 1 - 3 of 3

Filtering by

Creators: Yang, Yezhou

Towards Supporting Visual Question and Answering Applications

Description

Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and to generate a textual answer to the input question. There are two key research problems in VQA: image understanding and the question answering. My research mainly focuses on developing solutions to support solving these two problems.

In image understanding, one important research area is semantic segmentation, which takes images as input and output the label of each pixel. As much manual work is needed to label a useful training set, typical training sets for such supervised approaches are always small. There are also approaches with relaxed labeling requirement, called weakly supervised semantic segmentation, where only image-level labels are needed. With the development of social media, there are more and more user-uploaded images available

on-line. Such user-generated content often comes with labels like tags and may be coarsely labelled by various tools. To use these information for computer vision tasks, I propose a new graphic model by considering the neighborhood information and their interactions to obtain the pixel-level labels of the images with only incomplete image-level labels. The method was evaluated on both synthetic and real images.

In question answering, my research centers on best answer prediction, which addressed two main research topics: feature design and model construction. In the feature design part, most existing work discussed how to design effective features for answer quality / best answer prediction. However, little work mentioned how to design features by considering the relationship between answers of one given question. To fill this research gap, I designed new features to help improve the prediction performance. In the modeling part, to employ the structure of the feature space, I proposed an innovative learning-to-rank model by considering the hierarchical lasso. Experiments with comparison with the state-of-the-art in the best answer prediction literature have confirmed

that the proposed methods are effective and suitable for solving the research task.

ContributorsTian, Qiongjie (Author) / Li, Baoxin (Thesis advisor) / Tong, Hanghang (Committee member) / Davulcu, Hasan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2017

Comparison of Team Robot Localization by Input Difference for Deep Neural Network Model

Description

In a multi-robot system, locating a team robot is an important issue. If robots

can refer to the location of team robots based on information through passive action

recognition without explicit communication, various advantages (e.g. improving security

for military purposes) can be obtained. Specifically, when team robots follow

the same motion rule based on information about adjacent robots, associations can

be found between robot actions. If the association can be analyzed, this can be a clue

to the remote robot. Using these clues, it is possible to infer remote robots which are

outside of the sensor range.

In this paper, a multi-robot system is constructed using a combination of Thymio

II robotic platforms and Raspberry pi controllers. Robots moving in chain-formation

take action using motion rules based on information obtained through passive action

recognition. To find associations between robots, a regression model is created using

Deep Neural Network (DNN) and Long Short-Term Memory (LSTM), one of state-of-art technologies.

The input data of the regression model is divided into historical data, which

are consecutive positions of the robot, and observed data, which is information about the

observed robot. Historical data is sequence data that is analyzed through the LSTM

layer. The accuracy of the regression model designed using DNN can vary depending

on the quantity and quality of the input. In this thesis, three different input situations

are assumed for comparison. First, the amount of observed data is different, second, the

type of observed data is different, and third, the history length is different. Comparative

models are constructed for each case, and prediction accuracy is compared to analyze

the effect of input data on the regression model. This exploration validates that these

methods from deep learning can reduce the communication demands in coordinated

motion of multi-robot systems

ContributorsKang, Sehyeok (Author) / Pavlic, Theodore P (Thesis advisor) / Richa, Andréa W. (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

Automatic Programming Code Explanation Generation with Structured Translation Models

Description

Learning programming involves a variety of complex cognitive activities, from abstract knowledge construction to structural operations, which include program design,modifying, debugging, and documenting tasks. In this work, the objective was to explore and investigate the barriers and obstacles that programming novice learners encountered and how the learners overcome them. Several lab and classroom studies were designed and conducted, the results showed that novice students had different behavior patterns compared to experienced learners, which indicates obstacles encountered. The studies also proved that proper assistance could help novices find helpful materials to read. However, novices still suffered from the lack of background knowledge and the limited cognitive load while learning, which resulted in challenges in understanding programming related materials, especially code examples. Therefore, I further proposed to use the natural language generator (NLG) to generate code explanations for educational purposes. The natural language generator is designed based on Long Short Term Memory (LSTM), a deep-learning translation model. To establish the model, a data set was collected from Amazon Mechanical Turks (AMT) recording explanations from human experts for programming code lines.

To evaluate the model, a pilot study was conducted and proved that the readability of the machine generated (MG) explanation was compatible with human explanations, while its accuracy is still not ideal, especially for complicated code lines. Furthermore, a code-example based learning platform was developed to utilize the explanation generating model in programming teaching. To examine the effect of code example explanations on different learners, two lab-class experiments were conducted separately ii in a programming novices’ class and an advanced students’ class. The experiment result indicated that when learning programming concepts, the MG code explanations significantly improved the learning Predictability for novices compared to control group, and the explanations also extended the novices’ learning time by generating more material to read, which potentially lead to a better learning gain. Besides, a completed correlation model was constructed according to the experiment result to illustrate the connections between different factors and the learning effect.

ContributorsLu, Yihan (Author) / Hsiao, I-Han (Thesis advisor) / VanLehn, Kurt (Committee member) / Tong, Hanghang (Committee member) / Yang, Yezhou (Committee member) / Price, Thomas (Committee member) / Arizona State University (Publisher)

Created2020

Theses and Dissertations

Filtering by

Towards Supporting Visual Question and Answering Applications

Comparison of Team Robot Localization by Input Difference for Deep Neural Network Model

Automatic Programming Code Explanation Generation with Structured Translation Models