Search Content

Ensemble Learning on Deep Neural Networks for Image Caption Generation

Description

Capturing the information in an image into a natural language sentence is

considered a difficult problem to be solved by computers. Image captioning involves not just detecting objects from images but understanding the interactions between the objects to be translated into relevant captions. So, expertise in the fields of computer vision…

Capturing the information in an image into a natural language sentence is

considered a difficult problem to be solved by computers. Image captioning involves not just detecting objects from images but understanding the interactions between the objects to be translated into relevant captions. So, expertise in the fields of computer vision paired with natural language processing are supposed to be crucial for this purpose. The sequence to sequence modelling strategy of deep neural networks is the traditional approach to generate a sequential list of words which are combined to represent the image. But these models suffer from the problem of high variance by not being able to generalize well on the training data.

The main focus of this thesis is to reduce the variance factor which will help in generating better captions. To achieve this, Ensemble Learning techniques have been explored, which have the reputation of solving the high variance problem that occurs in machine learning algorithms. Three different ensemble techniques namely, k-fold ensemble, bootstrap aggregation ensemble and boosting ensemble have been evaluated in this thesis. For each of these techniques, three output combination approaches have been analyzed. Extensive experiments have been conducted on the Flickr8k dataset which has a collection of 8000 images and 5 different captions for every image. The bleu score performance metric, which is considered to be the standard for evaluating natural language processing (NLP) problems, is used to evaluate the predictions. Based on this metric, the analysis shows that ensemble learning performs significantly better and generates more meaningful captions compared to any of the individual models used.

ContributorsKatpally, Harshitha (Author) / Bansal, Ajay (Thesis advisor) / Acuna, Ruben (Committee member) / Gonzalez-Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2019

Sonata in G major for violin and piano, op. 78. III. Allegro molto moderato

ContributorsBrahms, Johannes, 1833-1897 (Composer)

Lambda Starship: A Video Game for Teaching Functional Programming with Lisp

Description

The functional programming paradigm is able to provide clean and concise solutions to many common programming problems, as well as promote safer, more testable code by encouraging an isolation of state-modifying behavior. Functional programming is finding its way into traditionally object-oriented and imperative languages, most notably with the introduction of…

The functional programming paradigm is able to provide clean and concise solutions to many common programming problems, as well as promote safer, more testable code by encouraging an isolation of state-modifying behavior. Functional programming is finding its way into traditionally object-oriented and imperative languages, most notably with the introduction of Java 8 and in LINQ for C#. However, no functional programming language has achieved widespread adoption, meaning that students without a formal computer science background who learn technology on-demand for personal projects or for business may not come across functional programming in a significant way. Programmers need a reason to spend time learning these concepts to not miss out on the subtle but profound benefits they provide. I propose the use of a video game as an environment in which learning functional programming is the player's goal. In this carefully constructed video game, learning functional programming is the key to progression. Players will be motivated to learn and will be given an immediate chance to test and demonstrate their understanding. The game, named Lambda Starship (stylized as (lambda () starship)), is a 3D first-person video game. It takes place in a spaceship that, due to extreme magnetic interference, has lost all on-board software while leaving the hardware completely intact. The player is tasked to write software using functional programming paradigms to replace the old software and bring the spaceship back to a working state. Throughout the process, the player is guided by an in-game manual and other descriptive resources. The game is implemented in Unity and scripted using C#. The game's educational and entertainment value was evaluated with a study case. 24 undergraduate students at Arizona State University (ASU) played the game and were surveyed detailing their experience. During play, user statistics were recorded automatically, providing a data-driven way to analyze where players struggled with the concepts introduced in the game. Reception was neutral or positive in both the entertainment and educational sides of the game. A few players expressed concerns about the manual in its form factor and engagement value.

ContributorsCompton, Tyler Alexander (Author) / Gonzalez-Sanchez, Javier (Thesis director) / Bansal, Srividya (Committee member) / Software Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

The Future of Brain-Computer Interaction: A Potential Brain-Aiding Device of the Future

Description

Brains and computers have been interacting since the invention of the computer. These two entities have worked together to accomplish a monumental set of goals, from landing man on the moon to helping to understand how the universe works on the most microscopic levels, and everything in between. As the…

Brains and computers have been interacting since the invention of the computer. These two entities have worked together to accomplish a monumental set of goals, from landing man on the moon to helping to understand how the universe works on the most microscopic levels, and everything in between. As the years have gone on, the extent and depth of interaction between brains and computers have consistently widened, to the point where computers help brains with their thinking in virtually infinite everyday situations around the world. The first purpose of this research project was to conduct a brief review for the purposes of gaining a sound understanding of how both brains and computers operate at fundamental levels, and what it is about these two entities that allow them to work evermore seamlessly as the years go on. Next, a history of interaction between brains and computers was developed, which expanded upon the first task and helped to contribute to visions of future brain-computer interaction (BCI). The subsequent and primary task of this research project was to develop a theoretical framework for a potential brain-aiding device of the future. This was done by conducting an extensive literature review regarding the most advanced BCI technology in modern times and expanding upon the findings to argue feasibility of the future device and its components. Next, social predictions regarding the acceptance and use of the new technology were made by designing and executing a survey based on the Unified Theory of the Acceptance and Use of Technology (UTAUT). Finally, general economic predictions were inferred by examining several relationships between money and computers over time.

ContributorsThum, Giuseppe Edwardo (Author) / Gaffar, Ashraf (Thesis director) / Gonzalez-Sanchez, Javier (Committee member) / College of Integrative Sciences and Arts (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Violin sonata no. 2, op. 100. Allegretto grazioso

ContributorsBrahms, Johannes, 1833-1897 (Composer)

Modeling and Design Analysis of Facial Expressions of Humanoid Social Robots Using Deep Learning Techniques

Description

A lot of research can be seen in the field of social robotics that majorly concentrate on various aspects of social robots including design of mechanical parts and their move- ment, cognitive speech and face recognition capabilities. Several robots have been developed with the intention of being social, like humans,…

A lot of research can be seen in the field of social robotics that majorly concentrate on various aspects of social robots including design of mechanical parts and their move- ment, cognitive speech and face recognition capabilities. Several robots have been developed with the intention of being social, like humans, without much emphasis on how human-like they actually look, in terms of expressions and behavior. Fur- thermore, a substantial disparity can be seen in the success of results of any research involving ”humanizing” the robots’ behavior, or making it behave more human-like as opposed to research into biped movement, movement of individual body parts like arms, fingers, eyeballs, or human-like appearance itself. The research in this paper in- volves understanding why the research on facial expressions of social humanoid robots fails where it is not accepted completely in the current society owing to the uncanny valley theory. This paper identifies the problem with the current facial expression research as information retrieval problem. This paper identifies the current research method in the design of facial expressions of social robots, followed by using deep learning as similarity evaluation technique to measure the humanness of the facial ex- pressions developed from the current technique and further suggests a novel solution to the facial expression design of humanoids using deep learning.

ContributorsMurthy, Shweta (Author) / Gaffar, Ashraf (Thesis advisor) / Ghazarian, Arbi (Committee member) / Gonzalez-Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2017

Filtering by