ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- Creators: Yang, Yezhou
Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated.
Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset.
This thesis realizes two implementations of LPMLN based on the reductions from LPMLN to ASP and LPMLN to MLN. This thesis first presents an implementation of LPMLN called LPMLN2ASP that uses standard ASP solvers for computing MAP inference using weak constraints, and marginal and conditional probabilities using stable models enumeration. Next, in this thesis, another implementation of LPMLN called LPMLN2MLN is presented that uses MLN solvers which apply completion to compute the tight fragment of LPMLN programs for MAP inference, marginal and conditional probabilities. The computation using ASP solvers yields exact inference as opposed to approximate inference using MLN solvers. Using these implementations, the usefulness of LPMLN for computing other formalisms is demonstrated by reducing them to LPMLN. The thesis also shows how the implementations are better than the native solvers of some of these formalisms on certain domains. The implementations make use of the current state of the art solving technologies in ASP and MLN, and therefore they benefit from any theoretical and practical advances in these technologies, thereby also benefiting the computation of other formalisms that can be reduced to LPMLN. Furthermore, the implementation also allows for certain SRL formalisms to be computed by ASP solvers, and certain KR formalisms to be computed by MLN solvers.
This work proposes to treat automatic image meme generation as a translation process, and further present an end to end neural and probabilistic approach to generate an image-based meme for any given sentence using an encoder-decoder architecture. For a given input sentence, a meme is generated by combining a meme template image and a text caption where the meme template image is selected from a set of popular candidates using a selection module and the meme caption is generated by an encoder-decoder model. An encoder is used to map the selected meme template and the input sentence into a meme embedding space and then a decoder is used to decode the meme caption from the meme embedding space. The generated natural language caption is conditioned on the input sentence and the selected meme template.
The model learns the dependencies between the meme captions and the meme template images and generates new memes using the learned dependencies. The quality of the generated captions and the generated memes is evaluated through both automated metrics and human evaluation. An experiment is designed to score how well the generated memes can represent popular tweets from Twitter conversations. Experiments on Twitter data show the efficacy of the model in generating memes capable of representing a sentence in online social interaction.