Matching Items (21)

153988-Thumbnail Image.png

Automatic text summarization using importance of sentences for email corpus

Description

With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the

With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from W3C are used in this system. Apart from common IR features like Term Frequency, Inverse Document Frequency, Term Rank, a variation of page rank based on graph model, which can cluster the words with respective to word ambiguity, is implemented. Term Rank also considers the possibility of co-occurrence of words with the corpus and evaluates the rank of the word accordingly. Sentences of email threads are ranked as per features and summaries are generated. System implemented the concept of pyramid evaluation in content selection. The system can be considered as a framework for Unsupervised Learning in text summarization.

Contributors

Agent

Created

Date Created
  • 2015

152168-Thumbnail Image.png

An intelligent co-reference resolver for Winograd schema sentences containing resolved semantic entities

Description

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities, reasoning abilities and background knowledge to get the answer right. It involves a coreference resolution task in which a machine is given a sentence containing a situation which involves two entities, one pronoun and some more information about the situation and the machine has to come up with the right resolution of a pronoun to one of the entities. The complexity of the task is increased with the fact that the Winograd sentences are not constrained by one domain or specific sentence structure and it also contains a lot of human proper names. This modification makes the task of association of entities, to one particular word in the sentence, to derive the answer, difficult. I have developed a pronoun resolver system for the confined domain Winograd sentences. I have developed a classifier or filter which takes input sentences and decides to accept or reject them based on a particular criteria. Once the sentence is accepted. I run parsers on it to obtain the detailed analysis. Furthermore I have developed four answering modules which use world knowledge and inferencing mechanisms to try and resolve the pronoun. The four techniques I use are : ConceptNet knowledgebase, Search engine pattern counts,Narrative event chains and sentiment analysis. I have developed a particular aggregation mechanism for the answers from these modules to arrive at a final answer. I have used caching technique for the association relations that I obtain for different modules, so as to boost the performance. I run my system on the standard ‘nyu dataset’ of Winograd sentences and questions. This dataset is then restricted, by my classifier, to 90 sentences. I evaluate my system on this 90 sentence dataset. When I compare my results against the state of the art system on the same dataset, I get nearly 4.5 % improvement in the restricted domain.

Contributors

Agent

Created

Date Created
  • 2013

151173-Thumbnail Image.png

A high level language for human robot interaction

Description

While developing autonomous intelligent robots has been the goal of many research programs, a more practical application involving intelligent robots is the formation of teams consisting of both humans and

While developing autonomous intelligent robots has been the goal of many research programs, a more practical application involving intelligent robots is the formation of teams consisting of both humans and robots. An example of such an application is search and rescue operations where robots commanded by humans are sent to environments too dangerous for humans. For such human-robot interaction, natural language is considered a good communication medium as it allows humans with less training about the robot's internal language to be able to command and interact with the robot. However, any natural language communication from the human needs to be translated to a formal language that the robot can understand. Similarly, before the robot can communicate (in natural language) with the human, it needs to formulate its communique in some formal language which then gets translated into natural language. In this paper, I develop a high level language for communication between humans and robots and demonstrate various aspects through a robotics simulation. These language constructs borrow some ideas from action execution languages and are grounded with respect to simulated human-robot interaction transcripts.

Contributors

Agent

Created

Date Created
  • 2012

149373-Thumbnail Image.png

An inverse lambda calculus algorithm for natural language processing

Description

Natural Language Processing is a subject that combines computer science and linguistics, aiming to provide computers with the ability to understand natural language and to develop a more intuitive human-computer

Natural Language Processing is a subject that combines computer science and linguistics, aiming to provide computers with the ability to understand natural language and to develop a more intuitive human-computer interaction. The research community has developed ways to translate natural language to mathematical formalisms. It has not yet been shown, however, how to automatically translate different kinds of knowledge in English to distinct formal languages. Most of the recent work presents the problem that the translation method aims to a specific formal language or is hard to generalize. In this research, I take a first step to overcome this difficulty and present two algorithms which take as input two lambda-calculus expressions G and H and compute a lambda-calculus expression F. The expression F returned by the first algorithm satisfies F@G=H and, in the case of the second algorithm, we obtain G@F=H. The lambda expressions represent the meanings of words and sentences. For each formal language that one desires to use with the algorithms, the language must be defined in terms of lambda calculus. Also, some additional concepts must be included. After doing this, given a sentence, its representation and knowing the representation of several words in the sentence, the algorithms can be used to obtain the representation of the other words in that sentence. In this work, I define two languages and show examples of their use with the algorithms. The algorithms are illustrated along with soundness and completeness proofs, the latter with respect to typed lambda-calculus formulas up to the second order. These algorithms are a core part of a natural language semantics system that translates sentences from English to formulas in different formal languages.

Contributors

Agent

Created

Date Created
  • 2010

154047-Thumbnail Image.png

Answering deep queries specified in natural language with respect to a frame based knowledge base and developing related natural language understanding components

Description

Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google

Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However, most of the research in Question Answering aims at factual questions rather than deep ones such as ``How'' and ``Why'' questions.

In this dissertation, I suggest a different approach in tackling this problem. We believe that the answers of deep questions need to be formally defined before found.

Because these answers must be defined based on something, it is better to be more structural in natural language text; I define Knowledge Description Graphs (KDGs), a graphical structure containing information about events, entities, and classes. We then propose formulations and algorithms to construct KDGs from a frame-based knowledge base, define the answers of various ``How'' and ``Why'' questions with respect to KDGs, and suggest how to obtain the answers from KDGs using Answer Set Programming. Moreover, I discuss how to derive missing information in constructing KDGs when the knowledge base is under-specified and how to answer many factual question types with respect to the knowledge base.

After having the answers of various questions with respect to a knowledge base, I extend our research to use natural language text in specifying deep questions and knowledge base, generate natural language text from those specification. Toward these goals, I developed NL2KR, a system which helps in translating natural language to formal language. I show NL2KR's use in translating ``How'' and ``Why'' questions, and generating simple natural language sentences from natural language KDG specification. Finally, I discuss applications of the components I developed in Natural Language Understanding.

Contributors

Agent

Created

Date Created
  • 2015

Solving Winograd Schema Challenge: using semantic parsing, automatic knowledge acquisition and logical reasoning

Description

Turing test has been a benchmark scale for measuring the human level intelligence in computers since it was proposed by Alan Turing in 1950. However, for last 60 years, the

Turing test has been a benchmark scale for measuring the human level intelligence in computers since it was proposed by Alan Turing in 1950. However, for last 60 years, the applications such as ELIZA, PARRY, Cleverbot and Eugene Goostman, that claimed to pass the test. These applications are either based on tricks to fool humans on a textual chat based test or there has been a disagreement between AI communities on them passing the test. This has led to the school of thought that it might not be the ideal test for predicting the human level intelligence in machines.

Consequently, the Winograd Schema Challenge has been suggested as an alternative to the Turing test. As opposed to deciding the intelligent behavior with the help of chat servers, like it was done in the Turing test, the Winograd Schema Challenge is a question answering test. It consists of sentence and question pairs such that the answer to the question depends on the resolution of a definite pronoun or adjective in the sentence. The answers are fairly intuitive for humans but they are difficult for machines because it requires some sort of background or commonsense knowledge about the sentence.

In this thesis, I propose a novel technique to solve the Winograd Schema Challenge. The technique has three basic modules at its disposal, namely, a Semantic Parser that parses the English text (both sentences and questions) into a formal representation, an Automatic Background Knowledge Extractor that extracts the Background Knowledge pertaining to the given Winograd sentence, and an Answer Set Programming Reasoning Engine that reasons on the given Winograd sentence and the corresponding Background Knowledge. The applicability of the technique is illustrated by solving a subset of Winograd Schema Challenge pertaining to a certain type of Background Knowledge. The technique is evaluated on the subset and a notable accuracy is achieved.

Contributors

Agent

Created

Date Created
  • 2014

154765-Thumbnail Image.png

Fixed verse generation using neural word embeddings

Description

For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in

For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and among linguistic creativity researchers in particular. This thesis presents a novel approach to fixed verse poetry generation using neural word embeddings. During the course of generation, a two layered poetry classifier is developed. The first layer uses a lexicon based method to classify poems into types based on form and structure, and the second layer uses a supervised classification method to classify poems into subtypes based on content with an accuracy of 92%. The system then uses a two-layer neural network to generate poetry based on word similarities and word movements in a 50-dimensional vector space.

The verses generated by the system are evaluated using rhyme, rhythm, syllable counts and stress patterns. These computational features of language are considered for generating haikus, limericks and iambic pentameter verses. The generated poems are evaluated using a Turing test on both experts and non-experts. The user study finds that only 38% computer generated poems were correctly identified by nonexperts while 65% of the computer generated poems were correctly identified by experts. Although the system does not pass the Turing test, the results from the Turing test suggest an improvement of over 17% when compared to previous methods which use Turing tests to evaluate poetry generators.

Contributors

Agent

Created

Date Created
  • 2016

154663-Thumbnail Image.png

Context-aware adaptive hybrid semantic relatedness in biomedical science

Description

Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language

Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1–6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources.

Contributors

Agent

Created

Date Created
  • 2016

154545-Thumbnail Image.png

Automatic tracking of linguistic changes for monitoring cognitive-linguistic health

Description

Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these

Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these individuals that precede disease onset. However, these studies are conducted on controlled speech samples from a specific task. This thesis explores the possibility of using natural language processing in order to detect declining linguistic complexity from more natural discourse. We use existing data from public figures suspected (or at risk) of suffering from cognitive-linguistic decline, downloaded from the Internet, to detect changes in linguistic complexity. In particular, we focus on two case studies. The first case study analyzes President Ronald Reagan’s transcribed spontaneous speech samples during his presidency. President Reagan was diagnosed with Alzheimer’s disease in 1994, however my results showed declining linguistic complexity during the span of the 8 years he was in office. President George Herbert Walker Bush, who has no known diagnosis of Alzheimer’s disease, shows no decline in the same measures. In the second case study, we analyze transcribed spontaneous speech samples from the news conferences of 10 current NFL players and 18 non-player personnel since 2007. The non-player personnel have never played professional football. Longitudinal analysis of linguistic complexity showed contrasting patterns in the two groups. The majority (6 of 10) of current players showed decline in at least one measure of linguistic complexity over time. In contrast, the majority (11 out of 18) of non-player personnel showed an increase in at least one linguistic complexity measure.

Contributors

Agent

Created

Date Created
  • 2016

153595-Thumbnail Image.png

Story detection using generalized concepts

Description

A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets,

A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets, and victims) as different objects, potentially missing common narrative patterns. Generalized concepts are used to overcome this problem. Generalization may result into word sense disambiguation failing to find similarity. This is addressed by taking into account contextual synonyms. Concept discovery based on contextual synonyms reveal information about the semantic roles of the words leading to concepts. Merger engine generalize the concepts so that it can be used as features in learning algorithms.

Contributors

Agent

Created

Date Created
  • 2015