Matching Items (51)

Filtering by

Clear all filters

Enhancing Student Learning Through Adaptive Sentence Generation

Description

Education of any skill based subject, such as mathematics or language, involves a significant amount of repetition and pratice. According to the National Survey of Student Engagements, students spend on average 17 hours per week reviewing and practicing material previously

Education of any skill based subject, such as mathematics or language, involves a significant amount of repetition and pratice. According to the National Survey of Student Engagements, students spend on average 17 hours per week reviewing and practicing material previously learned in a classroom, with higher performing students showing a tendency to spend more time practicing. As such, learning software has emerged in the past several decades focusing on providing a wide range of examples, practice problems, and situations for users to exercise their skills. Notably, math students have benefited from software that procedurally generates a virtually infinite number of practice problems and their corresponding solutions. This allows for instantaneous feedback and automatic generation of tests and quizzes. Of course, this is only possible because software is capable of generating and verifying a virtually endless supply of sample problems across a wide range of topics within mathematics. While English learning software has progressed in a similar manner, it faces a series of hurdles distinctly different from those of mathematics. In particular, there is a wide range of exception cases present in English grammar. Some words have unique spellings for their plural forms, some words have identical spelling for plural forms, and some words are conjugated differently for only one particular tense or person-of-speech. These issues combined make the problem of generating grammatically correct sentences complicated. To compound to this problem, the grammar rules in English are vast, and often depend on the context in which they are used. Verb-tense agreement (e.g. "I eat" vs "he eats"), and conjugation of irregular verbs (e.g. swim -> swam) are common examples. This thesis presents an algorithm designed to randomly generate a virtually infinite number of practice problems for students of English as a second language. This approach differs from other generation approaches by generating based on a context set by educators, so that problems can be generated in the context of what students are currently learning. The algorithm is validated through a study in which over 35 000 sentences generated by the algorithm are verified by multiple grammar checking algorithms, and a subset of the sentences are validated against 3 education standards by a subject matter expert in the field. The study found that this approach has a significantly reduced grammar error ratio compared to other generation algorithms, and shows potential where context specification is concerned.

Contributors

Agent

Created

Date Created
2016-05

136202-Thumbnail Image.png

Learning the Initial Lexicon in Translating Natural Language to Formal Language

Description

The objective of this research is to determine an approach for automating the learning of the initial lexicon used in translating natural language sentences to their formal knowledge representations based on lambda-calculus expressions. Using a universal knowledge representation and its

The objective of this research is to determine an approach for automating the learning of the initial lexicon used in translating natural language sentences to their formal knowledge representations based on lambda-calculus expressions. Using a universal knowledge representation and its associated parser, this research attempts to use word alignment techniques to align natural language sentences to the linearized parses of their associated knowledge representations in order to learn the meanings of individual words. The work includes proposing and analyzing an approach that can be used to learn some of the initial lexicon.

Contributors

Agent

Created

Date Created
2015-05

154047-Thumbnail Image.png

Answering deep queries specified in natural language with respect to a frame based knowledge base and developing related natural language understanding components

Description

Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However,

Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However, most of the research in Question Answering aims at factual questions rather than deep ones such as ``How'' and ``Why'' questions.

In this dissertation, I suggest a different approach in tackling this problem. We believe that the answers of deep questions need to be formally defined before found.

Because these answers must be defined based on something, it is better to be more structural in natural language text; I define Knowledge Description Graphs (KDGs), a graphical structure containing information about events, entities, and classes. We then propose formulations and algorithms to construct KDGs from a frame-based knowledge base, define the answers of various ``How'' and ``Why'' questions with respect to KDGs, and suggest how to obtain the answers from KDGs using Answer Set Programming. Moreover, I discuss how to derive missing information in constructing KDGs when the knowledge base is under-specified and how to answer many factual question types with respect to the knowledge base.

After having the answers of various questions with respect to a knowledge base, I extend our research to use natural language text in specifying deep questions and knowledge base, generate natural language text from those specification. Toward these goals, I developed NL2KR, a system which helps in translating natural language to formal language. I show NL2KR's use in translating ``How'' and ``Why'' questions, and generating simple natural language sentences from natural language KDG specification. Finally, I discuss applications of the components I developed in Natural Language Understanding.

Contributors

Agent

Created

Date Created
2015

151144-Thumbnail Image.png

Partial satisfaction planning: representation and solving methods

Description

Automated planning problems classically involve finding a sequence of actions that transform an initial state to some state satisfying a conjunctive set of goals with no temporal constraints. But in many real-world problems, the best plan may involve satisfying only

Automated planning problems classically involve finding a sequence of actions that transform an initial state to some state satisfying a conjunctive set of goals with no temporal constraints. But in many real-world problems, the best plan may involve satisfying only a subset of goals or missing defined goal deadlines. For example, this may be required when goals are logically conflicting, or when there are time or cost constraints such that achieving all goals on time may be too expensive. In this case, goals and deadlines must be declared as soft. I call these partial satisfaction planning (PSP) problems. In this work, I focus on particular types of PSP problems, where goals are given a quantitative value based on whether (or when) they are achieved. The objective is to find a plan with the best quality. A first challenge is in finding adequate goal representations that capture common types of goal achievement rewards and costs. One popular representation is to give a single reward on each goal of a planning problem. I further expand on this approach by allowing users to directly introduce utility dependencies, providing for changes of goal achievement reward directly based on the goals a plan achieves. After, I introduce time-dependent goal costs, where a plan incurs penalty if it will achieve a goal past a specified deadline. To solve PSP problems with goal utility dependencies, I look at using state-of-the-art methodologies currently employed for classical planning problems involving heuristic search. In doing so, one faces the challenge of simultaneously determining the best set of goals and plan to achieve them. This is complicated by utility dependencies defined by a user and cost dependencies within the plan. To address this, I introduce a set of heuristics based on combinations using relaxed plans and integer programming formulations. Further, I explore an approach to improve search through learning techniques by using automatically generated state features to find new states from which to search. Finally, the investigation into handling time-dependent goal costs leads us to an improved search technique derived from observations based on solving discretized approximations of cost functions.

Contributors

Agent

Created

Date Created
2012

151867-Thumbnail Image.png

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

Contributors

Agent

Created

Date Created
2013

151940-Thumbnail Image.png

Gene regulatory networks: modeling, intervention and context

Description

Biological systems are complex in many dimensions as endless transportation and communication networks all function simultaneously. Our ability to intervene within both healthy and diseased systems is tied directly to our ability to understand and model core functionality. The progress

Biological systems are complex in many dimensions as endless transportation and communication networks all function simultaneously. Our ability to intervene within both healthy and diseased systems is tied directly to our ability to understand and model core functionality. The progress in increasingly accurate and thorough high-throughput measurement technologies has provided a deluge of data from which we may attempt to infer a representation of the true genetic regulatory system. A gene regulatory network model, if accurate enough, may allow us to perform hypothesis testing in the form of computational experiments. Of great importance to modeling accuracy is the acknowledgment of biological contexts within the models -- i.e. recognizing the heterogeneous nature of the true biological system and the data it generates. This marriage of engineering, mathematics and computer science with systems biology creates a cycle of progress between computer simulation and lab experimentation, rapidly translating interventions and treatments for patients from the bench to the bedside. This dissertation will first discuss the landscape for modeling the biological system, explore the identification of targets for intervention in Boolean network models of biological interactions, and explore context specificity both in new graphical depictions of models embodying context-specific genomic regulation and in novel analysis approaches designed to reveal embedded contextual information. Overall, the dissertation will explore a spectrum of biological modeling with a goal towards therapeutic intervention, with both formal and informal notions of biological context, in such a way that will enable future work to have an even greater impact in terms of direct patient benefit on an individualized level.

Contributors

Agent

Created

Date Created
2013

151963-Thumbnail Image.png

Robust implementation of NL2KR system and it's application in iRODS domain

Description

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a training corpus of sentences with their translations. As a part of this thesis, we take the prototype NL2KR v.1 system and enhance various components of it to make it usable for somewhat substantial and useful interface languages. We revamped the lexicon learning components, Inverse-lambda and Generalization modules, and redesigned the lexicon learning algorithm which uses these components to learn new meanings of words. Similarly, we re-developed an inbuilt parser of the system in Answer Set Programming (ASP) and also integrated external parser with the system. Apart from this, we added some new rich features like various system configurations and memory cache in the learning component of the NL2KR system. These enhancements helped in learning more meanings of the words, boosted performance of the system by reducing the computation time by a factor of 8 and improved the usability of the system. We evaluated the NL2KR system on iRODS domain. iRODS is a rule-oriented data system, which helps in managing large set of computer files using policies. This system provides a Rule-Oriented interface langauge whose syntactic structure is like any procedural programming language (eg. C). However, direct translation of natural language (NL) to this interface language is difficult. So, for automatic translation of NL to this language, we define a simple intermediate Policy Declarative Language (IPDL) to represent the knowledge in the policies, which then can be directly translated to iRODS rules. We develop a corpus of 100 policy statements and manually translate them to IPDL langauge. This corpus is then used for the evaluation of NL2KR system. We performed 10 fold cross validation on the system. Furthermore, using this corpus, we illustrate how different components of our NL2KR system work.

Contributors

Agent

Created

Date Created
2013

151180-Thumbnail Image.png

Computational methods for knowledge integration in the analysis of large-scale biological networks

Description

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems,

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual patients render this a difficult task. In the last decade, several algorithms have been proposed to elucidate cellular systems from data, resulting in numerous data-driven hypotheses. However, due to the large number of variables involved in the process, many of which are unknown or not measurable, such computational approaches often lead to a high proportion of false positives. This renders interpretation of the data-driven hypotheses extremely difficult. Consequently, a dismal proportion of these hypotheses are subject to further experimental validation, eventually limiting their potential to augment existing biological knowledge. This dissertation develops a framework of computational methods for the analysis of such data-driven hypotheses leveraging existing biological knowledge. Specifically, I show how biological knowledge can be mapped onto these hypotheses and subsequently augmented through novel hypotheses. Biological hypotheses are learnt in three levels of abstraction -- individual interactions, functional modules and relationships between pathways, corresponding to three complementary aspects of biological systems. The computational methods developed in this dissertation are applied to high throughput cancer data, resulting in novel hypotheses with potentially significant biological impact.

Contributors

Agent

Created

Date Created
2012

151793-Thumbnail Image.png

A graphical language for LTL motion and mission planning

Description

Linear Temporal Logic is gaining increasing popularity as a high level specification language for robot motion planning due to its expressive power and scalability of LTL control synthesis algorithms. This formalism, however, requires expert knowledge and makes it inaccessible to

Linear Temporal Logic is gaining increasing popularity as a high level specification language for robot motion planning due to its expressive power and scalability of LTL control synthesis algorithms. This formalism, however, requires expert knowledge and makes it inaccessible to non-expert users. This thesis introduces a graphical specification environment to create high level motion plans to control robots in the field by converting a visual representation of the motion/task plan into a Linear Temporal Logic (LTL) specification. The visual interface is built on the Android tablet platform and provides functionality to create task plans through a set of well defined gestures and on screen controls. It uses the notion of waypoints to quickly and efficiently describe the motion plan and enables a variety of complex Linear Temporal Logic specifications to be described succinctly and intuitively by the user without the need for the knowledge and understanding of LTL specification. Thus, it opens avenues for its use by personnel in military, warehouse management, and search and rescue missions. This thesis describes the construction of LTL for various scenarios used for robot navigation using the visual interface developed and leverages the use of existing LTL based motion planners to carry out the task plan by a robot.

Contributors

Agent

Created

Date Created
2013

150114-Thumbnail Image.png

A computational framework to model and learn context-specific gene regulatory networks from multi-source data

Description

Reverse engineering gene regulatory networks (GRNs) is an important problem in the domain of Systems Biology. Learning GRNs is challenging due to the inherent complexity of the real regulatory networks and the heterogeneity of samples in available biomedical data. Real

Reverse engineering gene regulatory networks (GRNs) is an important problem in the domain of Systems Biology. Learning GRNs is challenging due to the inherent complexity of the real regulatory networks and the heterogeneity of samples in available biomedical data. Real world biological data are commonly collected from broad surveys (profiling studies) and aggregate highly heterogeneous biological samples. Popular methods to learn GRNs simplistically assume a single universal regulatory network corresponding to available data. They neglect regulatory network adaptation due to change in underlying conditions and cellular phenotype or both. This dissertation presents a novel computational framework to learn common regulatory interactions and networks underlying the different sets of relatively homogeneous samples from real world biological data. The characteristic set of samples/conditions and corresponding regulatory interactions defines the cellular context (context). Context, in this dissertation, represents the deterministic transcriptional activity within the specific cellular regulatory mechanism. The major contributions of this framework include - modeling and learning context specific GRNs; associating enriched samples with contexts to interpret contextual interactions using biological knowledge; pruning extraneous edges from the context-specific GRN to improve the precision of the final GRNs; integrating multisource data to learn inter and intra domain interactions and increase confidence in obtained GRNs; and finally, learning combinatorial conditioning factors from the data to identify regulatory cofactors. The framework, Expattern, was applied to both real world and synthetic data. Interesting insights were obtained into mechanism of action of drugs on analysis of NCI60 drug activity and gene expression data. Application to refractory cancer data and Glioblastoma multiforme yield GRNs that were readily annotated with context-specific phenotypic information. Refractory cancer GRNs also displayed associations between distinct cancers, not observed through only clustering. Performance comparisons on multi-context synthetic data show the framework Expattern performs better than other comparable methods.

Contributors

Agent

Created

Date Created
2011