Matching Items (54)

Filtering by

Clear all filters

133339-Thumbnail Image.png

Prescription Information Extraction from Electronic Health Records using BiLSTM-CRF and Word Embeddings

Description

Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records

Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important aspect within these records is the presence of prescription information. Existing techniques for extracting prescription information — which includes medication names, dosages, frequencies, reasons for taking, and mode of administration — from unstructured text have focused on the application of rule- and classifier-based methods. While state-of-the-art systems can be effective in extracting many types of information, they require significant effort to develop hand-crafted rules and conduct effective feature engineering. This paper presents the use of a bidirectional LSTM with CRF tagging model initialized with precomputed word embeddings for extracting prescription information from sentences without requiring significant feature engineering. The experimental results, run on the i2b2 2009 dataset, achieve an F1 macro measure of 0.8562, and scores above 0.9449 on four of the six categories, indicating significant potential for this model.

Contributors

Agent

Created

Date Created
2018-05

131274-Thumbnail Image.png

Improving upon the State-of-the-Art in Multimodal Emotional Recognition in Dialogue

Description

Emotion recognition in conversation has applications within numerous domains such as affective computing and medicine. Recent methods for emotion recognition jointly utilize conversational data over several modalities including audio, video, and text. However, state-of-the-art frameworks for this task do not

Emotion recognition in conversation has applications within numerous domains such as affective computing and medicine. Recent methods for emotion recognition jointly utilize conversational data over several modalities including audio, video, and text. However, state-of-the-art frameworks for this task do not focus on the feature extraction and feature fusion steps of this process. This thesis aims to improve the state-of-the-art method by incorporating two components to better accomplish these steps. By doing so, we are able to produce improved representations for the text modality and better model the relationships between all modalities. This paper proposes two methods which focus on these concepts and provide improved accuracy over the state-of-the-art framework for multimodal emotion recognition in dialogue.

Contributors

Agent

Created

Date Created
2020-05

Enhancing Student Learning Through Adaptive Sentence Generation

Description

Education of any skill based subject, such as mathematics or language, involves a significant amount of repetition and pratice. According to the National Survey of Student Engagements, students spend on average 17 hours per week reviewing and practicing material previously

Education of any skill based subject, such as mathematics or language, involves a significant amount of repetition and pratice. According to the National Survey of Student Engagements, students spend on average 17 hours per week reviewing and practicing material previously learned in a classroom, with higher performing students showing a tendency to spend more time practicing. As such, learning software has emerged in the past several decades focusing on providing a wide range of examples, practice problems, and situations for users to exercise their skills. Notably, math students have benefited from software that procedurally generates a virtually infinite number of practice problems and their corresponding solutions. This allows for instantaneous feedback and automatic generation of tests and quizzes. Of course, this is only possible because software is capable of generating and verifying a virtually endless supply of sample problems across a wide range of topics within mathematics. While English learning software has progressed in a similar manner, it faces a series of hurdles distinctly different from those of mathematics. In particular, there is a wide range of exception cases present in English grammar. Some words have unique spellings for their plural forms, some words have identical spelling for plural forms, and some words are conjugated differently for only one particular tense or person-of-speech. These issues combined make the problem of generating grammatically correct sentences complicated. To compound to this problem, the grammar rules in English are vast, and often depend on the context in which they are used. Verb-tense agreement (e.g. "I eat" vs "he eats"), and conjugation of irregular verbs (e.g. swim -> swam) are common examples. This thesis presents an algorithm designed to randomly generate a virtually infinite number of practice problems for students of English as a second language. This approach differs from other generation approaches by generating based on a context set by educators, so that problems can be generated in the context of what students are currently learning. The algorithm is validated through a study in which over 35 000 sentences generated by the algorithm are verified by multiple grammar checking algorithms, and a subset of the sentences are validated against 3 education standards by a subject matter expert in the field. The study found that this approach has a significantly reduced grammar error ratio compared to other generation algorithms, and shows potential where context specification is concerned.

Contributors

Agent

Created

Date Created
2016-05

136202-Thumbnail Image.png

Learning the Initial Lexicon in Translating Natural Language to Formal Language

Description

The objective of this research is to determine an approach for automating the learning of the initial lexicon used in translating natural language sentences to their formal knowledge representations based on lambda-calculus expressions. Using a universal knowledge representation and its

The objective of this research is to determine an approach for automating the learning of the initial lexicon used in translating natural language sentences to their formal knowledge representations based on lambda-calculus expressions. Using a universal knowledge representation and its associated parser, this research attempts to use word alignment techniques to align natural language sentences to the linearized parses of their associated knowledge representations in order to learn the meanings of individual words. The work includes proposing and analyzing an approach that can be used to learn some of the initial lexicon.

Contributors

Agent

Created

Date Created
2015-05

154047-Thumbnail Image.png

Answering deep queries specified in natural language with respect to a frame based knowledge base and developing related natural language understanding components

Description

Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However,

Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However, most of the research in Question Answering aims at factual questions rather than deep ones such as ``How'' and ``Why'' questions.

In this dissertation, I suggest a different approach in tackling this problem. We believe that the answers of deep questions need to be formally defined before found.

Because these answers must be defined based on something, it is better to be more structural in natural language text; I define Knowledge Description Graphs (KDGs), a graphical structure containing information about events, entities, and classes. We then propose formulations and algorithms to construct KDGs from a frame-based knowledge base, define the answers of various ``How'' and ``Why'' questions with respect to KDGs, and suggest how to obtain the answers from KDGs using Answer Set Programming. Moreover, I discuss how to derive missing information in constructing KDGs when the knowledge base is under-specified and how to answer many factual question types with respect to the knowledge base.

After having the answers of various questions with respect to a knowledge base, I extend our research to use natural language text in specifying deep questions and knowledge base, generate natural language text from those specification. Toward these goals, I developed NL2KR, a system which helps in translating natural language to formal language. I show NL2KR's use in translating ``How'' and ``Why'' questions, and generating simple natural language sentences from natural language KDG specification. Finally, I discuss applications of the components I developed in Natural Language Understanding.

Contributors

Agent

Created

Date Created
2015

151144-Thumbnail Image.png

Partial satisfaction planning: representation and solving methods

Description

Automated planning problems classically involve finding a sequence of actions that transform an initial state to some state satisfying a conjunctive set of goals with no temporal constraints. But in many real-world problems, the best plan may involve satisfying only

Automated planning problems classically involve finding a sequence of actions that transform an initial state to some state satisfying a conjunctive set of goals with no temporal constraints. But in many real-world problems, the best plan may involve satisfying only a subset of goals or missing defined goal deadlines. For example, this may be required when goals are logically conflicting, or when there are time or cost constraints such that achieving all goals on time may be too expensive. In this case, goals and deadlines must be declared as soft. I call these partial satisfaction planning (PSP) problems. In this work, I focus on particular types of PSP problems, where goals are given a quantitative value based on whether (or when) they are achieved. The objective is to find a plan with the best quality. A first challenge is in finding adequate goal representations that capture common types of goal achievement rewards and costs. One popular representation is to give a single reward on each goal of a planning problem. I further expand on this approach by allowing users to directly introduce utility dependencies, providing for changes of goal achievement reward directly based on the goals a plan achieves. After, I introduce time-dependent goal costs, where a plan incurs penalty if it will achieve a goal past a specified deadline. To solve PSP problems with goal utility dependencies, I look at using state-of-the-art methodologies currently employed for classical planning problems involving heuristic search. In doing so, one faces the challenge of simultaneously determining the best set of goals and plan to achieve them. This is complicated by utility dependencies defined by a user and cost dependencies within the plan. To address this, I introduce a set of heuristics based on combinations using relaxed plans and integer programming formulations. Further, I explore an approach to improve search through learning techniques by using automatically generated state features to find new states from which to search. Finally, the investigation into handling time-dependent goal costs leads us to an improved search technique derived from observations based on solving discretized approximations of cost functions.

Contributors

Agent

Created

Date Created
2012

151867-Thumbnail Image.png

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

Contributors

Agent

Created

Date Created
2013

151940-Thumbnail Image.png

Gene regulatory networks: modeling, intervention and context

Description

Biological systems are complex in many dimensions as endless transportation and communication networks all function simultaneously. Our ability to intervene within both healthy and diseased systems is tied directly to our ability to understand and model core functionality. The progress

Biological systems are complex in many dimensions as endless transportation and communication networks all function simultaneously. Our ability to intervene within both healthy and diseased systems is tied directly to our ability to understand and model core functionality. The progress in increasingly accurate and thorough high-throughput measurement technologies has provided a deluge of data from which we may attempt to infer a representation of the true genetic regulatory system. A gene regulatory network model, if accurate enough, may allow us to perform hypothesis testing in the form of computational experiments. Of great importance to modeling accuracy is the acknowledgment of biological contexts within the models -- i.e. recognizing the heterogeneous nature of the true biological system and the data it generates. This marriage of engineering, mathematics and computer science with systems biology creates a cycle of progress between computer simulation and lab experimentation, rapidly translating interventions and treatments for patients from the bench to the bedside. This dissertation will first discuss the landscape for modeling the biological system, explore the identification of targets for intervention in Boolean network models of biological interactions, and explore context specificity both in new graphical depictions of models embodying context-specific genomic regulation and in novel analysis approaches designed to reveal embedded contextual information. Overall, the dissertation will explore a spectrum of biological modeling with a goal towards therapeutic intervention, with both formal and informal notions of biological context, in such a way that will enable future work to have an even greater impact in terms of direct patient benefit on an individualized level.

Contributors

Agent

Created

Date Created
2013

151963-Thumbnail Image.png

Robust implementation of NL2KR system and it's application in iRODS domain

Description

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a training corpus of sentences with their translations. As a part of this thesis, we take the prototype NL2KR v.1 system and enhance various components of it to make it usable for somewhat substantial and useful interface languages. We revamped the lexicon learning components, Inverse-lambda and Generalization modules, and redesigned the lexicon learning algorithm which uses these components to learn new meanings of words. Similarly, we re-developed an inbuilt parser of the system in Answer Set Programming (ASP) and also integrated external parser with the system. Apart from this, we added some new rich features like various system configurations and memory cache in the learning component of the NL2KR system. These enhancements helped in learning more meanings of the words, boosted performance of the system by reducing the computation time by a factor of 8 and improved the usability of the system. We evaluated the NL2KR system on iRODS domain. iRODS is a rule-oriented data system, which helps in managing large set of computer files using policies. This system provides a Rule-Oriented interface langauge whose syntactic structure is like any procedural programming language (eg. C). However, direct translation of natural language (NL) to this interface language is difficult. So, for automatic translation of NL to this language, we define a simple intermediate Policy Declarative Language (IPDL) to represent the knowledge in the policies, which then can be directly translated to iRODS rules. We develop a corpus of 100 policy statements and manually translate them to IPDL langauge. This corpus is then used for the evaluation of NL2KR system. We performed 10 fold cross validation on the system. Furthermore, using this corpus, we illustrate how different components of our NL2KR system work.

Contributors

Agent

Created

Date Created
2013

151180-Thumbnail Image.png

Computational methods for knowledge integration in the analysis of large-scale biological networks

Description

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems,

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual patients render this a difficult task. In the last decade, several algorithms have been proposed to elucidate cellular systems from data, resulting in numerous data-driven hypotheses. However, due to the large number of variables involved in the process, many of which are unknown or not measurable, such computational approaches often lead to a high proportion of false positives. This renders interpretation of the data-driven hypotheses extremely difficult. Consequently, a dismal proportion of these hypotheses are subject to further experimental validation, eventually limiting their potential to augment existing biological knowledge. This dissertation develops a framework of computational methods for the analysis of such data-driven hypotheses leveraging existing biological knowledge. Specifically, I show how biological knowledge can be mapped onto these hypotheses and subsequently augmented through novel hypotheses. Biological hypotheses are learnt in three levels of abstraction -- individual interactions, functional modules and relationships between pathways, corresponding to three complementary aspects of biological systems. The computational methods developed in this dissertation are applied to high throughput cancer data, resulting in novel hypotheses with potentially significant biological impact.

Contributors

Agent

Created

Date Created
2012