Search Content

Prescription Information Extraction from Electronic Health Records using BiLSTM-CRF and Word Embeddings

Description

Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important…

Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important aspect within these records is the presence of prescription information. Existing techniques for extracting prescription information — which includes medication names, dosages, frequencies, reasons for taking, and mode of administration — from unstructured text have focused on the application of rule- and classifier-based methods. While state-of-the-art systems can be effective in extracting many types of information, they require significant effort to develop hand-crafted rules and conduct effective feature engineering. This paper presents the use of a bidirectional LSTM with CRF tagging model initialized with precomputed word embeddings for extracting prescription information from sentences without requiring significant feature engineering. The experimental results, run on the i2b2 2009 dataset, achieve an F1 macro measure of 0.8562, and scores above 0.9449 on four of the six categories, indicating significant potential for this model.

ContributorsRawal, Samarth Chetan (Author) / Baral, Chitta (Thesis director) / Anwar, Saadat (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

We Need to Talk About Robustness to Adversarial Attacks While Removing Spurious Dataset Biases

Description

Machine learning models can pick up biases and spurious correlations from training data and projects and amplify these biases during inference, thus posing significant challenges in real-world settings. One approach to mitigating this is a class of methods that can identify filter out bias-inducing samples from the training datasets to…

Machine learning models can pick up biases and spurious correlations from training data and projects and amplify these biases during inference, thus posing significant challenges in real-world settings. One approach to mitigating this is a class of methods that can identify filter out bias-inducing samples from the training datasets to force models to avoid being exposed to biases. However, the filtering leads to a considerable wastage of resources as most of the dataset created is discarded as biased. This work deals with avoiding the wastage of resources by identifying and quantifying the biases. I further elaborate on the implications of dataset filtering on robustness (to adversarial attacks) and generalization (to out-of-distribution samples). The findings suggest that while dataset filtering does help to improve OOD(Out-Of-Distribution) generalization, it has a significant negative impact on robustness to adversarial attacks. It also shows that transforming bias-inducing samples into adversarial samples (instead of eliminating them from the dataset) can significantly boost robustness without sacrificing generalization.

ContributorsSachdeva, Bhavdeep Singh (Author) / Baral, Chitta (Thesis advisor) / Liu, Huan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

Medical Question Answering using Instructional Prompts

Description

Instructional prompts are a novel technique that can significantly improve the performance of natural language processing tasks by specifying the task instruction to the language model. This is the first paper that uses instructional prompts to improve performance of the question answering task in biomedical domain. This work makes two…

Instructional prompts are a novel technique that can significantly improve the performance of natural language processing tasks by specifying the task instruction to the language model. This is the first paper that uses instructional prompts to improve performance of the question answering task in biomedical domain. This work makes two significant contributions. Firstly, a question answer dataset of 600K question answer pairs has been developed by using the medical textbook ‘Differential Diagnosis Primary Care’, which contains information on how to diagnose a patient by observing their disease symptoms. Secondly, a question answering language model augmented with instructional prompts has been developed by training on the medical information extracted from the book ‘Differential Diagnosis Primary Care’. Experiments have been conducted to demonstrate that it performs better than a normal question answering model that does not use instructional prompts. Instructional prompts are based on prompt tuning and prefix tuning, which are novel techniques which can help train language model to do specific downstream tasks by keeping majority of model parameters frozen, and only optimizing a small number of continuous task-specific vectors (called the prefixes).

ContributorsSaxena, Sharad (Author) / Baral, Chitta (Thesis advisor) / Blanco, Eduardo (Committee member) / Anwar, Saadat (Committee member) / Arizona State University (Publisher)

Created2021

Event Detection as Multi-Task Text Generation

Description

Event detection refers to the task of identifying event occurrences in a given natural language text. Event detection comprises two subtasks; recognizing event mention (event identification) and the type of event (event classification). Breaking from the sequence labeling and word classification approaches, this work models event detection, and its constituent…

Event detection refers to the task of identifying event occurrences in a given natural language text. Event detection comprises two subtasks; recognizing event mention (event identification) and the type of event (event classification). Breaking from the sequence labeling and word classification approaches, this work models event detection, and its constituent subtasks of trigger identification and trigger classification, as independent sequence generation tasks. This work proposes a prompted multi-task generative model trained on event identification, classification, and combined event detection. The model is evaluated on on general-domain and biomedical-domain event detection datasets, achieving state-of-the-art results on the general-domain Roles Across Multiple Sentences (RAMS) dataset, establishing event detection benchmark performance on WikiEvents, and achieving competitive performance on the general-domain Massive Event Detection (MAVEN) dataset and the biomedical-domain Multi-Level Event Extraction (MLEE) dataset.

ContributorsAnantheswaran, Ujjwala (Author) / Baral, Chitta (Thesis advisor) / Kerner, Hannah (Committee member) / Gopalan, Nakul (Committee member) / Arizona State University (Publisher)

Created2022

Implicit Hypothetical Reasoning about Intrinsic Physical Properties

Description

Multimodal reasoning is one of the most interesting research fields because of the ability to interact with systems and the explainability of the models' behavior. Traditional multimodal research problems do not focus on complex commonsense reasoning (such as physical interactions). Although real-world objects have physical properties associated with them,…

Multimodal reasoning is one of the most interesting research fields because of the ability to interact with systems and the explainability of the models' behavior. Traditional multimodal research problems do not focus on complex commonsense reasoning (such as physical interactions). Although real-world objects have physical properties associated with them, many of these properties (such as mass and coefficient of friction) are not captured directly by the imaging pipeline. Videos often capture objects, their motion, and the interactions between different objects. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. This thesis introduces a new video question-answering task for reasoning about the implicit physical properties of objects in a scene, from videos. For this task, I introduce a dataset -- CRIPP-VQA (Counterfactual Reasoning about Implicit Physical Properties - Video Question Answering), which contains videos of objects in motion, annotated with hypothetical/counterfactual questions about the effect of actions (such as removing, adding, or replacing objects), questions about planning (choosing actions to perform to reach a particular goal), as well as descriptive questions about the visible properties of objects. Further, I benchmark the performance of existing video question-answering models on two test settings of CRIPP-VQA: i.i.d. and an out-of-distribution setting which contains objects with values of mass, coefficient of friction, and initial velocities that are not seen in the training distribution. Experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this thesis) and explicit properties (the focus of prior work) of objects.

ContributorsPatel, Maitreya Jitendra (Author) / Yang, Yezhou (Thesis advisor) / Baral, Chitta (Committee member) / Lee, Kookjin (Committee member) / Arizona State University (Publisher)

Created2022

Extracting Semantic Information from Online Conversations to Enhance Cyber Defense

Description

Recent advances in techniques allow the extraction of Cyber Threat Information (CTI) from online content, such as social media, blog articles, and posts in discussion forums. Most research work focuses on social media and blog posts since their content is often contributed by cybersecurity experts and is usually of cleaner…

Recent advances in techniques allow the extraction of Cyber Threat Information (CTI) from online content, such as social media, blog articles, and posts in discussion forums. Most research work focuses on social media and blog posts since their content is often contributed by cybersecurity experts and is usually of cleaner formats. While posts in online forums are noisier and less structured, online forums attract more users than other sources and contain much valuable information that may help predict cyber threats. Therefore, effectively extracting CTI from online forum posts is an important task in today's data-driven cybersecurity defenses. Many Natural Language Processing (NLP) techniques are applied to the cybersecurity domains to extract the useful information, however, there is still space to improve. In this dissertation, a new Named Entity Recognition framework for cybersecurity domains and thread structure construction methods for unstructured forums are proposed to support the extraction of CTI. Then, extend them to filter the posts in the forums to eliminate non cybersecurity related topics with Cyber Attack Relevance Scale (CARS), extract the cybersecurity knowledgeable users to enhance more information for enhancing cybersecurity, and extract trending topic phrases related to cyber attacks in the hackers forums to find the clues for potential future attacks to predict them.

ContributorsKashihara, Kazuaki (Author) / Baral, Chitta (Thesis advisor) / Doupe, Adam (Committee member) / Blanco, Eduardo (Committee member) / Wang, Ruoyu (Committee member) / Arizona State University (Publisher)

Created2022

Interpreting Answers to Yes-No Questions in Twitter

Description

Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and when answers include them, they are rarely to be interpreted what the keywords suggest. This work presents a new corpus of 4,442 yes-no question answer pairs from Twitter (Twitter-YN). The corpus includes question-answer…

Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and when answers include them, they are rarely to be interpreted what the keywords suggest. This work presents a new corpus of 4,442 yes-no question answer pairs from Twitter (Twitter-YN). The corpus includes question-answer instances from different temporal settings. These settings allow investigating if having older tweets helps understanding more contemporary tweets. Common linguistic features of answers meaning yes, no as well as those whose interpretation remains unknown are also discussed. Experimental results show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media (F1: 0.59). In addition to English, this work presents a Hindi corpus of 3,409 yes-no questions and answers from Twitter (Twitter-YN-hi). Cross lingual experiments are conducted using a distant supervision approach. It is observed that performance of multilingual large language models to interpret indirect answers to yes-no questions in Hindi can be improved when Twitter-YN is blended with distantly supervised data.

ContributorsMathur, Shivam (Author) / Blanco, Eduardo (Thesis advisor) / Baral, Chitta (Thesis advisor) / Choi, YooJung (Committee member) / Arizona State University (Publisher)

Created2023

Towards Understanding the Role of Knowledge in Improving Transformer-based Language Models

Description

In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained…

In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained on massive curated data, they often need specific extracted knowledge to understand better and reason. This is because often relevant knowledge may be implicit or missing, which hampers machine reasoning. Apart from that, manual knowledge curation is time-consuming and erroneous. Hence, finding fast and effective methods to extract such knowledge from data is important for improving language models. This leads to finding ideal ways to utilize such knowledge by incorporating them into language models. Successful knowledge extraction and integration lead to an important question of knowledge evaluation of such models by developing tools or introducing challenging test suites to learn about their limitations and improve them further. So to improve the transformer-based models, understanding the role of knowledge becomes important. In the pursuit to improve language models with knowledge, in this dissertation I study three broad research directions spanning across the natural language, biomedical and cybersecurity domains: (1) Knowledge Extraction (KX) - How can transformer-based language models be leveraged to extract knowledge from data? (2) Knowledge Integration (KI) - How can such specific knowledge be used to improve such models? (3) Knowledge Evaluation (KE) - How can language models be evaluated for specific skills and understand their limitations? I propose methods to extract explicit textual, implicit structural, missing textual, and missing structural knowledge from natural language and binary programs using transformer-based language models. I develop ways to improve the language model’s multi-step and commonsense reasoning abilities using external knowledge. Finally, I develop challenging datasets which assess their numerical reasoning skills in both in-domain and out-of-domain settings.

ContributorsPal, Kuntal Kumar (Author) / Baral, Chitta (Thesis advisor) / Wang, Ruoyu (Committee member) / Blanco, Eduardo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2023

Multimodal Fake News Detection via Single Tower Transformer

Description

With the rise in social media usage and rapid communication, the proliferation of misinformation and fake news has become a pressing concern. The detection of multimodal fake news requires careful consideration of both image and textual semantics with proper alignment of the embedding space. Automated fake news detection has gained…

With the rise in social media usage and rapid communication, the proliferation of misinformation and fake news has become a pressing concern. The detection of multimodal fake news requires careful consideration of both image and textual semantics with proper alignment of the embedding space. Automated fake news detection has gained significant attention in recent years. Existing research has focused on either capturing cross-modal inconsistency information or leveraging the complementary information within image-text pairs. However, the potential of powerful cross-modal contrastive learning methods and effective modality mixing remains an open-ended question. The thesis proposes a novel two-leg single-tower architecture equipped with self-attention mechanisms and custom contrastive loss to efficiently aggregate multimodal features. Furthermore, pretraining and fine-tuning are employed on the custom transformer model to classify fake news across the popular Twitter multimodal fake news dataset. The experimental results demonstrate the efficacy and robustness of the proposed approach, offering promising advancements in multimodal fake news detection research.

ContributorsLakhanpal, Sanyam (Author) / Lee, Kookjin (Thesis advisor) / Baral, Chitta (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2023

Neuro-Symbolic AI Approaches to Enhance Deep Neural Networks with Logical Reasoning and Knowledge Integration

Description

One of the challenges in Artificial Intelligence (AI) is to integrate fast, automatic, and intuitive System-1 thinking with slow, deliberate, and logical System-2 thinking. While deep learning approaches excel at perception tasks for System-1, their reasoning capabilities for System-2 are limited. Besides, deep learning approaches are usually data-hungry, hard to…

One of the challenges in Artificial Intelligence (AI) is to integrate fast, automatic, and intuitive System-1 thinking with slow, deliberate, and logical System-2 thinking. While deep learning approaches excel at perception tasks for System-1, their reasoning capabilities for System-2 are limited. Besides, deep learning approaches are usually data-hungry, hard to make use of explicit knowledge, and struggling with interpretability and justification. This dissertation presents three neuro-symbolic AI approaches that integrate neural networks (NNs) with symbolic AI methods to address these issues. The first approach presented in this dissertation is NeurASP, which combines NNs with Answer Set Programming (ASP), a logic programming formalism. NeurASP provides an effective way to integrate sub-symbolic and symbolic computation by treating NN outputs as probability distributions over atomic facts in ASP. The explicit knowledge encoded in ASP corrects mistakes in NN outputs and allows for better training with less data. To avoid NeurASP's bottleneck in symbolic computation, this dissertation presents a Constraint Loss via Straight-Through Estimators (CL-STE). CL-STE provides a systematic way to compile discrete logical constraints into a loss function over discretized NN outputs and scales significantly better than state-of-the-art neuro-symbolic methods. This dissertation also presents a finding when CL-STE was applied to Transformers. Transformers can be extended with recurrence to enhance its power for multi-step reasoning. Such Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. Lastly, this dissertation addresses the limitation of pre-trained Large Language Models (LLMs) on multi-step logical reasoning problems with a dual-process neuro-symbolic reasoning system called LLM+ASP, where an LLM (e.g., GPT-3) serves as a highly effective few-shot semantic parser that turns natural language sentences into a logical form that can be used as input to ASP. LLM+ASP achieves state-of-the-art performance on several textual reasoning benchmarks and can handle robot planning tasks that an LLM alone fails to solve.

ContributorsYang, Zhun (Author) / Lee, Joohyung (Thesis advisor) / Baral, Chitta (Committee member) / Li, Baoxin (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2023