Matching Items (5)
Filtering by

Clear all filters

168369-Thumbnail Image.png
Description
Instructional prompts are a novel technique that can significantly improve the performance of natural language processing tasks by specifying the task instruction to the language model. This is the first paper that uses instructional prompts to improve performance of the question answering task in biomedical domain. This work makes two

Instructional prompts are a novel technique that can significantly improve the performance of natural language processing tasks by specifying the task instruction to the language model. This is the first paper that uses instructional prompts to improve performance of the question answering task in biomedical domain. This work makes two significant contributions. Firstly, a question answer dataset of 600K question answer pairs has been developed by using the medical textbook ‘Differential Diagnosis Primary Care’, which contains information on how to diagnose a patient by observing their disease symptoms. Secondly, a question answering language model augmented with instructional prompts has been developed by training on the medical information extracted from the book ‘Differential Diagnosis Primary Care’. Experiments have been conducted to demonstrate that it performs better than a normal question answering model that does not use instructional prompts. Instructional prompts are based on prompt tuning and prefix tuning, which are novel techniques which can help train language model to do specific downstream tasks by keeping majority of model parameters frozen, and only optimizing a small number of continuous task-specific vectors (called the prefixes).
ContributorsSaxena, Sharad (Author) / Baral, Chitta (Thesis advisor) / Blanco, Eduardo (Committee member) / Anwar, Saadat (Committee member) / Arizona State University (Publisher)
Created2021
190194-Thumbnail Image.png
Description
Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and when answers include them, they are rarely to be interpreted what the keywords suggest. This work presents a new corpus of 4,442 yes-no question answer pairs from Twitter (Twitter-YN). The corpus includes question-answer

Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and when answers include them, they are rarely to be interpreted what the keywords suggest. This work presents a new corpus of 4,442 yes-no question answer pairs from Twitter (Twitter-YN). The corpus includes question-answer instances from different temporal settings. These settings allow investigating if having older tweets helps understanding more contemporary tweets. Common linguistic features of answers meaning yes, no as well as those whose interpretation remains unknown are also discussed. Experimental results show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media (F1: 0.59). In addition to English, this work presents a Hindi corpus of 3,409 yes-no questions and answers from Twitter (Twitter-YN-hi). Cross lingual experiments are conducted using a distant supervision approach. It is observed that performance of multilingual large language models to interpret indirect answers to yes-no questions in Hindi can be improved when Twitter-YN is blended with distantly supervised data.
ContributorsMathur, Shivam (Author) / Blanco, Eduardo (Thesis advisor) / Baral, Chitta (Thesis advisor) / Choi, YooJung (Committee member) / Arizona State University (Publisher)
Created2023
189209-Thumbnail Image.png
Description
In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained

In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained on massive curated data, they often need specific extracted knowledge to understand better and reason. This is because often relevant knowledge may be implicit or missing, which hampers machine reasoning. Apart from that, manual knowledge curation is time-consuming and erroneous. Hence, finding fast and effective methods to extract such knowledge from data is important for improving language models. This leads to finding ideal ways to utilize such knowledge by incorporating them into language models. Successful knowledge extraction and integration lead to an important question of knowledge evaluation of such models by developing tools or introducing challenging test suites to learn about their limitations and improve them further. So to improve the transformer-based models, understanding the role of knowledge becomes important. In the pursuit to improve language models with knowledge, in this dissertation I study three broad research directions spanning across the natural language, biomedical and cybersecurity domains: (1) Knowledge Extraction (KX) - How can transformer-based language models be leveraged to extract knowledge from data? (2) Knowledge Integration (KI) - How can such specific knowledge be used to improve such models? (3) Knowledge Evaluation (KE) - How can language models be evaluated for specific skills and understand their limitations? I propose methods to extract explicit textual, implicit structural, missing textual, and missing structural knowledge from natural language and binary programs using transformer-based language models. I develop ways to improve the language model’s multi-step and commonsense reasoning abilities using external knowledge. Finally, I develop challenging datasets which assess their numerical reasoning skills in both in-domain and out-of-domain settings.
ContributorsPal, Kuntal Kumar (Author) / Baral, Chitta (Thesis advisor) / Wang, Ruoyu (Committee member) / Blanco, Eduardo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2023
187426-Thumbnail Image.png
Description
Code Generation is a task that has gained rapid progress in Natural Language Processing (NLP) research. This thesis focuses on the text-to-Structured Query Language (SQL) task, where the input is a question about a specific database and the output is the SQL that when executed will return the desired answer.

Code Generation is a task that has gained rapid progress in Natural Language Processing (NLP) research. This thesis focuses on the text-to-Structured Query Language (SQL) task, where the input is a question about a specific database and the output is the SQL that when executed will return the desired answer. The data creation process bottlenecks current text-to-SQL datasets. The technical knowledge required to understand and create SQL makes crowd-sourcing a dataset expensive and time-consuming. Thus, existing datasets do not provide a robust enough training set for state-of-the-art semantic parsing models. This thesis outlines my technique for generating a text-to-SQL dataset using GPT3 and prompt engineering techniques. My approach entails providing the Generative Pretrained Transformer 3 model (GPT-3) with particular instructions to build a rigorous text-to-SQL dataset. In this paper, I show that the created pairs have excellent quality and diversity, and when utilized as training data, they can enhance the accuracy of SQL generation models. I expect that my method will be of interest to academics in the disciplines of NLP because it can considerably reduce the time, effort, and cost necessary to produce large, high-quality text-to-SQL datasets. Furthermore, my approach can be extended to other tasks and domains to alleviate the burden of curating human-annotated data.
ContributorsKuznia, Kirby Charles (Author) / Baral, Chitta (Thesis advisor) / Blanco, Eduardo (Committee member) / Gopalan, Nakul (Committee member) / Arizona State University (Publisher)
Created2023
187694-Thumbnail Image.png
Description
In the era of information explosion and multi-modal data, information retrieval (IR) and question answering (QA) systems have become essential in daily human activities. IR systems aim to find relevant information in response to user queries, while QA systems provide concise and accurate answers to user questions. IR and

In the era of information explosion and multi-modal data, information retrieval (IR) and question answering (QA) systems have become essential in daily human activities. IR systems aim to find relevant information in response to user queries, while QA systems provide concise and accurate answers to user questions. IR and QA are two of the most crucial challenges in the realm of Artificial Intelligence (AI), with wide-ranging real-world applications such as search engines and dialogue systems. This dissertation investigates and develops novel models and training objectives to enhance current retrieval systems in textual and multi-modal contexts. Moreover, it examines QA systems, emphasizing generalization and robustness, and creates new benchmarks to promote their progress. Neural retrievers have surfaced as a viable solution, capable of surpassing the constraints of traditional term-matching search algorithms. This dissertation presents Poly-DPR, an innovative multi-vector model architecture that manages test-query, and ReViz, a comprehensive multimodal model to tackle multi-modality queries. By utilizing IR-focused pretraining tasks and producing large-scale training data, the proposed methodology substantially improves the abilities of existing neural retrievers.Concurrently, this dissertation investigates the realm of QA systems, referred to as ``readers'', by performing an exhaustive analysis of current extractive and generative readers, which results in a reliable guidance for selecting readers for downstream applications. Additionally, an original reader (Two-in-One) is designed to effectively choose the pertinent passages and sentences from a pool of candidates for multi-hop reasoning. This dissertation also acknowledges the significance of logical reasoning in real-world applications and has developed a comprehensive testbed, LogiGLUE, to further the advancement of reasoning capabilities in QA systems.
ContributorsLuo, Man (Author) / Baral, Chitta (Thesis advisor) / Yang, Yezhou (Committee member) / Blanco, Eduardo (Committee member) / Chen, Danqi (Committee member) / Arizona State University (Publisher)
Created2023