Search Content

Displaying 1 - 3 of 3

Filtering by

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Investigations of the role of high-level cognitive skills in the text production process

Description

Writing is an intricate cognitive and social process that involves the production of texts for the purpose of conveying meaning to others. The importance of lower level cognitive skills and language knowledge during this text production process has been well documented in the literature. However, the role of higher level skills (e.g., metacognition, strategy use, etc.) has been less strongly emphasized. This thesis proposal examines higher level cognitive skills in the context of persuasive essay writing. Specifically, two published manuscripts are presented, which both examine the role of higher level skills in the context of writing. The first manuscript investigates the role of metacognition in the writing process by examining the accuracy and characteristics of students' self-assessments of their essays. The second manuscript takes an individual differences approach and examines whether the higher level cognitive skills commonly associated with reading comprehension are also related to performance on writing tasks. Taken together, these manuscripts point towards a strong role of higher level skills in the writing process and provide a strong foundation on which to develop future research and educational interventions.

ContributorsAllen, Laura K (Author) / McNamara, Danielle S. (Thesis advisor) / Connor, Carol (Committee member) / Glenberg, Arthur (Committee member) / Arizona State University (Publisher)

Created2014

Examining the role of linguistic flexibility in the text production process

Description

A commonly held belief among educators, researchers, and students is that high-quality texts are easier to read than low-quality texts, as they contain more engaging narrative and story-like elements. Interestingly, these assumptions have typically failed to be supported by the writing literature. Research suggests that higher quality writing is typically associated with decreased levels of text narrativity and readability. Although narrative elements may sometimes be associated with high-quality writing, the majority of research suggests that higher quality writing is associated with decreased levels of text narrativity, and measures of readability in general. One potential explanation for this conflicting evidence lies in the situational influence of text elements on writing quality. In other words, it is possible that the frequency of specific linguistic or rhetorical text elements alone is not consistently indicative of essay quality. Rather, these effects may be largely driven by individual differences in students' ability to leverage the benefits of these elements in appropriate contexts. This dissertation presents the hypothesis that writing proficiency is associated with an individual's flexible use of text properties, rather than simply the consistent use of a particular set of properties. Across three experiments, this dissertation relies on a combination of natural language processing and dynamic methodologies to examine the role of linguistic flexibility in the text production process. Overall, the studies included in this dissertation provide important insights into the role of flexibility in writing skill and develop a strong foundation on which to conduct future research and educational interventions.

ContributorsAllen, Laura (Author) / McNamara, Danielle S. (Thesis advisor) / Glenberg, Arthur (Committee member) / Connor, Carol (Committee member) / Duran, Nicholas (Committee member) / Arizona State University (Publisher)

Created2017

Theses and Dissertations

Filtering by

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Investigations of the role of high-level cognitive skills in the text production process

Examining the role of linguistic flexibility in the text production process