Search Content

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

A: compact disc recording of three commissioned works featuring the clarinet by Portuguese composers, which include Portuguese folk music elements

Description

Despite the wealth of folk music traditions in Portugal and the importance of the clarinet in the music of bandas filarmonicas, it is uncommon to find works featuring the clarinet using Portuguese folk music elements. In the interest of expanding this type of repertoire, three new works were commissioned from…

Despite the wealth of folk music traditions in Portugal and the importance of the clarinet in the music of bandas filarmonicas, it is uncommon to find works featuring the clarinet using Portuguese folk music elements. In the interest of expanding this type of repertoire, three new works were commissioned from three different composers. The resulting works are Seres Imaginarios 3 by Luis Cardoso; Delirio Barroco by Tiago Derrica; and Memória by Pedro Faria Gomes. In an effort to submit these new works for inclusion into mainstream performance literature, the author has recorded these works on compact disc. This document includes interview transcripts with each composer, providing first-person discussion of each composition, as well as detailed biographical information on each composer. To provide context, the author has included a brief discussion on Portuguese folk music, and in particular, the role that the clarinet plays in Portuguese folk music culture.

ContributorsFerreira, Wesley (Contributor) / Spring, Robert S (Thesis advisor) / Bailey, Wayne (Committee member) / Gardner, Joshua (Committee member) / Hill, Gary (Committee member) / Schuring, Martin (Committee member) / Solis, Theodore (Committee member) / Arizona State University (Publisher)

Created2013

Shibboleth: an automated foreign accent identification program

Description

The speech of non-native (L2) speakers of a language contains phonological rules that differentiate them from native speakers. These phonological rules characterize or distinguish accents in an L2. The Shibboleth program creates combinatorial rule-sets to describe the phonological pattern of these accents and classifies L2 speakers into their native language.…

The speech of non-native (L2) speakers of a language contains phonological rules that differentiate them from native speakers. These phonological rules characterize or distinguish accents in an L2. The Shibboleth program creates combinatorial rule-sets to describe the phonological pattern of these accents and classifies L2 speakers into their native language. The training and classification is done in Shibboleth by support vector machines using a Gaussian radial basis kernel. In one experiment run using Shibboleth, the program correctly identified the native language (L1) of a speaker of unknown origin 42% of the time when there were six possible L1s in which to classify the speaker. This rate is significantly better than the 17% chance classification rate. Chi-squared test (1, N=24) =10.800, p=.0010 In a second experiment, Shibboleth was not able to determine the native language family of a speaker of unknown origin at a rate better than chance (33-44%) when the L1 was not in the transcripts used for training the language family rule-set. Chi-squared test (1, N=18) =1.000, p=.3173 The 318 participants for both experiments were from the Speech Accent Archive (Weinberger, 2013), and ranged in age from 17 to 80 years old. Forty percent of the speakers were female and 60% were male. The factor that most influenced correct classification was higher age of onset for the L2. A higher number of years spent living in an English-speaking country did not have the expected positive effect on classification.

ContributorsFrost, Wende (Author) / Gelderen, Elly van (Thesis advisor) / Perzanowski, Dennis (Committee member) / Gee, Elisabeth (Committee member) / Arizona State University (Publisher)

Created2013

Investigations of the role of high-level cognitive skills in the text production process

Description

Writing is an intricate cognitive and social process that involves the production of texts for the purpose of conveying meaning to others. The importance of lower level cognitive skills and language knowledge during this text production process has been well documented in the literature. However, the role of higher level…

Writing is an intricate cognitive and social process that involves the production of texts for the purpose of conveying meaning to others. The importance of lower level cognitive skills and language knowledge during this text production process has been well documented in the literature. However, the role of higher level skills (e.g., metacognition, strategy use, etc.) has been less strongly emphasized. This thesis proposal examines higher level cognitive skills in the context of persuasive essay writing. Specifically, two published manuscripts are presented, which both examine the role of higher level skills in the context of writing. The first manuscript investigates the role of metacognition in the writing process by examining the accuracy and characteristics of students' self-assessments of their essays. The second manuscript takes an individual differences approach and examines whether the higher level cognitive skills commonly associated with reading comprehension are also related to performance on writing tasks. Taken together, these manuscripts point towards a strong role of higher level skills in the writing process and provide a strong foundation on which to develop future research and educational interventions.

ContributorsAllen, Laura K (Author) / McNamara, Danielle S. (Thesis advisor) / Connor, Carol (Committee member) / Glenberg, Arthur (Committee member) / Arizona State University (Publisher)

Created2014

Charlotte Burton, clarinet

ContributorsBurton, Charlotte (Performer) / ASU Library. Music Library (Publisher)

Created2018-04-08

Elizabeth Druesedow, clarinet

ContributorsDruesedow, Elizabeth (Performer) / ASU Library. Music Library (Publisher)

Created2018-04-07

A recording and performance guide for three new works featuring clarinet and electronics, clarinet and piano, and clarinet, bass clarinet, and piano

Description

This project includes a recording and performance guide for three newly commissioned pieces for the clarinet. The first piece, shimmer, was written by Grant Jahn and is for B-flat clarinet and electronics. The second piece, Paragon, is for B-flat clarinet and piano and was composed by Dr. Theresa Martin. The…

This project includes a recording and performance guide for three newly commissioned pieces for the clarinet. The first piece, shimmer, was written by Grant Jahn and is for B-flat clarinet and electronics. The second piece, Paragon, is for B-flat clarinet and piano and was composed by Dr. Theresa Martin. The third and final piece, Duality in the Eye of a Bovine, was written by Kurt Mehlenbacher and is for B-flat clarinet, bass clarinet, and piano. In addition to the performance guide, this document also includes background information and program notes for the compositions, as well as composer biographical information, a list of other works featuring the clarinet by each composer, and transcripts of composer and performer interviews. This document is accompanied by a recording of the three pieces.

ContributorsPoupard, Caitlin Marie (Author) / Spring, Robert (Thesis advisor) / Gardner, Joshua (Thesis advisor) / Hill, Gary (Committee member) / Oldani, Robert (Committee member) / Schuring, Martin (Committee member) / Arizona State University (Publisher)

Created2016

A recording project featuring four newly commissioned pieces for clarinet

Description

The primary objective of this research project is to expand the clarinet repertoire with the addition of four new pieces. Each of these new pieces use contemporary clarinet techniques, including electronics, prerecorded sounds, multiphonics, circular breathing, multiple articulation, demi-clarinet, and the clari-flute. The repertoire composed includes Grant Jahn’s Duo for…

The primary objective of this research project is to expand the clarinet repertoire with the addition of four new pieces. Each of these new pieces use contemporary clarinet techniques, including electronics, prerecorded sounds, multiphonics, circular breathing, multiple articulation, demi-clarinet, and the clari-flute. The repertoire composed includes Grant Jahn’s Duo for Two Clarinets, Reggie Berg’s Funkalicious for Clarinet and Piano, Rusty Banks’ Star Juice for Clarinet and Fixed Media, and Chris Malloy’s A Celestial Breath for Clarinet and Electronics. In addition to the musical commissions, this project also includes interviews with the composers indicating how they wrote these works and what their influences were, along with any information pertinent to the performer, professional recordings of each piece, as well as performance notes and suggestions.

ContributorsCase-Ruchala, Celeste Ann (Contributor) / Gardner, Joshua (Thesis advisor) / Spring, Robert (Thesis advisor) / Hill, Gary (Committee member) / Rogers, Rodney (Committee member) / Schuring, Martin (Committee member) / Arizona State University (Publisher)

Created2016

Katrina Clements, clarinet

ContributorsClements, Katrina (Performer) / ASU Library. Music Library (Publisher)

Created2018-03-15

Mining signed social networks using unsupervised learning algorithms

Description

Due to vast resources brought by social media services, social data mining has

received increasing attention in recent years. The availability of sheer amounts of

user-generated data presents data scientists both opportunities and challenges. Opportunities are presented with additional data sources. The abundant link information

in social networks could provide another rich source…

Due to vast resources brought by social media services, social data mining has

received increasing attention in recent years. The availability of sheer amounts of

user-generated data presents data scientists both opportunities and challenges. Opportunities are presented with additional data sources. The abundant link information

in social networks could provide another rich source in deriving implicit information

for social data mining. However, the vast majority of existing studies overwhelmingly

focus on positive links between users while negative links are also prevailing in real-

world social networks such as distrust relations in Epinions and foe links in Slashdot.

Though recent studies show that negative links have some added value over positive

links, it is dicult to directly employ them because of its distinct characteristics from

positive interactions. Another challenge is that label information is rather limited

in social media as the labeling process requires human attention and may be very

expensive. Hence, alternative criteria are needed to guide the learning process for

many tasks such as feature selection and sentiment analysis.

To address above-mentioned issues, I study two novel problems for signed social

networks mining, (1) unsupervised feature selection in signed social networks; and

(2) unsupervised sentiment analysis with signed social networks. To tackle the first problem, I propose a novel unsupervised feature selection framework SignedFS. In

particular, I model positive and negative links simultaneously for user preference

learning, and then embed the user preference learning into feature selection. To study the second problem, I incorporate explicit sentiment signals in textual terms and

implicit sentiment signals from signed social networks into a coherent model Signed-

Senti. Empirical experiments on real-world datasets corroborate the effectiveness of

these two frameworks on the tasks of feature selection and sentiment analysis.

ContributorsCheng, Kewei (Author) / Liu, Huan (Thesis advisor) / Tong, Hanghang (Committee member) / Baral, Chitta (Committee member) / Arizona State University (Publisher)

Created2017

Filtering by