Search Content

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Robust implementation of NL2KR system and it's application in iRODS domain

Description

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to…

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a training corpus of sentences with their translations. As a part of this thesis, we take the prototype NL2KR v.1 system and enhance various components of it to make it usable for somewhat substantial and useful interface languages. We revamped the lexicon learning components, Inverse-lambda and Generalization modules, and redesigned the lexicon learning algorithm which uses these components to learn new meanings of words. Similarly, we re-developed an inbuilt parser of the system in Answer Set Programming (ASP) and also integrated external parser with the system. Apart from this, we added some new rich features like various system configurations and memory cache in the learning component of the NL2KR system. These enhancements helped in learning more meanings of the words, boosted performance of the system by reducing the computation time by a factor of 8 and improved the usability of the system. We evaluated the NL2KR system on iRODS domain. iRODS is a rule-oriented data system, which helps in managing large set of computer files using policies. This system provides a Rule-Oriented interface langauge whose syntactic structure is like any procedural programming language (eg. C). However, direct translation of natural language (NL) to this interface language is difficult. So, for automatic translation of NL to this language, we define a simple intermediate Policy Declarative Language (IPDL) to represent the knowledge in the policies, which then can be directly translated to iRODS rules. We develop a corpus of 100 policy statements and manually translate them to IPDL langauge. This corpus is then used for the evaluation of NL2KR system. We performed 10 fold cross validation on the system. Furthermore, using this corpus, we illustrate how different components of our NL2KR system work.

ContributorsKumbhare, Kanchan Ravishankar (Author) / Baral, Chitta (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2013

Shi Ge, piano

ContributorsShi, Ge (Performer) / ASU Library. Music Library (Publisher)

Created2018-03-25

An intelligent co-reference resolver for Winograd schema sentences containing resolved semantic entities

Description

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities,…

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities, reasoning abilities and background knowledge to get the answer right. It involves a coreference resolution task in which a machine is given a sentence containing a situation which involves two entities, one pronoun and some more information about the situation and the machine has to come up with the right resolution of a pronoun to one of the entities. The complexity of the task is increased with the fact that the Winograd sentences are not constrained by one domain or specific sentence structure and it also contains a lot of human proper names. This modification makes the task of association of entities, to one particular word in the sentence, to derive the answer, difficult. I have developed a pronoun resolver system for the confined domain Winograd sentences. I have developed a classifier or filter which takes input sentences and decides to accept or reject them based on a particular criteria. Once the sentence is accepted. I run parsers on it to obtain the detailed analysis. Furthermore I have developed four answering modules which use world knowledge and inferencing mechanisms to try and resolve the pronoun. The four techniques I use are : ConceptNet knowledgebase, Search engine pattern counts,Narrative event chains and sentiment analysis. I have developed a particular aggregation mechanism for the answers from these modules to arrive at a final answer. I have used caching technique for the association relations that I obtain for different modules, so as to boost the performance. I run my system on the standard ‘nyu dataset’ of Winograd sentences and questions. This dataset is then restricted, by my classifier, to 90 sentences. I evaluate my system on this 90 sentence dataset. When I compare my results against the state of the art system on the same dataset, I get nearly 4.5 % improvement in the restricted domain.

ContributorsBudukh, Tejas Ulhas (Author) / Baral, Chitta (Thesis advisor) / VanLehn, Kurt (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Kristina Shatuho, piano

ContributorsShatuho, Kristina (Performer) / ASU Library. Music Library (Publisher)

Created2018-03-27

Solving Winograd Schema Challenge: using semantic parsing, automatic knowledge acquisition and logical reasoning

Description

Turing test has been a benchmark scale for measuring the human level intelligence in computers since it was proposed by Alan Turing in 1950. However, for last 60 years, the applications such as ELIZA, PARRY, Cleverbot and Eugene Goostman, that claimed to pass the test. These applications are either based…

Turing test has been a benchmark scale for measuring the human level intelligence in computers since it was proposed by Alan Turing in 1950. However, for last 60 years, the applications such as ELIZA, PARRY, Cleverbot and Eugene Goostman, that claimed to pass the test. These applications are either based on tricks to fool humans on a textual chat based test or there has been a disagreement between AI communities on them passing the test. This has led to the school of thought that it might not be the ideal test for predicting the human level intelligence in machines.

Consequently, the Winograd Schema Challenge has been suggested as an alternative to the Turing test. As opposed to deciding the intelligent behavior with the help of chat servers, like it was done in the Turing test, the Winograd Schema Challenge is a question answering test. It consists of sentence and question pairs such that the answer to the question depends on the resolution of a definite pronoun or adjective in the sentence. The answers are fairly intuitive for humans but they are difficult for machines because it requires some sort of background or commonsense knowledge about the sentence.

In this thesis, I propose a novel technique to solve the Winograd Schema Challenge. The technique has three basic modules at its disposal, namely, a Semantic Parser that parses the English text (both sentences and questions) into a formal representation, an Automatic Background Knowledge Extractor that extracts the Background Knowledge pertaining to the given Winograd sentence, and an Answer Set Programming Reasoning Engine that reasons on the given Winograd sentence and the corresponding Background Knowledge. The applicability of the technique is illustrated by solving a subset of Winograd Schema Challenge pertaining to a certain type of Background Knowledge. The technique is evaluated on the subset and a notable accuracy is achieved.

ContributorsSharma, Arpita (Author) / Baral, Chita (Thesis advisor) / Lee, Joohyung (Committee member) / Pon-Barry, Heather (Committee member) / Arizona State University (Publisher)

Created2014

Daniel Carlisi, piano

ContributorsCarlisi, Daniel (Performer) / ASU Library. Music Library (Publisher)

Created2018-04-07

The incorporation of Greek folk melodies in the piano works of Yannis Constantinidis: with special consideration of the 22 songs and dances from the Dodecanese

Description

Yannis Constantinidis was the last of the handful of composers referred to collectively as the Greek National School. The members of this group strove to create a distinctive national style for Greece, founded upon a synthesis of Western compositional idioms with melodic, rhyhmic, and modal features of their local folk…

Yannis Constantinidis was the last of the handful of composers referred to collectively as the Greek National School. The members of this group strove to create a distinctive national style for Greece, founded upon a synthesis of Western compositional idioms with melodic, rhyhmic, and modal features of their local folk traditions. Constantinidis particularly looked to the folk melodies of his native Asia Minor and the nearby Dodecanese Islands. His musical output includes operettas, musical comedies, orchestral works, chamber and vocal music, and much piano music, all of which draws upon folk repertories for thematic material. The present essay examines how he incorporates this thematic material in his piano compositions, written between 1943 and 1971, with a special focus on the 22 Songs and Dances from the Dodecanese. In general, Constantinidis's pianistic style is expressed through miniature pieces in which the folk tunes are presented mostly intact, but embedded in accompaniment based in early twentieth-century modal harmony. Following the dictates of the founding members of the Greek National School, Manolis Kalomiris and Georgios Lambelet, the modal basis of his harmonic vocabulary is firmly rooted in the characteristics of the most common modes of Greek folk music. A close study of his 22 Songs and Dances from the Dodecanese not only offers a valuable insight into his harmonic imagination, but also demonstrates how he subtly adapts his source melodies. This work also reveals his care in creating a musical expression of the words of the original folk songs, even in purely instrumental compositon.

ContributorsSavvidou, Dina (Author) / Hamilton, Robert (Thesis advisor) / Little, Bliss (Committee member) / Meir, Baruch (Committee member) / Thompson, Janice M (Committee member) / Arizona State University (Publisher)

Created2011

Six Chinese piano pieces of the twentieth century: a recording project

Description

This paper describes six representative works by twentieth-century Chinese composers: Jian-Zhong Wang, Er-Yao Lin, Yi-Qiang Sun, Pei-Xun Chen, Ying-Hai Li, and Yi Chen, which are recorded by the author on the CD. The six pieces selected for the CD all exemplify traits of Nationalism, with or without Western influences. Of…

This paper describes six representative works by twentieth-century Chinese composers: Jian-Zhong Wang, Er-Yao Lin, Yi-Qiang Sun, Pei-Xun Chen, Ying-Hai Li, and Yi Chen, which are recorded by the author on the CD. The six pieces selected for the CD all exemplify traits of Nationalism, with or without Western influences. Of the six works on the CD, two are transcriptions of the Han Chinese folk-like songs, one is a composition in the style of the Uyghur folk music, two are transcriptions of traditional Chinese instrumental music dating back to the eighteenth century, and one is an original composition in a contemporary style using folk materials. Two of the composers, who studied in the United States, were strongly influenced by Western compositional style. The other four, who did not study abroad, retained traditional Chinese style in their compositions. The pianistic level of difficulty in these six pieces varies from intermediate to advanced level. This paper includes biographical information for the six composers, background information on the compositions, and a brief analysis of each work. The author was exposed to these six pieces growing up, always believing that they are beautiful and deserve to be appreciated. When the author came to the United States for her studies, she realized that Chinese compositions, including these six pieces, were not sufficiently known to her peers. This recording and paper are offered in the hopes of promoting a wider familiarity with Chinese music and culture.

ContributorsLuo, Yali, D.M.A (Author) / Hamilton, Robert (Thesis advisor) / Campbell, Andrew (Committee member) / Pagano, Caio (Committee member) / Cosand, Walter (Committee member) / Rogers, Rodney (Committee member) / Arizona State University (Publisher)

Created2012

Survey of selected contemporary Taiwanese female composers of music for solo piano

Description

The purpose of this project was to examine the lives and solo piano works of four members of the early generation of female composers in Taiwan. These four women were born between 1950 and 1960, began to appear on the Taiwanese musical scene after 1980, and were still active as…

The purpose of this project was to examine the lives and solo piano works of four members of the early generation of female composers in Taiwan. These four women were born between 1950 and 1960, began to appear on the Taiwanese musical scene after 1980, and were still active as composers at the time of this study. They include Fan-Ling Su (b. 1955), Hwei-Lee Chang (b. 1956), Shyh-Ji Pan-Chew (b. 1957), and Kwang-I Ying (b. 1960). Detailed biographical information on the four composers is presented and discussed. In addition, the musical form and features of all solo piano works at all levels by the four composers are analyzed, and the musical characteristics of each composer's work are discussed. The biography of a fifth composer, Wei-Ho Dai (b. 1950), is also discussed but is placed in the Appendices because her piano music could not be located. This research paper is presented in six chapters: (1) Prologue; the life and music of (2) Fan-Ling Su, (3) Hwei-Lee Chang, (4) Shyh-Ji Pan-Chew, and (5) Kwang-I Ying; and (6) Conclusion. The Prologue provides an overview of the development of Western classical music in Taiwan, a review of extant literature on the selected composers and their music, and the development of piano music in Taiwan. The Conclusion is comprised of comparisons of the four composers' music, including their personal interests and preferences as exhibited in their music. For example, all of the composers have used atonality in their music. Two of the composers, Fan-Ling Su and Kwang-I Ying, openly apply Chinese elements in their piano works, while Hwei-Lee Chang tries to avoid direct use of the Chinese pentatonic scale. The piano works of Hwei-Lee Chang and Shyh-Ji Pan-Chew are chromatic and atonal, and show an economical usage of material. Biographical information on Wei-Ho Dai and an overview of Taiwanese history are presented in the Appendices.

ContributorsWang, Jinding (Author) / Pagano, Caio (Thesis advisor) / Campbell, Andrew (Committee member) / Humphreys, Jere T. (Committee member) / Meyer-Thompson, Janice (Committee member) / Norton, Kay (Committee member) / Arizona State University (Publisher)

Created2011

Filtering by