Matching Items (12,054)
- Creators: ASU Library. Music Library
Goal specification is an important aspect of designing autonomous agents. A goal does not only refer to the set of states for the agent to reach. A goal also defines restrictions on the paths the agent should follow. Temporal logics are widely used in goal specification. However, they lack the ability to represent goals in a non-deterministic domain, goals that change non-monotonically, and goals with preferences. This dissertation defines new goal specification languages by extending temporal logics to address these issues. First considered is the goal specification in non-deterministic domains, in which an agent following a policy leads to a set of paths. A logic is proposed to distinguish paths of the agent from all paths in the domain. In addition, to address the need of comparing policies for finding the best ones, a language capable of quantifying over policies is proposed. As policy structures of agents play an important role in goal specification, languages are also defined by considering different policy structures. Besides, after an agent is given an initial goal, the agent may change its expectations or the domain may change, thus goals that are previously specified may need to be further updated, revised, partially retracted, or even completely changed. Non-monotonic goal specification languages that can make these changes in an elaboration tolerant manner are needed. Two languages that rely on labeling sub-formulas and connecting multiple rules are developed to address non-monotonicity in goal specification. Also, agents may have preferential relations among sub-goals, and the preferential relations may change as agents achieve other sub-goals. By nesting a comparison operator with other temporal operators, a language with dynamic preferences is proposed. Various goals that cannot be expressed in other languages are expressed in the proposed languages. Finally, plans are given for some goals specified in the proposed languages.
Answering deep queries specified in natural language with respect to a frame based knowledge base and developing related natural language understanding components
Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However, most of the research in Question Answering aims at factual questions rather than deep ones such as ``How'' and ``Why'' questions.
In this dissertation, I suggest a different approach in tackling this problem. We believe that the answers of deep questions need to be formally defined before found.
Because these answers must be defined based on something, it is better to be more structural in natural language text; I define Knowledge Description Graphs (KDGs), a graphical structure containing information about events, entities, and classes. We then propose formulations and algorithms to construct KDGs from a frame-based knowledge base, define the answers of various ``How'' and ``Why'' questions with respect to KDGs, and suggest how to obtain the answers from KDGs using Answer Set Programming. Moreover, I discuss how to derive missing information in constructing KDGs when the knowledge base is under-specified and how to answer many factual question types with respect to the knowledge base.
After having the answers of various questions with respect to a knowledge base, I extend our research to use natural language text in specifying deep questions and knowledge base, generate natural language text from those specification. Toward these goals, I developed NL2KR, a system which helps in translating natural language to formal language. I show NL2KR's use in translating ``How'' and ``Why'' questions, and generating simple natural language sentences from natural language KDG specification. Finally, I discuss applications of the components I developed in Natural Language Understanding.
Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features
Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.