Semantic feature extraction for narrative analysis

Ceran, Saadet Betul

A story is defined as "an actor(s) taking action(s) that culminates in a resolution(s)''. I present novel sets of features to facilitate story detection among text via supervised classification and further reveal different forms within stories via unsupervised clustering. First,…

A story is defined as "an actor(s) taking action(s) that culminates in a resolution(s)''. I present novel sets of features to facilitate story detection among text via supervised classification and further reveal different forms within stories via unsupervised clustering. First, I investigate the utility of a new set of semantic features compared to standard keyword features combined with statistical features, such as density of part-of-speech (POS) tags and named entities, to develop a story classifier. The proposed semantic features are based on triplets that can be extracted using a shallow parser. Experimental results show that a model of memory-based semantic linguistic features alongside statistical features achieves better accuracy. Next, I further improve the performance of story detection with a novel algorithm which aggregates the triplets producing generalized concepts and relations. A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets, and victims) as different objects, potentially missing common narrative patterns. The algorithm clusters triplets into generalized concepts by utilizing syntactic criteria based on common contexts and semantic corpus-based statistical criteria based on "contextual synonyms''. Generalized concepts representation of text (1) overcomes surface level differences (which arise when different keywords are used for related concepts) without drift, (2) leads to a higher-level semantic network representation of related stories, and (3) when used as features, they yield a significant (36%) boost in performance for the story detection task. Finally, I implement co-clustering based on generalized concepts/relations to automatically detect story forms. Overlapping generalized concepts and relationships correspond to archetypes/targets and actions that characterize story forms. I perform co-clustering of stories using standard unigrams/bigrams and generalized concepts. I show that the residual error of factorization with concept-based features is significantly lower than the error with standard keyword-based features. I also present qualitative evaluations by a subject matter expert, which suggest that concept-based features yield more coherent, distinctive and interesting story forms compared to those produced by using standard keyword-based features.

Copyright Statement

Reuse Permissions

Downloads

pdf (1.9 MB)

Details

Title

Semantic feature extraction for narrative analysis

Contributors

Ceran, Saadet Betul (Author)
Davulcu, Hasan (Thesis advisor)
Corman, Steven R. (Committee member)
Shakarian, Paulo (Committee member)
Ye, Jieping (Committee member)
Arizona State University (Publisher)

Date Created

2016

Subjects

Resource Type

Text

Collections this item is in

ASU Electronic Theses and Dissertations

Note

Partial requirement for: Ph.D., Arizona State University, 2016

Note type

thesis
Includes bibliographical references (pages 62-66)

Note type

bibliography
Field of study: Computer science

Semantic feature extraction for narrative analysis

Details

Citation and reuse

Statement of Responsibility

Machine-readable links