Search Content

Matching Items (2)

Filtering by

All Subjects: Linguistics
Genre: Doctoral Dissertation

Shibboleth: an automated foreign accent identification program

Description

The speech of non-native (L2) speakers of a language contains phonological rules that differentiate them from native speakers. These phonological rules characterize or distinguish accents in an L2. The Shibboleth program creates combinatorial rule-sets to describe the phonological pattern of these accents and classifies L2 speakers into their native language. The training and classification is done in Shibboleth by support vector machines using a Gaussian radial basis kernel. In one experiment run using Shibboleth, the program correctly identified the native language (L1) of a speaker of unknown origin 42% of the time when there were six possible L1s in which to classify the speaker. This rate is significantly better than the 17% chance classification rate. Chi-squared test (1, N=24) =10.800, p=.0010 In a second experiment, Shibboleth was not able to determine the native language family of a speaker of unknown origin at a rate better than chance (33-44%) when the L1 was not in the transcripts used for training the language family rule-set. Chi-squared test (1, N=18) =1.000, p=.3173 The 318 participants for both experiments were from the Speech Accent Archive (Weinberger, 2013), and ranged in age from 17 to 80 years old. Forty percent of the speakers were female and 60% were male. The factor that most influenced correct classification was higher age of onset for the L2. A higher number of years spent living in an English-speaking country did not have the expected positive effect on classification.

ContributorsFrost, Wende (Author) / Gelderen, Elly van (Thesis advisor) / Perzanowski, Dennis (Committee member) / Gee, Elisabeth (Committee member) / Arizona State University (Publisher)

Created2013

Who's blogging now?: linguistic features and authorship analysis in sports blogs

Description

The field of authorship determination, previously largely falling under the umbrella of literary analysis but recently becoming a large subfield of forensic linguistics, has grown substantially over the last two decades. As its body of research and its record of successful forensic application continue to grow, this growth is paralleled by the demand for its application. However, methods which have undergone rigorous testing to show their reliability and replicability, allowing them to meet the strict Daubert criteria put forth by the US court system, have not truly been established.

In this study, I set out to investigate how a list of parameters, many commonly used in the methodologies of previous researchers, would perform when used to test documents of bloggers from a sports blog, Winging It in Motown. Three prolific bloggers were chosen from the site, and a corpus of posts was created for each blogger which was then examined for each of the chosen parameters. One test document for each of the three bloggers which was not included in that blogger’s corpus was then chosen from the blog page, and these documents were examined for each of the parameters via the same methodologies as were used to examine the corpora. Once data for the corpora and all three test documents was obtained, the results were compared for similarity, and an author determination was made for each test document along each parameter.

The findings indicated that overall the parameters were quite unsuccessful in determining authorship for these test documents based on the author corpora developed for the study. Only two parameters successfully identified the authors of the test documents at a rate higher than chance, and the possibility exists that other factors may be driving these successful identifications, demanding further research to confirm their validity as parameters for the purpose of authorship work.

ContributorsCox, Taylor (Author) / Gelderen, Elly van (Thesis advisor) / Gillon, Carrie (Committee member) / Gee, Elisabeth (Committee member) / Arizona State University (Publisher)

Created2017