Search Content

Alternative methods via random forest to identify interactions in a general framework and variable importance in the context of value-added models

Description

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’…

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions.

ContributorsValdivia, Arturo (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Broatch, Jennifer (Committee member) / Arizona State University (Publisher)

Created2013

The difference in attributions of success and failure, out-of-class engagement, and predictions of future success of middle school band students in open and closed composition tasks

Description

The purpose of this study was to compare perceptions of success and failure, attributions of success and failure, predictions of future success, and reports of out-of-class engagement in composition among middle school band students composing in open task conditions (n = 32) and closed task conditions (n = 31). Two…

The purpose of this study was to compare perceptions of success and failure, attributions of success and failure, predictions of future success, and reports of out-of-class engagement in composition among middle school band students composing in open task conditions (n = 32) and closed task conditions (n = 31). Two intact band classes at the same middle school were randomly assigned to treatment groups. Both treatment groups composed music once a week for eight weeks during their regular band time. In Treatment A (n = 32), the open task group, students were told to compose music however they wished. In Treatment B (n = 31), the closed task group, students were given specific, structured composition assignments to complete each week. At the end of each session, students were asked to complete a Composing Diary in which they reported what they did each week. Their responses were coded for evidence of perceptions of success and failure as well as out-of-class engagement in composing. At the end of eight weeks, students were given three additional measures: the Music Attributions Survey to measure attributions of success and failure on 11 different subscales; the Future Success survey to measure students' predictions of future success; and the Out-of-Class Engagement Letter to measure students' engagement with composition outside of the classroom. Results indicated that students in the open task group and students in the closed task group behaved similarly. There were no significant differences between treatment groups in terms of perceptions of success or failure as composers, predictions of future success composing music, and reports of out-of-class engagement in composition. Students who felt they failed at composing made similar attributions for their failure in both treatment groups. Students who felt they succeeded also made similar attributions for their success in both treatment groups, with one exception. Successful students in the closed task group rated Peer Influence significantly higher than the successful students in the open task group. The findings of this study suggest that understanding individual student's attributions and offering a variety of composing tasks as part of music curricula may help educators meet students' needs.

ContributorsSchwartz, Emily, 1985- (Author) / Stauffer, Sandra L (Thesis advisor) / Tobias, Evan (Committee member) / Schmidt, Margaret (Committee member) / Broatch, Jennifer (Committee member) / Sullivan, Jill (Committee member) / Arizona State University (Publisher)

Created2014

Three essays on correlated binary outcomes: detection and appropriate models

Description

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association…

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types of association. The first paper introduces an ANOVA based measure of intraclass correlation for three level hierarchical data with binary outcomes, and corresponding properties. This measure is useful for evaluating when the correlation due to clustering warrants a more complex model. This measure is used to investigate AIDS knowledge in a clustered study conducted in Bangladesh. The second paper develops the Partitioned generalized method of moments (Partitioned GMM) model for longitudinal studies. This model utilizes valid moment conditions to separately estimate the varying effects of each time-dependent covariate on the outcome over time using multiple coefficients. The model is fit to data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) to investigate risk factors of childhood obesity. In the third paper, the Partitioned GMM model is extended to jointly estimate regression models for multiple outcomes of interest. Thus, this approach takes into account both the correlation between the multivariate outcomes, as well as the correlation due to time-dependency in longitudinal studies. The model utilizes an expanded weight matrix and objective function composed of valid moment conditions to simultaneously estimate optimal regression coefficients. This approach is applied to Add Health data to simultaneously study drivers of outcomes including smoking, social alcohol usage, and obesity in children.

ContributorsIrimata, Kyle (Author) / Wilson, Jeffrey R (Thesis advisor) / Broatch, Jennifer (Committee member) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2018

Applications of Deep Neural Networks to Neurocognitive Poetics: A Quantitative Study of the Project Gutenberg English Poetry Corpus

Description

With the advent of sophisticated computer technology, we increasingly see the use of computational techniques in the study of problems from a variety of disciplines, including the humanities. In a field such as poetry, where classic works are subject to frequent re-analysis over the course of years, decades, or even…

With the advent of sophisticated computer technology, we increasingly see the use of computational techniques in the study of problems from a variety of disciplines, including the humanities. In a field such as poetry, where classic works are subject to frequent re-analysis over the course of years, decades, or even centuries, there is a certain demand for fresh approaches to familiar tasks, and such breaks from convention may even be necessary for the advancement of the field. Existing quantitative studies of poetry have employed computational techniques in their analyses, however, there remains work to be done with regards to the deployment of deep neural networks on large corpora of poetry to classify portions of the works contained therein based on certain features. While applications of neural networks to social media sites, consumer reviews, and other web-originated data are common within computational linguistics and natural language processing, comparatively little work has been done on the computational analysis of poetry using the same techniques. In this work, I begin to lay out the first steps for the study of poetry using neural networks. Using a convolutional neural network to classify author birth date, I was able to not only extract a non-trivial signal from the data, but also identify the presence of clustering within by-author model accuracy. While definitive conclusions about the cause of this clustering were not reached, investigation of this clustering reveals immense heterogeneity in the traits of accurately classified authors. Further study may unpack this clustering and reveal key insights about how temporal information is encoded in poetry. The study of poetry using neural networks remains very open but exhibits potential to be an interesting and deep area of work.

ContributorsGoodloe, Oscar Laurence (Author) / Nishimura, Joel (Thesis director) / Broatch, Jennifer (Committee member) / School of Mathematical and Natural Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Classication for Conservation: A Random Forest Model to Predict Threatened Marine Species

Description

As threats to Earth's biodiversity continue to evolve, an effective methodology to predict such threats is crucial to ensure the survival of living species. Organizations like the International Union for Conservation of Nature (IUCN) monitor the Earth's environmental networks to preserve the sanctity of terrestrial and marine life. The IUCN…

As threats to Earth's biodiversity continue to evolve, an effective methodology to predict such threats is crucial to ensure the survival of living species. Organizations like the International Union for Conservation of Nature (IUCN) monitor the Earth's environmental networks to preserve the sanctity of terrestrial and marine life. The IUCN Red List of Threatened Species informs the conservation activities of governments as a world standard of species' risks of extinction. However, the IUCN's current methodology is, in some ways, inefficient given the immense volume of Earth's species and the laboriousness of its species' risk classification process. IUCN assessors can take years to classify a species' extinction risk, even as that species continues to decline. Therefore, to supplement the IUCN's classification process and thus bolster conservationist efforts for threatened species, a Random Forest model was constructed, trained on a group of fish species previously classified by the IUCN Red List. This Random Forest model both validates the IUCN Red List's classification method and offers a highly efficient, supplemental classification method for species' extinction risk. In addition, this Random Forest model is applicable to species with deficient data, which the IUCN Red List is otherwise unable to classify, thus engendering conservationist efforts for previously obscure species. Although this Random Forest model is built specifically for the trained fish species (Sparidae), the methodology can and should be extended to additional species.

ContributorsWoodyard, Megan (Author) / Broatch, Jennifer (Thesis director) / Polidoro, Beth (Committee member) / Mancenido, Michelle (Committee member) / School of Humanities, Arts, and Cultural Studies (Contributor) / School of Mathematical and Natural Sciences (Contributor) / College of Integrative Sciences and Arts (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Understanding the Role of the Repair Response during Localized Tissue Damage in D. melanogaster

Description

Proper developmental fidelity ensures uninterrupted progression towards sexual maturity and species longevity. However, early development, the time-frame spanning infancy through adolescence, is a fragile state since organisms have limited mobility and responsiveness towards their environment. Previous studies have shown that damage during development leads to an onset of developmental delay…

Proper developmental fidelity ensures uninterrupted progression towards sexual maturity and species longevity. However, early development, the time-frame spanning infancy through adolescence, is a fragile state since organisms have limited mobility and responsiveness towards their environment. Previous studies have shown that damage during development leads to an onset of developmental delay which is proportional to the extent of damage accrued by the organism. In contrast, damage sustained in older organisms does not delay development in response to tissue damage. In the fruit fly, Drosophila melanogaster, damage to wing precursor tissues is associated with developmental retardation if damage is sustained in young larvae. No developmental delay is observed when damage is inflicted closer to pupariation time. Here we use microarray analysis to characterize the genomic response to injury in Drosophila melanogaster in young and old larvae. We also begin to develop tools to examine in more detail, the role that the neurotransmitter dopamine might play in mediating injury-induced developmental delays.

ContributorsContreras Rodriguez, Jesus (Co-author) / Lupone, Teresa (Co-author) / Beckett, Chaz (Co-author) / Almajan, Ashley (Co-author) / Leek, Ty (Co-author) / Hussain, Sabahat (Co-author) / Marsh, Tyler (Co-author) / Broatch, Jennifer (Co-author) / Hackney Price, Jennifer (Thesis director) / Sandrin, Todd (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Filtering by

Alternative methods via random forest to identify interactions in a general framework and variable importance in the context of value-added models

The difference in attributions of success and failure, out-of-class engagement, and predictions of future success of middle school band students in open and closed composition tasks

Three essays on correlated binary outcomes: detection and appropriate models

Applications of Deep Neural Networks to Neurocognitive Poetics: A Quantitative Study of the Project Gutenberg English Poetry Corpus

Classication for Conservation: A Random Forest Model to Predict Threatened Marine Species

Understanding the Role of the Repair Response during Localized Tissue Damage in D. melanogaster

Le tombeau de couperin. IV. Regaudon

Sonata for violin and violoncello

Miroirs

Gaspard de la nuit