Search Content

Reproducibility and Repeatability Experiment with Nested Factors in Fingerprint Age Analysis

Description

Gage reproducibility and repeatability methods do not account for a mix of random and fixed effects, nested factors, and repeated measures. Using a case study in fingerprint analysis, we propose a new method using linear mixed effects models to determine the decomposition of the variation components in a measurement system.…

Gage reproducibility and repeatability methods do not account for a mix of random and fixed effects, nested factors, and repeated measures. Using a case study in fingerprint analysis, we propose a new method using linear mixed effects models to determine the decomposition of the variation components in a measurement system. The fingerprint analysis tests whether the measuring system for ridge widths is reproducible and repeatable. Using the new model and traditional measurement systems analysis metrics, we found that the current process to measure ridge widths is not adequate. Further, we discovered that it is possible to use a linear mixed model to decompose the variance of a measurement system.

ContributorsJohanson, Jena (Author) / Mancenido, Michelle (Thesis director) / Broatch, Jennifer (Committee member) / School of International Letters and Cultures (Contributor) / School of Mathematical and Natural Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Applications of Deep Neural Networks to Neurocognitive Poetics: A Quantitative Study of the Project Gutenberg English Poetry Corpus

Description

With the advent of sophisticated computer technology, we increasingly see the use of computational techniques in the study of problems from a variety of disciplines, including the humanities. In a field such as poetry, where classic works are subject to frequent re-analysis over the course of years, decades, or even…

With the advent of sophisticated computer technology, we increasingly see the use of computational techniques in the study of problems from a variety of disciplines, including the humanities. In a field such as poetry, where classic works are subject to frequent re-analysis over the course of years, decades, or even centuries, there is a certain demand for fresh approaches to familiar tasks, and such breaks from convention may even be necessary for the advancement of the field. Existing quantitative studies of poetry have employed computational techniques in their analyses, however, there remains work to be done with regards to the deployment of deep neural networks on large corpora of poetry to classify portions of the works contained therein based on certain features. While applications of neural networks to social media sites, consumer reviews, and other web-originated data are common within computational linguistics and natural language processing, comparatively little work has been done on the computational analysis of poetry using the same techniques. In this work, I begin to lay out the first steps for the study of poetry using neural networks. Using a convolutional neural network to classify author birth date, I was able to not only extract a non-trivial signal from the data, but also identify the presence of clustering within by-author model accuracy. While definitive conclusions about the cause of this clustering were not reached, investigation of this clustering reveals immense heterogeneity in the traits of accurately classified authors. Further study may unpack this clustering and reveal key insights about how temporal information is encoded in poetry. The study of poetry using neural networks remains very open but exhibits potential to be an interesting and deep area of work.

ContributorsGoodloe, Oscar Laurence (Author) / Nishimura, Joel (Thesis director) / Broatch, Jennifer (Committee member) / School of Mathematical and Natural Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Classication for Conservation: A Random Forest Model to Predict Threatened Marine Species

Description

As threats to Earth's biodiversity continue to evolve, an effective methodology to predict such threats is crucial to ensure the survival of living species. Organizations like the International Union for Conservation of Nature (IUCN) monitor the Earth's environmental networks to preserve the sanctity of terrestrial and marine life. The IUCN…

As threats to Earth's biodiversity continue to evolve, an effective methodology to predict such threats is crucial to ensure the survival of living species. Organizations like the International Union for Conservation of Nature (IUCN) monitor the Earth's environmental networks to preserve the sanctity of terrestrial and marine life. The IUCN Red List of Threatened Species informs the conservation activities of governments as a world standard of species' risks of extinction. However, the IUCN's current methodology is, in some ways, inefficient given the immense volume of Earth's species and the laboriousness of its species' risk classification process. IUCN assessors can take years to classify a species' extinction risk, even as that species continues to decline. Therefore, to supplement the IUCN's classification process and thus bolster conservationist efforts for threatened species, a Random Forest model was constructed, trained on a group of fish species previously classified by the IUCN Red List. This Random Forest model both validates the IUCN Red List's classification method and offers a highly efficient, supplemental classification method for species' extinction risk. In addition, this Random Forest model is applicable to species with deficient data, which the IUCN Red List is otherwise unable to classify, thus engendering conservationist efforts for previously obscure species. Although this Random Forest model is built specifically for the trained fish species (Sparidae), the methodology can and should be extended to additional species.

ContributorsWoodyard, Megan (Author) / Broatch, Jennifer (Thesis director) / Polidoro, Beth (Committee member) / Mancenido, Michelle (Committee member) / School of Humanities, Arts, and Cultural Studies (Contributor) / School of Mathematical and Natural Sciences (Contributor) / College of Integrative Sciences and Arts (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Understanding the Role of the Repair Response during Localized Tissue Damage in D. melanogaster

Description

Proper developmental fidelity ensures uninterrupted progression towards sexual maturity and species longevity. However, early development, the time-frame spanning infancy through adolescence, is a fragile state since organisms have limited mobility and responsiveness towards their environment. Previous studies have shown that damage during development leads to an onset of developmental delay…

Proper developmental fidelity ensures uninterrupted progression towards sexual maturity and species longevity. However, early development, the time-frame spanning infancy through adolescence, is a fragile state since organisms have limited mobility and responsiveness towards their environment. Previous studies have shown that damage during development leads to an onset of developmental delay which is proportional to the extent of damage accrued by the organism. In contrast, damage sustained in older organisms does not delay development in response to tissue damage. In the fruit fly, Drosophila melanogaster, damage to wing precursor tissues is associated with developmental retardation if damage is sustained in young larvae. No developmental delay is observed when damage is inflicted closer to pupariation time. Here we use microarray analysis to characterize the genomic response to injury in Drosophila melanogaster in young and old larvae. We also begin to develop tools to examine in more detail, the role that the neurotransmitter dopamine might play in mediating injury-induced developmental delays.

ContributorsContreras Rodriguez, Jesus (Co-author) / Lupone, Teresa (Co-author) / Beckett, Chaz (Co-author) / Almajan, Ashley (Co-author) / Leek, Ty (Co-author) / Hussain, Sabahat (Co-author) / Marsh, Tyler (Co-author) / Broatch, Jennifer (Co-author) / Hackney Price, Jennifer (Thesis director) / Sandrin, Todd (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Alternative methods via random forest to identify interactions in a general framework and variable importance in the context of value-added models

Description

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’…

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions.

ContributorsValdivia, Arturo (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Broatch, Jennifer (Committee member) / Arizona State University (Publisher)

Created2013

ANALYSIS OF FIGHT OUTCOMES IN THE UFC AND THE EFFICACY OF PREDICTING FIGHT OUTCOMES ESPECIALLY IN RELATION TO SPORTS BETTING

Description

In this study, models will be introduced which are developed from historical UFC data and aim to predict the fight outcomes between mixed martial arts fighters within the UFC. The paper will explore multivariate linear probability regression analysis using variables which were provided and developed from a large dataset to…

In this study, models will be introduced which are developed from historical UFC data and aim to predict the fight outcomes between mixed martial arts fighters within the UFC. The paper will explore multivariate linear probability regression analysis using variables which were provided and developed from a large dataset to effectively predict the probability of a fighter winning a given fight. It will analyze several multivariate regression models and compare, internally, the accuracy of each model and account for limitations within the models. Then, the model’s efficacy will be tested by recent UFC fights and adjusted to find a more accurate equation that maximizes profit in sports betting using implied probability from betting odds and comparing them to the model’s predicted probabilities.

ContributorsTufte, Nicholas (Author) / Hill, Alexander (Thesis director) / Broatch, Jennifer (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Natural Sciences (Contributor)

Created2022-12

Data Science Exploration

Description

I have challenged myself to learn Python. I did this because I wanted to improve myself and my mindset around coding. My view on coding has changed immensely. I was intimidated by the social stigmas around coding, but I have become more comfortable with it. There were times when I…

I have challenged myself to learn Python. I did this because I wanted to improve myself and my mindset around coding. My view on coding has changed immensely. I was intimidated by the social stigmas around coding, but I have become more comfortable with it. There were times when I thought that I would never understand something, but it became familiar. Through constant exposure, such as completing modules in DataCamp and Kaggle, I better understood the basics and uses of different models. The concepts I had learned before became clearer by completing a project I was genuinely interested in. I could search for a solution or ask my thesis director if I had an error. I enjoyed working with my thesis professor and failing many times. I have learned that I do not have to be a master within the year but must remain consistent with my practice. I will continue to practice and learn more about coding now with more confidence.

ContributorsMorales, Abril (Author) / Nishimura, Joel (Thesis director) / Broatch, Jennifer (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Natural Sciences (Contributor)

Created2023-05

Meta-Analysis for Multi-Cancer Early Detection Biomarker Discovery

Description

Cancer poses a significant worldwide burden where ongoing efforts are targeted towards improving patient outcomes in which a significant contribution results from cancer screening. Multi-cancer early detection tests have been introduced which measure a series of biomarkers to detect signals that may indicate carcinogenesis in its earliest stages and work…

Cancer poses a significant worldwide burden where ongoing efforts are targeted towards improving patient outcomes in which a significant contribution results from cancer screening. Multi-cancer early detection tests have been introduced which measure a series of biomarkers to detect signals that may indicate carcinogenesis in its earliest stages and work in tandem with other diagnostic techniques to localize and verify tumor formation across multiple cancer types. Molecular biomarkers such as autoantibodies are promising candidates for early detection across multiple cancers. This study identifies autoantibodies that are aberrantly expressed across multiple cancer types that may be used to discriminate between healthy individuals and those with cancer from a single serum sample. Multiple datasets are integrated from prior studies to examine 8,200 sera autoantibodies from 5 cancer types including lung adenocarcinoma, basal-like breast cancer, advanced colorectal cancer, ovarian cancer, and HER2+ breast cancer. The diagnostic utility of these autoantibodies is assessed for combined cancer types by meta-receiver operating characteristic (ROC) curve analysis. A meta-analysis data processing pipeline is utilized for processing each biomarker with statistical analysis performed across ROC metrics for each meta-curve including partial area under the curve and sensitivity at a 90% specificity threshold. Results identified 26 autoantibody biomarkers that are useful for multi-cancer detection and may be developed for future clinical applications in cancer screening.

ContributorsGoeringer, Cayden Brett (Author) / Chung, Yunro (Thesis advisor) / Broatch, Jennifer (Thesis advisor) / Hart, Steven (Committee member) / Arizona State University (Publisher)

Created2024

Unveiling Ancestral Echoes in Cancer Fusion Proteins through Structural Homology and Evolutionary Analysis

Description

Fusion genes, arising from chromosomal translocations through nonallelic homologous recombination (NAHR), are pivotal in oncogenesis, leading to the formation of fusion proteins that contribute to cancer’s aggressive nature. The atavism theory posits that cancer is a throwback to an ancient cellular state, with reactivated ancestral cellular mechanisms driving uncontrolled growth…

Fusion genes, arising from chromosomal translocations through nonallelic homologous recombination (NAHR), are pivotal in oncogenesis, leading to the formation of fusion proteins that contribute to cancer’s aggressive nature. The atavism theory posits that cancer is a throwback to an ancient cellular state, with reactivated ancestral cellular mechanisms driving uncontrolled growth and other cancerous traits. By comparing the evolutionary ages of the structural homologs of fusion proteins with those of their parental gene pairs, this study aims to determine whether these fusion proteins recapitulate ancient protein structures, thereby supporting the atavism theory.Utilizing data from the COSMIC database, fusion genes were constructed according to their corresponding cDNA sequences from parent gene pairs, and the 3D structures of resultant fusion proteins were predicted by using AlphaFold. Subsequent VAST analysis identified structural homologies with ancient proteins. The ages of original and fusion proteins were inferred by mapping homologous groups from the Ensembl Compara database to identify common ancestors. The TimeTree database was then used to assign gene ages based on the divergence of the most distantly related species in these groups. Finally, comparing these ages identified ancestral resemblances. The findings of this project demonstrate homology between the structures of most fusion proteins and those of ancient proteins found in humans, yeast, and bacteria, suggesting the re-emergency of ancient protein structures in cancer cells due to recurrent translocations. (Permutation test, p=0.0201). Additionally, a large portion (68%) of the examined fusion genes comprises one gene predating the advent of multicellularity and another emerging concurrently with or after this evolutionary milestone (One-sample proportions test, X-squared=13.291, df=1, p=0.00027). These results support the atavism theory, suggesting that such fusion events might bridge evolutionary gaps between unicellular and multicellular life forms. This could potentially explain the mechanisms behind cancer’s tendency to forsake multicellular characteristics, thereby enhancing malignancy. By illustrating how chromosomal translocations in cancer might be tapping into primordial protein architectures, this study not only provides evidence for the atavism theory but also opens new avenues for understanding cancer’s evolutionary underpinnings. This could lead to novel therapeutic strategies by exploiting the ancient vulnerabilities revealed through chromosomal translocations.

ContributorsSun, Shuyu (Author) / Bussey, Kimberly J. (Thesis advisor) / Broatch, Jennifer (Thesis advisor) / Marshall, Pamela A. (Committee member) / Arizona State University (Publisher)

Created2024

Effects of Lingual Frenectomies on Breastfeeding Dyads

Description

The purpose of this research was to determine the impact of undergoing a lingual frenectomies to fix partial ankyloglossia on breastfeeding function the mother infant dyad after completion of the procedure. Changes in breastfeeding were determined using FLIP (Flow, Latch, Injury, Post Feeding Behavior), a validated self-report questionnaire that classifies…

The purpose of this research was to determine the impact of undergoing a lingual frenectomies to fix partial ankyloglossia on breastfeeding function the mother infant dyad after completion of the procedure. Changes in breastfeeding were determined using FLIP (Flow, Latch, Injury, Post Feeding Behavior), a validated self-report questionnaire that classifies the severity of breastfeeding dysfunction associated with partial ankyloglossia. Through this, we can diagnose at-risk dyads and determine treatment options. The analysis revealed that 75% of respondents saw significant improvements in the severity and/or frequency of symptoms after completion of the procedure.

ContributorsPrabakaran, Glenny (Author) / Broatch, Jennifer (Thesis director) / Bussey, Kimberly (Committee member) / Barrett, The Honors College (Contributor) / Historical, Philosophical & Religious Studies, Sch (Contributor) / School of Life Sciences (Contributor)

Created2023-05