Matching Items (3)
187720-Thumbnail Image.png
Description
Decision trees is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. Conditional Inference Trees (CTREEs) is a non-parametric class of decision trees

Decision trees is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. Conditional Inference Trees (CTREEs) is a non-parametric class of decision trees that uses statistical theory in order to select variables for splitting. Missing data can be problematic in decision trees because of an inability to place an observation with a missing value into a node based on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place observations with missing values. Simple missing data approaches (e.g., deletion, majority rule, and surrogate split) have been implemented in decision tree algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. In addition to these approaches, this dissertation proposed a modified multiple imputation approach to handling missing data in CTREEs. A simulation was conducted to compare this approach with simple missing data approaches as well as single imputation and a multiple imputation with prediction averaging. Results revealed that simple approaches (i.e., majority rule, treat missing as its own category, and listwise deletion) were effective in handling missing data in CTREEs. The modified multiple imputation approach did not perform very well against simple approaches in most conditions, but this approach did seem best suited for small sample sizes and extreme missingness situations.
ContributorsManapat, Danielle Marie (Author) / Grimm, Kevin J (Thesis advisor) / Edwards, Michael C (Thesis advisor) / McNeish, Daniel (Committee member) / Anderson, Samantha F (Committee member) / Arizona State University (Publisher)
Created2023
157699-Thumbnail Image.png
Description
Network analysis is a key conceptual orientation and analytical tool in the social sciences that emphasizes the embeddedness of individual behavior within a larger web of social relations. The network approach is used to better understand the cause and consequence of social interactions which cannot be treated as independent. The

Network analysis is a key conceptual orientation and analytical tool in the social sciences that emphasizes the embeddedness of individual behavior within a larger web of social relations. The network approach is used to better understand the cause and consequence of social interactions which cannot be treated as independent. The relational nature of network data and models, however, amplify the methodological concerns associated with inaccurate or missing data. This dissertation addresses such concerns via three projects. As a motivating substantive example, Project 1 examines factors associated with the selection of interaction partners by students at a large urban high school implementing a reform which, like many organizational improvement initiatives, is associated with a theory of change that posits changes to the structuring of social interactions as a central causal pathway to improved outcomes. A distinctive aspect of the data used in Project 1 is that it was a complete egocentric network census – in addition to being asked about their own relationships, students were asked about the relationships between alters that they nominated in the self-report. This enables two unique examinations of methodological challenges in network survey data collection: Project 2 examines the factors related to how well survey respondents assess the strength of social connections between others, finding that "informant" competence corresponds positively with their social proximity to target dyad as well as their centrality in the network. Project 3 explores using such third-party reports to augment network imputation methods, and finds that incorporating third-party reports into model-based methods provides a significant boost in imputation accuracy. Together these findings provide important implications for collecting and extrapolating data in research contexts where a complete social network census is highly desirable but infeasible.
ContributorsBates, Jordan T (Author) / Maroulis, Spiro J (Thesis advisor) / Kang, Yun (Thesis advisor) / Frank, Kenneth A. (Committee member) / Arizona State University (Publisher)
Created2019
161812-Thumbnail Image.png
Description
Published in 1992, “The osteological paradox: problems of inferring prehistoric health from skeletal samples” highlighted the limitations of interpreting population health from archaeological skeletal samples. The authors drew the attention of the bioarchaeological community to several unfounded assumptions in the field of paleopathology. They cautioned that bioarchaeologists needed to expand

Published in 1992, “The osteological paradox: problems of inferring prehistoric health from skeletal samples” highlighted the limitations of interpreting population health from archaeological skeletal samples. The authors drew the attention of the bioarchaeological community to several unfounded assumptions in the field of paleopathology. They cautioned that bioarchaeologists needed to expand their methodological and theoretical toolkits and examine how variation in frailty influences mortality outcomes. This dissertation undertakes this task by 1) establishing a new approach for handling missing paleopathology data that facilitates the use of new analytical methods for exploring frailty and resiliency in skeletal data, and 2) investigating the role of prior frailty in shaping selective mortality in an underexplored epidemic context. The first section takes the initial step of assessing current techniques for handling missing data in bioarchaeology and testing protocols for imputation of missing paleopathology variables. A review of major bioarchaeological journals searching for terms describing the treatment of missing data are compiled. The articles are sorted by subject topic and into categories based on the statistical and theoretical rigor of how missing data are handled. A case study test of eight methods for handling missing data is conducted to determine which methods best produce unbiased parameter estimates. The second section explores how pre-existing frailty influenced mortality during the 1918 influenza pandemic. Skeletal lesion data are collected from a sample of 424 individuals from the Hamann-Todd Documented Collection. Using Kaplan-Meier and Cox proportional hazards, this chapter tests whether individuals who were healthy (i.e. non-frail) were equally likely to die during the flu as frail individuals. Results indicate that imputation is underused in bioarchaeology, therefore procedures for imputing ordinal and continuous paleopathology data are established. The findings of the second section reveal that while a greater proportion of non-frail individuals died during the 1918 pandemic compared to pre-flu times, frail individuals were more likely to die at all times. The outcomes of this dissertation help expand the types of statistical analyses that can be performed using paleopathology data. They contribute to the field’s knowledge of selective mortality and differential frailty during a major historical pandemic.
ContributorsWissler, Amanda (Author) / Buikstra, Jane E (Thesis advisor) / DeWitte, Sharon N (Committee member) / Stojanowski, Christopher M (Committee member) / Mamelund, Svenn-Erik (Committee member) / Arizona State University (Publisher)
Created2021