Matching Items (3)

147738-Thumbnail Image.png

Understanding the Impact of Varied Testing and Infection Rates on Covid-19 Impact Across Age-Based Populations

Description

Covid-19 is unlike any coronavirus we have seen before, characterized mostly by the ease with which it spreads. This analysis utilizes an SEIR model built to accommodate various populations to

Covid-19 is unlike any coronavirus we have seen before, characterized mostly by the ease with which it spreads. This analysis utilizes an SEIR model built to accommodate various populations to understand how different testing and infection rates may affect hospitalization and death. This analysis finds that infection rates have a significant impact on Covid-19 impact regardless of the population whereas the impact that testing rates have in this simulation is not as pronounced. Thus, policy-makers should focus on decreasing infection rates through targeted lockdowns and vaccine rollout to contain the virus, and decrease its spread.

Contributors

Created

Date Created
  • 2021-05

132157-Thumbnail Image.png

NBA Player Clustering: Exploring Player Archetypes in a Changing NBA

Description

The findings of this project show that through the use of principal component analysis and K-Means clustering, NBA players can be algorithmically classified in distinct clusters, representing a player archetype.

The findings of this project show that through the use of principal component analysis and K-Means clustering, NBA players can be algorithmically classified in distinct clusters, representing a player archetype. Individual player data for the 2018-2019 regular season was collected for 150 players, and this included regular per game statistics, such as rebounds, assists, field goals, etc., and advanced statistics, such as usage percentage, win shares, and value over replacement players. The analysis was achieved using the statistical programming language R on the integrated development environment RStudio. The principal component analysis was computed first in order to produce a set of five principal components, which explain roughly 82.20% of the total variance within the player data. These five principal components were then used as the parameters the players were clustered against in the K-Means clustering algorithm implemented in R. It was determined that eight clusters would best represent the groupings of the players, and eight clusters were created with a unique set of players belonging to each one. Each cluster was analyzed based on the players making up the cluster and a player archetype was established to define each of the clusters. The reasoning behind the player archetypes given to each cluster was explained, providing details as to why the players were clustered together and the main data features that influenced the clustering results. Besides two of the clusters, the archetypes were proven to be independent of the player's position. The clustering results can be expanded on in the future to include a larger sample size of players, and it can be used to make inferences regarding NBA roster construction. The clustering can highlight key weaknesses in rosters and show which combinations of player archetypes lead to team success.

Contributors

Agent

Created

Date Created
  • 2019-05

158520-Thumbnail Image.png

Contributions to Optimal Experimental Design and Strategic Subdata Selection for Big Data

Description

In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were

In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were investigated. Second, schemes for strategic subdata selection for nonparametric predictive modeling with big data were developed.

Under sparsity, the structure of DSDs can allow for the screening and optimization of a system in one step, but in non-sparse situations estimation of second-order models requires augmentation of the DSD. In this work, augmentation strategies for DSDs were considered, given the assumption that the correct form of the model for the response of interest is quadratic. Series of augmented designs were constructed and explored, and power calculations, model-robustness criteria, model-discrimination criteria, and simulation study results were used to identify the number of augmented runs necessary for (1) effectively identifying active model effects, and (2) precisely predicting a response of interest. When the goal is identification of active effects, it is shown that supersaturated designs are sufficient; when the goal is prediction, it is shown that little is gained by augmenting beyond the design that is saturated for the full quadratic model. Surprisingly, augmentation strategies based on the I-optimality criterion do not lead to better predictions than strategies based on the D-optimality criterion.

Computational limitations can render standard statistical methods infeasible in the face of massive datasets, necessitating subsampling strategies. In the big data context, the primary objective is often prediction but the correct form of the model for the response of interest is likely unknown. Here, two new methods of subdata selection were proposed. The first is based on clustering, the second is based on space-filling designs, and both are free from model assumptions. The performance of the proposed methods was explored visually via low-dimensional simulated examples; via real data applications; and via large simulation studies. In all cases the proposed methods were compared to existing, widely used subdata selection methods. The conditions under which the proposed methods provide advantages over standard subdata selection strategies were identified.

Contributors

Agent

Created

Date Created
  • 2020