Matching Items (13)

134477-Thumbnail Image.png

Are Professional Baseball Players Who are Promoted into the Major Leagues Better than Players Who Were Demoted into the Minor Leagues: A Logit Analysis

Description

Today, statistical analysis can be used for a variety of different reasons. In sports, more particularly baseball, there is an increasing necessity to have better up to date analysis of

Today, statistical analysis can be used for a variety of different reasons. In sports, more particularly baseball, there is an increasing necessity to have better up to date analysis of players and their performance as they attempt to make it to the Major League. Athletes are constantly moving around within one or more organizations. Since they are moving around so often, clubs spend an ample amount of time determining whether or not it is for their benefit and betterment of the organization as a whole. The objective of this thesis is to utilize previous baseball statistics in StataSE to determine performance levels of players who played at the major league level. From these, regression-based performance models will be used to predict whether or not Major League Baseball organizations effectively and efficiently move players around from their farm systems to the big leagues. From this, teams will be able to see whether or not they in fact make the right decisions during the season. Several tasks were accomplished to achieve this outcome: 1. First, data was obtained from the Baseball-Reference statistics database and sorted in google sheets in order for me to perform analysis anywhere. 2. Next, all 1,354 players that entered the major leagues in the year 2016, were assessed as to whether or not they started in a given league and stayed, got promoted from the minor leagues to the majors, or demoted from the majors to the minor leagues. 3. Based off of prior baseball knowledge and offensive performance quantifications only, players' abilities were evaluated and only those who were called up or sent down were included in the overall analysis. 4. The statistical analysis software application, StataSE, was used to create a further analyze if any of the four major regression assumptions were violated. It was determined that logistic regression models would produce better results than that of a standard, linear OLS model. After testing multiple models, and slightly refining my hypothesis, the adjustments made developed a more accurate analysis of whether organizations were making an efficient move sending a player down to promote another player up. After producing the model, I decided to investigate at what level a player was deemed to be no longer able to perform at a Major League Baseball level.

Contributors

Agent

Created

Date Created
  • 2017-05

131705-Thumbnail Image.png

Value Analysis of Microsoft's Game Pass

Description

The retail cost of video games has remained fairly consistent over the decades as the industry has grown so significantly. Emerging alternatives to buying individual games, such as subscription services,

The retail cost of video games has remained fairly consistent over the decades as the industry has grown so significantly. Emerging alternatives to buying individual games, such as subscription services, attempt to provide a better deal than the current options. Examining the various attributes that all video games possess, regression analysis can be performed to look for what factors may impact the retail cost of a game. After performing the analysis, however, the low adjusted R-square values indicate that the analysis only accounts for a small percentage of the retail cost variability. This suggests that the chosen attributes are not reliable in estimating retail cost with a regression analysis.

Contributors

Created

Date Created
  • 2020-05

135352-Thumbnail Image.png

A Regression Analysis: The Impact of Socioeconomic Factors on Depression and Mental Health

Description

The goal of our study is to identify socio-economic risk factors for depressive disorder and poor mental health by statistically analyzing survey data from the CDC. The identification of risk

The goal of our study is to identify socio-economic risk factors for depressive disorder and poor mental health by statistically analyzing survey data from the CDC. The identification of risk groups in a particular demographic could aid in the development of targeted interventions to improve overall quality of mental health in the United States. In our analysis, we studied the influences and correlations of socioeconomic factors that regulate the risk of developing Depressive Disorders and overall poor mental health. Using the statistical software STATA, we ran a regression model of selected independent socio-economic variables with the dependent mental health variables. The independent variables of the statistical model include Income, Race, State, Age, Marital Status, Sex, Education, BMI, Smoker Status, and Alcohol Consumption. Once the regression coefficients were found, we illustrated the data in graphs and heat maps to qualitatively provide visuals of the prevalence of depression in the U.S. demography. Our study indicates that the low-income and under-educated populations who are everyday smokers, obese, and/or are in divorced or separated relationships should be of main concern. A suggestion for mental health organizations would be to support counseling and therapeutic efforts as secondary care for those in smoking cessation programs, weight management programs, marriage counseling, or divorce assistance group. General improvement in alleviating poverty and increasing education could additionally show progress in counter-acting the prevalence of depressive disorder and also improve overall mental health. The identification of these target groups and socio-economic risk factors are critical in developing future preventative measures.

Contributors

Agent

Created

Date Created
  • 2016-05

133798-Thumbnail Image.png

The Next Great U.S. Open

Description

The United States Open Championship, often referred to as the U.S. Open, is one of the four major championships in professional golf. Held annually in June, the tournament changes venues

The United States Open Championship, often referred to as the U.S. Open, is one of the four major championships in professional golf. Held annually in June, the tournament changes venues each year and must meet a strict criterion to challenge the best players in the world. Undergoing an evaluation conducted by the United States Golf Association, the potential course is assessed on its quality and design. Along with this, the course is evaluated on its ability to hold various obstructions and thousands of spectators, while also providing plenty of space for parking, ease of transportation access, and a close proximity to local airports and lodging. Of the thousands of courses in the United States, only a select few have had the opportunity to host a U.S. Open, and far fewer have had the chance to host it on multiple occasions. Therefore, we are prepared to create the next venue that has the capabilities of hosting many U.S. Open tournaments for years to come.

Contributors

Agent

Created

Date Created
  • 2018-05

Modelling Megacities: An Approach to Modelling Dense Urban Area

Description

In 2010, for the first time in human history, more than half of the world's total population lived in cities; this number is expected to increase to 60% or more

In 2010, for the first time in human history, more than half of the world's total population lived in cities; this number is expected to increase to 60% or more by 2050. The goal of this research effort is to create a comprehensive model and modelling framework for megacities, middleweight cities, and urban agglomerations, collectively referred to as dense urban areas. The motivation for this project comes from the United States Army's desire for readiness in all operating environments including dense urban areas. Though there is valuable insight in research to support Army operational behaviors, megacities are of unique interest to nearly every societal sector imaginable. A novel application for determining both main effects and interactive effects between factors within a dense urban area is a Design of Experiments- providing insight on factor causations. Regression Modelling can also be employed for analysis of dense urban areas, providing wide ranging insights into correlations between factors and their interactions. Past studies involving megacities concern themselves with general trend of cities and their operation. This study is unique in its efforts to model a singular megacity to enable decision support for military operational planning, as well as potential decision support to city planners to increase the sustainability of these dense urban areas and megacities.

Contributors

Agent

Created

Date Created
  • 2016-05

131810-Thumbnail Image.png

mHealth Patient Care Improvement Study Through Statistical Analysis

Description

Technological applications are continually being developed in the healthcare industry as technology becomes increasingly more available. In recent years, companies have started creating mobile applications to address various conditions and

Technological applications are continually being developed in the healthcare industry as technology becomes increasingly more available. In recent years, companies have started creating mobile applications to address various conditions and diseases. This falls under mHealth or the “use of mobile phones and other wireless technology in medical care” (Rouse, 2018). The goal of this study was to identify if data gathered through the use of mHealth methods can be used to build predictive models. The first part of this thesis contains a literature review presenting relevant definitions and several potential studies that involved the use of technology in healthcare applications. The second part of this thesis focuses on data from one study, where regression analysis is used to develop predictive models.

Rouse, M. (2018). mHealth (mobile health). Retrieved from https://searchhealthit.techtarget.com/definition/mHealth

Contributors

Agent

Created

Date Created
  • 2020-05

157274-Thumbnail Image.png

A study of accelerated Bayesian additive regression trees

Description

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading algorithms, including BART. The results show that XBART maintains BART’s predictive power while reducing its computation time. The thesis also describes the development of a Python package implementing XBART.

Contributors

Agent

Created

Date Created
  • 2019

153018-Thumbnail Image.png

Applying distributional approaches to understand patterns of urban differentiation

Description

Urban scaling analysis has introduced a new scientific paradigm to the study of cities. With it, the notions of size, heterogeneity and structure have taken a leading role. These notions

Urban scaling analysis has introduced a new scientific paradigm to the study of cities. With it, the notions of size, heterogeneity and structure have taken a leading role. These notions are assumed to be behind the causes for why cities differ from one another, sometimes wildly. However, the mechanisms by which size, heterogeneity and structure shape the general statistical patterns that describe urban economic output are still unclear. Given the rapid rate of urbanization around the globe, we need precise and formal mathematical understandings of these matters. In this context, I perform in this dissertation probabilistic, distributional and computational explorations of (i) how the broadness, or narrowness, of the distribution of individual productivities within cities determines what and how we measure urban systemic output, (ii) how urban scaling may be expressed as a statistical statement when urban metrics display strong stochasticity, (iii) how the processes of aggregation constrain the variability of total urban output, and (iv) how the structure of urban skills diversification within cities induces a multiplicative process in the production of urban output.

Contributors

Agent

Created

Date Created
  • 2014

153442-Thumbnail Image.png

How does built environment affect cycling?: evidence from the whole California 2010-2012

Description

It has been identified in the literature that there exists a link between the built environment and non-motorized transport. This study aims to contribute to existing literature on the effects

It has been identified in the literature that there exists a link between the built environment and non-motorized transport. This study aims to contribute to existing literature on the effects of the built environment on cycling, examining the case of the whole State of California. Physical built environment features are classified into six groups as: 1) local density, 2) diversity of land use, 3) road connectivity, 4) bike route length, 5) green space, 6) job accessibility. Cycling trips in one week for all children, school children, adults and employed-adults are investigated separately. The regression analysis shows that cycling trips is significantly associated with some features of built environment when many socio-demographic factors are taken into account. Street intersections, bike route length tend to increase the use of bicycle. These effects are well-aligned with literature. Moreover, both local and regional job accessibility variables are statistically significant in two adults' models. However, residential density always has a significant negatively effect on cycling trips, which is still need further research to confirm. Also, there is a gap in literature on how green space affects cycling, but the results of this study is still too unclear to make it up. By elasticity analysis, this study concludes that street intersections is the most powerful predictor on cycling trips. From another perspective, the effects of built environment on cycling at workplace (or school) are distinguished from at home. This study implies that a wide range of measures are available for planners to control vehicle travel by improving cycling-level in California.

Contributors

Agent

Created

Date Created
  • 2015

151517-Thumbnail Image.png

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

Contributors

Agent

Created

Date Created
  • 2013