Matching Items (10)
Filtering by

Clear all filters

135547-Thumbnail Image.png
Description
The Experimental Data Processing (EDP) software is a C++ GUI-based application to streamline the process of creating a model for structural systems based on experimental data. EDP is designed to process raw data, filter the data for noise and outliers, create a fitted model to describe that data, complete a

The Experimental Data Processing (EDP) software is a C++ GUI-based application to streamline the process of creating a model for structural systems based on experimental data. EDP is designed to process raw data, filter the data for noise and outliers, create a fitted model to describe that data, complete a probabilistic analysis to describe the variation between replicates of the experimental process, and analyze reliability of a structural system based on that model. In order to help design the EDP software to perform the full analysis, the probabilistic and regression modeling aspects of this analysis have been explored. The focus has been on creating and analyzing probabilistic models for the data, adding multivariate and nonparametric fits to raw data, and developing computational techniques that allow for these methods to be properly implemented within EDP. For creating a probabilistic model of replicate data, the normal, lognormal, gamma, Weibull, and generalized exponential distributions have been explored. Goodness-of-fit tests, including the chi-squared, Anderson-Darling, and Kolmogorov-Smirnoff tests, have been used in order to analyze the effectiveness of any of these probabilistic models in describing the variation of parameters between replicates of an experimental test. An example using Young's modulus data for a Kevlar-49 Swath stress-strain test was used in order to demonstrate how this analysis is performed within EDP. In order to implement the distributions, numerical solutions for the gamma, beta, and hypergeometric functions were implemented, along with an arbitrary precision library to store numbers that exceed the maximum size of double-precision floating point digits. To create a multivariate fit, the multilinear solution was created as the simplest solution to the multivariate regression problem. This solution was then extended to solve nonlinear problems that can be linearized into multiple separable terms. These problems were solved analytically with the closed-form solution for the multilinear regression, and then by using a QR decomposition to solve numerically while avoiding numerical instabilities associated with matrix inversion. For nonparametric regression, or smoothing, the loess method was developed as a robust technique for filtering noise while maintaining the general structure of the data points. The loess solution was created by addressing concerns associated with simpler smoothing methods, including the running mean, running line, and kernel smoothing techniques, and combining the ability of each of these methods to resolve those issues. The loess smoothing method involves weighting each point in a partition of the data set, and then adding either a line or a polynomial fit within that partition. Both linear and quadratic methods were applied to a carbon fiber compression test, showing that the quadratic model was more accurate but the linear model had a shape that was more effective for analyzing the experimental data. Finally, the EDP program itself was explored to consider its current functionalities for processing data, as described by shear tests on carbon fiber data, and the future functionalities to be developed. The probabilistic and raw data processing capabilities were demonstrated within EDP, and the multivariate and loess analysis was demonstrated using R. As the functionality and relevant considerations for these methods have been developed, the immediate goal is to finish implementing and integrating these additional features into a version of EDP that performs a full streamlined structural analysis on experimental data.
ContributorsMarkov, Elan Richard (Author) / Rajan, Subramaniam (Thesis director) / Khaled, Bilal (Committee member) / Chemical Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Ira A. Fulton School of Engineering (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
136550-Thumbnail Image.png
Description
The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team be as successful as possible by defining which positions are most important to a team's success. Data from fifteen years of NFL games was collected and information on every player in the league was analyzed. First there needed to be a benchmark which describes a team as being average and then every player in the NFL must be compared to that average. Based on properties of linear regression using ordinary least squares this project aims to define such a model that shows each position's importance. Finally, once such a model had been established then the focus turned to the NFL draft in which the goal was to find a strategy of where each position needs to be drafted so that it is most likely to give the best payoff based on the results of the regression in part one.
ContributorsBalzer, Kevin Ryan (Author) / Goegan, Brian (Thesis director) / Dassanayake, Maduranga (Committee member) / Barrett, The Honors College (Contributor) / Economics Program in CLAS (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2015-05
132834-Thumbnail Image.png
Description
Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual
funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to
achieve their functionality and this mechanism is designed to minimize the deviations that occur
between the ETF’s listed price and the net

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual
funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to
achieve their functionality and this mechanism is designed to minimize the deviations that occur
between the ETF’s listed price and the net asset value of the ETF’s underlying assets. However
while this does cause ETF deviations to be generally lower than their mutual fund counterparts,
as our paper explores this process does not eliminate these deviations completely. This article
builds off an earlier paper by Engle and Sarkar (2006) that investigates these properties of
premiums (discounts) of ETFs from their fair market value. And looks to see if these premia
have changed in the last 10 years. Our paper then diverges from the original and takes a deeper
look into the standard deviations of these premia specifically.
Our findings show that over 70% of an ETFs standard deviation of premia can be
explained through a linear combination consisting of two variables: a categorical (Domestic[US],
Developed, Emerging) and a discrete variable (time-difference from US). This paper also finds
that more traditional metrics such as market cap, ETF price volatility, and even 3rd party market
indicators such as the economic freedom index and investment freedom index are insignificant
predictors of an ETFs standard deviation of premia. These findings differ somewhat from
existing literature which indicate that these factors should have a significant impact on the
predictive ability of an ETFs standard deviation of premia.
ContributorsHenning, Thomas Louis (Co-author) / Zhang, Jingbo (Co-author) / Simonson, Mark (Thesis director) / Wendell, Licon (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
133570-Thumbnail Image.png
Description
In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10 plus years. Without a definite explanation, Colony Collapse Disorder (CCD)

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10 plus years. Without a definite explanation, Colony Collapse Disorder (CCD) was coined to explain the sudden and sharp decline of the honey bee colonies that beekeepers were experiencing. Colony collapses have been rising higher compared to expected averages over the years, and during the winter season losses are even more severe than what is normally acceptable. There are some possible explanations pointing towards meteorological variables, diseases, and even pesticide usage. Despite the cause of CCD being unknown, thousands of beekeepers have reported their losses, and even numbers of infected colonies and colonies under certain stressors in the most recent years. Using the data that was reported to The United States Department of Agriculture (USDA), as well as weather data collected by The National Centers for Environmental Information (NOAA) and the National Centers for Environmental Information (NCEI), regression analysis was used to investigate honey bee colonies to find relationships between stressors in honey bee colonies and meteorological variables, and colony collapses during the winter months. The regression analysis focused on the winter season, or quarter 4 of the year, which includes the months of October, November, and December. In the model, the response variables was the percentage of colonies lost in quarter 4. Through the model, it was concluded that certain weather thresholds and the percentage increase of colonies under certain stressors were related to colony loss.
ContributorsVasquez, Henry Antony (Author) / Zheng, Yi (Thesis director) / Saffell, Erinanne (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
134415-Thumbnail Image.png
Description
This paper will begin by initially discussing the potential uses and challenges of efficient and accurate traffic forecasting. The data we used includes traffic volume from seven locations on a busy Athens street in April and May of 2000. This data was used as part of a traffic forecasting competition.

This paper will begin by initially discussing the potential uses and challenges of efficient and accurate traffic forecasting. The data we used includes traffic volume from seven locations on a busy Athens street in April and May of 2000. This data was used as part of a traffic forecasting competition. Our initial observation, was that due to the volatility and oscillating nature of daily traffic volume, simple linear regression models will not perform well in predicting the time-series data. For this we present the Harmonic Time Series model. Such model (assuming all predictors are significant) will include a sinusoidal term for each time index within a period of data. Our assumption is that traffic volumes have a period of one week (which is evidenced by the graphs reproduced in our paper). This leads to a model that has 6,720 sine and cosine terms. This is clearly too many coefficients, so in an effort to avoid over-fitting and having an efficient model, we apply the sub-setting algorithm known as Adaptive Lass.
ContributorsMora, Juan (Author) / Kamarianakis, Ioannis (Thesis director) / Yu, Wanchunzi (Committee member) / W. P. Carey School of Business (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2017-05
Description
This paper attempts to introduce analytics and regression techniques into the National Hockey League. Hockey as a sport has been a slow adapter of analytics, and this can be attributed to poor data collection methods. Using data collected for hockeyreference.com, and R statistical software, the number of wins a team

This paper attempts to introduce analytics and regression techniques into the National Hockey League. Hockey as a sport has been a slow adapter of analytics, and this can be attributed to poor data collection methods. Using data collected for hockeyreference.com, and R statistical software, the number of wins a team experiences will be predicted using Goals For and Goals Against statistics from 2005-2017. The model showed statistical significance and strong normality throughout the data. The number of wins each team was expected to experience in 2016-2017 was predicted using the model and then compared to the actual number of games each team won. To further analyze the validity of the model, the expected playoff outcome for 2016-2017 was compared to the observed playoff outcome. The discussion focused on team's that did not fit the model or traditional analytics and expected forecasts. The possible discrepancies were analyzed using the Las Vegas Golden Knights as a case study. Possible next steps for data analysis are presented and the role of future technology and innovation in hockey analytics is discussed and predicted.
ContributorsVermeer, Brandon Elliot (Author) / Goegan, Brian (Thesis director) / Eaton, John (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
147964-Thumbnail Image.png
Description

In collaboration with Moog Broad Reach and Arizona State University, a<br/>team of five undergraduate students designed a hardware design solution for<br/>protecting flash memory data in a spaced-based radioactive environment. Team<br/>Aegis have been working on the research, design, and implementation of a<br/>Verilog- and Python-based error correction code using a Reed-Solomon method<br/>to

In collaboration with Moog Broad Reach and Arizona State University, a<br/>team of five undergraduate students designed a hardware design solution for<br/>protecting flash memory data in a spaced-based radioactive environment. Team<br/>Aegis have been working on the research, design, and implementation of a<br/>Verilog- and Python-based error correction code using a Reed-Solomon method<br/>to identify bit changes of error code. For an additional senior design project, a<br/>Python code was implemented that runs statistical analysis to identify whether<br/>the error correction code is more effective than a triple-redundancy check as well<br/>as determining if the presence of errors can be modeled by a regression model.

ContributorsSalls, Demetra Helen (Author) / Kozicki, Michael (Thesis director) / Hodge, Chris (Committee member) / Electrical Engineering Program (Contributor, Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2021-05
Description

This project uses SAS (Statistical Analysis Software) to create a regression model that provides a prediction for which NFL playoff team will win the Super Bowl in a given year.

ContributorsOleksyn, Alexander (Author) / Schneider, Laurence (Thesis director) / Hansen, Whitney (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2023-05
132759-Thumbnail Image.png
Description
Historically, per capita water demand has tended to increase proportionately with population growth. However, the last two decades have exhibited a different trend; per capita water usage is declining despite a growing economy and population. Subsequently, city planners and water suppliers have been struggling to understand this new trend and

Historically, per capita water demand has tended to increase proportionately with population growth. However, the last two decades have exhibited a different trend; per capita water usage is declining despite a growing economy and population. Subsequently, city planners and water suppliers have been struggling to understand this new trend and whether it will continue over the coming years. This leads to inefficient water management practices as well as flawed water storage design, both of which have adverse impacts on the economy and environment. Water usage data, provided by the city of Santa Monica, was analyzed using a combination of hydro-climatic and demographic variables to dissect these trends and variation in usage. The data proved to be tremendously difficult to work with; several values were missing or erroneously reported, and additional variables had to be brought from external sources to help explain the variation. Upon completion of the data processing, several statistical techniques including regression and clustering models were built to identify potential correlations and understand the consumers’ behavior. The regression models highlighted temperature and precipitation as significant stimuli of water usage, while the cluster models emphasized high volume consumers and their respective demographic traits. However, the overall model accuracy and fit was very poor for the models due to the inadequate quality of data collection and management. The imprecise measurement process for recording water usage along with varying levels of granularity across the different variables prevented the models from revealing meaningful associations. Moving forward, smart meter technology needs to be considered as it accurately captures real-time water usage and transmits the information to data hubs which then implement predictive analytics to provide updated trends. This efficient system will allow cities across the nation to stay abreast of future water usage developments and conserve time, resources, and the environment.
ContributorsPendyala, Kiran Vinaysai (Author) / Garcia, Margaret (Thesis director) / Stufken, John (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
Description

This investigation evaluates the most effective time series model to forecast the stock price for companies that started trading during the COVID-19 stock market crash. My research involved the analysis of five companies in the technology industry. I was able to create three different machine-learning models for each company. Each

This investigation evaluates the most effective time series model to forecast the stock price for companies that started trading during the COVID-19 stock market crash. My research involved the analysis of five companies in the technology industry. I was able to create three different machine-learning models for each company. Each model contained various criteria to determine the efficacy of the model. The AIC and SBC are common metrics among Autoregressive, autoregressive moving averages, and cross-correlation input models. Lower AIC and SBC values indicated better-fitted models. Additionally, I conducted a white-noise test to determine stationarity. This yielded an Auto-correlation graph determining whether the data was non-stationary or stationary. This paper is supplemented by a project plan, exploratory data analysis, methodology, data, results, and challenges section. This has relevance in understanding the overall stock market trend when impacted by a global pandemic.

ContributorsSriram, Ananth (Author) / Schneider, Laurence (Thesis director) / Tran, Samantha (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2023-05