Search Content

Probabilistic Modeling and Regression Analysis of Experimental Data for Structural Systems

Description

The Experimental Data Processing (EDP) software is a C++ GUI-based application to streamline the process of creating a model for structural systems based on experimental data. EDP is designed to process raw data, filter the data for noise and outliers, create a fitted model to describe that data, complete a…

The Experimental Data Processing (EDP) software is a C++ GUI-based application to streamline the process of creating a model for structural systems based on experimental data. EDP is designed to process raw data, filter the data for noise and outliers, create a fitted model to describe that data, complete a probabilistic analysis to describe the variation between replicates of the experimental process, and analyze reliability of a structural system based on that model. In order to help design the EDP software to perform the full analysis, the probabilistic and regression modeling aspects of this analysis have been explored. The focus has been on creating and analyzing probabilistic models for the data, adding multivariate and nonparametric fits to raw data, and developing computational techniques that allow for these methods to be properly implemented within EDP. For creating a probabilistic model of replicate data, the normal, lognormal, gamma, Weibull, and generalized exponential distributions have been explored. Goodness-of-fit tests, including the chi-squared, Anderson-Darling, and Kolmogorov-Smirnoff tests, have been used in order to analyze the effectiveness of any of these probabilistic models in describing the variation of parameters between replicates of an experimental test. An example using Young's modulus data for a Kevlar-49 Swath stress-strain test was used in order to demonstrate how this analysis is performed within EDP. In order to implement the distributions, numerical solutions for the gamma, beta, and hypergeometric functions were implemented, along with an arbitrary precision library to store numbers that exceed the maximum size of double-precision floating point digits. To create a multivariate fit, the multilinear solution was created as the simplest solution to the multivariate regression problem. This solution was then extended to solve nonlinear problems that can be linearized into multiple separable terms. These problems were solved analytically with the closed-form solution for the multilinear regression, and then by using a QR decomposition to solve numerically while avoiding numerical instabilities associated with matrix inversion. For nonparametric regression, or smoothing, the loess method was developed as a robust technique for filtering noise while maintaining the general structure of the data points. The loess solution was created by addressing concerns associated with simpler smoothing methods, including the running mean, running line, and kernel smoothing techniques, and combining the ability of each of these methods to resolve those issues. The loess smoothing method involves weighting each point in a partition of the data set, and then adding either a line or a polynomial fit within that partition. Both linear and quadratic methods were applied to a carbon fiber compression test, showing that the quadratic model was more accurate but the linear model had a shape that was more effective for analyzing the experimental data. Finally, the EDP program itself was explored to consider its current functionalities for processing data, as described by shear tests on carbon fiber data, and the future functionalities to be developed. The probabilistic and raw data processing capabilities were demonstrated within EDP, and the multivariate and loess analysis was demonstrated using R. As the functionality and relevant considerations for these methods have been developed, the immediate goal is to finish implementing and integrating these additional features into a version of EDP that performs a full streamlined structural analysis on experimental data.

ContributorsMarkov, Elan Richard (Author) / Rajan, Subramaniam (Thesis director) / Khaled, Bilal (Committee member) / Chemical Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Ira A. Fulton School of Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Player Optimization in the National Football League: Creating a Winning Franchise

Description

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team…

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team be as successful as possible by defining which positions are most important to a team's success. Data from fifteen years of NFL games was collected and information on every player in the league was analyzed. First there needed to be a benchmark which describes a team as being average and then every player in the NFL must be compared to that average. Based on properties of linear regression using ordinary least squares this project aims to define such a model that shows each position's importance. Finally, once such a model had been established then the focus turned to the NFL draft in which the goal was to find a strategy of where each position needs to be drafted so that it is most likely to give the best payoff based on the results of the regression in part one.

ContributorsBalzer, Kevin Ryan (Author) / Goegan, Brian (Thesis director) / Dassanayake, Maduranga (Committee member) / Barrett, The Honors College (Contributor) / Economics Program in CLAS (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Using Natural Diversity of Quorum Sensing to Expand the Synthetic Biology Toolbox

Description

Currently in synthetic biology only the Las, Lux, and Rhl quorum sensing pathways have been adapted for broad engineering use. Quorum sensing allows a means of cell to cell communication in which a designated sender cell produces quorum sensing molecules that modify gene expression of a designated receiver cell. While…

Currently in synthetic biology only the Las, Lux, and Rhl quorum sensing pathways have been adapted for broad engineering use. Quorum sensing allows a means of cell to cell communication in which a designated sender cell produces quorum sensing molecules that modify gene expression of a designated receiver cell. While useful, these three quorum sensing pathways exhibit a nontrivial level of crosstalk, hindering robust engineering and leading to unexpected effects in a given design. To address the lack of orthogonality among these three quorum sensing pathways, previous scientists have attempted to perform directed evolution on components of the quorum sensing pathway. While a powerful tool, directed evolution is limited by the subspace that is defined by the protein. For this reason, we take an evolutionary biology approach to identify new orthogonal quorum sensing networks and test these networks for cross-talk with currently-used networks. By charting characteristics of acyl homoserine lactone (AHL) molecules used across quorum sensing pathways in nature, we have identified favorable candidate pathways likely to display orthogonality. These include Aub, Bja, Bra, Cer, Esa, Las, Lux, Rhl, Rpa, and Sin, which we have begun constructing and testing. Our synthetic circuits express GFP in response to a quorum sensing molecule, allowing quantitative measurement of orthogonality between pairs. By determining orthogonal quorum sensing pairs, we hope to identify and adapt novel quorum sensing pathways for robust use in higher-order genetic circuits.

ContributorsMuller, Ryan (Author) / Haynes, Karmella (Thesis director) / Wang, Xiao (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Department of Chemistry and Biochemistry (Contributor) / School of Life Sciences (Contributor)

Created2015-05

Exploring the Relation Between NAV and Price of ETFs in Financial Markets

Description

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual
funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to
achieve their functionality and this mechanism is designed to minimize the deviations that occur
between the ETF’s listed price and the net…

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual
funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to
achieve their functionality and this mechanism is designed to minimize the deviations that occur
between the ETF’s listed price and the net asset value of the ETF’s underlying assets. However
while this does cause ETF deviations to be generally lower than their mutual fund counterparts,
as our paper explores this process does not eliminate these deviations completely. This article
builds off an earlier paper by Engle and Sarkar (2006) that investigates these properties of
premiums (discounts) of ETFs from their fair market value. And looks to see if these premia
have changed in the last 10 years. Our paper then diverges from the original and takes a deeper
look into the standard deviations of these premia specifically.
Our findings show that over 70% of an ETFs standard deviation of premia can be
explained through a linear combination consisting of two variables: a categorical (Domestic[US],
Developed, Emerging) and a discrete variable (time-difference from US). This paper also finds
that more traditional metrics such as market cap, ETF price volatility, and even 3rd party market
indicators such as the economic freedom index and investment freedom index are insignificant
predictors of an ETFs standard deviation of premia. These findings differ somewhat from
existing literature which indicate that these factors should have a significant impact on the
predictive ability of an ETFs standard deviation of premia.

ContributorsHenning, Thomas Louis (Co-author) / Zhang, Jingbo (Co-author) / Simonson, Mark (Thesis director) / Wendell, Licon (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Regression Analysis on Colony Collapse Disorder in the United States

Description

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10 plus years. Without a definite explanation, Colony Collapse Disorder (CCD)…

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10 plus years. Without a definite explanation, Colony Collapse Disorder (CCD) was coined to explain the sudden and sharp decline of the honey bee colonies that beekeepers were experiencing. Colony collapses have been rising higher compared to expected averages over the years, and during the winter season losses are even more severe than what is normally acceptable. There are some possible explanations pointing towards meteorological variables, diseases, and even pesticide usage. Despite the cause of CCD being unknown, thousands of beekeepers have reported their losses, and even numbers of infected colonies and colonies under certain stressors in the most recent years. Using the data that was reported to The United States Department of Agriculture (USDA), as well as weather data collected by The National Centers for Environmental Information (NOAA) and the National Centers for Environmental Information (NCEI), regression analysis was used to investigate honey bee colonies to find relationships between stressors in honey bee colonies and meteorological variables, and colony collapses during the winter months. The regression analysis focused on the winter season, or quarter 4 of the year, which includes the months of October, November, and December. In the model, the response variables was the percentage of colonies lost in quarter 4. Through the model, it was concluded that certain weather thresholds and the percentage increase of colonies under certain stressors were related to colony loss.

ContributorsVasquez, Henry Antony (Author) / Zheng, Yi (Thesis director) / Saffell, Erinanne (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

The Adaptive Lasso Procedure for Building a Traffic Forecasting Model

Description

This paper will begin by initially discussing the potential uses and challenges of efficient and accurate traffic forecasting. The data we used includes traffic volume from seven locations on a busy Athens street in April and May of 2000. This data was used as part of a traffic forecasting competition.…

This paper will begin by initially discussing the potential uses and challenges of efficient and accurate traffic forecasting. The data we used includes traffic volume from seven locations on a busy Athens street in April and May of 2000. This data was used as part of a traffic forecasting competition. Our initial observation, was that due to the volatility and oscillating nature of daily traffic volume, simple linear regression models will not perform well in predicting the time-series data. For this we present the Harmonic Time Series model. Such model (assuming all predictors are significant) will include a sinusoidal term for each time index within a period of data. Our assumption is that traffic volumes have a period of one week (which is evidenced by the graphs reproduced in our paper). This leads to a model that has 6,720 sine and cosine terms. This is clearly too many coefficients, so in an effort to avoid over-fitting and having an efficient model, we apply the sub-setting algorithm known as Adaptive Lass.

ContributorsMora, Juan (Author) / Kamarianakis, Ioannis (Thesis director) / Yu, Wanchunzi (Committee member) / W. P. Carey School of Business (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Analytics in the National Hockey League (NHL): A Regression of Goals For, Goals Against and Wins from 2005-2017

Description

This paper attempts to introduce analytics and regression techniques into the National Hockey League. Hockey as a sport has been a slow adapter of analytics, and this can be attributed to poor data collection methods. Using data collected for hockeyreference.com, and R statistical software, the number of wins a team…

This paper attempts to introduce analytics and regression techniques into the National Hockey League. Hockey as a sport has been a slow adapter of analytics, and this can be attributed to poor data collection methods. Using data collected for hockeyreference.com, and R statistical software, the number of wins a team experiences will be predicted using Goals For and Goals Against statistics from 2005-2017. The model showed statistical significance and strong normality throughout the data. The number of wins each team was expected to experience in 2016-2017 was predicted using the model and then compared to the actual number of games each team won. To further analyze the validity of the model, the expected playoff outcome for 2016-2017 was compared to the observed playoff outcome. The discussion focused on team's that did not fit the model or traditional analytics and expected forecasts. The possible discrepancies were analyzed using the Las Vegas Golden Knights as a case study. Possible next steps for data analysis are presented and the role of future technology and innovation in hockey analytics is discussed and predicted.

ContributorsVermeer, Brandon Elliot (Author) / Goegan, Brian (Thesis director) / Eaton, John (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Accuracy of Error Correction Code and Regression Analysis within a Python Software

Description

In collaboration with Moog Broad Reach and Arizona State University, a team of five undergraduate students designed a hardware design solution for protecting flash memory data in a spaced-based radioactive environment. Team Aegis have been working on the research, design, and implementation of a Verilog- and Python-based error correction code using a Reed-Solomon method to…

In collaboration with Moog Broad Reach and Arizona State University, a team of five undergraduate students designed a hardware design solution for protecting flash memory data in a spaced-based radioactive environment. Team Aegis have been working on the research, design, and implementation of a Verilog- and Python-based error correction code using a Reed-Solomon method to identify bit changes of error code. For an additional senior design project, a Python code was implemented that runs statistical analysis to identify whether the error correction code is more effective than a triple-redundancy check as well as determining if the presence of errors can be modeled by a regression model.

ContributorsSalls, Demetra Helen (Author) / Kozicki, Michael (Thesis director) / Hodge, Chris (Committee member) / Electrical Engineering Program (Contributor, Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Using Data to Predict the Winner of the 2023 Super Bowl

Description

This project uses SAS (Statistical Analysis Software) to create a regression model that provides a prediction for which NFL playoff team will win the Super Bowl in a given year.

ContributorsOleksyn, Alexander (Author) / Schneider, Laurence (Thesis director) / Hansen, Whitney (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2023-05

A Time Series Analysis of Companies that had their Initial Public Offering at the Brink of the Coronavirus Pandemic

Description

This investigation evaluates the most effective time series model to forecast the stock price for companies that started trading during the COVID-19 stock market crash. My research involved the analysis of five companies in the technology industry. I was able to create three different machine-learning models for each company. Each…

This investigation evaluates the most effective time series model to forecast the stock price for companies that started trading during the COVID-19 stock market crash. My research involved the analysis of five companies in the technology industry. I was able to create three different machine-learning models for each company. Each model contained various criteria to determine the efficacy of the model. The AIC and SBC are common metrics among Autoregressive, autoregressive moving averages, and cross-correlation input models. Lower AIC and SBC values indicated better-fitted models. Additionally, I conducted a white-noise test to determine stationarity. This yielded an Auto-correlation graph determining whether the data was non-stationary or stationary. This paper is supplemented by a project plan, exploratory data analysis, methodology, data, results, and challenges section. This has relevance in understanding the overall stock market trend when impacted by a global pandemic.

ContributorsSriram, Ananth (Author) / Schneider, Laurence (Thesis director) / Tran, Samantha (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2023-05

Filtering by