Matching Items (9)

Filtering by

Clear all filters

132759-Thumbnail Image.png

Analysis of Santa Monica Water Usage Data for Water Conservation

Description

Historically, per capita water demand has tended to increase proportionately with population growth. However, the last two decades have exhibited a different trend; per capita water usage is declining despite a growing economy and population. Subsequently, city planners and water

Historically, per capita water demand has tended to increase proportionately with population growth. However, the last two decades have exhibited a different trend; per capita water usage is declining despite a growing economy and population. Subsequently, city planners and water suppliers have been struggling to understand this new trend and whether it will continue over the coming years. This leads to inefficient water management practices as well as flawed water storage design, both of which have adverse impacts on the economy and environment. Water usage data, provided by the city of Santa Monica, was analyzed using a combination of hydro-climatic and demographic variables to dissect these trends and variation in usage. The data proved to be tremendously difficult to work with; several values were missing or erroneously reported, and additional variables had to be brought from external sources to help explain the variation. Upon completion of the data processing, several statistical techniques including regression and clustering models were built to identify potential correlations and understand the consumers’ behavior. The regression models highlighted temperature and precipitation as significant stimuli of water usage, while the cluster models emphasized high volume consumers and their respective demographic traits. However, the overall model accuracy and fit was very poor for the models due to the inadequate quality of data collection and management. The imprecise measurement process for recording water usage along with varying levels of granularity across the different variables prevented the models from revealing meaningful associations. Moving forward, smart meter technology needs to be considered as it accurately captures real-time water usage and transmits the information to data hubs which then implement predictive analytics to provide updated trends. This efficient system will allow cities across the nation to stay abreast of future water usage developments and conserve time, resources, and the environment.

Contributors

Agent

Created

Date Created
2019-05

132834-Thumbnail Image.png

Exploring the Relation Between NAV and Price of ETFs in Financial Markets

Description

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual
funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to
achieve their functionality and this mechanism is designed to

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual
funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to
achieve their functionality and this mechanism is designed to minimize the deviations that occur
between the ETF’s listed price and the net asset value of the ETF’s underlying assets. However
while this does cause ETF deviations to be generally lower than their mutual fund counterparts,
as our paper explores this process does not eliminate these deviations completely. This article
builds off an earlier paper by Engle and Sarkar (2006) that investigates these properties of
premiums (discounts) of ETFs from their fair market value. And looks to see if these premia
have changed in the last 10 years. Our paper then diverges from the original and takes a deeper
look into the standard deviations of these premia specifically.
Our findings show that over 70% of an ETFs standard deviation of premia can be
explained through a linear combination consisting of two variables: a categorical (Domestic[US],
Developed, Emerging) and a discrete variable (time-difference from US). This paper also finds
that more traditional metrics such as market cap, ETF price volatility, and even 3rd party market
indicators such as the economic freedom index and investment freedom index are insignificant
predictors of an ETFs standard deviation of premia. These findings differ somewhat from
existing literature which indicate that these factors should have a significant impact on the
predictive ability of an ETFs standard deviation of premia.

Contributors

Agent

Created

Date Created
2019-05

Analytics in the National Hockey League (NHL): A Regression of Goals For, Goals Against and Wins from 2005-2017

Description

This paper attempts to introduce analytics and regression techniques into the National Hockey League. Hockey as a sport has been a slow adapter of analytics, and this can be attributed to poor data collection methods. Using data collected for hockeyreference.com,

This paper attempts to introduce analytics and regression techniques into the National Hockey League. Hockey as a sport has been a slow adapter of analytics, and this can be attributed to poor data collection methods. Using data collected for hockeyreference.com, and R statistical software, the number of wins a team experiences will be predicted using Goals For and Goals Against statistics from 2005-2017. The model showed statistical significance and strong normality throughout the data. The number of wins each team was expected to experience in 2016-2017 was predicted using the model and then compared to the actual number of games each team won. To further analyze the validity of the model, the expected playoff outcome for 2016-2017 was compared to the observed playoff outcome. The discussion focused on team's that did not fit the model or traditional analytics and expected forecasts. The possible discrepancies were analyzed using the Las Vegas Golden Knights as a case study. Possible next steps for data analysis are presented and the role of future technology and innovation in hockey analytics is discussed and predicted.

Contributors

Agent

Created

Date Created
2018-05

133570-Thumbnail Image.png

Regression Analysis on Colony Collapse Disorder in the United States

Description

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10 plus years. Without a definite explanation, Colony Collapse Disorder (CCD) was coined to explain the sudden and sharp decline of the honey bee colonies that beekeepers were experiencing. Colony collapses have been rising higher compared to expected averages over the years, and during the winter season losses are even more severe than what is normally acceptable. There are some possible explanations pointing towards meteorological variables, diseases, and even pesticide usage. Despite the cause of CCD being unknown, thousands of beekeepers have reported their losses, and even numbers of infected colonies and colonies under certain stressors in the most recent years. Using the data that was reported to The United States Department of Agriculture (USDA), as well as weather data collected by The National Centers for Environmental Information (NOAA) and the National Centers for Environmental Information (NCEI), regression analysis was used to investigate honey bee colonies to find relationships between stressors in honey bee colonies and meteorological variables, and colony collapses during the winter months. The regression analysis focused on the winter season, or quarter 4 of the year, which includes the months of October, November, and December. In the model, the response variables was the percentage of colonies lost in quarter 4. Through the model, it was concluded that certain weather thresholds and the percentage increase of colonies under certain stressors were related to colony loss.

Contributors

Agent

Created

Date Created
2018-05

134415-Thumbnail Image.png

The Adaptive Lasso Procedure for Building a Traffic Forecasting Model

Description

This paper will begin by initially discussing the potential uses and challenges of efficient and accurate traffic forecasting. The data we used includes traffic volume from seven locations on a busy Athens street in April and May of 2000. This

This paper will begin by initially discussing the potential uses and challenges of efficient and accurate traffic forecasting. The data we used includes traffic volume from seven locations on a busy Athens street in April and May of 2000. This data was used as part of a traffic forecasting competition. Our initial observation, was that due to the volatility and oscillating nature of daily traffic volume, simple linear regression models will not perform well in predicting the time-series data. For this we present the Harmonic Time Series model. Such model (assuming all predictors are significant) will include a sinusoidal term for each time index within a period of data. Our assumption is that traffic volumes have a period of one week (which is evidenced by the graphs reproduced in our paper). This leads to a model that has 6,720 sine and cosine terms. This is clearly too many coefficients, so in an effort to avoid over-fitting and having an efficient model, we apply the sub-setting algorithm known as Adaptive Lass.

Contributors

Agent

Created

Date Created
2017-05

136687-Thumbnail Image.png

Innovative strategies used to teach mathematics: A look at educators and classrooms across six countries

Description

Mathematics is an increasingly critical subject and the achievement of students in mathematics has been the focus of many recent reports and studies. However, few studies exist that both observe and discuss the specific teaching and assessment techniques employed in

Mathematics is an increasingly critical subject and the achievement of students in mathematics has been the focus of many recent reports and studies. However, few studies exist that both observe and discuss the specific teaching and assessment techniques employed in the classrooms across multiple countries. The focus of this study is to look at classrooms and educators across six high achieving countries to identify and compare teaching strategies being used. In Finland, Hong Kong, Japan, New Zealand, Singapore, and Switzerland, twenty educators were interviewed and fourteen educators were observed teaching. Themes were first identified by comparing individual teacher responses within each country. These themes were then grouped together across countries and eight emerging patterns were identified. These strategies include students active involvement in the classroom, students given written feedback on assessments, students involvement in thoughtful discussion about mathematical concepts, students solving and explaining mathematics problems at the board, students exploring mathematical concepts either before or after being taught the material, students engagement in practical applications, students making connections between concepts, and students having confidence in their ability to understand mathematics. The strategies identified across these six high achieving countries can inform educators in their efforts of increasing student understanding of mathematical concepts and lead to an improvement in mathematics performance.

Contributors

Agent

Created

Date Created
2014-12

135547-Thumbnail Image.png

Probabilistic Modeling and Regression Analysis of Experimental Data for Structural Systems

Description

The Experimental Data Processing (EDP) software is a C++ GUI-based application to streamline the process of creating a model for structural systems based on experimental data. EDP is designed to process raw data, filter the data for noise and outliers,

The Experimental Data Processing (EDP) software is a C++ GUI-based application to streamline the process of creating a model for structural systems based on experimental data. EDP is designed to process raw data, filter the data for noise and outliers, create a fitted model to describe that data, complete a probabilistic analysis to describe the variation between replicates of the experimental process, and analyze reliability of a structural system based on that model. In order to help design the EDP software to perform the full analysis, the probabilistic and regression modeling aspects of this analysis have been explored. The focus has been on creating and analyzing probabilistic models for the data, adding multivariate and nonparametric fits to raw data, and developing computational techniques that allow for these methods to be properly implemented within EDP. For creating a probabilistic model of replicate data, the normal, lognormal, gamma, Weibull, and generalized exponential distributions have been explored. Goodness-of-fit tests, including the chi-squared, Anderson-Darling, and Kolmogorov-Smirnoff tests, have been used in order to analyze the effectiveness of any of these probabilistic models in describing the variation of parameters between replicates of an experimental test. An example using Young's modulus data for a Kevlar-49 Swath stress-strain test was used in order to demonstrate how this analysis is performed within EDP. In order to implement the distributions, numerical solutions for the gamma, beta, and hypergeometric functions were implemented, along with an arbitrary precision library to store numbers that exceed the maximum size of double-precision floating point digits. To create a multivariate fit, the multilinear solution was created as the simplest solution to the multivariate regression problem. This solution was then extended to solve nonlinear problems that can be linearized into multiple separable terms. These problems were solved analytically with the closed-form solution for the multilinear regression, and then by using a QR decomposition to solve numerically while avoiding numerical instabilities associated with matrix inversion. For nonparametric regression, or smoothing, the loess method was developed as a robust technique for filtering noise while maintaining the general structure of the data points. The loess solution was created by addressing concerns associated with simpler smoothing methods, including the running mean, running line, and kernel smoothing techniques, and combining the ability of each of these methods to resolve those issues. The loess smoothing method involves weighting each point in a partition of the data set, and then adding either a line or a polynomial fit within that partition. Both linear and quadratic methods were applied to a carbon fiber compression test, showing that the quadratic model was more accurate but the linear model had a shape that was more effective for analyzing the experimental data. Finally, the EDP program itself was explored to consider its current functionalities for processing data, as described by shear tests on carbon fiber data, and the future functionalities to be developed. The probabilistic and raw data processing capabilities were demonstrated within EDP, and the multivariate and loess analysis was demonstrated using R. As the functionality and relevant considerations for these methods have been developed, the immediate goal is to finish implementing and integrating these additional features into a version of EDP that performs a full streamlined structural analysis on experimental data.

Contributors

Created

Date Created
2016-05

136550-Thumbnail Image.png

Player Optimization in the National Football League: Creating a Winning Franchise

Description

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team be as successful as possible by defining which positions are most important to a team's success. Data from fifteen years of NFL games was collected and information on every player in the league was analyzed. First there needed to be a benchmark which describes a team as being average and then every player in the NFL must be compared to that average. Based on properties of linear regression using ordinary least squares this project aims to define such a model that shows each position's importance. Finally, once such a model had been established then the focus turned to the NFL draft in which the goal was to find a strategy of where each position needs to be drafted so that it is most likely to give the best payoff based on the results of the regression in part one.

Contributors

Created

Date Created
2015-05

147964-Thumbnail Image.png

Accuracy of Error Correction Code and Regression Analysis within a Python Software

Description

In collaboration with Moog Broad Reach and Arizona State University, a<br/>team of five undergraduate students designed a hardware design solution for<br/>protecting flash memory data in a spaced-based radioactive environment. Team<br/>Aegis have been working on the research, design, and implementation of

In collaboration with Moog Broad Reach and Arizona State University, a<br/>team of five undergraduate students designed a hardware design solution for<br/>protecting flash memory data in a spaced-based radioactive environment. Team<br/>Aegis have been working on the research, design, and implementation of a<br/>Verilog- and Python-based error correction code using a Reed-Solomon method<br/>to identify bit changes of error code. For an additional senior design project, a<br/>Python code was implemented that runs statistical analysis to identify whether<br/>the error correction code is more effective than a triple-redundancy check as well<br/>as determining if the presence of errors can be modeled by a regression model.

Contributors

Agent

Created

Date Created
2021-05