Matching Items (23)

Filtering by

Clear all filters

132421-Thumbnail Image.png

Predicting Mechanical Failure of Vacuum Pumps Using Accelerometer Data

Description

The objective of this paper is to find and describe trends in the fast Fourier transformed accelerometer data that can be used to predict the mechanical failure of large vacuum pumps used in industrial settings, such as providing drinking water.

The objective of this paper is to find and describe trends in the fast Fourier transformed accelerometer data that can be used to predict the mechanical failure of large vacuum pumps used in industrial settings, such as providing drinking water. Using three-dimensional plots of the data, this paper suggests how a model can be developed to predict the mechanical failure of vacuum pumps.

Contributors

Agent

Created

Date Created
2019-05

132832-Thumbnail Image.png

Exploring the Relation Between NAV and Price of ETFs in Financial Markets

Description

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to achieve their functionality and this mechanism is designed to

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to achieve their functionality and this mechanism is designed to minimize the deviations that occur between the ETF’s listed price and the net asset value of the ETF’s underlying assets. However while this does cause ETF deviations to be generally lower than their mutual fund counterparts, as our paper explores this process does not eliminate these deviations completely. This article builds off an earlier paper by Engle and Sarkar (2006) that investigates these properties of premiums (discounts) of ETFs from their fair market value. And looks to see if these premia have changed in the last 10 years. Our paper then diverges from the original and takes a deeper look into the standard deviations of these premia specifically.

Our findings show that over 70% of an ETFs standard deviation of premia can be explained through a linear combination consisting of two variables: a categorical (Domestic[US], Developed, Emerging) and a discrete variable (time-difference from US). This paper also finds that more traditional metrics such as market cap, ETF price volatility, and even 3rd party market indicators such as the economic freedom index and investment freedom index are insignificant predictors of an ETFs standard deviation of premia when combined with the categorical variable. These findings differ somewhat from existing literature which indicate that these factors should have a significant impact on the predictive ability of an ETFs standard deviation of premia.

Contributors

Agent

Created

Date Created
2019-05

133482-Thumbnail Image.png

Utilizing Machine Learning Methods to Model Cryptocurrency

Description

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.

Contributors

Agent

Created

Date Created
2018-05

133570-Thumbnail Image.png

Regression Analysis on Colony Collapse Disorder in the United States

Description

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10 plus years. Without a definite explanation, Colony Collapse Disorder (CCD) was coined to explain the sudden and sharp decline of the honey bee colonies that beekeepers were experiencing. Colony collapses have been rising higher compared to expected averages over the years, and during the winter season losses are even more severe than what is normally acceptable. There are some possible explanations pointing towards meteorological variables, diseases, and even pesticide usage. Despite the cause of CCD being unknown, thousands of beekeepers have reported their losses, and even numbers of infected colonies and colonies under certain stressors in the most recent years. Using the data that was reported to The United States Department of Agriculture (USDA), as well as weather data collected by The National Centers for Environmental Information (NOAA) and the National Centers for Environmental Information (NCEI), regression analysis was used to investigate honey bee colonies to find relationships between stressors in honey bee colonies and meteorological variables, and colony collapses during the winter months. The regression analysis focused on the winter season, or quarter 4 of the year, which includes the months of October, November, and December. In the model, the response variables was the percentage of colonies lost in quarter 4. Through the model, it was concluded that certain weather thresholds and the percentage increase of colonies under certain stressors were related to colony loss.

Contributors

Agent

Created

Date Created
2018-05

134418-Thumbnail Image.png

Assessing the Economic Prosperity of Persons with Disabilities in American Cities

Description

We seek a comprehensive measurement for the economic prosperity of persons with disabilities. We survey the current literature and identify the major economic indicators used to describe the socioeconomic standing of persons with disabilities. We then develop a methodology for

We seek a comprehensive measurement for the economic prosperity of persons with disabilities. We survey the current literature and identify the major economic indicators used to describe the socioeconomic standing of persons with disabilities. We then develop a methodology for constructing a statistically valid composite index of these indicators, and build this index using data from the 2014 American Community Survey. Finally, we provide context for further use and development of the index and describe an example application of the index in practice.

Contributors

Agent

Created

Date Created
2017-05

133957-Thumbnail Image.png

Statistical Properties of Coherent Structures in Two Dimensional Turbulence

Description

Coherent vortices are ubiquitous structures in natural flows that affect mixing and transport of substances and momentum/energy. Being able to detect these coherent structures is important for pollutant mitigation, ecological conservation and many other aspects. In recent years, mathematical criteria

Coherent vortices are ubiquitous structures in natural flows that affect mixing and transport of substances and momentum/energy. Being able to detect these coherent structures is important for pollutant mitigation, ecological conservation and many other aspects. In recent years, mathematical criteria and algorithms have been developed to extract these coherent structures in turbulent flows. In this study, we will apply these tools to extract important coherent structures and analyze their statistical properties as well as their implications on kinematics and dynamics of the flow. Such information will aide representation of small-scale nonlinear processes that large-scale models of natural processes may not be able to resolve.

Contributors

Created

Date Created
2018-05

133983-Thumbnail Image.png

Jump Dynamics

Description

There are multiple mathematical models for alignment of individuals moving within a group. In a first class of models, individuals tend to relax their velocity toward the average velocity of other nearby neighbors. These types of models are motivated by

There are multiple mathematical models for alignment of individuals moving within a group. In a first class of models, individuals tend to relax their velocity toward the average velocity of other nearby neighbors. These types of models are motivated by the flocking behavior exhibited by birds. Another class of models have been introduced to describe rapid changes of individual velocity, referred to as jump, which better describes behavior of smaller agents (e.g. locusts, ants). In the second class of model, individuals will randomly choose to align with another nearby individual, matching velocities. There are several open questions concerning these two type of behavior: which behavior is the most efficient to create a flock (i.e. to converge toward the same velocity)? Will flocking still emerge when the number of individuals approach infinity? Analysis of these models show that, in the homogeneous case where all individuals are capable of interacting with each other, the variance of the velocities in both the jump model and the relaxation model decays to 0 exponentially for any nonzero number of individuals. This implies the individuals in the system converge to an absorbing state where all individuals share the same velocity, therefore individuals converge to a flock even as the number of individuals approach infinity. Further analysis focused on the case where interactions between individuals were determined by an adjacency matrix. The second eigenvalues of the Laplacian of this adjacency matrix (denoted ƛ2) provided a lower bound on the rate of decay of the variance. When ƛ2 is nonzero, the system is said to converge to a flock almost surely. Furthermore, when the adjacency matrix is generated by a random graph, such that connections between individuals are formed with probability p (where 01/N. ƛ2 is a good estimator of the rate of convergence of the system, in comparison to the value of p used to generate the adjacency matrix..

Contributors

Agent

Created

Date Created
2018-05

133091-Thumbnail Image.png

FastStat: Online Statistics Calculator

Description

FastStat is a responsive website designed to work on any handheld, laptop, or desktop device. It serves as a first step into statistical calculations, educating the user on the basics of statistical analysis, and guiding them as they perform analyses

FastStat is a responsive website designed to work on any handheld, laptop, or desktop device. It serves as a first step into statistical calculations, educating the user on the basics of statistical analysis, and guiding them as they perform analyses of their own using built-in calculators. The calculators available can perform z tests, t tests, chi square tests, and analysis of variance tests to determine significant characteristics of the user's data. Outputted data includes means, standard deviations, significance levels, applicable statistics, and worded results indicating the outcome of the performed test. With its clean design, FastStat directs the user in an intuitive manner to fill in the information needed, giving clear indications of what types of values are needed where and flagging descriptive error messages if any inputted values are incorrect. FastStat also implements a halt to calculations if any errors are found, which saves time by avoiding impossible calculations. Once complete, FastStat outputs a variety of information of use to the user in a clearly labeled manner. The calculators are designed in such a way that the user will know what information they will get out of the calculator before performing any calculations at all. Aside from the calculators, FastStat includes introductory pages designed to get users familiar with common statistical terms and the associated tests, solidifying its purpose as an introductory tool. All tests are described by their typical uses, necessary inputs, calculated outputs, and extra notes of importance. Many terms are defined for the purpose of statistics, complete with examples to help educate the user on the concepts. With the information available, even the newest statistician can learn and begin performing tests almost immediately.

Contributors

Agent

Created

Date Created
2018-12

135858-Thumbnail Image.png

A Statistical Framework for Detecting Edges from Noisy Fourier Data with Multiple Concentration Factors

Description

The concentration factor edge detection method was developed to compute the locations and values of jump discontinuities in a piecewise-analytic function from its first few Fourier series coecients. The method approximates the singular support of a piecewise smooth function using

The concentration factor edge detection method was developed to compute the locations and values of jump discontinuities in a piecewise-analytic function from its first few Fourier series coecients. The method approximates the singular support of a piecewise smooth function using an altered Fourier conjugate partial sum. The accuracy and characteristic features of the resulting jump function approximation depends on these lters, known as concentration factors. Recent research showed that that these concentration factors could be designed using aexible iterative framework, improving upon the overall accuracy and robustness of the method, especially in the case where some Fourier data are untrustworthy or altogether missing. Hypothesis testing methods were used to determine how well the original concentration factor method could locate edges using noisy Fourier data. This thesis combines the iterative design aspect of concentration factor design and hypothesis testing by presenting a new algorithm that incorporates multiple concentration factors into one statistical test, which proves more ective at determining jump discontinuities than the previous HT methods. This thesis also examines how the quantity and location of Fourier data act the accuracy of HT methods. Numerical examples are provided.

Contributors

Agent

Created

Date Created
2016-05

136550-Thumbnail Image.png

Player Optimization in the National Football League: Creating a Winning Franchise

Description

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team be as successful as possible by defining which positions are most important to a team's success. Data from fifteen years of NFL games was collected and information on every player in the league was analyzed. First there needed to be a benchmark which describes a team as being average and then every player in the NFL must be compared to that average. Based on properties of linear regression using ordinary least squares this project aims to define such a model that shows each position's importance. Finally, once such a model had been established then the focus turned to the NFL draft in which the goal was to find a strategy of where each position needs to be drafted so that it is most likely to give the best payoff based on the results of the regression in part one.

Contributors

Created

Date Created
2015-05