Search Content

Utilizing Machine Learning Methods to Model Cryptocurrency

Description

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries…

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.

ContributorsBrowning, Jacob Christian (Author) / Meuth, Ryan (Thesis director) / Jones, Donald (Committee member) / McCulloch, Robert (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

NHL Goal Probability: Identifying Trends Across the League Using Predictive Modeling

Description

My project goes over creating a probability model to accurately predict the probability of a shot in the NHL becoming a goal. It explores different types of models to produce the most accurate model. The study explains which variables contribute most to whether a shot results in a goal or…

My project goes over creating a probability model to accurately predict the probability of a shot in the NHL becoming a goal. It explores different types of models to produce the most accurate model. The study explains which variables contribute most to whether a shot results in a goal or not and of those variables how teams can control them to have the most success.

ContributorsLachapelle, William (Author) / McCulloch, Robert (Thesis director) / Schneider, Laurence (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Department of Information Systems (Contributor)

Created2023-05

Lachapelle_Spring_2023.pdf

Description

My project goes over creating a probability model to accurately predict the probability of a shot in the NHL becoming a goal. It explores different types of models to produce the most accurate model. The study explains which variables contribute most to whether a shot results in a goal or…

My project goes over creating a probability model to accurately predict the probability of a shot in the NHL becoming a goal. It explores different types of models to produce the most accurate model. The study explains which variables contribute most to whether a shot results in a goal or not and of those variables how teams can control them to have the most success.

ContributorsLachapelle, William (Author) / McCulloch, Robert (Thesis director) / Schneider, Laurence (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Department of Information Systems (Contributor)

Created2023-05

Abstract.pdf

Description

My project goes over creating a probability model to accurately predict the probability of a shot in the NHL becoming a goal. It explores different types of models to produce the most accurate model. The study explains which variables contribute most to whether a shot results in a goal or…

My project goes over creating a probability model to accurately predict the probability of a shot in the NHL becoming a goal. It explores different types of models to produce the most accurate model. The study explains which variables contribute most to whether a shot results in a goal or not and of those variables how teams can control them to have the most success.

ContributorsLachapelle, William (Author) / McCulloch, Robert (Thesis director) / Schneider, Laurence (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Department of Information Systems (Contributor)

Created2023-05

Machine Learning and Causal Inference: Theory, Examples, and Computational Results

Description

This dissertation covers several topics in machine learning and causal inference. First, the question of “feature selection,” a common byproduct of regularized machine learning methods, is investigated theoretically in the context of treatment effect estimation. This involves a detailed review and extension of frameworks for estimating causal effects and in-depth…

This dissertation covers several topics in machine learning and causal inference. First, the question of “feature selection,” a common byproduct of regularized machine learning methods, is investigated theoretically in the context of treatment effect estimation. This involves a detailed review and extension of frameworks for estimating causal effects and in-depth theoretical study. Next, various computational approaches to estimating causal effects with machine learning methods are compared with these theoretical desiderata in mind. Several improvements to current methods for causal machine learning are identified and compelling angles for further study are pinpointed. Finally, a common method used for “explaining” predictions of machine learning algorithms, SHAP, is evaluated critically through a statistical lens.

ContributorsHerren, Andrew (Author) / Hahn, P Richard (Thesis advisor) / Kao, Ming-Hung (Committee member) / Lopes, Hedibert (Committee member) / McCulloch, Robert (Committee member) / Zhou, Shuang (Committee member) / Arizona State University (Publisher)

Created2023

US Forest Fire Size Prediction using Machine Learning

Description

The number of extreme wildfires is on the rise globally, and predicting the size of a fire will help officials make appropriate decisions to mitigate the risk the fire poses against the environment and humans. This study attempts to find the burned area of fires in the United States based…

The number of extreme wildfires is on the rise globally, and predicting the size of a fire will help officials make appropriate decisions to mitigate the risk the fire poses against the environment and humans. This study attempts to find the burned area of fires in the United States based on attributes such as time, weather, and location of the fire using machine learning methods.

ContributorsPrabagaran, Padma (Author, Co-author) / Meuth, Ryan (Thesis director) / McCulloch, Robert (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-12

Advances in Local Multiscale Modeling in a Regression Framework

Description

Embedded within the regression framework, local models can estimate conditioned relationships between observed spatial phenomena and hypothesized explanatory variables and help infer the intangible spatial processes that contribute to the observed spatial patterns. Rather than investigating averaged characteristics corresponding to processes over space as global models do, these models estimate…

Embedded within the regression framework, local models can estimate conditioned relationships between observed spatial phenomena and hypothesized explanatory variables and help infer the intangible spatial processes that contribute to the observed spatial patterns. Rather than investigating averaged characteristics corresponding to processes over space as global models do, these models estimate a surface of spatially varying parameters with a value for each location. Additionally, some models such as variants within the Geographically Weighted Regression (GWR) framework, also estimate a parameter to represent the spatial scale across which the processes vary representing the inherent heterogeneity of the estimated surfaces. Since different processes tend to operate at unique spatial scales, some extensions to local models such as Multiscale GWR (MGWR) estimate unique scales of association for each predictor in a model and generate significantly more information on the nature of geographic processes than their predecessors. However, developments within the realm of local models are fairly nascent and hence an understanding around their correct application as well as recognizing their true potential in exploring fundamental spatial science issues is under-developed. The techniques within these frameworks are also currently limited thus restricting the kinds of data that can be analyzed using these models. Therefore the goal of this dissertation is to advance techniques within local multiscale modeling specifically by coining new diagnostics, exploring their novel application in understanding long-standing issues concerning spatial scale and by expanding the tool base to allow their use in wider empirical applications. This goal is realized through three distinct research objectives over four chapters, followed by a discussion on the future of the developments within local multiscale modeling. A correct understanding of the capability and promise of local multiscale models and expanding the fields where they can be employed will not only enhance geographical research by strengthening the intuition of the nature of geographic processes, but will also exemplify the importance and need for using such tools bringing quantitative spatial science to the fore.

ContributorsSachdeva, Mehak (Author) / Fotheringham, A. Stewart (Thesis advisor) / Goodchild, Michael Frank (Committee member) / Kedron, Peter (Committee member) / Wolf, Levi John (Committee member) / Arizona State University (Publisher)

Created2022

Spatial Regression and Gaussian Process BART

Description

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models and their real world applications. New methods and models were…

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models and their real world applications. New methods and models were proposed to overcome the challenges in practice. There are three major parts in the dissertation.

In the first part, nonlinear regression models were embedded into a multistage workflow to predict the spatial abundance of reef fish species in the Gulf of Mexico. There were two challenges, zero-inflated data and out of sample prediction. The methods and models in the workflow could effectively handle the zero-inflated sampling data without strong assumptions. Three strategies were proposed to solve the out of sample prediction problem. The results and discussions showed that the nonlinear prediction had the advantages of high accuracy, low bias and well-performed in multi-resolution.

In the second part, a two-stage spatial regression model was proposed for analyzing soil carbon stock (SOC) data. In the first stage, there was a spatial linear mixed model that captured the linear and stationary effects. In the second stage, a generalized additive model was used to explain the nonlinear and nonstationary effects. The results illustrated that the two-stage model had good interpretability in understanding the effect of covariates, meanwhile, it kept high prediction accuracy which is competitive to the popular machine learning models, like, random forest, xgboost and support vector machine.

A new nonlinear regression model, Gaussian process BART (Bayesian additive regression tree), was proposed in the third part. Combining advantages in both BART and Gaussian process, the model could capture the nonlinear effects of both observed and latent covariates. To develop the model, first, the traditional BART was generalized to accommodate correlated errors. Then, the failure of likelihood based Markov chain Monte Carlo (MCMC) in parameter estimating was discussed. Based on the idea of analysis of variation, back comparing and tuning range, were proposed to tackle this failure. Finally, effectiveness of the new model was examined by experiments on both simulation and real data.

ContributorsLu, Xuetao (Author) / McCulloch, Robert (Thesis advisor) / Hahn, Paul (Committee member) / Lan, Shiwei (Committee member) / Zhou, Shuang (Committee member) / Saul, Steven (Committee member) / Arizona State University (Publisher)

Created2020

Multiscale Geographically Weighted Regression: Computation, Inference, and Application

Description

Geographically Weighted Regression (GWR) has been broadly used in various fields to

model spatially non-stationary relationships. Classic GWR is considered as a single-scale model that is based on one bandwidth parameter which controls the amount of distance-decay in weighting neighboring data around each location. The single bandwidth in GWR assumes that…

Geographically Weighted Regression (GWR) has been broadly used in various fields to

model spatially non-stationary relationships. Classic GWR is considered as a single-scale model that is based on one bandwidth parameter which controls the amount of distance-decay in weighting neighboring data around each location. The single bandwidth in GWR assumes that processes (relationships between the response variable and the predictor variables) all operate at the same scale. However, this posits a limitation in modeling potentially multi-scale processes which are more often seen in the real world. For example, the measured ambient temperature of a location is affected by the built environment, regional weather and global warming, all of which operate at different scales. A recent advancement to GWR termed Multiscale GWR (MGWR) removes the single bandwidth assumption and allows the bandwidths for each covariate to vary. This results in each parameter surface being allowed to have a different degree of spatial variation, reflecting variation across covariate-specific processes. In this way, MGWR has the capability to differentiate local, regional and global processes by using varying bandwidths for covariates. Additionally, bandwidths in MGWR become explicit indicators of the scale at various processes operate. The proposed dissertation covers three perspectives centering on MGWR: Computation; Inference; and Application. The first component focuses on addressing computational issues in MGWR to allow MGWR models to be calibrated more efficiently and to be applied on large datasets. The second component aims to statistically differentiate the spatial scales at which different processes operate by quantifying the uncertainty associated with each bandwidth obtained from MGWR. In the third component, an empirical study will be conducted to model the changing relationships between county-level socio-economic factors and voter preferences in the 2008-2016 United States presidential elections using MGWR.

ContributorsLi, Ziqi (Author) / Fotheringham, A. Stewart (Thesis advisor) / Goodchild, Michael F. (Committee member) / Li, Wenwen (Committee member) / Arizona State University (Publisher)

Created2020

Using Machine Learning Classification Techniques to Predict Recessionary Periods in the U.S. Economy

Description

The goal of this research project is to determine how beneficial machine learning (ML) techniquescan be in predicting recessions. Past work has utilized a multitude of classification methods from Probit models to linear Support Vector Machines (SVMs) and obtained accuracies nearing 60-70%, where some models even predicted the Great Recession…

The goal of this research project is to determine how beneficial machine learning (ML) techniquescan be in predicting recessions. Past work has utilized a multitude of classification methods from Probit models to linear Support Vector Machines (SVMs) and obtained accuracies nearing 60-70%, where some models even predicted the Great Recession based off data from the previous 50 years. This paper will build on past work, by starting with less complex classification techniques that are more broadly used in recession forecasting and end by incorporating more complex ML models that produce higher accuracies than their more primitive counterparts. Many models were tested in this analysis and the findings here corroborate past work that the SVM methodology produces more accurate results than currently used probit models, but adds on that other ML models produced sufficient accuracy as well.

ContributorsHogan, Carter (Author) / McCulloch, Robert (Thesis director) / Pereira, Claudiney (Committee member) / Barrett, The Honors College (Contributor) / School of International Letters and Cultures (Contributor) / Economics Program in CLAS (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05

Filtering by