Search Content

Three essays on shrinkage estimation and model selection of linear and nonlinear time series models

Description

The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal dynamics to improve forecasting. Estimation of ARMA can be performed…

The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal dynamics to improve forecasting. Estimation of ARMA can be performed through regressing current values on previous realizations and proxy innovations. The classic paradigm fails when dynamics are nonlinear; in this case, parametric, regime-switching specifications model changes in level, ARMA dynamics, and volatility, using a finite number of latent states. If the states can be identified using past endogenous or exogenous information, a threshold autoregressive (TAR) or logistic smooth transition autoregressive (LSTAR) model may simplify complex nonlinear associations to conditional weakly stationary processes. For ARMA, TAR, and STAR, order parameters quantify the extent past information is associated with the future. Unfortunately, even if model orders are known a priori, the possibility of over-fitting can lead to sub-optimal forecasting performance. By intentionally overestimating these orders, a linear representation of the full model is exploited and Bayesian regularization can be used to achieve sparsity. Global-local shrinkage priors for AR, MA, and exogenous coefficients are adopted to pull posterior means toward 0 without over-shrinking relevant effects. This dissertation introduces, evaluates, and compares Bayesian techniques that automatically perform model selection and coefficient estimation of ARMA, TAR, and STAR models. Multiple Monte Carlo experiments illustrate the accuracy of these methods in finding the "true" data generating process. Practical applications demonstrate their efficacy in forecasting.

ContributorsGiacomazzo, Mario (Author) / Kamarianakis, Yiannis (Thesis advisor) / Reiser, Mark R. (Committee member) / McCulloch, Robert (Committee member) / Hahn, Richard (Committee member) / Fricks, John (Committee member) / Arizona State University (Publisher)

Created2018

Supervised and ensemble classification of multivariate functional data: applications to lupus diagnosis

Description

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional…

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional nonparametric classifiers form a methodological basis, which is used herein to develop a) the family of ESFuNC segment-wise curve classification algorithms and b) per-pixel ensembles based on logistic regression and fused-LASSO. The proposed methods achieve test set accuracy rates as high as 94.3%, while returning information about regions of the temperature domain that are critical for population discrimination. The undertaken analyses suggest that derivate-based information contributes significantly in improved classification performance relative to recently published studies on SLE plasma thermograms.

ContributorsBuscaglia, Robert, Ph.D (Author) / Kamarianakis, Yiannis (Thesis advisor) / Armbruster, Dieter (Committee member) / Lanchier, Nicholas (Committee member) / McCulloch, Robert (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2018

Bayesian Inference Frameworks for Fluorescence Microscopy Data Analysis

Description

In this work, I present a Bayesian inference computational framework for the analysis of widefield microscopy data that addresses three challenges: (1) counting and localizing stationary fluorescent molecules; (2) inferring a spatially-dependent effective fluorescence profile that describes the spatially-varying rate at which fluorescent molecules emit subsequently-detected photons (due to different…

In this work, I present a Bayesian inference computational framework for the analysis of widefield microscopy data that addresses three challenges: (1) counting and localizing stationary fluorescent molecules; (2) inferring a spatially-dependent effective fluorescence profile that describes the spatially-varying rate at which fluorescent molecules emit subsequently-detected photons (due to different illumination intensities or different local environments); and (3) inferring the camera gain. My general theoretical framework utilizes the Bayesian nonparametric Gaussian and beta-Bernoulli processes with a Markov chain Monte Carlo sampling scheme, which I further specify and implement for Total Internal Reflection Fluorescence (TIRF) microscopy data, benchmarking the method on synthetic data. These three frameworks are self-contained, and can be used concurrently so that the fluorescence profile and emitter locations are both considered unknown and, under some conditions, learned simultaneously. The framework I present is flexible and may be adapted to accommodate the inference of other parameters, such as emission photophysical kinetics and the trajectories of moving molecules. My TIRF-specific implementation may find use in the study of structures on cell membranes, or in studying local sample properties that affect fluorescent molecule photon emission rates.

ContributorsWallgren, Ross (Author) / Presse, Steve (Thesis advisor) / Armbruster, Hans (Thesis advisor) / McCulloch, Robert (Committee member) / Arizona State University (Publisher)

Created2019

A study of accelerated Bayesian additive regression trees

Description

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading…

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading algorithms, including BART. The results show that XBART maintains BART’s predictive power while reducing its computation time. The thesis also describes the development of a Python package implementing XBART.

ContributorsYalov, Saar (Author) / Hahn, P. Richard (Thesis advisor) / McCulloch, Robert (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)

Created2019

Balancing the Present and Future: Making Valuable Predictions for Continuous Improvement in Management

Description

In the words of W. Edwards Deming, "the central problem in management and in leadership is failure to understand the information in variation." While many quality management programs propose the institution of technical training in advanced statistical methods, this paper proposes that by understanding the fundamental information behind statistical theory,…

In the words of W. Edwards Deming, "the central problem in management and in leadership is failure to understand the information in variation." While many quality management programs propose the institution of technical training in advanced statistical methods, this paper proposes that by understanding the fundamental information behind statistical theory, and by minimizing bias and variance while fully utilizing the available information about the system at hand, one can make valuable, accurate predictions about the future. Combining this knowledge with the work of quality gurus W. E. Deming, Eliyahu Goldratt, and Dean Kashiwagi, a framework for making valuable predictions for continuous improvement is made. After this information is synthesized, it is concluded that the best way to make accurate, informative predictions about the future is to "balance the present and future," seeing the future through the lens of the present and thus minimizing bias, variance, and risk.

ContributorsSynodis, Nicholas Dahn (Author) / Kashiwagi, Dean (Thesis director, Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Player Optimization in the National Football League: Creating a Winning Franchise

Description

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team…

The NFL is one of largest and most influential industries in the world. In America there are few companies that have a stronger hold on the American culture and create such a phenomena from year to year. In this project aimed to develop a strategy that helps an NFL team be as successful as possible by defining which positions are most important to a team's success. Data from fifteen years of NFL games was collected and information on every player in the league was analyzed. First there needed to be a benchmark which describes a team as being average and then every player in the NFL must be compared to that average. Based on properties of linear regression using ordinary least squares this project aims to define such a model that shows each position's importance. Finally, once such a model had been established then the focus turned to the NFL draft in which the goal was to find a strategy of where each position needs to be drafted so that it is most likely to give the best payoff based on the results of the regression in part one.

ContributorsBalzer, Kevin Ryan (Author) / Goegan, Brian (Thesis director) / Dassanayake, Maduranga (Committee member) / Barrett, The Honors College (Contributor) / Economics Program in CLAS (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

A Statistical Framework for Detecting Edges from Noisy Fourier Data with Multiple Concentration Factors

Description

The concentration factor edge detection method was developed to compute the locations and values of jump discontinuities in a piecewise-analytic function from its first few Fourier series coecients. The method approximates the singular support of a piecewise smooth function using an altered Fourier conjugate partial sum. The accuracy and characteristic…

The concentration factor edge detection method was developed to compute the locations and values of jump discontinuities in a piecewise-analytic function from its first few Fourier series coecients. The method approximates the singular support of a piecewise smooth function using an altered Fourier conjugate partial sum. The accuracy and characteristic features of the resulting jump function approximation depends on these lters, known as concentration factors. Recent research showed that that these concentration factors could be designed using aexible iterative framework, improving upon the overall accuracy and robustness of the method, especially in the case where some Fourier data are untrustworthy or altogether missing. Hypothesis testing methods were used to determine how well the original concentration factor method could locate edges using noisy Fourier data. This thesis combines the iterative design aspect of concentration factor design and hypothesis testing by presenting a new algorithm that incorporates multiple concentration factors into one statistical test, which proves more ective at determining jump discontinuities than the previous HT methods. This thesis also examines how the quantity and location of Fourier data act the accuracy of HT methods. Numerical examples are provided.

ContributorsLubold, Shane Michael (Author) / Gelb, Anne (Thesis director) / Cochran, Doug (Committee member) / Viswanathan, Aditya (Committee member) / Economics Program in CLAS (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Prioritizing Projects on Time and Cost Savings for a more Efficient Manufacturing Process of a Semiconductor Company

Description

Over the course of six months, we have worked in partnership with Arizona State University and a leading producer of semiconductor chips in the United States market (referred to as the "Company"), lending our skills in finance, statistics, model building, and external insight. We attempt to design models that hel…

Over the course of six months, we have worked in partnership with Arizona State University and a leading producer of semiconductor chips in the United States market (referred to as the "Company"), lending our skills in finance, statistics, model building, and external insight. We attempt to design models that help predict how much time it takes to implement a cost-saving project. These projects had previously been considered only on the merit of cost savings, but with an added dimension of time, we hope to forecast time according to a number of variables. With such a forecast, we can then apply it to an expense project prioritization model which relates time and cost savings together, compares many different projects simultaneously, and returns a series of present value calculations over different ranges of time. The goal is twofold: assist with an accurate prediction of a project's time to implementation, and provide a basis to compare different projects based on their present values, ultimately helping to reduce the Company's manufacturing costs and improve gross margins. We believe this approach, and the research found toward this goal, is most valuable for the Company. Two coaches from the Company have provided assistance and clarified our questions when necessary throughout our research. In this paper, we begin by defining the problem, setting an objective, and establishing a checklist to monitor our progress. Next, our attention shifts to the data: making observations, trimming the dataset, framing and scoping the variables to be used for the analysis portion of the paper. Before creating a hypothesis, we perform a preliminary statistical analysis of certain individual variables to enrich our variable selection process. After the hypothesis, we run multiple linear regressions with project duration as the dependent variable. After regression analysis and a test for robustness, we shift our focus to an intuitive model based on rules of thumb. We relate these models to an expense project prioritization tool developed using Microsoft Excel software. Our deliverables to the Company come in the form of (1) a rules of thumb intuitive model and (2) an expense project prioritization tool.

ContributorsAl-Assi, Hashim (Co-author) / Chiang, Robert (Co-author) / Liu, Andrew (Co-author) / Ludwick, David (Co-author) / Simonson, Mark (Thesis director) / Hertzel, Michael (Committee member) / Barrett, The Honors College (Contributor) / Department of Information Systems (Contributor) / Department of Finance (Contributor) / Department of Economics (Contributor) / Department of Supply Chain Management (Contributor) / School of Accountancy (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Mechanical and Aerospace Engineering Program (Contributor) / WPC Graduate Programs (Contributor)

Created2015-05

Statistical Properties of Coherent Structures in Two Dimensional Turbulence

Description

Coherent vortices are ubiquitous structures in natural flows that affect mixing and transport of substances and momentum/energy. Being able to detect these coherent structures is important for pollutant mitigation, ecological conservation and many other aspects. In recent years, mathematical criteria and algorithms have been developed to extract these coherent structures…

Coherent vortices are ubiquitous structures in natural flows that affect mixing and transport of substances and momentum/energy. Being able to detect these coherent structures is important for pollutant mitigation, ecological conservation and many other aspects. In recent years, mathematical criteria and algorithms have been developed to extract these coherent structures in turbulent flows. In this study, we will apply these tools to extract important coherent structures and analyze their statistical properties as well as their implications on kinematics and dynamics of the flow. Such information will aide representation of small-scale nonlinear processes that large-scale models of natural processes may not be able to resolve.

ContributorsCass, Brentlee Jerry (Author) / Tang, Wenbo (Thesis director) / Kostelich, Eric (Committee member) / Department of Information Systems (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Exploring the Relation Between NAV and Price of ETFs in Financial Markets

Description

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to achieve their functionality and this mechanism is designed to minimize the deviations that occur between the ETF’s listed price…

Exchange traded funds (ETFs) in many ways are similar to more traditional closed-end mutual funds, although thee differ in a crucial way. ETFs rely on a creation and redemption feature to achieve their functionality and this mechanism is designed to minimize the deviations that occur between the ETF’s listed price and the net asset value of the ETF’s underlying assets. However while this does cause ETF deviations to be generally lower than their mutual fund counterparts, as our paper explores this process does not eliminate these deviations completely. This article builds off an earlier paper by Engle and Sarkar (2006) that investigates these properties of premiums (discounts) of ETFs from their fair market value. And looks to see if these premia have changed in the last 10 years. Our paper then diverges from the original and takes a deeper look into the standard deviations of these premia specifically.

Our findings show that over 70% of an ETFs standard deviation of premia can be explained through a linear combination consisting of two variables: a categorical (Domestic[US], Developed, Emerging) and a discrete variable (time-difference from US). This paper also finds that more traditional metrics such as market cap, ETF price volatility, and even 3rd party market indicators such as the economic freedom index and investment freedom index are insignificant predictors of an ETFs standard deviation of premia when combined with the categorical variable. These findings differ somewhat from existing literature which indicate that these factors should have a significant impact on the predictive ability of an ETFs standard deviation of premia.

ContributorsZhang, Jingbo (Co-author, Co-author) / Henning, Thomas (Co-author) / Simonson, Mark (Thesis director) / Licon, L. Wendell (Committee member) / Department of Finance (Contributor) / Department of Information Systems (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Filtering by