Search Content

Swarming in bounded domains

Description

Swarms of animals, fish, birds, locusts etc. are a common occurrence but their coherence and method of organization poses a major question for mathematics and biology.The Vicsek and the Attraction-Repulsion are two models that have been proposed to explain the emergence of collective motion. A major issue…

Swarms of animals, fish, birds, locusts etc. are a common occurrence but their coherence and method of organization poses a major question for mathematics and biology.The Vicsek and the Attraction-Repulsion are two models that have been proposed to explain the emergence of collective motion. A major issue for the Vicsek Model is that its particles are not attracted to each other, leaving the swarm with alignment in velocity but without spatial coherence. Restricting the particles to a bounded domain generates global spatial coherence of swarms while maintaining velocity alignment. While individual particles are specularly reflected at the boundary, the swarm as a whole is not. As a result, new dynamical swarming solutions are found.

The Attraction-Repulsion Model set with a long-range attraction and short-range repulsion interaction potential typically stabilizes to a well-studied flock steady state solution. The particles for a flock remain spatially coherent but have no spatial bound and explore all space. A bounded domain with specularly reflecting walls traps the particles within a specific region. A fundamental refraction law for a swarm impacting on a planar boundary is derived. The swarm reflection varies from specular for a swarm dominated by

kinetic energy to inelastic for a swarm dominated by potential energy. Inelastic collisions lead to alignment with the wall and to damped pulsating oscillations of the swarm. The fundamental refraction law provides a one-dimensional iterative map that allows for a prediction and analysis of the trajectory of the center of mass of a flock in a channel and a square domain.

The extension of the wall collisions to a scattering experiment is conducted by setting two identical flocks to collide. The two particle dynamics is studied analytically and shows a transition from scattering: diverging flocks to bound states in the form of oscillations or parallel motions. Numerical studies of collisions of flocks show the same transition where the bound states become either a single translating flock or a rotating (mill).

ContributorsThatcher, Andrea (Author) / Armbruster, Hans (Thesis advisor) / Motsch, Sebastien (Committee member) / Ringhofer, Christian (Committee member) / Platte, Rodrigo (Committee member) / Gardner, Carl (Committee member) / Arizona State University (Publisher)

Created2015

Three essays on shrinkage estimation and model selection of linear and nonlinear time series models

Description

The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal dynamics to improve forecasting. Estimation of ARMA can be performed…

The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal dynamics to improve forecasting. Estimation of ARMA can be performed through regressing current values on previous realizations and proxy innovations. The classic paradigm fails when dynamics are nonlinear; in this case, parametric, regime-switching specifications model changes in level, ARMA dynamics, and volatility, using a finite number of latent states. If the states can be identified using past endogenous or exogenous information, a threshold autoregressive (TAR) or logistic smooth transition autoregressive (LSTAR) model may simplify complex nonlinear associations to conditional weakly stationary processes. For ARMA, TAR, and STAR, order parameters quantify the extent past information is associated with the future. Unfortunately, even if model orders are known a priori, the possibility of over-fitting can lead to sub-optimal forecasting performance. By intentionally overestimating these orders, a linear representation of the full model is exploited and Bayesian regularization can be used to achieve sparsity. Global-local shrinkage priors for AR, MA, and exogenous coefficients are adopted to pull posterior means toward 0 without over-shrinking relevant effects. This dissertation introduces, evaluates, and compares Bayesian techniques that automatically perform model selection and coefficient estimation of ARMA, TAR, and STAR models. Multiple Monte Carlo experiments illustrate the accuracy of these methods in finding the "true" data generating process. Practical applications demonstrate their efficacy in forecasting.

ContributorsGiacomazzo, Mario (Author) / Kamarianakis, Yiannis (Thesis advisor) / Reiser, Mark R. (Committee member) / McCulloch, Robert (Committee member) / Hahn, Richard (Committee member) / Fricks, John (Committee member) / Arizona State University (Publisher)

Created2018

Supervised and ensemble classification of multivariate functional data: applications to lupus diagnosis

Description

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional…

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional nonparametric classifiers form a methodological basis, which is used herein to develop a) the family of ESFuNC segment-wise curve classification algorithms and b) per-pixel ensembles based on logistic regression and fused-LASSO. The proposed methods achieve test set accuracy rates as high as 94.3%, while returning information about regions of the temperature domain that are critical for population discrimination. The undertaken analyses suggest that derivate-based information contributes significantly in improved classification performance relative to recently published studies on SLE plasma thermograms.

ContributorsBuscaglia, Robert, Ph.D (Author) / Kamarianakis, Yiannis (Thesis advisor) / Armbruster, Dieter (Committee member) / Lanchier, Nicholas (Committee member) / McCulloch, Robert (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2018

Bayesian Inference Frameworks for Fluorescence Microscopy Data Analysis

Description

In this work, I present a Bayesian inference computational framework for the analysis of widefield microscopy data that addresses three challenges: (1) counting and localizing stationary fluorescent molecules; (2) inferring a spatially-dependent effective fluorescence profile that describes the spatially-varying rate at which fluorescent molecules emit subsequently-detected photons (due to different…

In this work, I present a Bayesian inference computational framework for the analysis of widefield microscopy data that addresses three challenges: (1) counting and localizing stationary fluorescent molecules; (2) inferring a spatially-dependent effective fluorescence profile that describes the spatially-varying rate at which fluorescent molecules emit subsequently-detected photons (due to different illumination intensities or different local environments); and (3) inferring the camera gain. My general theoretical framework utilizes the Bayesian nonparametric Gaussian and beta-Bernoulli processes with a Markov chain Monte Carlo sampling scheme, which I further specify and implement for Total Internal Reflection Fluorescence (TIRF) microscopy data, benchmarking the method on synthetic data. These three frameworks are self-contained, and can be used concurrently so that the fluorescence profile and emitter locations are both considered unknown and, under some conditions, learned simultaneously. The framework I present is flexible and may be adapted to accommodate the inference of other parameters, such as emission photophysical kinetics and the trajectories of moving molecules. My TIRF-specific implementation may find use in the study of structures on cell membranes, or in studying local sample properties that affect fluorescent molecule photon emission rates.

ContributorsWallgren, Ross (Author) / Presse, Steve (Thesis advisor) / Armbruster, Hans (Thesis advisor) / McCulloch, Robert (Committee member) / Arizona State University (Publisher)

Created2019

A study of accelerated Bayesian additive regression trees

Description

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading…

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading algorithms, including BART. The results show that XBART maintains BART’s predictive power while reducing its computation time. The thesis also describes the development of a Python package implementing XBART.

ContributorsYalov, Saar (Author) / Hahn, P. Richard (Thesis advisor) / McCulloch, Robert (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)

Created2019

Utilizing Machine Learning Methods to Model Cryptocurrency

Description

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries…

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.

ContributorsBrowning, Jacob Christian (Author) / Meuth, Ryan (Thesis director) / Jones, Donald (Committee member) / McCulloch, Robert (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Forecasting the 85281 Residential Rental Property

Description

The listing price of residential rental real estate is dependent upon property specific attributes. These attributes involve data that can be tabulated as categorical and continuous predictors. The forecasting model presented in this paper is developed using publicly available, property specific information sourced from the Zillow and Trulia online real…

The listing price of residential rental real estate is dependent upon property specific attributes. These attributes involve data that can be tabulated as categorical and continuous predictors. The forecasting model presented in this paper is developed using publicly available, property specific information sourced from the Zillow and Trulia online real estate databases. The following fifteen predictors were tracked for forty-eight rental listings in the 85281 area code: housing type, square footage, number of baths, number of bedrooms, distance to Arizona State University’s Tempe Campus, crime level of the neighborhood, median age range of the neighborhood population, percentage of the neighborhood population that is married, median year of construction of the neighborhood, percentage of the population commuting longer than thirty minutes, percentage of neighborhood homes occupied by renters, percentage of the population commuting by transit, and the number of restaurants, grocery stores, and nightlife within a one mile radius of the property. Through regression analysis, the significant predictors of the listing price of a rental property in the 85281 area code were discerned. These predictors were used to form a forecasting model. This forecasting model explains 75.5% of the variation in listing prices of residential rental real estate in the 85281 area code.

ContributorsSchuchter, Grant (Author) / Clough, Michael (Thesis director) / Escobedo, Adolfo (Committee member) / Industrial, Systems & Operations Engineering Prgm (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Comparison of Regression and Tree Analyses for Predicting Alcohol-Related Problems

Description

Problems related to alcohol consumption cause not only extra economic expenses, but are an expense to the health of both drinkers and non-drinkers due to the harm directly and indirectly caused by alcohol consumption. Investigating predictors and reasons for alcohol-related problems is of importance, as alcohol-related problems could be prevented…

Problems related to alcohol consumption cause not only extra economic expenses, but are an expense to the health of both drinkers and non-drinkers due to the harm directly and indirectly caused by alcohol consumption. Investigating predictors and reasons for alcohol-related problems is of importance, as alcohol-related problems could be prevented by quitting or limiting consumption of alcohol. We were interested in predicting alcohol-related problems using multiple linear regression and regression trees, and then comparing the regressions to the tree. Impaired control, anxiety sensitivity, mother permissiveness, father permissiveness, gender, and age were included as predictors. The data used was comprised of participants (n=835) sampled from students at Arizona State University. A multiple linear regression without interactions, multiple linear regression with two-way interactions and squares, and a regression tree were used and compared. The regression and the tree had similar results. Multiple interactions of variables predicted alcohol-related problems. Overall, the tree was easier to interpret than the regressions, however, the regressions provided specific predicted alcohol-related problems scores, whereas the tree formed large groups and had a predicted alcohol-related problems score for each group. Nevertheless, the tree still predicted alcohol-related problems nearly as well, if not better than the regressions.

ContributorsVoorhies, Kirsten Reed (Author) / McCulloch, Robert (Thesis director) / Zheng, Yi (Committee member) / Patock-Peckham, Julie (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Department of Psychology (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

An improved mathematical formulation for the carbon capture and storage (CCS) problem

Description

Carbon Capture and Storage (CCS) is a climate stabilization strategy that prevents CO2 emissions from entering the atmosphere. Despite its benefits, impactful CCS projects require large investments in infrastructure, which could deter governments from implementing this strategy. In this sense, the development of innovative tools to support large-scale cost-efficient CCS…

Carbon Capture and Storage (CCS) is a climate stabilization strategy that prevents CO2 emissions from entering the atmosphere. Despite its benefits, impactful CCS projects require large investments in infrastructure, which could deter governments from implementing this strategy. In this sense, the development of innovative tools to support large-scale cost-efficient CCS deployment decisions is critical for climate change mitigation. This thesis proposes an improved mathematical formulation for the scalable infrastructure model for CCS (SimCCS), whose main objective is to design a minimum-cost pipe network to capture, transport, and store a target amount of CO2. Model decisions include source, reservoir, and pipe selection, as well as CO2 amounts to capture, store, and transport. By studying the SimCCS optimal solution and the subjacent network topology, new valid inequalities (VI) are proposed to strengthen the existing mathematical formulation. These constraints seek to improve the quality of the linear relaxation solutions in the branch and bound algorithm used to solve SimCCS. Each VI is explained with its intuitive description, mathematical structure and examples of resulting improvements. Further, all VIs are validated by assessing the impact of their elimination from the new formulation. The validated new formulation solves the 72-nodes Alberta problem up to 7 times faster than the original model. The upgraded model reduces the computation time required to solve SimCCS in 72% of randomly generated test instances, solving SimCCS up to 200 times faster. These formulations can be tested and then applied to enhance variants of the SimCCS and general fixed-charge network flow problems. Finally, an experience from testing a Benders decomposition approach for SimCCS is discussed and future scope of probable efficient solution-methods is outlined.

ContributorsLobo, Loy Joseph (Author) / Sefair, Jorge A (Thesis advisor) / Escobedo, Adolfo (Committee member) / Kuby, Michael (Committee member) / Middleton, Richard (Committee member) / Arizona State University (Publisher)

Created2017

An information based optimal subdata selection algorithm for big data linear regression and a suitable variable selection algorithm

Description

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives…

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA is studied from three aspects: nonorthogonality, including interaction terms and variable misspecification. A new accurate variable selection algorithm is proposed to help the implementation of IBOSS algorithms when a large number of variables are present with sparse important variables among them. Aggregating random subsample results, this variable selection algorithm is much more accurate than the LASSO method using full data. Since the time complexity is associated with the number of variables only, it is also very computationally efficient if the number of variables is fixed as n increases and not massively large. More importantly, using subsamples it solves the problem that full data cannot be stored in the memory when a data set is too large.

ContributorsZheng, Yi (Author) / Stufken, John (Thesis advisor) / Reiser, Mark R. (Committee member) / McCulloch, Robert (Committee member) / Arizona State University (Publisher)

Created2017