Search Content

Post-Optimization of Permutation Coverings

Description

Covering subsequences with sets of permutations arises in many applications, including event-sequence testing. Given a set of subsequences to cover, one is often interested in knowing the fewest number of permutations required to cover each subsequence, and in finding an explicit construction of such a set of permutations that has…

Covering subsequences with sets of permutations arises in many applications, including event-sequence testing. Given a set of subsequences to cover, one is often interested in knowing the fewest number of permutations required to cover each subsequence, and in finding an explicit construction of such a set of permutations that has size close to or equal to the minimum possible. The construction of such permutation coverings has proven to be computationally difficult. While many examples for permutations of small length have been found, and strong asymptotic behavior is known, there are few explicit constructions for permutations of intermediate lengths. Most of these are generated from scratch using greedy algorithms. We explore a different approach here. Starting with a set of permutations with the desired coverage properties, we compute local changes to individual permutations that retain the total coverage of the set. By choosing these local changes so as to make one permutation less "essential" in maintaining the coverage of the set, our method attempts to make a permutation completely non-essential, so it can be removed without sacrificing total coverage. We develop a post-optimization method to do this and present results on sequence covering arrays and other types of permutation covering problems demonstrating that it is surprisingly effective.

ContributorsMurray, Patrick Charles (Author) / Colbourn, Charles (Thesis director) / Czygrinow, Andrzej (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Department of Physics (Contributor)

Created2014-12

Categorizing and Discovering Social Bots

Description

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot…

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot detection, we are interested in bots on Twitter that tweet Arabic extremist-like phrases. A testing dataset is collected using the honeypot method, and five different heuristics are measured for their effectiveness in detecting bots. The model underperformed, but we have laid the ground-work for a vastly untapped focus on bot detection: extremist ideal diffusion through bots.

ContributorsKarlsrud, Mark C. (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Prediction of heat transport in multiple tokamak devices with neural networks

Description

The OMFIT (One Modeling Framework for Integrated Tasks) modeling environment and the BRAINFUSE module have been deployed on the PPPL (Princeton Plasma Physics Laboratory) computing cluster with modifications that have rendered the application of artificial neural networks (NNs) to the TRANSP databases for the JET (Joint European Torus), TFTR (Tokamak…

The OMFIT (One Modeling Framework for Integrated Tasks) modeling environment and the BRAINFUSE module have been deployed on the PPPL (Princeton Plasma Physics Laboratory) computing cluster with modifications that have rendered the application of artificial neural networks (NNs) to the TRANSP databases for the JET (Joint European Torus), TFTR (Tokamak Fusion Test Reactor), and NSTX (National Spherical Torus Experiment) devices possible through their use. This development has facilitated the investigation of NNs for predicting heat transport profiles in JET, TFTR, and NSTX, and has promoted additional investigations to discover how else NNs may be of use to scientists at PPPL. In applying NNs to the aforementioned devices for predicting heat transport, the primary goal of this endeavor is to reproduce the success shown in Meneghini et al. in using NNs for heat transport prediction in DIII-D. Being able to reproduce the results from is important because this in turn would provide scientists at PPPL with a quick and efficient toolset for reliably predicting heat transport profiles much faster than any existing computational methods allow; the progress towards this goal is outlined in this report, and potential additional applications of the NN framework are presented.

ContributorsLuna, Christopher Joseph (Author) / Tang, Wenbo (Thesis director) / Treacy, Michael (Committee member) / Orso, Meneghini (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Department of Physics (Contributor)

Created2015-05

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public.…

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

ContributorsMarshall, Grant A (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Optimal Modeling of Knots in Wood

Description

A model has been developed to modify Euler-Bernoulli beam theory for wooden beams, using visible properties of wood knot-defects. Treating knots in a beam as a system of two ellipses that change the local bending stiffness has been shown to improve the fit of a theoretical beam displacement function to…

A model has been developed to modify Euler-Bernoulli beam theory for wooden beams, using visible properties of wood knot-defects. Treating knots in a beam as a system of two ellipses that change the local bending stiffness has been shown to improve the fit of a theoretical beam displacement function to edge-line deflection data extracted from digital imagery of experimentally loaded beams. In addition, an Ellipse Logistic Model (ELM) has been proposed, using L1-regularized logistic regression, to predict the impact of a knot on the displacement of a beam. By classifying a knot as severely positive or negative, vs. mildly positive or negative, ELM can classify knots that lead to large changes to beam deflection, while not over-emphasizing knots that may not be a problem. Using ELM with a regression-fit Young's Modulus on three-point bending of Douglass Fir, it is possible estimate the effects a knot will have on the shape of the resulting displacement curve.

ContributorsSexton, Thurston Bryant (Author) / Takahashi, Timothy (Thesis director) / Jones, Donald (Committee member) / Barrett, The Honors College (Contributor) / Mechanical and Aerospace Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / School of Human Evolution and Social Change (Contributor)

Created2015-05

Phase Recovery and Unimodular Waveform Design

Description

In many systems, it is difficult or impossible to measure the phase of a signal. Direct recovery from magnitude is an ill-posed problem. Nevertheless, with a sufficiently large set of magnitude measurements, it is often possible to reconstruct the original signal using algorithms that implicitly impose regularization conditions on this…

In many systems, it is difficult or impossible to measure the phase of a signal. Direct recovery from magnitude is an ill-posed problem. Nevertheless, with a sufficiently large set of magnitude measurements, it is often possible to reconstruct the original signal using algorithms that implicitly impose regularization conditions on this ill-posed problem. Two such algorithms were examined: alternating projections, utilizing iterative Fourier transforms with manipulations performed in each domain on every iteration, and phase lifting, converting the problem to that of trace minimization, allowing for the use of convex optimization algorithms to perform the signal recovery. These recovery algorithms were compared on a basis of robustness as a function of signal-to-noise ratio. A second problem examined was that of unimodular polyphase radar waveform design. Under a finite signal energy constraint, the maximal energy return of a scene operator is obtained by transmitting the eigenvector of the scene Gramian associated with the largest eigenvalue. It is shown that if instead the problem is considered under a power constraint, a unimodular signal can be constructed starting from such an eigenvector that will have a greater return.

ContributorsJones, Scott Robert (Author) / Cochran, Douglas (Thesis director) / Diaz, Rodolfo (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2014-05

Utilizing Machine Learning Methods to Model Cryptocurrency

Description

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries…

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.

ContributorsBrowning, Jacob Christian (Author) / Meuth, Ryan (Thesis director) / Jones, Donald (Committee member) / McCulloch, Robert (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Machine Learning Enabled Analytics for Health-Related Demographics: a Case Study Identifying Important Factors in Cardiac Disease

Description

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently…

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently in the BioElectrical Systems and Technology Lab, there is a biosensor in development that retrieves and analyzes data manually. In a proof of concept, this project uses the neural network architecture to automatically parse and classify a cardiac disease data set as well as explore health related factors impacting cardiac disease in patients of all ages.

ContributorsMurella, Akhila Sainagaki (Author) / Blain-Christen, Jennifer (Thesis director) / Meuth, Ryan (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / School of the Arts, Media and Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Reddit Predicts Swings in the Stock Market: r/WorldNews and Using Machine Learning to Predict Changes in Stock Price

Description

In this paper, I will show that news headlines of global events can predict changes in stock price by using Machine Learning and eight years of data from r/WorldNews, a popular forum on Reddit.com. My data is confined to the top 25 daily posts on the forum, and due to…

In this paper, I will show that news headlines of global events can predict changes in stock price by using Machine Learning and eight years of data from r/WorldNews, a popular forum on Reddit.com. My data is confined to the top 25 daily posts on the forum, and due to the implicit filtering mechanism in the online community, these 25 posts are representative of the most popular news headlines and influential global events of the day. Hence, these posts shine a light on how large-scale social and political events affect the stock market. Using a Logistic Regression and a Naive Bayes classifier, I am able to predict with approximately 85% accuracy a binary change in stock price using term-feature vectors gathered from the news headlines. The accuracy, precision and recall results closely rival the best models in this field of research. In addition to the results, I will also describe the mathematical underpinnings of the two models; preceded by a general investigation of the intersection between the multiple academic disciplines related to this project. These range from social to computer science and from statistics to philosophy. The goal of this additional discussion is to further illustrate the interdisciplinary nature of the research and hopefully inspire a non-monolithic mindset when further investigations are pursued.

ContributorsPriniski, John Hunter (Author) / Haiyan, Wang (Thesis director) / Hazel, Kwon (Committee member) / School of Historical, Philosophical and Religious Studies (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Collaborative Computation in Self-Organizing Particle Systems

Description

Many forms of programmable matter have been proposed for various tasks. We use an abstract model of self-organizing particle systems for programmable matter which could be used for a variety of applications, including smart paint and coating materials for engineering or programmable cells for medical uses. Previous research using this…

Many forms of programmable matter have been proposed for various tasks. We use an abstract model of self-organizing particle systems for programmable matter which could be used for a variety of applications, including smart paint and coating materials for engineering or programmable cells for medical uses. Previous research using this model has focused on shape formation and other spatial configuration problems, including line formation, compression, and coating. In this work we study foundational computational tasks that exceed the capabilities of the individual constant memory particles described by the model. These tasks represent new ways to use these self-organizing systems, which, in conjunction with previous shape and configuration work, make the systems useful for a wider variety of tasks. We present an implementation of a counter using a line of particles, which makes it possible for the line of particles to count to and store values much larger than their individual capacities. We then present an algorithm that takes a matrix and a vector as input and then sets up and uses a rectangular block of particles to compute the matrix-vector multiplication. This setup also utilizes the counter implementation to store the resulting vector from the matrix-vector multiplication. Operations such as counting and matrix multiplication can leverage the distributed and dynamic nature of the self-organizing system to be more efficient and adaptable than on traditional linear computing hardware. Such computational tools also give the systems more power to make complex decisions when adapting to new situations or to analyze the data they collect, reducing reliance on a central controller for setup and output processing. Finally, we demonstrate an application of similar types of computations with self-organizing systems to image processing, with an implementation of an image edge detection algorithm.

ContributorsPorter, Alexandra Marie (Author) / Richa, Andrea (Thesis director) / Xue, Guoliang (Committee member) / School of Music (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Filtering by