Matching Items (12)

Filtering by

Clear all filters

133482-Thumbnail Image.png

Utilizing Machine Learning Methods to Model Cryptocurrency

Description

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.

Contributors

Agent

Created

Date Created
2018-05

134011-Thumbnail Image.png

Machine Learning Enabled Analytics for Health-Related Demographics: a Case Study Identifying Important Factors in Cardiac Disease

Description

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently in the BioElectrical Systems and Technology Lab, there is a biosensor in development that retrieves and analyzes data manually. In a proof of concept, this project uses the neural network architecture to automatically parse and classify a cardiac disease data set as well as explore health related factors impacting cardiac disease in patients of all ages.

Contributors

Created

Date Created
2018-05

136271-Thumbnail Image.png

Prediction of heat transport in multiple tokamak devices with neural networks

Description

The OMFIT (One Modeling Framework for Integrated Tasks) modeling environment and the BRAINFUSE module have been deployed on the PPPL (Princeton Plasma Physics Laboratory) computing cluster with modifications that have rendered the application of artificial neural networks (NNs) to the

The OMFIT (One Modeling Framework for Integrated Tasks) modeling environment and the BRAINFUSE module have been deployed on the PPPL (Princeton Plasma Physics Laboratory) computing cluster with modifications that have rendered the application of artificial neural networks (NNs) to the TRANSP databases for the JET (Joint European Torus), TFTR (Tokamak Fusion Test Reactor), and NSTX (National Spherical Torus Experiment) devices possible through their use. This development has facilitated the investigation of NNs for predicting heat transport profiles in JET, TFTR, and NSTX, and has promoted additional investigations to discover how else NNs may be of use to scientists at PPPL. In applying NNs to the aforementioned devices for predicting heat transport, the primary goal of this endeavor is to reproduce the success shown in Meneghini et al. in using NNs for heat transport prediction in DIII-D. Being able to reproduce the results from is important because this in turn would provide scientists at PPPL with a quick and efficient toolset for reliably predicting heat transport profiles much faster than any existing computational methods allow; the progress towards this goal is outlined in this report, and potential additional applications of the NN framework are presented.

Contributors

Agent

Created

Date Created
2015-05

135771-Thumbnail Image.png

Using Facebook to Examine Smoking Behavior through ""Quit Smoking"" Support Groups

Description

Background: As the growth of social media platforms continues, the use of the constantly increasing amount of freely available, user-generated data they receive becomes of great importance. One apparent use of this content is public health surveillance; such as for

Background: As the growth of social media platforms continues, the use of the constantly increasing amount of freely available, user-generated data they receive becomes of great importance. One apparent use of this content is public health surveillance; such as for increasing understanding of substance abuse. In this study, Facebook was used to monitor nicotine addiction through the public support groups users can join to aid their quitting process. Objective: The main objective of this project was to gain a better understanding of the mechanisms of nicotine addiction online and provide content analysis of Facebook posts obtained from "quit smoking" support groups. Methods: Using the Facebook Application Programming Interface (API) for Python, a sample of 9,970 posts were collected in October 2015. Information regarding the user's name and the number of likes and comments they received on their post were also included. The posts crawled were then manually classified by one annotator into one of three categories: positive, negative, and neutral. Where positive posts are those that describe current quits, negative posts are those that discuss relapsing, and neutral posts are those that were not be used to train the classifiers, which include posts where users have yet to attempt a quit, ads, random questions, etc. For this project, the performance of two machine learning algorithms on a corpus of manually labeled Facebook posts were compared. The classification goal was to test the plausibility of creating a natural language processing machine learning classifier which could be used to distinguish between relapse (labeled negative) and quitting success (labeled positive) posts from a set of smoking related posts. Results: From the corpus of 9,970 posts that were manually labeled: 6,254 (62.7%) were labeled positive, 1,249 (12.5%) were labeled negative, and 2467 (24.8%) were labeled neutral. Since the posts labeled neutral are those which are irrelevant to the classification task, 7,503 posts were used to train the classifiers: 83.4% positive and 16.6% negative. The SVM classifier was 84.1% accurate and 84.1% precise, had a recall of 1, and an F-score of 0.914. The MNB classifier was 82.8% accurate and 82.8% precise, had a recall of 1, and an F-score of 0.906. Conclusions: From the Facebook surveillance results, a small peak is given into the behavior of those looking to quit smoking. Ultimately, what makes Facebook a great tool for public health surveillance is that it has an extremely large and diverse user base with information that is easily obtainable. This, and the fact that so many people are actually willing to use Facebook support groups to aid their quitting processes demonstrates that it can be used to learn a lot about quitting and smoking behavior.

Contributors

Agent

Created

Date Created
2016-05

136409-Thumbnail Image.png

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

Contributors

Created

Date Created
2015-05

136442-Thumbnail Image.png

Optimal Modeling of Knots in Wood

Description

A model has been developed to modify Euler-Bernoulli beam theory for wooden beams, using visible properties of wood knot-defects. Treating knots in a beam as a system of two ellipses that change the local bending stiffness has been shown to

A model has been developed to modify Euler-Bernoulli beam theory for wooden beams, using visible properties of wood knot-defects. Treating knots in a beam as a system of two ellipses that change the local bending stiffness has been shown to improve the fit of a theoretical beam displacement function to edge-line deflection data extracted from digital imagery of experimentally loaded beams. In addition, an Ellipse Logistic Model (ELM) has been proposed, using L1-regularized logistic regression, to predict the impact of a knot on the displacement of a beam. By classifying a knot as severely positive or negative, vs. mildly positive or negative, ELM can classify knots that lead to large changes to beam deflection, while not over-emphasizing knots that may not be a problem. Using ELM with a regression-fit Young's Modulus on three-point bending of Douglass Fir, it is possible estimate the effects a knot will have on the shape of the resulting displacement curve.

Contributors

Created

Date Created
2015-05

136516-Thumbnail Image.png

Categorizing and Discovering Social Bots

Description

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot detection, we are interested in bots on Twitter that tweet Arabic extremist-like phrases. A testing dataset is collected using the honeypot method, and five different heuristics are measured for their effectiveness in detecting bots. The model underperformed, but we have laid the ground-work for a vastly untapped focus on bot detection: extremist ideal diffusion through bots.

Contributors

Created

Date Created
2015-05

147666-Thumbnail Image.png

Learning the Diffusion Coefficient on a Cell Membrane

Description

A statistical method is proposed to learn what the diffusion coefficient is at any point in space of a cell membrane. The method used bayesian non-parametrics to learn this value. Learning the diffusion coefficient might be useful for understanding more about cellular dynamics.

Contributors

Agent

Created

Date Created
2021-05

131482-Thumbnail Image.png

Improving Peptide Identification in Shotgun Proteomics Using Deep Neural Networks

Description

In shotgun proteomics, liquid chromatography coupled to tandem mass spectrometry
(LC-MS/MS) is used to identify and quantify peptides and proteins. LC-MS/MS produces mass spectra, which must be searched by one or more engines, which employ
algorithms to match spectra to

In shotgun proteomics, liquid chromatography coupled to tandem mass spectrometry
(LC-MS/MS) is used to identify and quantify peptides and proteins. LC-MS/MS produces mass spectra, which must be searched by one or more engines, which employ
algorithms to match spectra to theoretical spectra derived from a reference database.
These engines identify and characterize proteins and their component peptides. By
training a convolutional neural network on a dataset of over 6 million MS/MS spectra
derived from human proteins, we aim to create a tool that can quickly and effectively
identify spectra as peptides prior to database searching. This can significantly reduce search space and thus run time for database searches, thereby accelerating LCMS/MS-based proteomics data acquisition. Additionally, by training neural networks
on labels derived from the search results of three different database search engines, we
aim to examine and compare which features are best identified by individual search
engines, a neural network, or a combination of these.

Contributors

Agent

Created

Date Created
2020-05

132368-Thumbnail Image.png

Moving Target Defense: Defending against Adversarial Defense

Description

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the reasons why particular combinations were more effective than others is explored.

Contributors

Agent

Created

Date Created
2019-05