Search Content

Categorizing and Discovering Social Bots

Description

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot…

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot detection, we are interested in bots on Twitter that tweet Arabic extremist-like phrases. A testing dataset is collected using the honeypot method, and five different heuristics are measured for their effectiveness in detecting bots. The model underperformed, but we have laid the ground-work for a vastly untapped focus on bot detection: extremist ideal diffusion through bots.

ContributorsKarlsrud, Mark C. (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Prediction of heat transport in multiple tokamak devices with neural networks

Description

The OMFIT (One Modeling Framework for Integrated Tasks) modeling environment and the BRAINFUSE module have been deployed on the PPPL (Princeton Plasma Physics Laboratory) computing cluster with modifications that have rendered the application of artificial neural networks (NNs) to the TRANSP databases for the JET (Joint European Torus), TFTR (Tokamak…

The OMFIT (One Modeling Framework for Integrated Tasks) modeling environment and the BRAINFUSE module have been deployed on the PPPL (Princeton Plasma Physics Laboratory) computing cluster with modifications that have rendered the application of artificial neural networks (NNs) to the TRANSP databases for the JET (Joint European Torus), TFTR (Tokamak Fusion Test Reactor), and NSTX (National Spherical Torus Experiment) devices possible through their use. This development has facilitated the investigation of NNs for predicting heat transport profiles in JET, TFTR, and NSTX, and has promoted additional investigations to discover how else NNs may be of use to scientists at PPPL. In applying NNs to the aforementioned devices for predicting heat transport, the primary goal of this endeavor is to reproduce the success shown in Meneghini et al. in using NNs for heat transport prediction in DIII-D. Being able to reproduce the results from is important because this in turn would provide scientists at PPPL with a quick and efficient toolset for reliably predicting heat transport profiles much faster than any existing computational methods allow; the progress towards this goal is outlined in this report, and potential additional applications of the NN framework are presented.

ContributorsLuna, Christopher Joseph (Author) / Tang, Wenbo (Thesis director) / Treacy, Michael (Committee member) / Orso, Meneghini (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Department of Physics (Contributor)

Created2015-05

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public.…

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

ContributorsMarshall, Grant A (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Optimal Modeling of Knots in Wood

Description

A model has been developed to modify Euler-Bernoulli beam theory for wooden beams, using visible properties of wood knot-defects. Treating knots in a beam as a system of two ellipses that change the local bending stiffness has been shown to improve the fit of a theoretical beam displacement function to…

A model has been developed to modify Euler-Bernoulli beam theory for wooden beams, using visible properties of wood knot-defects. Treating knots in a beam as a system of two ellipses that change the local bending stiffness has been shown to improve the fit of a theoretical beam displacement function to edge-line deflection data extracted from digital imagery of experimentally loaded beams. In addition, an Ellipse Logistic Model (ELM) has been proposed, using L1-regularized logistic regression, to predict the impact of a knot on the displacement of a beam. By classifying a knot as severely positive or negative, vs. mildly positive or negative, ELM can classify knots that lead to large changes to beam deflection, while not over-emphasizing knots that may not be a problem. Using ELM with a regression-fit Young's Modulus on three-point bending of Douglass Fir, it is possible estimate the effects a knot will have on the shape of the resulting displacement curve.

ContributorsSexton, Thurston Bryant (Author) / Takahashi, Timothy (Thesis director) / Jones, Donald (Committee member) / Barrett, The Honors College (Contributor) / Mechanical and Aerospace Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / School of Human Evolution and Social Change (Contributor)

Created2015-05

Utilizing Machine Learning Methods to Model Cryptocurrency

Description

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries…

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.

ContributorsBrowning, Jacob Christian (Author) / Meuth, Ryan (Thesis director) / Jones, Donald (Committee member) / McCulloch, Robert (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Machine Learning Enabled Analytics for Health-Related Demographics: a Case Study Identifying Important Factors in Cardiac Disease

Description

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently…

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently in the BioElectrical Systems and Technology Lab, there is a biosensor in development that retrieves and analyzes data manually. In a proof of concept, this project uses the neural network architecture to automatically parse and classify a cardiac disease data set as well as explore health related factors impacting cardiac disease in patients of all ages.

ContributorsMurella, Akhila Sainagaki (Author) / Blain-Christen, Jennifer (Thesis director) / Meuth, Ryan (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / School of the Arts, Media and Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Reddit Predicts Swings in the Stock Market: r/WorldNews and Using Machine Learning to Predict Changes in Stock Price

Description

In this paper, I will show that news headlines of global events can predict changes in stock price by using Machine Learning and eight years of data from r/WorldNews, a popular forum on Reddit.com. My data is confined to the top 25 daily posts on the forum, and due to…

In this paper, I will show that news headlines of global events can predict changes in stock price by using Machine Learning and eight years of data from r/WorldNews, a popular forum on Reddit.com. My data is confined to the top 25 daily posts on the forum, and due to the implicit filtering mechanism in the online community, these 25 posts are representative of the most popular news headlines and influential global events of the day. Hence, these posts shine a light on how large-scale social and political events affect the stock market. Using a Logistic Regression and a Naive Bayes classifier, I am able to predict with approximately 85% accuracy a binary change in stock price using term-feature vectors gathered from the news headlines. The accuracy, precision and recall results closely rival the best models in this field of research. In addition to the results, I will also describe the mathematical underpinnings of the two models; preceded by a general investigation of the intersection between the multiple academic disciplines related to this project. These range from social to computer science and from statistics to philosophy. The goal of this additional discussion is to further illustrate the interdisciplinary nature of the research and hopefully inspire a non-monolithic mindset when further investigations are pursued.

ContributorsPriniski, John Hunter (Author) / Haiyan, Wang (Thesis director) / Hazel, Kwon (Committee member) / School of Historical, Philosophical and Religious Studies (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Using Facebook to Examine Smoking Behavior through ""Quit Smoking"" Support Groups

Description

Background: As the growth of social media platforms continues, the use of the constantly increasing amount of freely available, user-generated data they receive becomes of great importance. One apparent use of this content is public health surveillance; such as for increasing understanding of substance abuse. In this study, Facebook was…

Background: As the growth of social media platforms continues, the use of the constantly increasing amount of freely available, user-generated data they receive becomes of great importance. One apparent use of this content is public health surveillance; such as for increasing understanding of substance abuse. In this study, Facebook was used to monitor nicotine addiction through the public support groups users can join to aid their quitting process. Objective: The main objective of this project was to gain a better understanding of the mechanisms of nicotine addiction online and provide content analysis of Facebook posts obtained from "quit smoking" support groups. Methods: Using the Facebook Application Programming Interface (API) for Python, a sample of 9,970 posts were collected in October 2015. Information regarding the user's name and the number of likes and comments they received on their post were also included. The posts crawled were then manually classified by one annotator into one of three categories: positive, negative, and neutral. Where positive posts are those that describe current quits, negative posts are those that discuss relapsing, and neutral posts are those that were not be used to train the classifiers, which include posts where users have yet to attempt a quit, ads, random questions, etc. For this project, the performance of two machine learning algorithms on a corpus of manually labeled Facebook posts were compared. The classification goal was to test the plausibility of creating a natural language processing machine learning classifier which could be used to distinguish between relapse (labeled negative) and quitting success (labeled positive) posts from a set of smoking related posts. Results: From the corpus of 9,970 posts that were manually labeled: 6,254 (62.7%) were labeled positive, 1,249 (12.5%) were labeled negative, and 2467 (24.8%) were labeled neutral. Since the posts labeled neutral are those which are irrelevant to the classification task, 7,503 posts were used to train the classifiers: 83.4% positive and 16.6% negative. The SVM classifier was 84.1% accurate and 84.1% precise, had a recall of 1, and an F-score of 0.914. The MNB classifier was 82.8% accurate and 82.8% precise, had a recall of 1, and an F-score of 0.906. Conclusions: From the Facebook surveillance results, a small peak is given into the behavior of those looking to quit smoking. Ultimately, what makes Facebook a great tool for public health surveillance is that it has an extremely large and diverse user base with information that is easily obtainable. This, and the fact that so many people are actually willing to use Facebook support groups to aid their quitting processes demonstrates that it can be used to learn a lot about quitting and smoking behavior.

ContributorsMolina, Daniel Antonio (Author) / Li, Baoxin (Thesis director) / Tian, Qiongjie (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Development of a Game-Based Intervention to Promote HPV Vaccination Among Adolescents: A Qualitative Analysis

Description

Purpose: This qualitative research aimed to create a developmentally and gender-appropriate game-based intervention to promote Human Papillomavirus (HPV) vaccination in adolescents. Background: Ranking as the most common sexually transmitted infection, about 80 million Americans are currently infected by HPV, and it continues to increase with an estimated 14 million new…

Purpose: This qualitative research aimed to create a developmentally and gender-appropriate game-based intervention to promote Human Papillomavirus (HPV) vaccination in adolescents. Background: Ranking as the most common sexually transmitted infection, about 80 million Americans are currently infected by HPV, and it continues to increase with an estimated 14 million new cases yearly. Certain types of HPV have been significantly associated with cervical, vaginal, and vulvar cancers in women; penile cancers in men; and oropharyngeal and anal cancers in both men and women. Despite HPV vaccination being one of the most effective methods in preventing HPV-associated cancers, vaccination rates remain suboptimal in adolescents. Game-based intervention, a novel medium that is popular with adolescents, has been shown to be effective in promoting health behaviors. Methods: Sample/Sampling. We used purposeful sampling to recruit eight adolescent-parent dyads (N = 16) which represented both sexes (4 boys, 4 girls) and different racial/ethnic groups (White, Black, Latino, Asian American) in the United States. The inclusion criteria for the dyads were: (1) a child aged 11-14 years and his/her parent, and (2) ability to speak, read, write, and understand English. Procedure. After eligible families consented to their participation, semi-structured interviews (each 60-90 minutes long) were conducted with each adolescent-parent dyad in a quiet and private room. Each dyad received $50 to acknowledge their time and effort. Measure. The interview questions consisted of two parts: (a) those related to game design, functioning, and feasibility of implementation; (b) those related to theoretical constructs of the Health Belief Model (HBM) and the Theory of Planned Behavior (TPB). Data analysis. The interviews were audio-recorded with permission and manually transcribed into textual data. Two researchers confirmed the verbatim transcription. We use pre-developed codes to identify each participant’s responses and organize data and develop themes based on the HBM and TPB constructs. After the analysis was completed, three researchers in the team reviewed the results and discussed the discrepancies until a consensus is reached. Results: The findings suggested that the most common motivating factors for adolescents’ HPV vaccination were its effectiveness, benefits, convenience, affordable cost, reminders via text, and recommendation by a health care provider. Regarding the content included in the HPV game, participants suggested including information about who and when should receive the vaccine, what is HPV and the vaccination, what are the consequences if infected, the side effects of the vaccine, and where to receive the vaccine. The preferred game design elements were: 15 minutes long, stories about fighting or action, option to choose characters/avatars, motivating factors (i.e., rewards such as allowing users to advance levels and receive coins when correctly answering questions), use of a portable electronic device (e.g., tablet) to deliver the education. Participants were open to multiplayer function which assists in a facilitated conversation about HPV and the HPV vaccine. Overall, the participants concluded enthusiasm for an interactive yet engaging game-based intervention to learn about the HPV vaccine with the goal to increase HPV vaccination in adolescents. Implications: Tailored educational games have the potential to decrease the stigma of HPV and HPV vaccination, increasing communication between the adolescent, parent, and healthcare provider, as well as increase the overall HPV vaccination rate.

ContributorsBeaman, Abigail Marie (Author) / Chen, Angela Chia-Chen (Thesis director) / Amresh, Ashish (Committee member) / Edson College of Nursing and Health Innovation (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Comparison of Machine Learning Algorithms for Predicting Breast Cancer Malignancy

Description

Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized…

Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized images of breast tissue samples, called fine-needle aspirates. Breast cancer diagnosis typically involves a combination of mammography, ultrasound, and biopsy. However, machine learning algorithms can assist in the detection and diagnosis of breast cancer by analyzing large amounts of data and identifying patterns that may not be discernible to the human eye. By using these algorithms, healthcare professionals can potentially detect breast cancer at an earlier stage, leading to more effective treatment and better patient outcomes. The results showed that the gradient boosting classifier performed the best, achieving an accuracy of 96% on the test set. This indicates that this algorithm can be a useful tool for healthcare professionals in the early detection and diagnosis of breast cancer, potentially leading to improved patient outcomes.

ContributorsMallya, Aatmik (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Filtering by