Filtering by
- Creators: Computer Science and Engineering Program
This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence Aware Dictionary for Sentiment Reasoning (VADER) natural language processing package to determine numerical polarities which represented positivity or negativity for a given stock ticker. These generated polarities were paired with stock metrics typically observed by stock analysts as the feature set for a Logistic Regression machine learning model. The model was trained on roughly 1500 major stocks to determine a binary classification between a “Buy” or “Not Buy” rating for each stock, and the results of the model were inserted into the back-end of the Agora Web UI which emulates search engine behavior specifically for stocks found in NYSE and NASDAQ. The model reported an accuracy of 82.5% and for most major stocks, the model’s prediction correlated with stock analysts’ ratings. Given the volatility of the stock market and the propensity for hive-mind behavior in online forums, the performance of the Logistic Regression model would benefit from incorporating historical stock data and more sources of opinion to balance any subjectivity in the model.
Since it doesn’t hurt to attempt to utilize feature extracted values to improve a model (if things don’t work out, one can always use their original features), the question may arise: how could the results of feature extraction on values such as sentiment affect a model’s ability to predict the movement of the stock market? This paper attempts to shine some light on to what the answer could be by deriving TextBlob sentiment values from Twitter data, and using Granger Causality Tests and logistic and linear regression to test if there exist a correlation or causation between the stock market and features extracted from public sentiment.