Sentiment-based Stock Prediction: WallStreetBets Case Study 

Introduction:

This data product explores the use of machine learning techniques to analyze Reddit posts from the /WallStreetBets community and predict the fluctuations in the GME stock price. By employing both supervised and unsupervised learning methods, we uncover insights into the impact of these posts on stock prices, enabling better-informed trading decisions and the potential for improved investment strategies.

Problem Statement:

The goal of this data product is to develop and compare different machine learning models capable of predicting stock price fluctuations based on sentiment analysis of Reddit posts from the /WallStreetBets community. This would provide valuable insights for investors and traders interested in understanding the relationship between social media sentiment and stock market movements.

Data Collection:

Data was collected from Reddit's /WallStreetBets community, focusing on posts related to GME stock. The dataset includes post titles, post bodies, and stock price adjustments corresponding to the next day. This comprehensive dataset allows for a thorough analysis of the relationship between online discussions and stock price changes.

Methodology:

The methodology consists of three main parts, each crucial to building accurate and effective models for stock price prediction:

Results:

The study found that the K-Means Clustering model showed impressive predictive ability, with the sentiment of Reddit posts often corresponding to stock price changes several days ahead of time. On the other hand, Naive Bayes performed poorly in comparison. In supervised learning, Random Forest with BOW and title aggregation stood out, achieving approximately 83% accuracy.

Conclusion:

Both supervised and unsupervised learning methods have shown promise in predicting stock price fluctuations based on the sentiment analysis of Reddit posts from the /WallStreetBets community. Further refinement in preprocessing pipelines, word embedding methods, and training/classification resolution labeling may lead to improvements in the models' predictive abilities, making them even more useful for investment decision-making.

Applications:

This data product can be used by traders, investors, and financial institutions to gain insights into the impact of social media sentiment on stock prices. By understanding these relationships, users can make better-informed trading decisions, develop more effective investment strategies, and manage risk more efficiently.

Challenges & Future Work:


WSB Case Study