Can reddit comments give a realtime gauge of market sentiment for particular stocks? How well can this be gauged? Note: This is a personal project to learn more about: Streaming, Containerization. Clusters, Data Modeling, Natural Language, Linear Regression and Visualization
- Project Overview
- Table of Contents
- Architecture
- Installation and Setup
- Improvements
- Acknowledgements
All applications in the above architecture are containerized into Docker containers, which are orchestrated by Kubernetes - and its infrastructure is managed by Terraform. The docker images for each application are available publically in the Docker Hub registry. Further details about each layer is provided below:
-
Data Ingestion : A containerized Python application called reddit_producer connects to Reddit API using credentials provided in the
.config/reddit_producer.cfg
file. It takes the received messages (reddit comments) and converts select messages into a JSON format. These transformed messages are then sent and stored in a Kafka broker. PRAW python library is used for interacting with Reddit API. -
Message Broker : The Kafka broker (kafkaservice pod), recieves messages from the reddit_producer. The Kafka broker is accompanied by the Kafdrop applicatino, which acts as Kafka mointoring tool through UI. When Kafka starts, another container named
kafkainit
creates the topicredditcomments
. The zookeeper pod is launched before Kafka for managing Kafka metadata. -
**Stream Processer : TODO
-
**Processed Data Storage : TODO
-
**Data Visualisation : TODO
- Finnhub Streaming Data Pipeline Project
- Docker Images -
- Libraries - Kafka-Python, PRAW
Thank you @nama1arpit for the inspo: https://github.com/nama1arpit/reddit-streaming-pipeline/blob/main/README.md?plain=1