Skip to content

Can reddit comments give a realtime gauge of sale for popular companies? Note: This is a personal project to learn more about: Streaming, Containerization. Clusters, Data Modeling, Natural Language, Linear Regression and Visualization

Notifications You must be signed in to change notification settings

floresrosas/sentiment-sale-prediction-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

sentiment-sale-prediction-pipeline

Project Overview

Can reddit comments give a realtime gauge of market sentiment for particular stocks? How well can this be gauged? Note: This is a personal project to learn more about: Streaming, Containerization. Clusters, Data Modeling, Natural Language, Linear Regression and Visualization

Table of Contents

Architecture

reddit_sentiment_analysis_pipeline_architecture

All applications in the above architecture are containerized into Docker containers, which are orchestrated by Kubernetes - and its infrastructure is managed by Terraform. The docker images for each application are available publically in the Docker Hub registry. Further details about each layer is provided below:

  1. Data Ingestion : A containerized Python application called reddit_producer connects to Reddit API using credentials provided in the .config/reddit_producer.cfg file. It takes the received messages (reddit comments) and converts select messages into a JSON format. These transformed messages are then sent and stored in a Kafka broker. PRAW python library is used for interacting with Reddit API.

  2. Message Broker : The Kafka broker (kafkaservice pod), recieves messages from the reddit_producer. The Kafka broker is accompanied by the Kafdrop applicatino, which acts as Kafka mointoring tool through UI. When Kafka starts, another container named kafkainit creates the topic redditcomments. The zookeeper pod is launched before Kafka for managing Kafka metadata.

  3. **Stream Processer : TODO

  4. **Processed Data Storage : TODO

  5. **Data Visualisation : TODO

Acknowledgements

  1. Finnhub Streaming Data Pipeline Project
  2. Docker Images -
  3. Libraries - Kafka-Python, PRAW

Thank you @nama1arpit for the inspo: https://github.com/nama1arpit/reddit-streaming-pipeline/blob/main/README.md?plain=1

About

Can reddit comments give a realtime gauge of sale for popular companies? Note: This is a personal project to learn more about: Streaming, Containerization. Clusters, Data Modeling, Natural Language, Linear Regression and Visualization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages