In this project we developed a standalone application to perform real-time sentiment analysis of Twitter users related to keywords defined by the application user. The system exploits two frameworks for large-scale distributed computation of data such as Kafka and Spark Streaming to process the stream of Tweets generated by Twitter APIs.
Run Zookeper and Kakfa Server first:
zookeeper-server-start.sh /usr/local/kafka_2.11-2.0.0/config/zookeeper.properties
kafka-server-start.sh /usr/local/kafka_2.11-2.0.0/config/server.properties
Run the Web Server in the folder /src/ruby
:
ruby web-server.rb
In the browser, navigate to http://localhost:4567
, insert the
preferred keywords and start the stream processing.
Then run the Spark Streaming job int the folder /src/scala
:
run sbt
To track new keywords the stream must be stopped before and then started again.
-
Ruby 2.3.4
-
Scala 2.11.12
- Sinatra 2.0.4
-
Kafka 2.11-2.0.0
-
Spark 2.3.1
- Stanford CoreNLP 3.5.2
The system consists of:
-
A Kafka Producer that submit the input from the Twitter Stream APIs to the Kafka Brokers. It is written in Ruby (file:
/src/ruby/kafka-producer.rb
) -
A Kafka Consumer that creates a DirectStream from the Kafka distrbuted log. It is written in Scala
(file:/src/scala/spark-sentiment-analysis.scala
) -
A Spark Streaming Job to count the processed tweets and analyze their sentiment related to a specific topic. It is written in Scala
(file:/src/scala/spark-sentiment-analysis.scala
) -
A Sentiment Analyzer object to perform the Sentiment Analysis of tweets. It is written in Scala exploiting the CoreNLP Stanford library
(file:/src/scala/sentiment-analyzer.scala
) -
A Web Server to create a dynamic web interface to visualize the analysis and manage the stream of data. It is written in Ruby exploiting the Sinatra framework
(file:/src/scala/web-server.rb
)