Run docker-compose up -d
to build and run the images
Run the application using its jar file in out/artifacts/APP_NAME/
Run src/main/java/myapp/RatingsDriverTest.java
, change the INPUT_TOPIC
in the file based on the app you're testing.
Run RatingsDriver
with msstopk-scored-rated-movies-${DATASET}
, INPUT_THROUGHPUT
and DATASET
as arguments.
Then:
Run src/main/java/myapp/materializeScoreSort/PhysicalWindow/CentralizedMSSTopK.java
with TOPK
and DATASET
as arguments
Check when reading offset << writing offset
Run cleanLatencyFile.py CentralizedMSSTopK/dataset/topKK_latency_5s.txt TOPK
to get the final MaterializeSort/latency_5ms.csv
Run averageLatency.py CentralizedMSSTopK/dataset/topKK_latency_5s.txt
to get average Latency of the experiment.
Run RatingsDriver
with centralized-mintopk-scored-rated-movies-${DATASET}
, INPUT_THROUGHPUT
and DATASET
as arguments.
Then:
Run the experiment using the 5 datasets that you can find in dataset/
Run java -jar out/artifacts/minTopK_jar/kafka-stream-tutorial.jar "src/main/java/myapp/minTopK/minTopK.env" topK dataset
where topK and dataset are integer values.
Run cleanLatencyFile.py CentralizedMinTopK/topKK_latency_5ms.txt topK
to get the final CentralizedMinTopK/topKK_latency_5ms.csv
Run averageLatency.py CentralizedMinTopK/topKK_latency_5ms.txt
to get average Latency of the experiment.
Run the experiment using the 5 datasets that you can find in dataset/
Run java -jar out/artifacts/CentralizeMinTopKN_jar/CentralizeMinTopKN.jar "src/main/java/myapp/minTopKN/minTopKN.env" topK topN dataset
where topK, topN and dataset are integer values.
Run RatingsDriver
with centralized-mintopkn-scored-rated-movies
, INPUT_THROUGHPUT
and DATASET
as arguments.
Run cleanLatencyFile.py CentralizedMinTopKN/topKK_latency_5ms.txt topK
to get the final CentralizedMinTopKN/topKK_latency_5ms.csv
Run averageLatency.py CentralizedMinTopKN/topKK_latency_5ms.txt
to get average Latency of the experiment.
Load input data running RatingsDriver
with pdmss-scored-rated-movies-dataset${DATASET}
, INPUT_THROUGHPUT
and DATASET
as arguments.
Then:
Run 3 instances of src/main/java/myapp/distributedMaterializeScoreSort/PhysicalWindow/PhysicalWindowDistributedMSS ENV_FILE TOPK DATASET #INSTANCE
.
Run src/main/java/myapp/distributedMaterializeScoreSort/PhysicalWindow/PhysicalWindowCentralizedAggregatedSort ENV_FILE TOPK DATASET
.
Run both files with src/main/java/myapp/distributedMaterializeScoreSort/PhysicalWindow/physicalWindowDisMSS.env
as ENV_FILE
argument.
Run RatingsDriver
with dis-mintopk-scored-rated-movies-dataset${DATASET}
, INPUT_THROUGHPUT
and DATASET
as arguments.
Then:
Run 3 instances of src/main/java/myapp/distributedMinTopK/DistributedMinTopK.java ENV_FILE TOPK DATASET #INSTANCE
.
Run src/main/java/myapp/distributedMinTopK/CentralizedTopK.java ENV_FILE TOPK DATASET DisMinTopK
.
Run both files with src/main/java/myapp/distributedMinTopK/disMinTopK.env
as first argument.
Run 3 instances of src/main/java/myapp/distributedMinTopKN/DistributedMinTopKN.java ENV_FILE TOPK TOPN DATASET #INSTANCE
.
Run src/main/java/myapp/distributedMinTopK/CentralizedTopK.java ENV_FILE TOPK DATASET DisMinTopKN
.
Run both files with src/main/java/myapp/distributedMinTopKN/disMinTopKN.env
as first argument.
Run RatingsDriver
with dis-mintopkn-scored-rated-movies
, INPUT_THROUGHPUT
and DATASET
as arguments.
Run ./measurements.sh NUM_INSTANCES DATASET ALGO TOPK
to clean measurent files and compute distributed latency and total_time
ALGO parameter can be one betweem DisMSSTopK
and DisMinTopK
(For NUM_INSTANCES != 6 need to modify python script distributedLatency.py
)
Run python3 averageLatency.py CentralizedMinTopK/dataset0/500Krecords_1200_300_50K_latency_5s.csv CentralizedMinTopK/dataset1/500Krecords_1200_300_50K_latency_5s.csv CentralizedMinTopK/dataset2/500Krecords_1200_300_50K_latency_5s.csv CentralizedMinTopK/dataset3/500Krecords_1200_300_50K_latency_5s.csv CentralizedMinTopK/dataset4/500Krecords_1200_300_50K_latency_5s.csv
Run python3 measurementsPerDataset.py CentralizedMinTopK/dataset0/500Krecords_1200_300_50K_latency_5s.csv CentralizedMinTopK/dataset1/500Krecords_1200_300_50K_latency_5s.csv CentralizedMinTopK/dataset2/500Krecords_1200_300_50K_latency_5s.csv CentralizedMinTopK/dataset3/500Krecords_1200_300_50K_latency_5s.csv CentralizedMinTopK/dataset4/500Krecords_1200_300_50K_latency_5s.csv
Run python3 measurements.py CentralizedMinTopK/dataset0/500Krecords_1200_300_2K_average.csv CentralizedMinTopK/dataset0/500Krecords_1200_300_10K_average.csv CentralizedMinTopK/dataset0/500Krecords_1200_300_50K_average.csv
Run python3 latencyBoxPlotPerDataset.py CentralizedMinTopK/dataset0/500Krecords_1200_300_2K_latency_5s.csv CentralizedMinTopK/dataset1/500Krecords_1200_300_2K_latency_5s.csv CentralizedMinTopK/dataset2/500Krecords_1200_300_2K_latency_5s.csv CentralizedMinTopK/dataset3/500Krecords_1200_300_2K_latency_5s.csv CentralizedMinTopK/dataset4/500Krecords_1200_300_2K_latency_5s.csv TITLE
Run python3 plotBoxPlot.py CentralizedMinTopK/dataset0/500Krecords_1200_300_2K_average.csv CentralizedMinTopK/dataset0/500Krecords_1200_300_10K_average.csv CentralizedMinTopK/dataset0/500Krecords_1200_300_50K_average.csv TITLE
Run python3 totalTimeBoxPlotPerAlgo
with the total_times_6instances.csv
of each algorithms
Run python3 totalTimeBoxPlotPerInstances total_times_3instances.csv total_times_6instances.csv total_times_10instances.csv
Run python3 totalTimeBoxPlotPerTopK
with the total_times_6instances.csv
for each topk.