Malware visualization in 2D using t-SNE
contents:
- sally.cfg : configuration file for the sally tool
- stoptokens.txt : to eliminate 00 and ??
- preprocess.sh and preprocess_test.sh : for preprocessing .bytes files
- MalwareFeatExtAndViz.ipynb : notebook to experiment with feature extraction, t-SNE and plots
- FeatSelectionTrainTest.ipynb : notebook that we used to generate the testing instances predictions using a t-SNE + SVM classifier
- MalwareFeatureExtraction-spark-databricks.ipynb : notebook that we used on databricks to experiment with a pyspark implementation.
- test_instances_predictions.csv : the predictions file of our late submission to kaggle (logloss = 0.1719). (A no-clue classifier scores 2.1972)