This mini project was motivated by seeing in this and this notebook that one of the simplest classification models, logistic regression, on raw image features gives on-par performance with vanilla CNNs. We'll look into why this might be the case.
The data we use here comes from Kaggle, originally sourced under the ISIC competitions. Here is a sample from our training set, 10 benign and 10 malignant images:
TLDR: Logistic Regression achieves 71% test accuracy on the malignant/benign classification task out of the box, based on b/w images and balanced data. To investigate how this is possible, we look at various pixel statistics and feature importance measures and arrive at the conclusion that (for this data set) pixel intensity at the corner of the images provides a strong predictor for the target.