+ dataReader
---- dataset_reader.py
+ json_helper
---- json_creator.py
---- json_info.py
---- json_parser.py
+ loss
---- loss_func.py
+ models
---- resnet.py
---- siamese.py
---- siamese2.py
---- Siamese_EfficientNet.py
---- config.py
---- evaluator.py
---- helper.py
---- json_creator_main.py
---- main.py
-
Dataset Directory: /data/mnist/
-
command : python json_creator_main.py
-
parameters :
-
[--train-json] :'path to train dataset folder'
-
[--train-output] : 'output filename/directory without .json'
-
[--num-train-classes] : 'number of training classes to take'
-
[--val-json] : 'path to val dataset folder'
-
[--val-output] : 'output filename/directory without .json'
-
[--num-val-classes] : 'number of validation classes to take'
-
[--test-json] : 'path to train dataset folder'
-
[--test-output] : 'output filename/directory without .json'
-
[--num-test-classes] : 'number of test classes to take'
Example: python json_creator_main.py --train-json 'mnist' --train-output 'training_dataset' --num-train-classes 30
This will create training_dataset.json in the default directory.
Validation and test dataset creation is same as above. Can be created simultaneously.By default it will pick classes that have more than 500 images and less than 1000 images.
To change this behaviour move to json_helper/json_creator.py Line No. 50
and edit the if condition. Has to be done manually. (TODO) -
Here is a list of available models for training:
- Siamese [Uses only convolution layers, no fully connected layers]
- SiameseNetwork [Uses convolution with a fully connected layer]
- SiameseEfficientNet [Uses efficientNet-b0 as feature extractor followed by a fully connected layer]
- ResNet50 [Uses resnet50 as a feature extractor followed by a fully connected layer]
- ResNet101 [Uses resnet101 as a feature extractor followed by a fully connected layer]
- ResNet152 [Uses resnet152 as a feature extractor followed by a fully connected layer]
- Model class definitions are in models folder. For any change required Please refer to that.
- All the models with fully connected layers has 5 output neurons. This provides the optimal value. Some other values like 1, 2, 5, 8, 16, 32 has been tried before.
- By default main.py will start training all the models. To change this behavious, please refer to main.py, line no. 146 . No command line argument added for this(TODO).
- image width
- image height
- learning rate
- epochs
- criterion
- train batch size
- validation batch size
- test batch size
- number of workers
- transform function
- optimizer
- learning rate scheduler
-
First activate the venv as described in 2.1.2 unless not activated
-
command python main.py
-
parameters :
- [--train-json] 'Directory of the training Json' (Required)
- [--val-json] 'Directory of the validation Json'
-
If no validation json is given, it will use the training dataset to create a validation dataset
-
Example: > python main.py --train-json 'dataset_train.json'
By default all the above mentioned models will be trained for 50 epochs using the parameters defined in config.py Models will be saved according to their names and the best validation loss Loss Curves will be saved according to the model names.
- command python test.py
- parameters :
- [--test-json] 'Directory of the testing Json file' (Required)
- [--model-name] 'Name of the Model' (Required)
- Output:
- test loss
- Classification report
- ROC Curves