A 3D bounding box detection model for medical data. See pre-print: https://arxiv.org/abs/2312.07729
This model's architecture and input pipeline are based on the framework laid out by YOLOv5 at commit ed887b5976d94dc61fa3f7e8e07170623dc7d6ee
.
To increase transparency, compatible YOLOv5 code is reused as much as possible (found in folders without 3D in the name), and replaced code is usually written to follow the same structure as the code it replaces.
Some YOLOv5 functionality is currently unimplemented, such as download, logging, and certain augmentation routines.
To simplify future use with non-NIfTI data types the dataloader code features additional encapsulation to highlight which code needs to be edited and what transformations need to be done before feeding the data to the model. Similarly, default normalization routines for CT and MR modalities are provided alongside an explicit indicator showing where custom normalization functions should be added. This should make it more straightforward to use MedYOLO with novel modalities.
Tested with Python 3.8 and 3.11.
Install PyTorch with the appropriate CUDA version for your hardware before installing requirements.txt
, e.g. from the MedYOLO directory:
$ conda create --name MedYOLO python=3.11
$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
$ conda install --file requirements.txt -c pytorch
Because of the wide variety of different label formats generated by medical imaging annotation tools or used by public datasets a widely-useful solution for generating MedYOLO labels from existing labels is intractable. Though there are similarities between them, every new dataset tested has required a unique script to convert the existing labels into MedYOLO labels. As discussed below there is a script for going from MedYOLO labels/predictions to NIfTI masks, which is the easiest way to verify your conversion script was successful. With that in mind, here is a description of what format MedYOLO labels need to have and how the folders containing the data need to be organized.
Label format: Class-number Z-Center X-Center Y-Center Z-Length X-Length Y-Length
Center positions and edge lengths should be given as fractions of the whole, like in YOLOv5. Center coordinates should fall between -0.5 and 1.5.
Example label entry: 1 0.142 0.308 0.567 0.239 0.436 0.215
Image and label files should have the same filename except for their extension (e.g. image CT_001.nii.gz
corresponds to label CT_001.txt
).
Data yaml files can be modeled after /data/example.yaml
and should be saved in the /data/
folder.
Your NIfTI images and text labels should be organized into folders as follows:
|--- parent_directory
| |--- images
| | |--- train
| | | |--- train_img001.nii.gz
| | | |--- ...
| | |--- val
| | | |--- val_img001.nii.gz
| | | |--- ...
| |--- labels
| | |--- train
| | | |--- train_img001.txt
| | | |--- ...
| | |--- val
| | | |--- val_img001.txt
| | | |--- ...
Here /parent_directory/
is the path to your data.
It's recommended to store your data outside the MedYOLO directory.
$ python train.py --data example.yaml --adam --norm CT --epochs 1000 --patience 200 --device 0
This trains a MedYOLO model on the data found in example.yaml
using Adam as the optimizer and the default CT normalization.
The model will train on GPU:0
for up to 1000 epochs or until 200 epochs have passed without improvement.
By default, the small version of the model will be trained.
To train a larger model append --cfg /MedYOLO_directory/models3D/yolo3Dm.yaml
or the yaml file that corresponds to the model you'd like to train.
Larger models can be generated by modifying the depth_multiple
and width_multiple
parameters in the model yaml file.
By default, MedYOLO will use the hyperparameters for training from scratch.
These are the hyperparameters that have been used for every test so far.
To train using different hyperparameters append --hyp /MedYOLO_directory/data/hyps/hyp.finetune.yaml
or the yaml file that corresponds to the hyperparameters you'd like to use.
$ python detect.py --source /path_to_images/ --weights /path_to_model_weights/model_weights.pt --device 0 --save-txt
This runs inference on the images in /path_to_images/
with the model saved in model_weights.pt
using GPU:0
.
The model weights specify the model size so a model configuration yaml is not required.
Model predictions will be saved as txt files in the /runs/detect/exp/
directory.
By default, labels are not saved during inference.
Using the --save-txt
argument will save labels in the default directory, which can be changed by specifying the project
and name
arguments
By default, confidence levels are printed to the screen but not saved in .txt labels.
Use the --save-conf
argument to append the model's confidence level at the end of each saved label entry.
Configuring the max_det
, conf-thresh
, and iou-thresh
arguments alongside --save-conf
can be helpful when troubleshooting trained models.
/utils3D/nifti_utils.py
contains an example script for converting MedYOLO predictions into viewable NIfTI masks.
This can also be useful for verifying that your MedYOLO labels mark the correct positions before you begin training a model.
As with the label creation process, there are several ways you may want to use MedYOLO's predicted bounding boxes, so this can also be used as a schematic for interpreting your model's output.
Similarly to how YOLOv5 reshapes input data into a square shape under the hood, MedYOLO reshapes input data into cubic volumes. Unfortunately an equivalent to YOLOv5's rectangular training strategy is cumbersome in 3-D, so we currently use naive resampling which stretches input examples. This means images with dramatically different shapes will be distorted relative to each other when input to the model, and performance may differ for examples with outlier thicknesses.
This reshaping into cubes also means you should choose the imgsz
parameter carefully to balance GPU resources, input resolution, and batch size.
imgsz
350 and batch-size
8 were used for most tests, but imgsz
512 and batch-size
2 have also been tested.
Note that 350 is near the smallest possible size before PyTorch begins raising errors about mismatched layer sizes.
Depending on the available GPU resources, this reshaping will most likely reduce the X and Y resolution of your data or require you to use very small batches.
Pseudo-batching may resolve this to some extent, but is currently unimplemented.
Training from scratch has seen the best results with several hundred training examples available. Of note, one task saw remarkable improvement when going from ~400 training examples to ~650. On the other hand, fine-tuning onto related tasks has seen good results with ~60 training examples. In cases where your task has limited available data, pretraining your model on a related task with more data is advisable.
Despite heavily referencing the YOLOv5 detection framework during its creation, MedYOLO does not support 2-D input (such as photographs). Modification to support 2-D input is straightforward but inadvisable.