Skip to content

slideMoveNewWSI

Sander W. van der Laan edited this page Oct 15, 2024 · 2 revisions

slideMoveNewWSI.py is a Python script designed to move Whole Slide Imaging (WSI) files (e.g., .ndpi, .TIF) from an input folder to a destination folder based on a specific study type (e.g., AE) and stain (e.g., CD34). The script detects and prioritizes duplicate files according to predefined criteria, ensuring that duplicates are managed efficiently by moving less prioritized files to a backup folder (_backup_duplicates) and keeping the most relevant file for each study number in the destination folder.

The script can also log its operations and metadata to a specified log file, offering dry-run and verbose options for more control and visibility over the process.

Options and Arguments:

Option Description
--input, -i Required. The folder where the input files (e.g., .ndpi or .TIF) are located.
--study-type, -t Required. The study type prefix (e.g., AE). Files must start with this prefix to be processed.
--stain, -s Required. The stain name (e.g., CD34). Files must contain this string to be processed.
--destination, -d Required. The folder where files will be moved. A subdirectory _duplicates will be created to hold duplicate files.
--log, -l Required. The log file prefix for naming log files. The log will include information about moved files and prioritization, saved in _duplicates.
--dry-run, -n Optional. Perform a dry run where no actual file operations are performed. Actions are reported to the terminal.
--verbose, -v Optional. Print detailed information about each file operation and prioritization process.
--help, -h Optional. Show help message and usage instructions.

Example Usage:

python slideMoveNewWSI.py --input /path/to/input \
                          --study-type AE \
                          --stain CD34 \
                          --destination /path/to/destination \
                          --log 20241015 \
                          --verbose

In this example, the script will:

  1. Search /path/to/input for .ndpi or .TIF files related to the AE study type and CD34 stain.
  2. Move files to /path/to/destination, with duplicates stored in /path/to/destination/_duplicates.
  3. Log actions to a file named 20241015.AE.CD34.movenewwsi.log and save metadata to 20241015.AE.CD34.movenewwsi.metadata.csv.

Duplicate Handling and Prioritization Logic:

The script uses the following criteria to prioritize files:

  • Preferred file type: .ndpi files are preferred over .TIF.
  • Creation date: The newest file is prioritized.
  • Checksum and size: If multiple files have the same type and creation date, the largest file by size is kept.
Clone this wiki locally