Skip to content

InBinaryWorld/DuplicateRemover

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation


Logo

License Top language Phyton version Contributors Last commit LinkedIn

Duplicate Remover

Store files and photos without duplicates,
take care of disk space and save your time.
Explore the docs ยป

Report Bug ยท Request Feature

Table of Contents

About The Project

All IT specialists know that people are divided into those who make backups and those who will do them.

But backing up data not always is an easy process especially if you care about their quality, and you don't have unlimited space on your backup drives.

How it started?

Making periodic backups of photos (and not only) from phone without cleaning memory (just moving DCIM directory to the backup disk) leads to storing multiple copies of the same images. The problem grows even when each family member wants to store their data in one place with you. Your family shares images with each other and the copies again end up on backup disks.

Over time, a mess appears on the disk, you don't know what you can delete, what not, how many copies of the same files you have, and you don't know how to find something usable. So you start merging all folders and manually browsing content to make some disk space and make files organized. But after an hour you realized that browsing thousands of files manually to remove duplicates seems impossible.

So I thought: "Why I waste my time doing it manually. Let's automate this process!"

And.... Here it is! Duplicate Remover improved with some useful functionalities!

Built With

Project was created with pure python using only included modules.

Getting Started

Getting started with this tool is extremely easy, so enjoy!

To use this tool, you first need to get:

  • Python in version 3.8 or newer.

Installation

  1. Clone the repo.
git clone https://github.com/InBinaryWorld/DuplicateRemover.git
  1. Go for it!
python ./DuplicateRemover/duplicate_remover.py

Usage

This script will allow you to keep your data clean. Described bellow functionalities may intertwine in different scenarios to give the user the best impression. At end of process, the user can check the statistics of the performed operation.

Remove duplicates

This tool allows you to clear your data from copies even if file names are different. It looks deep in provided directory and looking for duplicate also in subdirectories.

NOTE: To improve performance and memory usage, MD5 hashes are used to find similar files.

NOTE: MD5 hashes don't guarantee that files are also the same, so before each remove action binary comparisons of files with the same hashes are performed. That makes the tool safe.

Analyse and leave only new

Script ask fot two directories and remove files from only one of them. It analyzes one directory and then remove duplicated files from the other one.

It's especially useful when performing new backups and having already backup data. This functionality allows you to analyze the current state of the backup and based on this clear the new data directory before it will be appended.

Rename files

Support Tool - Useful when the user expects to standardize filenames. The user is prompted for the preferred prefix (ex, "IMG"), to which a suffix based on the last modified date will be added.

Example output file name: IMG_2021-09-25_13_19_32.jpg

NOTE: It supports files with the same last modification date by adding additional sequential number.

Make flat dir

Moves all files to root directory using Rename functionality.

Roadmap

See the issues panel for a list of proposed features.

Waiting for better times:

  • Live photo cleaner - Getting Apple Live Photos on the computer it creates two files (JPG and MOV). Sometimes it gets annoying when they're mixed with real movies causing a mess. The target is to remove .MOV files if found image with a corresponding name and .JPG extension.

  • Support to perform backups easily. Copy and removes files in the destination directory to make a perfect copy of a source without removing and copying all data to save disk life and user time (Master-Slave model).

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Find me on:

LinkedIn Github