Giter Site home page Giter Site logo

nav9 / duplicatefilefinder Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 2.0 106 KB

This program finds duplicate files in a folder and its subfolders. Duplicates are moved to a separate folder. A few other modes of operation are also planned/available.

License: MIT License

Python 100.00%
duplicates python3

duplicatefilefinder's Introduction

Alt text
Alt text

Duplicate File Finder

  • File duplicates: This program finds duplicate files in a folder and its subfolders. Duplicates are moved to a separate folder. Files are not checked based on filenames. They are checked based on file size and chunks of binary data that the file is composed of. Unlike other programs that calculate the hashes of files (which can get very expensive for large files), this program starts with comparing small binary chunks, which also helps quickly eliminate files that don't match.
  • Image duplicates: There is also a mode to find image duplicates based on similarity of pixels (perceptual hash), even if the images are in different file formats (the same image present as png and jpg) or varied in size (Eg: the same image present as 640x480 and 1024x768). Image formats currently tested are jpg, png and webp. There is a lot more to be programmed to make this fully functional. The TODO's are mentioned in the code.
  • Line duplicates: If the same lines are present in multiple text files or a single text file, this program will take all unique lines and write them to a new file.
  • Deleting residual files: The program can be run in a second mode that searches for files of a specified name, and deletes them in the folder and subfolders of a specified directory. For example, it's useful in deleting multiple Thumbs.db files. The User is given a choice of whether the search should be case sensitive or not.
  • Undo option: After duplicate files are moved into a folder, if the user wants to undo the operation and restore all files to their original folders and filenames, it is possible by using an undo file that gets created during the duplciate finding. The user can also refer the undo file and manually undo.

To run the program

Python 3.9.6 was used. You could use PyEnv or Anaconda/Miniconda to install specific versions of Python. For Python and the pip packages, it isn't strictly necessary to install the specified versions. The versions have been specified to help prevent incompatibility issues that may be present in future versions. So by knowing what versions the working version of duplicateFileFinder used, you'll be able to run the program without hassles.

First install dependencies, using (the version numbers don't have to match exactly):
pip3 install imagehash==4.3.0 PySimpleGUI==4.60.3 Pillow==9.2.0

Then simply run the main file as:
python3 main.py

To run the tests

pip3 install pytest==7.1.2
Use pytest at the commandline to run the test cases. Do this in the root program directory.

TODO

These are mentioned in the code for now.

Known issues

  • Menus could be made more generic and multiple menus can be presented in a single window rather than have to go through multiple menus.
  • File/folder error handling needs to be more robust.
  • GUI aesthetics needs improvement.
  • Code needs to be broken further into discrete, standalone components.

Additional info

  • The GUI library used is PySimpleGUI.
  • Images are compared with what's called an "image hash". It's different from standard hashing algorithms. The imagehash algorithm documentation explains more about it. More info: https://pypi.org/project/ImageHash/

Donate

duplicatefilefinder's People

Contributors

nav9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

duplicatefilefinder's Issues

UnboundLocalError in duplicate image segregation

Traceback (most recent call last):
File "/home/bob/PycharmProjects/duplicateFileFinder/main/main.py", line 646, in
fileSearcher.search()
File "/home/bob/PycharmProjects/duplicateFileFinder/main/main.py", line 518, in search
self.markAlreadyProcessedFile(folderOrdinal, fileOrdinal)
UnboundLocalError: local variable 'fileOrdinal' referenced before assignment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.