Giter Site home page Giter Site logo

duff's Introduction

duff - Duplicate file finder
============================

0. Introduction
===============

Duff is a command-line utility for identifying duplicates in a given set of
files.  It attempts to be usably fast, and uses SHA1 checksums as a part of
the comparisons.

The project website is here:

  http://duff.sourceforge.net/

Duff resides in public CVS on cvs.sourceforge.net.  The CVSROOT for anonymous,
read-only access is:

  :pserver:[email protected]:/cvsroot/duff

The CVS module for duff 0.x is `duff'.

The version numbering scheme for duff is as follows:

 * The first number is the major version.  This will be updated upon what the
   author considers a round of feature completion.  The only feature currently
   missing for the next major release is i18n.

 * The second number is the minor version number.  This is updated for releases
   that include minor new features, or features that do not change the
   functionality of the program.

 * The third number, if present, is the bugfix release number.  This indicates
   a release which only fixes bugs present in a previous major or minor release.


1. License and copyright
========================

Duff is copyright (c) 2005 Camilla Berglund <[email protected]>

Duff is licensed under the zlib/libpng license.  See the file `COPYING' for
license details.  The license is also included at the top of each source file.

Duff contains sha1-asaddi.
Copyright (c) 2001-2003 Allan Saddi <[email protected]>
See the files `sha1.c' or `sha1.h' for license details.


2. Project news
===============

See the file `NEWS'.


3. Building Duff
================

If you got this source tree from a CVS repository, you will need to bootstrap
the build environment using `bootstrap.sh'.  Note that this script requires
autoconf and automake to run.

If (or once) you have a `configure' script, go ahead and run it.  No additional
magic should be required.  If it is, then that's a bug and should be reported.

This release of duff has been successfully built on the following systems:

  Arch Linux x86
  Darwin 7.9.0 powerpc
  Debian Etch powerpc
  Debian Sarge alpha
  FreeBSD 4.11 x86
  FreeBSD 5.4 x86
  NetBSD 1.6.1 sparc
  SunOS 5.9 sparc64
  Ubuntu Breezy x86

Earlier releases have been successfully built on the following systems:

  Arch Linux x86
  Darwin 7.9.0 powerpc
  Debian Etch powerpc
  Debian Sarge alpha
  FreeBSD 4.11 x86
  FreeBSD 5.4 x86
  SunOS 5.9 sparc64

The tools used were gcc and GNU or BSD make.  However, it should build on most
Unix systems without modifications.


4. Installing Duff
==================

See the file `INSTALL'.


5. Using Duff
=============

See the accompanying manpage duff(1).

To read the manpage before installation, use the following command:

  groff -mdoc -Tascii duff.1 | less -R

On Linux systems, however, the following command may suffice:

  man -l duff.1


6. Hacking Duff
===============

See the file `HACKING'.


7. Bugs, feedback and patches
=============================

Please send bug reports, feedback, patches and cookies to:
Camilla Berglund <[email protected]>

For more involved discussions, please join the mailing list:
http://lists.sourceforge.net/lists/listinfo/duff-devel


8. Disambiguation
=================

This is duff, the Unix command-line utility, and not DUFF, the Windows program.
If you wish to find duplicate files on Windows, use DUFF.


9. Release history
===================

Version 0.1 was named `duplicate', and was never released anywhere.

Version 0.2 was the first release named duff.  It lacked a real checksumming
algorithm, and was thus only released to a few individuals, during the first
half of 2005.

Version 0.3 was the first official release, on November 22, 2005, after a
prolonged search for a suitably licensed implementation of SHA1.

Version 0.3.1 was a bugfix release, on November 27, 2005, adding a single
feature (-z), which just happened to get included.

Version 0.4 was the second feature release, on January 13, 2006, adding a
number of missing and/or requested features as well as bug fixes.  It was the
first release to be considered stable and safe enough for everyday use.

Version 0.5 improves the algorithm that searches for duplicates by
sorting the list of entries.  The changes to this version were contributed
by James Craig Burley <[email protected]>.

duff's People

Stargazers

 avatar Mark Bucklin avatar James Craig Burley avatar

Watchers

James Cloos avatar

duff's Issues

"Stable" sort not necessarily stable

duffdriver.c's cmpentryp() uses the relationship between the left and right pointers to determine an ordering if nothing else differs. The idea here is to provide a "stable" sort, so the order of output (reporting), within a set of duplicates, is the same as the input ordering.

Whether this is even needed, I don't know. In any case, I'm concerned that qsort() might be moving those pointers around. To be safe, something intrinsic to the relative Entry objects should be used -- perhaps something as simple as a monotonically increasing counter so each Entry has a unique ID.

Check all malloc return values for NULL

Possibly tolerate out-of-memory conditions by backing off some aggressive optimizations? But at least don't blindly proceed assuming it always returns a non-NULL value.

Extend to support "internal" files?

This would mean allowing e.g. an email to be pulled apart and the individual attachments treated as (virtual) entries; ditto a tarball, a compressed file, a compressed tarball, etc.

(enhancement request) Order by pathname (for excess mode)?

I rely on duff a lot and often find that the file that appears in the excess mode is not the copy of the file I wish to delete (one file has been sorted into the right subfolder, the other has not).

I try to manipulate the names of the subdirectories to force the order to be how I want, but I don't think it can be fully controlled.

How can I control which copy appears in excess mode? If there is no such control, how difficult would it be to introduce such a feature?

join-duplicates.sh needs some TLC

Document that the new version passes the -t (thorough) option, since that makes sense for a script that automatically deletes "duplicates". Install as chmod +x. Document it on the duff man page.

Do smart stuff to make things go faster

Version 0.5 (RC1) does duplicate-entry detection via qsort for better performance; this should make the biggest single improvement reasonably possible against 0.4.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.