Duplicate file cleanup

CloneSpy (Windows)


CloneSpy can help you free up hard drive space by detecting and removing duplicate files. Duplicate files have exactly the same contents regardless of their name, date, time and location. Also, CloneSpy is able to find files that are not exactly identical, but have the same file name, or the file size differs only a bit.

FSlint (Linux)


FSlint is an utility to fix problems with filesystems' data, like duplicate files
is a toolkit to clean filesystem

fdupes (Linux)

FDupes uses md5sums and then a byte by byte comparison to find duplicate files within a set of directories. It has several useful options including recursion. This is the fastest one.
sudo aptitude install fdupes

You can instruct fdupes to delete duplicate files automatically, but you can't be sure it will delete always from the system folder.
Therefore, you could execute the following command:
fdupes -r a/ b/ | grep -o "^b/.*" | xargs -d '\n' rm ; find b/ -empty -delete

Linux shell commands

In an Ubuntu forum, they worked out a pipe of shell commands, to generate the same output as fdupes. So you don't have to install any software, but it's also much more slower (see benchmarks below).
find . ! -empty -type f -printf "%s " -exec ls -dQ {} \; | sort -n | uniq -D -w 1 | \
cut -d" " -f2- | \
xargs md5sum | sort | \
uniq -w32 -d --all-repeated=separate | \
cut -c35-


Benchmarks

722 groups
994 duplicate files
6205 files

FSlint 10 minutes
fdupes 5,5 minutes
Piped commands 13 minutes

Hard links


If you don't want to remove duplicates, but save memory you can use hardlink (or fdupes) on Linux which detects multiple copies of the same file and replaces them with hardlinks.
For example in my case:
Files:    5935
Linked:   991 files
Compared: 4022 files
Saved:    4.87 GiB
Duration: 12.6 minutes