CloneSpy (Windows)
CloneSpy can help you free up hard drive space by detecting and removing duplicate files. Duplicate files have exactly the same contents regardless of their name, date, time and location. Also, CloneSpy is able to find files that are not exactly identical, but have the same file name, or the file size differs only a bit.
FSlint (Linux)
FSlint is an utility to fix problems with filesystems' data, like duplicate files
is a toolkit to clean filesystem
fdupes (Linux)
FDupes uses md5sums and then a byte by byte comparison to find duplicate files within a set of directories. It has several useful options including recursion. This is the fastest one.sudo aptitude install fdupes
You can instruct fdupes to delete duplicate files automatically, but you can't be sure it will delete always from the system folder.
Therefore, you could execute the following command:
fdupes -r a/ b/ | grep -o "^b/.*" | xargs -d '\n' rm ; find b/ -empty -delete
Linux shell commands
In an Ubuntu forum, they worked out a pipe of shell commands, to generate the same output as fdupes. So you don't have to install any software, but it's also much more slower (see benchmarks below).find . ! -empty -type f -printf "%s " -exec ls -dQ {} \; | sort -n | uniq -D -w 1 | \ cut -d" " -f2- | \ xargs md5sum | sort | \ uniq -w32 -d --all-repeated=separate | \ cut -c35-
Benchmarks
722 groups994 duplicate files
6205 files
FSlint 10 minutes
fdupes 5,5 minutes
Piped commands 13 minutes
Hard links
If you don't want to remove duplicates, but save memory you can use hardlink (or fdupes) on Linux which detects multiple copies of the same file and replaces them with hardlinks.
For example in my case:
Files: 5935 Linked: 991 files Compared: 4022 files Saved: 4.87 GiB Duration: 12.6 minutes