A while ago I was facing a problem with duplicate binary files on my machine. In the past ten years or so I have been collecting music and backing up music on my computer. Thing is After a while duplicate files appear. Various backups and "cheap disks" for use in the car caused my collection to grow with a exponential rate.

I looked around for a program that could remove these duplicate files from disk but found none. This inspired me to right duplicate. It is a command line tool that takes a directory as parameter and looks through that directory for duplicate files. I was developed for music but can be used on any binary or text file.

The algorithm is rather simple, except for the VCDIFF part of it. Basically it compares two files on size. If the sizes are equal, it does a binary diff using the xdelta api developed by Josh MacDonald. If the files are equal, on is removed.

Duplicate was developed on a Monolithic system and sould compile fine on linux or os x I haven't tried it on Windows yet.


Duplicate needs to be run more than once to remove all duplicate files.