So whilst working through my cleaned up archive from last week I noticed that I still had duplicate image files!! I have no idea how this happened but in my cleaned up photo directory I have lots files with similar names and file sizes but are not identical…
I should say that they were there before I started my clean up so things are at least getting better.
$ ls -lhrt *2560*
-rw-r--r-- 1 grant grant 838K Apr 9 2009 img_2560.jpg
-rw-r--r-- 1 grant grant 836K Apr 9 2009 IMG_2560.JPG
When you look at the images they are pretty much the same. We can use the identify command from Imagemagick to look at them:
$ identify *2560*
img_2560.jpg JPEG 3264x2448 3264x2448+0+0 8-bit DirectClass 857KB 0.000u 0:00.000
IMG_2560.JPG JPEG 3264x2448 3264x2448+0+0 8-bit DirectClass 855KB 0.000u 0:00.000
So what to do? After a bit of time with google I found a program called findimagedupes which analyses the images somewhat before deciding they are a match. It takes some time to run but the command I used was:
findimagedupes --fingerprints=imagedupes.dat -R ./ > listofdupes.txt
listofdupes.txt files was 1614 lines long. This should save a lot of manual deletion! findimagedupes wasn’t perfect though – it matched some things that were not matches so I need to think carefully about what to do with the output… hmm.