Monthly Archives: May 2015

Cleanup The Photograph Archive

Dear Reader,

Being a photographer of the “pray and spray” school of composition I have acquired a large number of photographs. I got my first digital camera in around 2004 and now have a collection of nearly 28,000 photographs. Some of them are even in focus.

Due to a number of botched imports many of these are duplicates and they resided in a directory tree that was truly a complete mess. I had three semi-nested directory structures:

  • Pictures/Year/Month/Day with most of the files from 2004 to 2015
  • Pictures/Pictures/Year/Month/Day with some of the files from 2006 to 2012
  • Pictures/Pictures/Photos/Year/Month/Day with more of the files from a random selection of years

Note that Shotwell will detect duplicate files on import – but I had managed to get a lot of files into the archive before I turned this on and I am not sure how well Shotwell deals with the same file with different names img88.jpg and img88-1.jpg or IMG88.JPG for example.

So this weekend I have been sorting this out.    After quite a bit of googling for me this was the step by step process:

  1. Back everything up – in case it all goes wrong.
  2. Set Shotwell to write tags to the files and to watch library location for new files.  This means all the labels will be attached to the pictures rather than Shotwell’s database avoiding potential confusion later.  Setting Shotwell watching the library location allows it to pick up on changes made automatically.   There might be a cleverer way of doing this by fiddling with the database.  My shotwell preferences looked like this:Screenshot from 2015-05-31 22:55:20
  3. For a large archive you need to wait a reasonable amount of time for Shotwell to do its thing…
  4. I then closed Shotwell so it didn’t get upset whilst I moved everything around 🙂
  5. Use fdupes to get rid of the duplicate files. The correction command for this is: fdupes -rdN Pictures/ Note that this removes files without prompting so you should check the output of fdupes -r Pictures/ to make sure that it is about to do what you want. Of course since we carried out step 1 (backup) we can recover from any failure.
  6. Clean up the directory tree using rsync. So basically here I made a new directory called cleanup and moved Pictures/Pictures and Pictures/Pictures/Photos into it. Then rsync -avuP cleanup/Pictures/ Pictures/ and rsync -avuP cleanup/Photos/ Pictures/ was used to sync up the two directory trees. I then deleted the cleanup directory having created a unified structure under Pictures.
  7. Restart Shotwell.  You need to wait quite a bit whilst it works out what you have done to the precious photographs under its guard.  At the bottom left corner you’ll see this chugging away….  Screenshot from 2015-05-31 23:01:15

After all this I ended up with an archive with zero duplicates and a clean directory structure.  The first is objectively useful.  The second is only useful if you interact with the directory tree by hand and since I am using Shotwell is much less important – but I feel better for it nonetheless.