Category Archives: Uncategorized

Removing more Image Duplicates

Dear Reader,

So whilst working through my cleaned up archive from last week I noticed that I still had duplicate image files!! I have no idea how this happened but in my cleaned up photo directory I have lots files with similar names and file sizes but are not identical…

I should say that they were there before I started my clean up so things are at least getting better.

$ ls -lhrt *2560*
-rw-r--r-- 1 grant grant 838K Apr 9 2009 img_2560.jpg
-rw-r--r-- 1 grant grant 836K Apr 9 2009 IMG_2560.JPG

When you look at the images they are pretty much the same. We can use the identify command from Imagemagick to look at them:

$ identify *2560*
img_2560.jpg JPEG 3264x2448 3264x2448+0+0 8-bit DirectClass 857KB 0.000u 0:00.000
IMG_2560.JPG[1] JPEG 3264x2448 3264x2448+0+0 8-bit DirectClass 855KB 0.000u 0:00.000

The photograph in question is of Venice but actually I have quite a lot of them of various places….IMG_2560

So what to do? After a bit of time with google I found a program called findimagedupes which analyses the images somewhat before deciding they are a match. It takes some time to run but the command I used was:

findimagedupes --fingerprints=imagedupes.dat -R ./ > listofdupes.txt

My listofdupes.txt files was 1614 lines long. This should save a lot of manual deletion! findimagedupes wasn’t perfect though – it matched some things that were not matches so I need to think carefully about what to do with the output… hmm.


Cleanup The Photograph Archive

Dear Reader,

Being a photographer of the “pray and spray” school of composition I have acquired a large number of photographs. I got my first digital camera in around 2004 and now have a collection of nearly 28,000 photographs. Some of them are even in focus.

Due to a number of botched imports many of these are duplicates and they resided in a directory tree that was truly a complete mess. I had three semi-nested directory structures:

  • Pictures/Year/Month/Day with most of the files from 2004 to 2015
  • Pictures/Pictures/Year/Month/Day with some of the files from 2006 to 2012
  • Pictures/Pictures/Photos/Year/Month/Day with more of the files from a random selection of years

Note that Shotwell will detect duplicate files on import – but I had managed to get a lot of files into the archive before I turned this on and I am not sure how well Shotwell deals with the same file with different names img88.jpg and img88-1.jpg or IMG88.JPG for example.

So this weekend I have been sorting this out.    After quite a bit of googling for me this was the step by step process:

  1. Back everything up – in case it all goes wrong.
  2. Set Shotwell to write tags to the files and to watch library location for new files.  This means all the labels will be attached to the pictures rather than Shotwell’s database avoiding potential confusion later.  Setting Shotwell watching the library location allows it to pick up on changes made automatically.   There might be a cleverer way of doing this by fiddling with the database.  My shotwell preferences looked like this:Screenshot from 2015-05-31 22:55:20
  3. For a large archive you need to wait a reasonable amount of time for Shotwell to do its thing…
  4. I then closed Shotwell so it didn’t get upset whilst I moved everything around 🙂
  5. Use fdupes to get rid of the duplicate files. The correction command for this is: fdupes -rdN Pictures/ Note that this removes files without prompting so you should check the output of fdupes -r Pictures/ to make sure that it is about to do what you want. Of course since we carried out step 1 (backup) we can recover from any failure.
  6. Clean up the directory tree using rsync. So basically here I made a new directory called cleanup and moved Pictures/Pictures and Pictures/Pictures/Photos into it. Then rsync -avuP cleanup/Pictures/ Pictures/ and rsync -avuP cleanup/Photos/ Pictures/ was used to sync up the two directory trees. I then deleted the cleanup directory having created a unified structure under Pictures.
  7. Restart Shotwell.  You need to wait quite a bit whilst it works out what you have done to the precious photographs under its guard.  At the bottom left corner you’ll see this chugging away….  Screenshot from 2015-05-31 23:01:15

After all this I ended up with an archive with zero duplicates and a clean directory structure.  The first is objectively useful.  The second is only useful if you interact with the directory tree by hand and since I am using Shotwell is much less important – but I feel better for it nonetheless.

Finding files by date using the GNU Find Command

So I have now acquired so many photographs (~20k) that Shotwell chokes a bit when I try to import more.  This is partly as I tend to let my memory cards fill up so it has to compare 2,000 photographs with the database of 20,000 to see if it already has any of them.  Still with my reluctance to delete anything means I need some GNU magic. 

find -mtime -7 -exec cp {} ~/Desktop/photoimport/ \;

Will find files in the current directory modified in the last seven days and copy them to the Desktop.  You can then copy them into the Shotwell directory and it is much quicker. 


I’ve Missed My Laptop

DSC_2062So after my laptop going technical I was left for three weeks without my computer!

Of course your laptop screen chooses to break on the day that you are giving four open day talks and you’ve lost the keys to your office….

O.k. so I took my annual leave right after that so I took my trusty Tablet S from Sony with me instead, thinking that this should probably be the future anyway. However I like the past for a few reasons:

  • I really missed my keyboard and my mouse.  I found the virtual keyboard slow and error prone and I missed the mouse as there is a clear distinction between position and action with my fat fingers I seem to often click buttons when I am simply trying to navigate around a page.  This can cause the odd outburst of swearing over slow internet connections when trying to enter complex data…
  • The Android Tablet pretends not to be a computer. This means the filesystem is abstracted away from the user.  This means it is much harder than it should be to do things like back up photographs from your camera.   To be fair a lot of software is doing this I often spend as much time looking for my boarding card download on PC as I did getting the PDF of the boarding card from the airline.
  • The Android Tablet doesn’t play DVDs.  When abroad I quite like to catch up with the DVDs that I have bought in shops but the stupidity that is region encoding means having a DVD player with an HDMI output is quite handy.  Messages like the one on the hotel supplied DVD player make me wonder when I don’t just pirate the stuff… DSC_2386
  • You need an app for everything.  Like navigating the file system to create a folder to backup your photographs…. but also for the stuff you do everyday like making notes, storing on-line passwords, creating a budget for the holiday, managing and uploading photographs.  Some of this could be solved by planning an preparation but as time goes on I want my IT to be simpler and simpler and only want to learn one program ever!
  • The Sony Tablet S has a really stupid custom power supply connector so you have to lug around a power brick and adaptor. Everything should be a standard supply in this day and age.  tabletspowersupply
  • Once you get back to mission control and want to copy files back to your computer your problems continue!  You connect the tablet via USB but then nothing happens…. This is down to the whole MTP thingy but still I had to had to check out OMG Ubuntu to get it done.

On the plus side my tablet is very easy to use on trains, planes and automobiles and comes with a game called angry birds where for no good reason I could determine a series of small birds (who looked angry) conducted suicidal attacks on pigs using a catapult and their own body weight.  Surprisingly fun.

Perhaps the future is actually in these touchscreen laptops which combine the best of both worlds…


Being old fashioned I normally type “All best wishes, Grant” at the end of each e-mail.  This is the sort of thing you ought to be able to include in your signature file so that you can save hours.

However by default Thunderbird puts two dashes above your signature so that mail clients can recognise this as your signature and format the mail appropriately.  This then looks a bit dumb if you have the message sign off in a different colour to the rest of the message.  To change this you can set: mail.identity.default.suppress_signature_separator to false as outlined on MozillaZine

Then you can include the two dashes into your signature file saving me typing four words per e-mail every day.  Since the 15th of June 2009 I have sent around 6403 e-mails (crikey!) so that should save me around 7631 words per year…..

Cleaning up EPS Files

Dear Reader,

Having been supplied with an eps file of a wonderful logo, I then found that putting it into beamer meant it looked like a dogs dinner.

eps2eps logo.eps

Cleans up the file considerably but turned the logo blue! Since there was only one colour in the logo I thought that hand editing the EPS file by hand was the way to go…

EPS appears to use an RGB Colour space: 102,51,102 is my institution’s colour. I looked for three space seperate numbers followed by rG and put in a new three numbers:

102 51 102 rG
2048.1 2443.73 m

epstopdf then converts to pdf for pdflatex and friends.

Having done all this the projector resolution was then really low so my slides looked like rubbish anyway…. but I did fell like more of a real man!

Colours in Beamer Presentations

If you are using beamer to do some presentations. You might like the base colour in a theme to be a particular colour, to match your logo for example.

The trick is to later the “structure” colour as follows:

% Change theme to Durham University Purple...