Observant readers will have spotted that I have been obsessing about ereaders recently and have just bought a Kobo Touch (see here for the details). I have, therefore. started to work on converting a stack of PDF files into EPUB format for reading on said device (don’t bother with native PDF support on any of these devices, the nature of PDF files means they always render in a particular page format so can not be read without scrolling on any portable device smaller than the paper for which they are intended). I immediately found out why people find it so difficult! I don’t really have a good solution (so far) but I at least understand the problem.

The problem with PDF files

Adobe designed PDF files (or Acrobat files as we should call them) to provide an electronic image of a page as it would appear on a printed copy. They are widely used for sending the equivalent of an electronic fax, and also for sending material to people when you want to not give them the source files.

Pages can be complex with multiple columns, images, tables. side and foot notes and so forth so easily converting a page representation of the same in to blocks of HTML (which is roughly what EPUB looks like to me) is not simple.

I would have hoped that simple books which are one column with maybe page numbers, chapter breaks and headers would be achievable. How wrong I was!

Freeware options for conversion

As far as I can tell the best (freeware) package out there is Calibre which has a simple interface, a tool for getting book data from the web to populate metadata and the ability to convert format and load onto your device. I pointed some PDF files and they seemed to work, this was until I got them to my Kobo…

To be fair to Calibre the help file says ‘PDF is unrealiable’ and the software is upgraded regularly (in fact I don’t have the newest version I notice) but it does weird things on conversion – odd page breaks, weird splitting of the word Chapter and inconsistently losing ‘ll’. Painful in the extreme.

I have had to fall back on something called Mobi Pocket Creator to convert PDF to PRC format which seems to do a reasonable job of handling the pure text. I then use Calibre to convert to EPUB. This of course means that the actual EPUB code produced is a bit inconsistent / un-obvious and the Table of Contents doesn’t happen any more.

I then need to edit the EPUB; I could spend more time in Calibre and tweak the conversion but I prefer to get hands on so I used an editor called Sigil which lets me get right into the code but of course the previous steps have tagged things for style and structure!

The moral of this tale?

To be fair I want something for nothing – lots of books in PDF converted for free to EPUB to allow me to read in comfort. Maybe a bit of pain in conversion is good for the soul?

I will let you know how this progresses in coming months!

