Converting a PDF to ePUB format isn’t difficult, but getting quality results seems to be nigh on impossible.
DevelopRx assembled a team of crack engineers (actually just me) to tackle this issue and solve it using readily available open source software (a.k.a. FREE!).
The process described in this article is primarily for text-based PDFs with common headers and footers otherwise known as “books.”
A typical book-type PDF normal has chapter headings and page numbers. When these pages are converted to another format like ePUB, all these artifacts litter the text. Now, it’s possible to go in and edit all these strings but it’s very tedious.
Typical PDF with Header and Footer
Converted PDF in ePUB Format
The solution is to crop the PDF and chop off the headers and footers. Save that header-less, page-number-less file and then convert to ePUB. Seems so easy, no? But wait, there’s more!
When a PDF is cropped, the only thing that changes is the visible portion. This means you’ll still get all the headers and footers in the converted ePUB. I won’t tell you how long it took me to figure this out…
If you have Adobe Acrobat Pro (which you can rent for $15/month), you can do the cropping magic and actually easily delete the cropped area. Yay! Or you can…
Convert PDF to ePUB using Open Source Software
Run Briss to crop pages.
Briss is a simple Java program that runs by clicking the executable.
Select your PDF under File > Load File.
When the file is loaded, it will show all the pages overlapped on one another for even and odd pages. Usually the program detects the boundaries for you. You can also drag the corners to get the max page width to set the boundaries.
Under Rectangle > Select all and then under Action > Crop PDF
This will save the file with a _cropped extension: Lorum Ipsum_cropped.pdf
Run Libre Writer and open the cropped PDF in Writer. It may take awhile if it’s a large file.
Depending on your default font, you may get a file with words outside the boundaries.
This appears to be a font problem. The easiest way to solve it is to change the size of the pages:
Page > Properties
Change the width (not the height!) and set the margins to zero.
This put the text back into the boundaries and still leave the headers and the footers in the Gray Zone.
Export the file to PDF.
CALIBRE E-BOOK MANAGER
Open Calibre and drag over the latest exported PDF file.
Press Convert books > Output format: EPUB (upper right) > OK (bottom right)
When the little chipmunks stop running on the wheel in the bottom right of the main screen, the converted PDF in ePUB format should be in your library.
There may be a few line return errors but the rest should be…