Dear Jane: What Is the Best Way to Read PDFs on a 6 Inch Screen
I’ve received a number of questions from readers about what to do with all the PDFs they’ve purchased over the years now that they are ready to migrate to a dedicated ereader. The fact is that there is no perfect conversion of PDFs -> 6″ screen. PDFs were made to be read on a larger screen and the best portable device out there to read PDFs is the iPad, hands down. However, if you can live with some formatting quirks, there are two free programs that will allow you to convert your PDFs to ePub or Mobi to be read on your favorite 6″ screen eink device.
The first program you need is Briss. The reason that you need a program like Briss is so that you can cut out the top of the ebook that contains the title or author and the bottom to cut out the page numbers or any other artifacts.
Here’s what a PDF, converted, looks like without a destructive crop:
There are plenty of other programs that do a PDF crop, but few of these create what is known as a destructive crop. A destructive crop is one that permanently reduces the margins of the PDF. Most programs do just a basic crop (even Adobe Acrobat) which means your original document is preserved. Take, for example, the preview program under Mac OS. There is a crop box and a media box. The crop box shows the cropped version but the original (media) version lurks behind. When you go and convert, the conversion program reads from the original but not the cropped version.
DESTRUCTIVE CROP
1. Preview (Mac ONLY)
Under Preview (Mac ONLY), you can create a destructive crop by cropping your image, selecting “print” and then “print as PDF” in the bottom left hand corner. Choose ” PS” which is postscript. Then you have to open the PS file and print as PDF again.
2. Briss, platform agnostic
Or download Briss. Briss is a small program that does not require installation and runs on any machine that has java. Simply download the folder and unpack the folder somewhere. Look for briss-0.0.10.jar file and double click.
A small dialog box will open and you need to select “Load File”. From there, navigate to the PDF you wish to crop. For simple PDFs like our books, Briss will usually create an image cluster for odd pages and another cluster for even pages. Draw a box around the area of the text by clicking the mouse button and holding and dragging.
If you are unhappy with the box you’ve drawn, simply right click. Right clicking on the purple box and the purple box will disappear and you can redraw your box. If you are happy with your selection, click “Crop PDF” and then a dialog box will open allowing you to save the cropped PDF.
The program automatically adds “cropped” to the name so you needn’t worry about overwriting your original PDF. Open your cropped PDF in your favorite PDF viewing program and make sure you have cropped the right image. From here, you can actually just transfer the PDF to your eink reader if your devices reads PDFs. Sony and Kindle both do. This is what a cropped PDF looks like on the Kindle without conversion (click for larger image):
As you can see the font size of the PDF on a 6″ screen is miniscule. It’s very hard to read. This is where the need for conversion comes into play. You should already have Calibre downloaded and installed but if you don’t, grab it here. Simply drag your cropped PDF onto the Calibre screen or use the Add Books button.
Once the book is in the library, select it with your mouse. You can choose to edit the metadata (author, title, publisher). When you are done editing the metadata, press “Convert books” button. Here you have the option to select ePub (for Sony, nook, iThings) or Mobi (Kindle) as the converted format.
REGULAR EXPRESSIONS
Now, if all you want to do is remove the header and footer text, you can use Calibre’s “Structure Detection” and regular expressions. Sometimes there is hidden text in the PDF (like a footer or a header) and the destructive crop will NOT remove the hidden text.
You will then need to use the Structure Detection option to remove the hidden text. Structure Detection is an option on the conversion page. This was even challenging for me. There is a tutorial here. Basically, for the page numbers, I use this in the footer:
(\d+ <br> <hr>)
For the header, I used this code:
(<A name=\d+>\s*</a>)(<i>Anne Calhoun </i><br>)|(<A name=\d+>\s*</a>)(<i>Liberating Lacey </i><br>)
The parentheses set off each grouping of text you want to remove. The “|” is an or instruction. So here I want to remove the (A name=2></a>) and the author’s name and the title. Use the “wand” to examine your PDF. You will want to pattern your regular expression off the PDF.
You can click “Test” to determine whether your expression is going to strip out the right text. The yellow highlighted text will be removed:
Regular Expression gives me a huge headache so I prefer to use the destructive crop when I can. However, whenever there is a PDF with this hidden text, you will almost always have to use Regular Expressions to remove the header and footer. Here are some Regular Expression shortcuts that might help you:
- (<A name=\d+>\s*</a>) = This will remove everything that starts with <A name and ends with </a>. The \d+ tells the program that you want to remove every digit whether it is 1 or 301 so it doesn’t matter if the code is <A name=1></a> or <A name=301></a> because the + is like wildcard and removes all numbers with <a name= before the number and </a> after the number.
- \n = end of line, used if you need to remove code that is on the next line. I.e.,
- \d+ = removes all numbers from 0 to infinity
- \s* = removes all whitespace characters (those are the blank spaces between words and letters, created with a spacebar usually)
- | = this is called a pipe or vertical bar. I use it to separate sets of regular expressions.
Example:
Anne Calhoun’s Liberating Lacey in PDF form from EC contained an alternating header with author name and title. Remember, the number changes every page:
<a name=”6″></a><em>Anne Calhoun </em>
<a name=”7″></a><em>Liberating Lacey </em>
and a footer with the page numbers:
6 <br>
<hr />
My reg expression is as follows.
1. Remove the author name:
(<a name=”\d+”></a>)(<i>Anne Calhoun </i><br>)
I used the A name code from above and simply copied the <i>Anne Calhoun </i><br> directly from the PDF. Press test and it is all highlighted.
2. Remove Title:
|(<A name=\d+>\s*</a>)(<i>Liberating Lacey </i><br>)
I use the | to separate the sets of text I am removing, use the A name code from above and copy the <i>Liberating Lacey </i><br> directly from the PDF. You could add a \s+ between “Lacey” and the </i> just to be on the safe side: <i>Liberating Lacey\s*</i><br>
3. Remove page numbers:
(\d+\s*<br>\n<hr>)
\d+ to remove the page number + \s* to remove any whitespaces + <br> copied from the PDF + \n because we are moving to a new line + <hr> copied from the PDF.
I know. This is hard. It’s hard for me too. Generally, it takes me some trial and error to figure out the right regular expression code. I hope this helps to start you on the road to demystifying that. I’m not at all experienced in this but I thought I would share what I little I do understand in hopes to help others. Obviously the folks at Mobile Read are far more experienced than I. The best thing to do is just break it down, line by line, letter/digit by letter/digit.
If you had Adobe Acrobat, you could simply use “Document > Header & Footer > Remove”. Adobe Acrobat, however, is $299. If you have a better suggestion for us PDF owners, I would love to hear it!
This is a great explanation! Thanks, Jane.
However, people living in Canada need to be aware that changing the format of ebooks that are DRM protected is actually now against the new digital copyright laws.
I hate PDFs so very much.
I tried to use Preview first but, as far as I could see, I’d have to do each page individually. The ARC’s 300 pages.
Then I used Briss, which was easy enough. Conversion to ePub via Calibre wasn’t pretty format-wise, though, so I loaded it to the Goodreader app on my iTouch. At least it’s big enough now so I can mostly make out the words. Not sure it’s ideal reading, though.
I think I’m just doomed as far as PDFs go.
Thank you so much for this. That dreaded header-in-file thing has cut in on my enjoyment of many a great book.
Thanks! Briss is just what I needed. Even though the Sony readers have PDF reflow, the results are not always good. So I usually just read PDFs in landscape mode without the reflow. Which is Ok, but the font is still a bit smaller than I like. Cropping the margins makes the font a bit bigger and more readable in landscape mode.
Thank you so much for this – bookmarked for reference!
I’ve never managed to get PDF ebooks to a readable state on my ereader yet, so this is massively useful.
It’s a damned shame that PDF publishers won’t take the extra few minutes to kick out PDFs *customized* for 6″ screens. It’s not that hard to do. And despite the fact Sony has offered a FREE guide for doing this, for YEARS, no one takes the time to do it.
Reference: Optimize PDFs For Sony Reader
Of course, now there’s the new lust-tastic Sony Pocket Touch with a 5″ screen, so a version for that should be done too.
@Shannon Stacey In preview, make sure you have a thumbnail view. Go to one page and make your crop box. Go over to the thumbnail view and select all. Then crop. That will crop all the pages.
It took me a while to get Briss running on my Windows 7 machine, but once I figured it out, it was awesome. Thanks so much, Jane. Needless to say, the cropped pdfs look even better on the Daily Edition’s 7″ screen.
I wasn’t able to get the program to run properly when I clicked on the file in Explorer (even after I associated it with Java), but it ran from the command line without any problems.
Am I the only one who thinks that the product name “Briss” is a hoot? What does Briss do? It trims off what’s not wanted! If you don’t think this is funny, spell it with one s and look it up!
I read on a Kindle (I have had all three 6″ models) and I find it simpler to convert the PDFs than to leave them as PDFs. The down side is the weird hyphenation (hyphen- ated words look funky), but I need to change font sizes, so I go with that. I have tried to read a PDF just by locking in landscape mode, but that was worse. The Amazon email conversion will convert PDFs but the subject line of the email must consist of only the word “convert.”
I use a program called PDFtoEPUB which handles both the cropping and the conversion, so I don’t have to use two programs. It normally costs $39.95 though at the moment it says on their site that authors can get it free of charge until Oct. 10.
I haven’t encountered hidden text yet, though, so I don’t know if this program handles it or not. I’m not sure what hidden text is. Typically I see the headers and footers on the crop screen and crop them out along with the page numbers.
This program is easy to use for the most part but there is one thing I find challenging about it which is that it gives me a list of the letters, numbers and other symbols and how they will look after conversion and I have to go through it and make sure it is converting all of them accurately. Sometimes it gets the fi’s and fl’s wrong and I have to correct it. Occasionally I don’t catch something and then it can be annoying.
BTW, I just googled “Briss program” and saw that someone claims that Briss may come bundled with spyware or adware.
No luck with Briss…I can’t open it on my computer. I always have that problem when dealing with zip files.
I was able to use regular expressions for one of my PDFs (great instructions!), but haven’t had any luck with the second one I tried. I’ll keep playing with it. Luckily I don’t have many PDFs.
Thanks for this. Even on the new Kindle, I don’t like the pdf reader, because if I magnify the text to fill the screen, I have to then scroll down to read all the page.
I use Mobipocket Creator to convert pdf’s, but even then you sometimes get the blasted headers, so Briss is a great find.
I have been able to enlarge the font size in pdf’s with Calibre. The result looked better on my kindle than a pdf-to-mobi conversion.
btw, the new Kindle automatically crops the margins for you when reading PDFs in landscape mode. On the PDFs I tested, each page was split into 3 sections with the right and left margins cropped, making straight text PDFs very readable in landscape mode. Still has the header/footer on each page, though.
I was able to get an activation code via e-mail for PDFtoEPUB, as mentioned by Janine, even though I claimed myself as “Other:Teacher” and not author. If they’re willing to give out 20.000 licenses it might be something other DA readers could try.
@Estara: Estara, that’s great news, since I saw on their site that PDFtoEPUB converts to Kindle format as well as for the Sony.
@Emma Cunningham: Emma, the C-32 bill that will change the DRM laws here in Canada has not been passed in parliament so it is NOT illegal to format shift in Canada. Maybe in the future, it will depend on whether the minority government lasts and if C-32 lasts (C-61 the last one regarding copyright, never made it through voting)
@Jane – you can actually do a destructive crop in adobe acrobat. Simply crop the pages, and then choose “examine document” and choose the options to delete hidden data. All done, headers and footers gone. As always won’t work if PDF is DRM’d.
It took me a while to get Briss running on my Windows 7 machine, but once I figured it out, it was awesome. Thanks so much, Jane. Needless to say, the cropped pdfs look even better on the Daily Edition’s 7? screen.
PDF to word mac
You really make it seem so easy with your presentation but I find this topic to
be actually something which I think I would never understand.
It seems too complex and extremely broad for me.
I am looking forward for your next post, I will try to get the hang
of it!
Excellent willing synthetic eye for detail and may foresee
issues just before they will happen.