amalthia: (Legend of the Seeker Kahlan beautiful)
[personal profile] amalthia
Portable Document Format (PDF) is a format used by many publishers as a final layout format. I use it at work when we send publications to a printer to print. PDF is great for making sure if I sent the file to five different printers each printing company would be able to create the same publication. "PDF is used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system." (wikipedia)



The thing that makes PDF so attractive to those sending documents to publishers is the very thing that makes PDFs challenging to use on portable devices with various screen sizes.

PDF is designed to preserve the print layout which includes margins, font size, font type, page size, and artwork layout. This is also the reason websites are not shared in PDF format. Imagine if every website had to make a PDF for all different sized monitor screens? Basically HTML re-flows the text to fit the size of the monitor screen or browser.

Portable reading devices start at 3-inches and go up to about 8-inches to give you a general idea of screen sizes we're talking about. The e-ink devices are generally 5-inches, 6-inches, and 8-inches. I read on a Sony PRS-505 which is 6-inches.

Now, when writers write a story in Microsoft Word or Open Office (or any other word processing program) the default size of the paper layout in the document is 8x11 with 1 inch margins on all sides. This is the case because business print out a lot of documents. Letterheads and reports are not generally printed out on 4x6 paper. They are printed out on 8x11 paper. Microsoft Word does give the users the option to change the page size of their document which comes in handy for my job because we sometimes have figures that are 11x17. But for fan fiction reading and writing purposes there has not been a need for anyone to change the page size in their word processing program. When an author hits Print to PDF in their Word program the PDF that is created is sized to 8x11.

This is important to know because keeping in mind that PDFs do not re-flow to fit the screen what happens when a person transfers a regular PDF to a device with a 6 inch screen is that the PDF tries to preserve the original formatting. So everything gets shrunk so all of page 1 of a 8x11 document will fit on a 4x6 screen this gives the user super tiny text that's nearly impossible to read as well as a very wide margin.

The Sony does have limited reflow capacity but it does not always work right. If I'm correct the Kindle does not have reflow on PDFs. So if you hit the magnify button to enlarge the text you get paragraphs that kind of look like.

"sentence one goes here
and then breaks
off
to start a new sentence here.
The paragraphs look ugly as
a result."

I've seen that on PDFs on my own device.

So when I say I do not like PDFs and think it's a godawful format for reading on ebook devices that is why.

The only way the PDF could possibly come close to looking good on a portable device is if the author follows the directions listed here...Conversions to PDF Now keeping in mind that e-ink devices have 3 main screen sizes this means the author would have to make a 5 inch version, a 6 inch, and if they were really nice a 8 inch too. This is a lot of work for any one person to do and is not practical at all.

This is why for ebook readers mobipocket and epub are the two main formats people use to read on their devices. Mobipocket and epub are like HTML for browsers. They reflow so the paragraphs look good at any font size. The images fit to the screen and the formats will work on 3 inch screens and on up to 22 inch computer monitor screens.

Having said all that there are times when PDFs do have an advantage over epub and mobipocket format. Someone brought up a good point in my last post that some character encoding for words with diacritics does not translate well into epub or mobipocket. There are ways to embed fonts onto ebook devices so they could be read properly but lets be serious the average internet user is not going to have time to mess with it and it's not as simple as hitting a few buttons. I'm hoping in future devices there is more language support for characters outside the English language.

However for the English written fan fiction fandom the PDF format is not my first choice of formats to read on portable devices.

That authors are now more willing to share PDFs of their stories is great for readers who use them. However, as a reader that reads on a Sony PRS-505 and knowing the limitations of the PDF format I doubt I'll ever download a PDF to read on my device or future e-ink devices.

This post is partly in response to something I'd read a few weeks ago at the spnanonmeme. Someone said that I supported or encouraged authors to share PDF formatted stories for the SPN BB. I just wanted to make sure it's very clear that I do not support PDF and I will never ask an author for a PDF version of their story. I don't actually think I have that much influence on how authors share their works. Otherwise there would be a lot more single file HTML versions out on the web.

I don't recommend PDF for anyone reading on portable devices (unless it's been specifically formatted for the screen size of the device you're using). The single file download options I do prefer are HTML or a Word Document. I appreciate all authors that provide either of these with their long stories that are posted to LJ.

Adobe has made some changes in the new Adobe Acrobat software to make creating PDFs more ebook friendly like adding tags to the PDF. I don't know how to do this personally. I'm not sure if anyone else even knows what I'm talking about except for people who work with PDFs professionally.

But the tagging does make it easier for PDFs to reflow to fit a screen. The problem is these options are not known to the majority of fan fiction writers who do make PDFs. I'm willing to bet based on the metadata I've seen on PDFs that most authors maybe use Print to PDF to create their PDFs.

You also need the paid for version of Adobe Acrobat to edit metadata in PDFs. This is probably why most PDFs have the weirdest author and titles when they are loaded onto my Sony PRS. Most authors do not know that the file name of their document is generally inserted into the title and that their MS Word Author names are inserted into the author section (this sometimes shows the author's real name). People reading PDFs on their computers would not see this unless they open go to Document Properties. But for navigating on portable devices the devices use the PDF metadata to sort the ebook.

PDFS are not going away because there is still an audience for them but I wanted this post to make clear how they work on portable devices of which a growing number of fans are buying.

For those who are interested in reading about ebook file formats I highly recommend [personal profile] elf's Ebook Formats guide

For my next ebook related post I was hoping to cover the topic of quality control by sharing all the mistakes I've made when making ebooks. I'm hoping by sharing my experiences it'll give others creating ebooks an idea of what to watch out for when they create their own ebook versions of fan fiction.

Tagging

Date: 2010-08-29 04:16 pm (UTC)
elf: Quote: She is too fond of books, and it has turned her brain (Fond of Books)
From: [personal profile] elf
...adding tags to the PDF. I don't know how to do this personally.

Conversion--not printing, but conversion--from Word 2003 or later will automatically tag the PDF. People who have Acrobat Pro (or possibly Acrobat Standard) instead of the free reader program can use the "Advanced--> Accessibility--> Add tags to document" feature, which works on most documents but occasionally fails due to bizarre font encodings. (There are some professional ebooks I can't tag.)

Books converted from InDesign are not automatically tagged; I don't know if this is an available option. Books converted by third-party software (PDFWriter and such) are almost never tagged.

Manual tagging is possible, but nightmarish. I say this as a person who loves line-by-line proofreading. It's like line-by-line proofreading, with an annoying UI and complex program options that aren't described anywhere. Oh, and if you do too many things without saving, Acrobat will crash & lose all your work. (Acrobat's instructions about tagging are "here's the dropdown; click 'yes' to continue.")

Tagging has two purposes:
1) If it works well, it allows much better reflow; it avoids those broken-line problems. (Often does not work that way for double-spaced docs; the auto-tag reads each line as a separate paragraph, and manual fixing is, erm, nightmarish. Would have to be done for every single line in the book.)

2) Allowing read-aloud programs to read the text properly. Again, it helps if the auto-tagging is done right, but the "each line is a paragraph" thing is probably less disruptive to this function than to reflow.

Purpose #2 is fairly irrelevant for novels (I believe the read-aloud programs will work on untagged documents; they just aren't as clear about things like chapter breaks); it can be important for charts & tables that need to be read in the right order. Also, tagging allows you to add alt text to images.

Re: Tagging

Date: 2010-08-29 07:24 pm (UTC)
elf: Quote: She is too fond of books, and it has turned her brain (Fond of Books)
From: [personal profile] elf
I think there are programs that will convert to PDF w/o Acrobat Pro. (OpenOffice, maybe?) I don't know the details 'cos I work with Acrobat. There are other programs that will print to PDF for free--things like CutePDF and other small programs, and I think the new Office monstrosity has all sorts of embedded PDF bits in it. (I'm told macs have auto-conversion built in to their office programs. I don't speak mac.)

Feel free to add any parts of my explanation that would help. :)

I used to work for a company that was all gung-ho on PDF tagging because about 8 years ago, accessibility standards for gov't documents changed, and they were all required to be accessible to screen readers; the company thought it'd get in on the ground floor of making accessible PDFs. It didn't work out that way--the tech is too weird & obscure & nonstandardized; most gov't documents just switched to "searchable, auto-tagged PDF" and completely ignored how *mangled* that was for anything based on scans.

Metadata

Date: 2010-08-29 04:20 pm (UTC)
elf: Rainbow sparkly fairy (Default)
From: [personal profile] elf
There are free programs that will edit PDF metadata. Also, any program that'll convert directly to PDF can have its own metadata edited, so the converted file has the correct info. (In Word, this is under "File--> Properties.")

Most people just don't know these options exist.

(Someday, I will write snarky RPF about m/m publishing houses, based entirely on whose name shows up as the "author" of their books.)

Re: Metadata

Date: 2010-08-29 07:20 pm (UTC)
elf: Quote: She is too fond of books, and it has turned her brain (Fond of Books)
From: [personal profile] elf
CutePDF is a print program, not a converter; it doesn't deal with metadata. (Had no idea it dropped existing metadata, but I'm not surprised.)

My current software quest: A portable PDF printer driver, so friends can convert web to PDF at work, where they're not allowed to install anything. (I'm told by geekfriends this may not be possible; drivers are apparently more touchy than that.)

http://www.softpedia.com/get/Office-tools/PDF/BeCyPDFMetaEdit.shtml is the program I wave around for PDF metadata editing; it also will remove metadata from some PDFs that Acrobat won't. So far this has been some Wowio books, and other locked things with weird embedded coding.

Re: Metadata

Date: 2010-08-29 07:37 pm (UTC)
elf: Rainbow sparkly fairy (Default)
From: [personal profile] elf
I keep thinking about putting together either a blog or just a series of "PDF info" posts. I'm mostly not sure what to include.

The main useful PDF function I haven't seen in free (or cheap) software is bookmark editing. There's some that'll remove bookmarks (which is brainless; printing to a new PDF will do that), and there's cheap programs that will split a big PDF by bookmarks; extracting a list of bookmarks and editing current ones both seem to require either Acro Pro or one of the *expensive* alternates. (Foxit Pro, maybe? I haven't worked with that one.)

Date: 2010-08-30 03:45 am (UTC)
yourlibrarian: Angel and Lindsey (SPN-ImInUrSPN-estarmuerta)
From: [personal profile] yourlibrarian
So everything gets shrunk so all of page 1 of a 8x11 document will fit on a 4x6 screen this gives the user super tiny text that's nearly impossible to read as well as a very wide margin.

Ah, so this is what happened when my PDFs failed to convert in Stanza. In my first try I had done a batch conversion and I suspect that a problem with one file led to a problem with almost all of them, and unusable transfers.

Most authors do not know that the file name of their document is generally inserted into the title and that their MS Word Author names are inserted into the author section (this sometimes shows the author's real name).

Another "Aha" there. I'd been wondering.

Date: 2010-09-23 03:43 pm (UTC)
potted_music: (Default)
From: [personal profile] potted_music
I second everything you said. Of course, copying the text from PDF to Word is always an option, but that can be done as easily from any website, and so defeats the purpose of providing a handy one-file version of the fic :-|

Date: 2010-09-26 10:45 am (UTC)
thismaz: (Default)
From: [personal profile] thismaz
yet people seem to want pdf. Personally, I prefer rtf, but for the graphics issue. I make both A4 and A5 versions of my stories, in an effort to make them more small screen friendly, but the A4 gets downloaded more than the A5.

Thank you for the link to elf's post. I will look into those alternatives.