Skip to main content

Create PDFs from DOC, not DOCX Files

We learned a lesson tonight when I was trying to submit a script to a production company: PDFs from DOC files are much, much smaller than PDFs generated from DOCX files.

Microsoft Word migrated from the familiar ".DOC" format of Word 97-2004 with the release of Word 2007/2008 (Windows/OS X). I recall the painful transition from Word 95 to Word 97, but nothing has compared to the nightmare that is the DOCX "Office XML" file format. I appreciate the idea of XML-based documents. Unfortunately, Microsoft's DOCX seems to cause a fair amount of pain.

The 101-page script stored as a DOCX refused to convert to a compressed and optimized PDF with Acrobat Distiller, Acrobat Pro, or Apple's built-in PDF driver. This left me able to create only an uncompressed PDF. The file was 62 megabytes! A 184 kilobyte document exploded to 62MB… and it couldn't be emailed through our server.

Saving the document as a DOC file, the document grew to 214KB, a bit larger than the DOCX. However, when a PDF was generated it was only 800KB. Not that 800KB is great, but it is much better than megabytes of bloat.

I often tell my students to save documents in DOC format, instead of DOCX, if they intend to email a document. I never considered that the DOC/DOCX differences would affect PDF output.

In trying to "help" the layout, Microsoft's DOCX format includes a lot of redundant font and layout information. Although I didn't have any graphics in my script, the DOCX format also links to higher resolution images than the DOC format supports. I examined the PDF output from Word 2011 (OS X) and discovered nearly 100 font "embed" occurrences. The problem is that Word styles are assigned multiple times — for no apparent reason.

My script template uses six major paragraph styles. In DOC, HTML, or RTF files, the styles would be defined once, at the top of the document. But, that's not the DOCX way.

You might imagine "Character Name" would be a single style that is assigned to all paragraphs that are used to mark when a character speaks. But, no, Microsoft's DOCX included two dozen "Character Name" styles, each assigned to varying number of paragraphs. It makes no sense at all to me. During the PDF creation, it seems fonts are embedded repeatedly with the styles. I'd have to do some forensic work to discover what is happening in greater detail.

No matter what the cause, the best way to create a PDF from Word appears to be saving a document as a "DOC" file first.

I get that hard drives are cheap and broadband is fast, but that's no defense for lousy file formats. More is not always better, as Microsoft's bloated file formats constantly demonstrate. Unfortunately, Microsoft's bloat adds to Adobe's bloat.

Comments

Popular posts from this blog

Comic Sans Is (Generally) Lousy: Letters and Reading Challenges

Specimen of the typeface Comic Sans. (Photo credit: Wikipedia) Personally, I support everyone being able to type and read in whatever typefaces individuals prefer. If you like Comic Sans, then change the font while you type or read online content. If you like Helvetica, use that.

The digital world is not print. You can change typefaces. You can change their sizes. You can change colors. There is no reason to argue over what you use to type or to read as long as I can use typefaces that I like.

Now, as a design researcher? I'll tell you that type matters a lot to both the biological act of reading and the psychological act of constructing meaning. Statistically, there are "better" and "worse" type for conveying messages. There are also typefaces that are more legible and more readable. Sometimes, legibility does not help readability, either, as a type with overly distinct letters (legibility) can hinder word shapes and decoding (readability).

One of the co…

Let’s Make a Movie: Digital Filmmaking on a Budget

Film camera collection. (Photo credit: Wikipedia) Visalia Direct: Virtual Valley
June 5, 2015 Deadline
July 2015 Issue

Every weekend a small group of filmmakers I know make at least one three-minute movie and share the short film on their YouTube channel, 3X7 Films.

Inspired by the 48-Hour Film Project (48hourfilm.com), my colleagues started to joke about entering a 48-hour contest each month. Someone suggested that it might be possible to make a three-minute movie every week. Soon, 3X7 Films was launched as a Facebook group and members started to assemble teams to make movies.

The 48-Hour Film Project, also known as 48HFP, launched in 2001 by Mark Ruppert. He convinced some colleagues in Washington, D.C., that they could make a movie in 48 hours. The idea became a friendly competition. Fifteen years later, 48HFP is an international phenomenon, with competitions in cities around the world. Regional winners compete in national and international festivals.

On a Friday night, teams gathe…

Edutainment: Move Beyond Entertaining, to Learning

A drawing made in Tux Paint using various brushes and the Paint tool. (Photo credit: Wikipedia) Visalia Direct: Virtual Valley
November 2, 2015 Deadline
December 2015 Issue

Randomly clicking on letters, the young boy I was watching play an educational game “won” each level. He paid no attention to the letters themselves. His focus was on the dancing aliens at the end of each alphabet invasion.

Situations like this occur in classrooms and homes every day. Technology appeals to parents, politicians and some educators as a path towards more effective teaching. We often bring technology into our schools and homes, imagining the latest gadgets and software will magically transfer skills and information to our children.

This school year, I left teaching business communications to return to my doctoral specialty in education, technology and language development. As a board member of an autism-related charity, I speak to groups on how technology both helps and hinders special education. Busin…