Skip to main content

Xkcd, Kerning, Full-Width Justification, Kerning, and Light Sabers

6655954683_e64106609c_oBetween its strips on kerning, dates, and diacritics, the webcomic xkcd has shown itself to be adept at poking fun at bad formatting.

Today the webcomic turned its attention to kerning again, this time focuses on the unsolved problem of full-width justification.

The bane of ebook readers and developers alike, full-width justification is a digital holdover of a print formatting decision. It’s intended to make a body of text look prettier by giving it nice and even sides, but that comes at a price.

Justification has to be handled automatically in ebooks and on the web, and that leads to one of several compromises.

full_width_justificationAs you can see at right, a rendering engine can either leave gaps at the end of the line, insert space between words and characters, or hyphenate the words which span two lines (a common web solution which the Kindle platform is still working to adopt).

And to make things more complicated, when it comes to ebooks we also have some publishers setting the justification by CSS fiat, choosing to either force full justification or left justification (aka ragged right).

While the comic does mention stretching, filler, and snakes, those aren’t real options for text formatting (or at least not yet). And curiously, even though today is 4 May there is no mention of light sabers as a filler option – just snakes.

None of the real options are an ideal solution, but they all have their advocates in the ebook realm. I, for one, prefer a ragged right when I am reading ebooks because stretching words to fill the space looks like crap, and hyphenation is not always an option in all the apps.

How do you like to format text in an ebook, when you get to make a choice?

images by xkcd, hans.gerwitz

Similar Articles


Comments


Anne May 4, 2016 um 4:33 pm

I like full justification and hyphenation. I often use Calibre and various plugins to achieve this.

Nate Hoffelder May 4, 2016 um 4:43 pm

When hyphenation is done right, it’s great. But it frequently doesn’t work for me.


Mike Hall May 4, 2016 um 5:47 pm

I’ve read a number of articles by typographical purists who are tearing their hair out about the appearance of e-books (mostly, if I recall correctly, about the Kindle).

However,though I find their technical complaints quite interesting, I also find that I don’t actually care, any more than I care that my choice of fonts is limited. I’m perfectly happy with full width justification and no hyphenation and can live with the very occasional infelicities in the appearance of the text; it simply does not have any impact on my reading.

I dislike ragged-right in print and suspect that I’d dislike it in e-books, though I don’t recall seeing it in anything other than PDFs. Having said that, I think that it would probably be a good idea for the Kindle display to switch to left justified when the selected text size is very large.


Muratcan Simsek May 5, 2016 um 1:25 am

Hyphenation is not a feature, it has been a part of the printed word ever since the first one. The Gutenberg Bible has hyphenation. Older scholastics used it in their hand-written books. It is an inherent part of latin writing. It was here since the very beginning.

This is why many people (including myself) were so offended by the Kindles. It wasn’t lacking hyphenation as a feature, it was crippled. Because, as I said, hyphenation is as much a feature as punctuation marks are.

So is kerning, actually. There was kerning in Gutenberg.

If you consider these topics as I see them, I am sure you will understand the problem many of us are facing. Kerning, hyphenation… these are not features you can have or not have. They are parts of printing, they always were. No one had even discussed a book without hyphenation or kerning before Kindles and internet. Adobe based readers also had them back then, Adobe was always good about hyphenation anyway. When hyphenation and kerning became available in web, it was a new feature as a much needed fixing.

Now, and please don’t be offended, I even see people saying they can’t read books with hyphenations. I am complete dumbfounded here. I mean, did you ever read a printed book before? Only books I have without hyphenations are the books my mother’s 5-6 year-old students try read to learn how to read. I admit that hyphenation confuses them for a while. But even they have kerning.

Mike Hall May 5, 2016 um 5:37 am

It is of course true that hyphenation and kerning have always been part of the “printed word” and I understand that you are offended by their exclusion from the Kindle (other than, of course, in PDFs). I guess that this exclusion was a pragmatic choice of the software developers faced with the need to display text in a variety of fonts and sizes.

Without in anyway intending to be offensive I do find your view that this “crippled” the Kindle rather absurd. As I said in my earlier comment, this is something I simply cannot bring myself to care about and the Kindle’s success suggests to me that my attitude is a very common one. I would have no problem were hyphenation implemented for the Kindle – as long as it did not noticeably slow down page turns – but it would be right at the bottom of my priority list and would have zero impact on any purchase decision I make.

I think this is something about which we will simply have to agree to disagree. You will presumably avoid the Kindle because you cannot stand the typography but I suspect that you are part of a very small minority that actually worries about this.


Budding Typographer May 5, 2016 um 3:19 am

The problem with many Justification Algorithms is that they are rendering with only a partial subset of the typographical toolbag (and implementing many of these poorly).

Most of the ereaders + browsers + word processors only include the very basics (Word Spacing, Hyphenation, Kerning, Ligatures). Typically these are implemented poorly as well (more details below).

Part of the problem is that many of the Justification Algorithms were also implemented for speed + memory/processor constraints—not for proper typography. Ereaders may have very weak processors, very little RAM, and readers may not want tens/hundreds of milliseconds of delay before the page renders (so the algorithms tend towards the real-time implementations).

Also, instead of instead of tackling the paragraph/page as a whole—as more advanced typographical programs do—the devices handle justification on a line-by-line basis. (Some of this can be seen in the examples below).

———-

These are the basics that are typically implemented in ereaders + browsers + word processors:

—–

Word Spacing

For example, Microsoft Word’s (including many ereader’s) Justification Algorithm only stretches the Word Spacing (the space that is between two words). The smaller the columns/width of text, the bigger this problem stands out. On smaller devices such as cell phones + ereaders, the spaces between words becomes HUGE.

Here are one of the better examples out there is a comparison between Word/InDesign/LaTeX’s justification algorithms:

https://tex.stackexchange.com/questions/110133/visual-comparison-between-latex-and-word-output-hyphenation-typesetting-ligat

The huge spaces can be somewhat helped when Hyphenation is turned on—but many older ereaders do not have—and which many people do not enable on word processors (automatic hyphenation is off by default).

—–

Hyphenation Algorithms

Hyphenation Algorithms are another problem in ereaders—many of the algorithms used are very subpar, and will break words at the wrong positions.

Example: "the-rapist".

The devices will typically involve very dumb algorithms that follow rules along the line of:

You can break after first two characters + before the last two characters + don’t break words that are 5 characters or less.

Proper Hyphenation would break on syllables (in US English) or roots of words (in UK English).

More can be read here, in an article called "On Hyphenation – Anarchy of Pedantry":

https://web.archive.org/web/20150621232031/http://www.melbpc.org.au/pcupdate/9100/9112article4.htm

and here:

https://en.wikipedia.org/wiki/Hyphenation_algorithm

—–

Kerning

https://en.wikipedia.org/wiki/Kerning

This is the space between two specific pairs of letters.

Example: "AV" will be squeezed closer together.

https://en.wikipedia.org/wiki/Kerning#/media/File:Kerning_EN.svg

This is typically implemented in the fonts.

—–

Ligatures

https://en.wikipedia.org/wiki/Typographic_ligature

Example: "ff" + "ffi" + "fi" + "fl" + "Th"

These are typically implemented at the font level.

———-

There are quite a few other microtypography tools that are not implemented in ereaders + browsers + word processors:

—–

Tracking (also known as Letter Spacing)

https://en.wikipedia.org/wiki/Tracking_(typography)

This is the space between letters as a whole (not just pairs like Kerning).

—–

Shrinking/Stretching

This is where you can shrink/stretch the characters themselves.

Example: The width of an 'a' becomes slightly smaller.

Typically characters can be stretched/shrunk by ~2%. This tiny adjustment, added up over an entire page, can lead to much better spacing + fewer hyphens.

———-

There are quite a few more microtypographical changes that could be done (Protrusion).

Many of these can be read about here:

https://en.wikipedia.org/wiki/Microtypography

and some other examples can be read about + shown here:

http://www.khirevich.com/latex/microtype/

Name (Required) May 5, 2016 um 10:28 am

I have to disagree with you.
Current readers do have limited processor and memory resources, but a 486 was able to do a proper typography in inDesign or LaTeX on a processor that was magnitude slower and simpler. An 486 PC of its time had something like 50MHz processor and 32MB RAM, a typical reader nowadays has 1000MHz processor and 512MB RAM. Not to mention the difference between an ATA hard disk versus FLASH RAM on reader.

Name (Required) May 5, 2016 um 10:41 am

… one more thing.
You do not have to render a page in milliseconds. While the user reads page 55 you have LOTS of time to render page 56, and you also cache previously rendered pages just in case they want to look at previous page.

Budding Typographer May 5, 2016 um 5:47 pm

You do not have to render a page in milliseconds. While the user reads page 55 you have LOTS of time to render page 56, and you also cache previously rendered pages just in case they want to look at previous page.

While something like a Fiction book may be read linearly (you read it from front -> back), not all books are made that way—Non-Fiction in particular.

You may click on a link (footnote/endnote) and jump to a completely different position in the book. You will have to wait hundreds of milliseconds for that to load + render. Then the same sluggishness while going backwards.

Typically these devices may load an entire HTML chapter at a time (they don’t load the entire book due to memory constraints). Sure, a person might not mind if they make it the Endnotes Chapter and wait a while as it renders, but what happens when you jump back to the text?

Those delays add up, and get very annoying very fast.

—–

You also don’t want to cache too far ahead, because the cache can easily become garbage (wasting precious CPU cycles). (See html5rocks link below).

Then what happens when you click a link back? The exact paragraph/sentence you were linked to will probably appear at the very top of the page (meaning everything before/after that point has to be rerendered as well).

—-

Semi-related to this discussion is also Reflow:

https://developers.google.com/speed/articles/reflow#guidelines

Where web pages feel very sluggish because the page is recalculating the layout (text now is dependant on text before, and text after is dependant on the text now).

Do you wait and calculate the entire page FIRST? No one likes staring at a blank/loading screen. Or do you render what you can, and reflow as new calculations come in?

—–

Quite related to rendering ebooks is knowing how a Web Browser works, and all of the steps involved:

http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/

———-

A lot of the problems of typography in ereaders are the ol' "Space-Time Tradeoff" in computing. Take up more memory for lower computing time, or more computing time for lower memory:

https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff

A key thing to remember also is that a lot of these devices run on batteries (so they are trying to minimize the computing time… and proper typography is a HARD + CPU intensive problem).

—–

This isn’t getting into the typographical considerations between different countries/languages either, or multi-language documents.

Take the Hyphenation Algorithms as an example:

English (US) Hyphenation =/= English (UK) Hyphenation =/= German Hyphenation =/= French Hyphenation =/= Greek Hyphenation.

In your reader, you may want to load up Hyphenation dictionaries for each of these languages (a few hundred KBs -> a few MBs each):

https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens#Browser_compatibility

Want to do hyphenation a bit better while cutting down on CPU time? You have to have the Hyphenation Dictionaries sitting in memory. (In reality, a US-made device might only include US/UK English Hyphenation, and any language outside of that… too bad. Or they get the simple algorithm I mentioned in the previous comment, or the crappy Microsoft Word algorithm. :D).

—–

Similar to Hyphenation, you can take Word/Line Wrapping Algorithms as another example:

https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap#Algorithm

Trying to minimize raggedness is a HARD problem, thus in order to save computing power, the ereaders might take the quick way—which is deemed "good enough"—like Word Processors do (Microsoft Word/OpenOffice/LibreOffice). They handle line breaks on a line-by-line basis, and just break as soon as possible.

Taking the easy way out is fast (very little CPU) + simple (easy/quick algorithm), but looks like crap.

—–

These ereaders typically aim towards the Real-Time Algorithms, not typographically best ones (which are hard problems. See above: Hyphenation, Line Wrapping, Microtypography, […]).

Current readers do have limited processor and memory resources, but a 486 was able to do a proper typography in inDesign or LaTeX on a processor that was magnitude slower and simpler. An 486 PC of its time had something like 50MHz processor and 32MB RAM, a typical reader nowadays has 1000MHz processor and 512MB RAM. Not to mention the difference between an ATA hard disk versus FLASH RAM on reader.

Windows 3.1 fit on a few floppies as well!

Today, a web browser is more complicated and larger than the entire OS back then.

—–

On a reading device, you also aren’t just displaying the static result (like a physical book).

The rendering engine has to handle all the fun web stuff:

HTML/CSS + Javascript + MathML + Unicode + RTL text + links + all the different image types (SVG, PNG, JPG, GIF, […]) + Fonts (OTF, TTF, WOFF, […]) + Tables + changes in fonts/font size/margins/line-spacing + […]

You also need the device to handle all the tangentially related topics such as:

Search/Highlighting/Notes + Text-to-Speech + DRM + […]

Each one of these topics introduces their own problems (and CPU/Memory constraints).


Jim Chapman May 5, 2016 um 4:02 am

The next update of the Freda ebook reader for Windows 10 (now going through pre-release testing) will include snake-justification as an advanced option.

Jim Chapman May 6, 2016 um 4:35 am

The Freda update is live now in the Store at https://www.microsoft.com/store/apps/9wzdncrfj43b

For best results, use the 'settings' screen to switch 'hyphenation' to 'no', 'use snakes' to 'yes', and choose a large font size (33 or so). Then pick a book with long words, and read it in a narrow window.


xkcd über Typographie | schneeschmelze | texte May 5, 2016 um 4:14 am

[…] Via Nate Hoffelder. […]


Name (Required) May 5, 2016 um 10:12 am

On an e-ink reader (*) I personally prefer left justification plus hyphenation.

A typical 6″ e-ink reader has shorter lines than typical paler book. Plus, it has practically non-existent typography, so there is no effort to make the spaces between words on individual lines more consistent. So even if the full justification does look better in a paper book, with careful attention to typography, on an e-ink it is lesser of two evils 😉

Please notice, that the first ever book printed using movable type – the Gutenberg bible already had two columns, full justification and even "hanging" hyphens.
Ancient scribes considered it a matter of professional pride to be able to stretch text in such a manner that the handwriting was fully justified. And Gutenberg tried to imitate that.


purple lady May 6, 2016 um 1:31 pm

I need left justified text because otherwise I end up with large spaces between words. I use a fairly large font so the spaces between words can be huge. Why can’t *I* choose how I want my text formatted? This is why I strip drm and read in an app that lets me format in a way that works for me.

For those who don’t like reading left justified, what about reading this post and comments? It’s all left justified. Most of the web is left justified.


Write a Comment