Technical facts & references

What’s involved in the technology behind caption and subtitle fonts? In this section, we’ll cover:

character sets and encoding
transmission and display technologies
psychology of reading

There’s also a bibliography.

Character sets and encoding

We don’t have a Unicode character encoding for captioning or subtitling. We don’t even have encodings for the existing character sets used by specific platforms, like Line 21 closed captioning or teletext.

In theory, there’s an easy way to fix the problem – just hunt around in the existing encodings for all the characters we’re using. But that puts the cart before the horse. Nobody uses well-encoded characters because there is no easy encoding to use.

For example, here are a few characters found in Line 21 that you wouldn’t necessarily expect:

Character	Issue
♪	Most troublesome of all, the staffnote character (actually an eighth note, `U+266A EIGHTH NOTE`), used to denote music and lyrics. It can be found all the way back to MS-DOS encodings, but continually bedevils captioning software and displays. This character is not the related `♫` `U+266B BEAMED EIGHTH NOTES` that is often errantly substituted
¼ ¾ ½	Troublesome characters because ¼ ¾ were replaced by ® and ™ in 1992, meaning that old and new decoders have different display of new and old captions, respectively
°
÷
¿	We’ve never had an inverted exclamation point, `¡`. Captioners have used a lower-case `i`, which is inadequate and simply wrong
" '	All we’ve got are neutral double and single quotation marks. But in typical fonts, the neutral single quotation mark is actually a curled apostrophe `’`; some neutral double quotation marks are actually double closing quotation marks `”`
¢ £

And here is the full range of accented characters found in the EIA-608 specification for Line 21 captions:

á é í ó ú ç ñ Ñ à è â ê î ô û

You can add a few more if you use the optional extended character set, whose real-world support is unknown:

Á É Ó Ú Ü ü À Â Ç È Ê Ê ê Î Ï î Ô Ù ù Û Ã ã Í Ì ì Ò ò Õ õ Ä a O ö Å å Ø ø

(That extended character set also solves the ¡ problem. See, for example, pp. 65–66 and 312–313 of Gary Robson’s Closed Captioning Handbook, whose elucidations about the needs of specific languages are often incomplete when not incorrect.)

To put all these characters together, we have to comb through three Unicode encodings – Basic Latin, Miscellaneous Symbols, and Latin-1 Supplement.

A single Unicode encoding for captions and subtitles would ensure that manufacturers at least get the characters right and would, more importantly, facilitate file transfer. Moving a Line 21 caption file to certain kinds of DVD subpictures, to PAL teletext, to DVB bitmaps, and to Rear Window® cinema captioning is generally troublesome, given that some characters are simply unavailable (e.g., accented letters in Rear Window) or are mapped to different characters (e.g., ♪ in Line 21 maps to # in teletext).

A few years ago, manufacturers were asked to assist in developing a Unicode encoding, but ignored the request. We’ll be working on this one ourselves.

Preferred viewing distance

We managed to locate a reference to a standard measurement of preferred television viewing distance, and we’ve separately quoted its measurements and graph.