Understanding captions & subtitles

The terms captioning and subtitling may be confusing to some readers. Both terms relate to the addition of onscreen text that renders dialogue. But captioning and subtitling are significantly different:

Captions are intended for deaf and hard-of-hearing audiences. The assumed audience for subtitling is hearing people who do not understand the language of dialogue.
Captions move to denote who is speaking; subtitles are almost always set at bottom centre.
Captions can explicitly state the speaker’s name:
1. [MARTIN]
2. >> Announcer:
3. ORIGINAL CAST OF "ANNIE":
Captions notate sound effects and other dramatically significant audio. Subtitles assume you can hear the phone ringing, the footsteps outside the door, or a thunderclap.
Subtitles are usually open (permanent, always visible). Captions are usually closed (selectable; you can turn them on or off). Closed subtitles, however, are now more numerous due to the popularity of DVDs.
Captions are usually in the same language as the audio. Subtitles are usually a translation.
Subtitles also translate onscreen type in another language, e.g., a sign tacked to a door, a computer monitor display, a newspaper headline, or opening credits.
Subtitles never mention the source language. A film with dialogue in multiple languages will feature continuous subtitles that never indicate that the source language has changed. (Or only dialogue in one language will be subtitled.)
Captions tend to actually transcribe and render utterances in a foreign language, or transliterate that dialogue if a different writing system is used, or state the name of the language being spoken.
Captioning aims to render all utterances. Subtitles are selective and do not bother to duplicate some verbal forms, e.g., proper names uttered in isolation (“Jacques!”), words repeated (“Help! Help! Help!”), song lyrics, phrases or utterances in the target language, or phrases the worldly hearing audience is expected to know (“Danke schön”).
Captions render tone and manner of voice where necessary:
1. ( whispering )
2. [BRITISH ACCENT]
3. [ Vincent, Narrating ]
4. (Sings like Elvis)
A subtitled program can be captioned (subtitles first, captions later). Captioned programs aren’t subtitled after captioning.

In U.K. English, subtitling is used to mean both captioning and subtitling. Canadian, American, New Zealand, and Australian English do not make that mistake, and neither will we here.

Current state of affairs

What’s the prevailing attitude?

So far, the responses to our queries about captioning and subtitling fonts can be paraphrased as follows:

We already have Tiresias.: Tiresias isn’t a tested screenfont family, among its many other limitations.
We just use Arial.: Essentially, you didn’t make a choice, presumably because you don’t know how (and because you really don’t have any provably better choice). Arial is a sorry excuse for a grotesk, and grotesks are ill-suited to captions and subtitles due to confusable character shapes.
The print fonts we’re selling are just fine with our customers.: Since this is essentially the same argument as “Comic Sans is just fine with our customers,” it’s difficult to take it seriously.

We now have nearly thirty years of experience with custom-engineered typefaces for specific applications – from Bell Centennial to Verdana to Clearview, with many stops in between. Whenever we hear type experts attempting to make the case that any old font will do for captioning and subtitling, we want to ask “Well, then why did we need Bell Centennial, Verdana, and Clearview?”

Indeed, what we really hear is “I don’t even like this subtitling business. Why should I care what fonts they use?” You don’t have to care, but we do. We can learn more about how screenfonts for captions and subtitles differ from print fonts and from other kinds of screenfonts.

Isn’t Tiresias good enough?

No.

The RNIB/Bitstream typeface Tiresias Screenfont has received attention as a claimed solution to every “subtitling” problem, but the typeface has a host of design flaws. It lacks even an italic, and Tiresias has not been fully tested with representative audiences (deaf/hard-of-hearing, visually-impaired, and hearing).

In the field of online captioning, no custom-made and tested screenfonts exist. (Type designer David Berlow, however, tells this activity that his Charcoal font for Apple was designed in part for such usage.)

Issues in screenfonts for captions and subtitles

Captions (and subtitles) are typographic forms with a unique set of properties.

They are transcriptions or translations of spoken words. Some viewers – even many viewers with hearing impairments – will read the captions and listen to the audio, making captions a multimodal form.
Captions move (as by scrolling, crawling, or painting on) and appear and disappear. Though viewers can often record a production and watch it again, in broad canonical terms you’ve got one chance to read a caption before it disappears.
Viewing conditions include both reflected and emitted light – reflected on movie screens and projections, emitted on TV and computers.
Technology affects resolution and colour: First-run feature films can do more than an old NTSC television set can do, for example.
Viewing conditions can be poor. People can sit too far from their televisions, or watch TV lying down, or use smudged and scratched reflectors in Rear Window® installations.
Some viewers can be expected to have multiple impairments. Even very slight blurry vision (as caused by an outdated eyeglass prescription) can severely diminish caption reading (Thorn & Thorn 1996).
Captioners and developers pay no especial attention to typography.

Legibility and readability

All the problems of legibility and readability we are familiar with in onscreen typography are actually worse in captioning and subtitling.

Television

Unlike computer screens, analogue television sets have low resolution and poor colour (especially in NTSC, an acronym that Europeans like to say stands for “never the same colour”). Scanlines blur into each other.
People are much more likely to watch an old TV than use an old computer monitor.
Viewing distances for TV viewing are much greater than with computers. If a distance of three times the picture height is considered optimal, some tests show that viewers typically sit seven or eight picture heights away. Fonts appear small.
By definition, captions and subtitles move. If nothing else, one title is replaced by another. If you’re watching broadcast TV, you have one chance to read the title before it disappears forever.
Closed captioning (captioning sent along with the signal; you can turn it on or off) is usually done through fonts built into the TV set, which are generally poor.
You can be expected to follow multiple streams of text at once.
Subtitled programs can also have captions. (Those who errantly make no distinction between subtitles and captions pretend that’s impossible by definition.)
Subtitles and captions either cover up other text (usually, that’s a mistake) or stay out of its way. Some of that text itself is in motion, as with crawls on news networks.

Digital television

In the U.K. and some other places, digital video broadcasting (DVB) “subtitles” are bitmaps transmitted to your TV. Even though you can put anything you want in the bitmaps, in the 21st century they still don’t have an italic.
Digital television as used in North America, known as ATSC, requires that receivers – not just TV sets, but also set-top boxes and computer tuner cards – include eight typeface families, described in the spec as follows:
1. Proportionally-spaced without serifs
2. Monospaced without serifs
3. Proportionally-spaced with serifs
4. Monospaced with serifs
5. Casual font type
6. Cursive font type
7. Small capitals
8. Default (undefined)
To date, manufacturers have used fonts from print typography that were pulled off the shelf, particularly for the “unusual” font categories like cursive and casual.

Rendering

Rendering means the presentation of a typeface in a form perceivable to a viewer. On today’s computer screens, fonts are rendered as dots or pixels. In some cases (as on liquid-crystal displays), those pixels can be divided into subpixels and individually manipulated to improve the appearance of a font. ClearType is most prominent technology, but CoolType can be found in Adobe Acrobat, and subpixel rendering is built into Mac OS X. (See articles at Typographi.ca and by Gruber and Riley.)

Rendering issues that impair legibility and readability include:

Caption fonts will look better with subpixel rendering, but such rendering can only improve what’s already there. With small sizes producing stem weights of a single pixel, some characters become hard to distinguish because there aren’t enough pixels to manipulate. Stems may shrink or diminish when seen against some background colours.
Italics may not be available in some fonts, though software may render an oblique of a roman font. Jagged edges are more pronounced in such cases. Italics also make for a poor fit on the same line as roman characters in most captioning examples.
Foreground and background colours are sometimes unwisely chosen.
Leading is difficult or impossible to control, as are linebreaks in some cases, resulting in sometimes-lengthy lines of type with too little space between the lines.