Report for Ascender Corp. and Microsoft on screenfonts for captioning

Note

This report, from 2006, was commissioned by Ascender Corp. on behalf of Microsoft Typography and was never implemented. It is presented here (August 2009) for general reference.

Screenfonts for captioning

This paper explains some basic facts about screenfonts for captioning; how they differ from other screenfonts; and incentives to design and test new font families from the standpoints of legal compliance and psychology of reading.

Captioning is important because it is a unique, ongoing, long-term reading task that has been overlooked by researchers and typographers. Laws require captions on TV shows; other laws require the ability to display those captions. Most of the people for whom captioning is intended have no choice but to watch captions; they cannot understand TV and video any other way. This captive, neglected audience needs and deserves better type.

Summary

The ability to display captions is sometimes required, either as part of a technical specification or by law.
Millions of viewers watch captions, often for dozens of hours per week.
Captioning has typographic characteristics different from print and from other kinds of screenfonts.
Captioning typography has been neglected, resulting in often-unpleasant reading conditions.
The effects of antialiasing and other reading technologies are unknown.
Custom fonts are needed to address the unique conditions found in captioning.

Terminology

This paper discusses captions only, not subtitles. Though the terms are used interchangeably in the U.K. and Ireland, captions are same-language transliterations of dialogue and important sound effects intended mostly for deaf and hard-of-hearing viewers, while subtitles are a written translation of dialogue and some onscreen type intended exclusively for hearing viewers.
Both captions and subtitles can be open or closed. Open captions are always visible to a sighted viewer and cannot be turned off or avoided. Closed captions are seen only if the viewer so chooses. When used generically, those terms do not imply the use of any technology.
Line 21 is the analogue closed-captioning system used in the U.S., Canada, Japan, and other countries running the NTSC television format. The term refers to the line of the TV picture where caption codes are found. (EIA-608, or just 608, is the technical standard for Line 21 captions.) “Closed captioning” on NTSC television refers to Line 21 most of the time. (A variant of Line 21, Line 22, was developed for PAL-format home videotapes.)
EIA-708 (or just 708) is the standard for captioning for high-definition television (under the ATSC specification) in the U.S. and Canada.
Teletext is the system used in the U.K. and Ireland, Australia, and other PAL-format countries that transmits pages of data along with the television signal. Teletext can also include closed captioning (as on page 888 or 801).
DVB is the U.K. and Ireland standard for digital video broadcasting. It includes translated teletext captions and, more commonly, captions as bitmaps.
DVDs can carry Line 21 captions (on NTSC discs), and all DVDs can carry bitmap or subpicture captions.

Microsoft business areas affected

Windows XP Media Center Edition: In the U.S., analogue and digital TV receivers have to include caption decoding, a requirement that includes software-based systems like Media Center. Teletext receivers may include caption decoding. Line 21 captions on NTSC DVDs may optionally be displayed, while subpicture captions must be displayable by spec.
Windows Media Player: Closed captions are supported in all recent versions.
Xbox 360: DVD subpicture playback as with Media Center. Can optionally, but in practice must, pass through Line 21 captions.
Typography and Advanced Reading Technologies: It is unknown how well ClearType antialiasing works with real-world captions. Features of interest include ever-changing backgrounds, scrolling captions, and character masks and edging. Research into caption typography is an open area.

Note that employees with disabilities must be accommodated in the workplace. Microsoft’s internal videos may require captioning for deaf and hard-of-hearing employees, hence any improvement in caption typography benefits not only viewers at large but Microsoft itself.

Microsoft’s Accessibility group is unlikely to have useful expertise in caption typography.

Worldwide usage

Captioning is used extensively on TV in the U.S., Canada, the U.K. and Ireland, and Australia (the Big 5 countries), where it is also prevalent in home video. (U.S. and Canadian commercial VHS tapes have been captioned for over 20 years.) Captioning is also commonly used on TV in New Zealand, Sweden, the Netherlands, France, Germany, Portugal, and other developed countries.
Captioned DVDs are common in the Big 5 countries and are somewhat haphazardly available elsewhere.
Online media, whether as standalone files or streaming video, can be closed-captioned in all three major players (Windows Media, QuickTime, Real) and can be open-captioned in any player or format. Windows Media uses its own markup language for captions and subtitles, SAMI, while QuickTime and Real use SMIL along with their own languages.
First-run movies can be closed- or open-captioned. Technologies include the Rear Window system of captions reflected from an LED screen onto a viewer’s seat-mounted panel; DTS-CSS and Dolby Screentalk projected bitmaps; and burned-in captions. Captioned movies are hard to find and almost entirely limited to the Big 5 countries.

Legislation

Closed-caption decoders are required in the U.S. under the Television Decoder Circuitry Act of 1990.
- Analogue TV sets with diagonal screen dimensions of 13″ or larger must decode 608 captions.
- Devices “with integrated ‘widescreen’ displays measuring at least 7.8″ vertically, DTV sets with conventional displays measuring at least 13″ vertically, and standalone DTV tuners, whether or not they are marketed with display screens” must decode 708 captions (FCC 2000).
- Importantly, all analogue devices with TV tuners that might produce a display 13″ or larger must decode 608 captions, and all devices that tune high-definition signals must decode 708 captions. As such, this includes nearly any Media Center implementation that displays NTSC or ATSC television signals.
- There are no similar requirements in other Big 5 countries, but Canada tends to receive the same hardware as the U.S. and most midrange to high-end analogue PAL televisions support teletext.
Broadcasting regulators in different countries impose requirements for captioning. With a small list of exemptions, as of 2006, nearly all English-language U.S. programming aired between 0600 and 0200 hours on networks five years of age or older must be captioned. (Some Spanish-language programming must also be captioned.) Large English-language broadcasters in Canada must caption 90% of their programming, save for overnight periods and some exemptions. In North America alone, the result is thousands of hours of captioned programming produced now and every year into the indefinite future.
Some human-rights complaints or inquiries have resulted in increased captioning, as in Australia and Canada. Complainants have also sought the installation of captioning equipment in movie theatres, as in New York and New Jersey. Caption provision in consumer hardware and software, including operating systems, could be the subject of a future complaint.

Font categories

Everywhere in captioning, there are only three categories of fonts:

Encoded (as with Line 21, teletext, and most Webcasts), where a stream of character data is transmitted to a device that renders captions in its own fonts
Bitmapped, as with DVB and DVD subpictures
Burned-in, where type is irreversibly incorporated into the visible picture

Font requirements

EIA-608 specifies a character set but no fonts. (The character set has changed twice, meaning that old, recent, and current captions can and do have different character encoding, although the differences are mostly confined to rarely-seen characters.) Teletext has no font requirement; many devices have used the same bitmapped font for decades.
EIA-708 requires that devices:
- support “standard,” “large,” and “small” caption sizes and allow the caption provider to choose a size and the viewer to override it
- support eight font families –
  1. “undefined”
  2. monospaced serif
  3. proportional serif
  4. monospaced sansserif
  5. proportional sansserif
  6. casual
  7. cursive
  8. small capitals (sic)
  – and allow the viewer to override the base font choice
- support eight background and foreground colours (white, black, red, green, blue, yellow, magenta and cyan), also overridable by the viewer
- support different options for character edging (implicitly up to the viewer, not the captioner)
DVB can use any font, but tends to use Tiresias Screenfont, a typeface whose appropriateness for captioning has been disputed (Clark 2005a).
DVD subpictures can use any font (commonly Arial). The DVD example is complicated by the fact that NTSC and PAL widescreen and fullscreen formats all have different pixel aspect ratios, none of which is square or circular, resulting in misrendered type.

Note that small capitals are deemed a separate font family in the 708 spec. While this is a clear error, a compliant device has to use a (presumed) caps-and-small-caps font for that setting. It could also permit small capitals as an attribute in the remaining families. (Cursive small caps have worked before, as with Zapf Chancery, and should not be ruled out.)

Audiences

Captioning is intended for deaf and hard-of-hearing viewers. Hearing people also watch captions. The most-often-mentioned group of hearing captioning viewers is second-language learners, but the prevalence of built-in caption decoders suggests that hearing native speakers are the actual majority audience of captioning (Clark 2001).

Research

There is a reasonable body of research on captioning, including caption viewership, but almost no research is available on captioning typography.

Caption usage

Jordan et al. (2003) found that 84% of deaf people and 66% of hard-of-hearing people surveyed use captions “all the time,” as do 20% of ESL learners and 3% of nondisabled English-speakers. Fully 71% of deaf people report being “very interested” in captions for “Internet audio” (62% for hard-of-hearing people, and a surprising 12% for nondisabled people).

Caption reading

In an eyetracking study, Jensema et al. (2000b) showed that deaf subjects watching test videos spent an average of 84% of their time directing their gaze at captions. If people similar to these subjects watch an average of 30 hours of TV a week, they “may spend about 25 hours a week” reading captions off a screen, a lengthy and ongoing reading task second only to office computer usage.

Jensema et al. (2000a) conducted another eye-tracking study of deaf and hearing caption viewers and found they spend most of their time reading captions: “The addition of captions apparently turns television viewing into a reading task.”

Thorn and Thorn (1995) report that “caption reading is more difficult than most other reading tasks because the text is divided into segments and displayed for limited periods of time.” Even small amounts of blur were shown to effectively destroy a subject’s ability to read captions (also shown in Thorn, Thorn, and Malloy [1995]).

Thorn and Thorn quote other studies asserting that deaf people are more likely to have a visual impairment than hearing people. Most of those visual impairments are refractive errors; for that group, deaf people are more likely than hearing to have the wrong eyeglass prescription. On the whole, deaf people have worse vision than hearing people, yet they commonly spend hours each week reading captions.

The needs of captioning viewers with a visual impairment have been discussed since the 1980s (Clark 1989), but have not been the subject of significant research and development.

Caption font choices

Research on caption viewers’ font choices is almost nonexistent, in part because old captioning technologies like Line 21 gave viewers no font choice at all.

WGBH (1997) conducted simulations of high-definition video with captions typeset in Helvetica and Times (“popular” fonts) and Monaco (“similar to current captioning,” i.e., a “sansserif, monospaced font”).

“Helvetica... was the clear choice of participants.... Times was too ‘busy’ or ‘crowded.’ This was due in part to the serifs, although some felt that another serif font may have worked. Most of the participants felt that Monaco was too large[, perhaps] because the letters were all upper case. The poor response to Monaco indicates that it was not an effective approximation of today’s captioning after all.”

The experiment produced equivocal results for two reasons: It used print fonts (though experimenters had few, if any, alternatives) and it simulated high-definition captioning rather than showing true video. Results could additionally indicate that viewers dislike “today’s captioning” fonts, had never have a chance to say so, and availed themselves of the opportunity.

Unique features of caption reading

How caption reading differs from print reading

Captions are always on a screen
Captions can be luminous (TV, Web) or reflective (Rear Window)

How caption reading differs from other onscreen reading

Viewers are usually stuck with the fonts. Few technologies permit the viewer to change the caption font. Usually, viewers are stuck with whatever their device gives them
Fonts are “undesigned” or reused. Most Line 21 and teletext decoders use fonts that were designed by engineers or chip manufacturers, not typographers. Or fonts may be reused from print (a known problem and the genesis of many screenfont projects) or reused from other screenfonts (an ironic magnification of the previous problem)
Captions can be superimposed or offscreen or out-of-frame. Typical TV captions are superimposed right on the picture. Captions on letterboxed video can be wholly or partially outside the frame. Most online captions are shown in a box below the actual frame
Captions tend to be viewed from far away. Computer monitors are, axiomatically, lean-forward devices where viewing distances are usually an arm’s length or shorter. (Online captions are typically read from close distances.) Televisions are lean-back devices where viewing distances are so large they are conventionally measured in multiples of screen height, with distances of 8H to 12H common. A personal computer may need to support both of those caption-viewing distances (up close for online media, far back for TV signals)
Some captions overuse capital letters. Because the first Line 21 decoder fonts had no descenders (an error left unremedied in a subsequent hardware upgrade), captioners considered all-upper-case captions less illegible. To this day, Line 21 captions are commonly set almost entirely in capital letters. Hence captioning viewers in the U.S. and Canada spend dozens of hours a week reading blocks of glowing capital letters set against moving backgrounds
Wide range of sizes. Online streams may use fonts 9px to 18px high. Analogue and digital TV captions can be large compared to the video (particularly in the often-seen case of three-line captions), and 708 captions can be resized by the viewer

How caption reading differs from print and other onscreen reading

Captions appear and disappear. Nominally, viewers have exactly one chance to read the caption before it vanishes, as compared with computer screens where content tends to remain stationary until the reader chooses otherwise
Captions move. Scrolling captions are common in 608, 708, and teletext and are nearly the only method used for live programming. (Crawling captions are scarcely ever seen and tend to be limited to art projects using open captioning)
Caption backgrounds change. Most captions appear superimposed on the ever-changing background of a television program, with instantaneous, unpredictable variations in colour, brightness, and, most important of all, movement. No other field of reading requires viewers to continuously fixate on words set against a moving background – and the words themselves might also be moving
Caption masks will vary. Line 21 and teletext use a black character box by default (as DVB tends to do), though the viewer can sometimes turn such masks off. Other caption formats, including online streams, almost never use a character mask

Current state of Media Center fonts

In assessing the existing Media Center fonts, the following general observations apply.

Reusing print or screen typefaces for captioning will always involve issues of spacing. In broad terms, proportional fonts have too little spacing and monospaced fonts too much. In the following critique, reuse of existing fonts should always operate under the assumption that spacing must be modified, in effect creating a custom variation of the font.
Oldstyle ranging numerals are also generally inappropriate for caption display. There is no imperative to balance typographic colour or x-height as there is in print or in static screenfonts. Discernible numeral shapes are needed, and they may include small descenders on 5·7·9 and a small ascender on 6.
Slash zero is more appropriate to computer programming than caption reading.
Extended display of capital letters makes heavily-bracketed crossbars on I ill-advised (mostly in monospaced fonts). In caption reading, that letter scarcely ever needs to be seriously discerned from confusable forms (l or 1).
The casual and cursive styles need particular attention because:
- Few people will use them, but the people who do will set all their captions to display in them. Shaikh et al. (2006) found that between 0.4% and 10.9% of respondents (median 1.5%) chose a casual or cursive font as a last choice for tasks that obviously required serif or sansserif fonts, such as Web texts, business documents, “E-magazines,” and spreadsheets. Similar usage of casual or cursive fonts for captioning can be reasonably assumed.
- No matter how unwise or tasteless this may be, designs must compensate for overuse or misuse.
- Every type designer agrees that serif, sansserif, and monospaced fonts have to be legible, but casual and cursive fonts are given a pass and do not receive the same design and engineering attention.
- Casual and cursive fonts are not typically associated with long-term reading.
- These faces may, like many caption fonts, be presented in CAPITAL LETTERS for extended periods.
Essentially, cursive and casual fonts are not the place to “have a little fun” with type. Because having a little fun is the first thing people will want to do, these edge cases require as much attention as base fonts.

Fonts currently used

Monospace serif: Courier, a de facto letterpress font given its IBM Selectric origins, turns into a monoline font in outline form. Poor handling of narrow and wide characters, overexpressive serifs, confusable numerals, brutal and mismanaged accented characters. Courier is the monspaced font people use when they don’t know anything about monospaced fonts. Associated with lowest-end analogue TV captioning.
Proportional serif: Georgia. Possibly viable, but default letterspacing is likely too tight. Some question about rendering of bracketed serifs.
Monospace sansserif: Lucida Console. Also essentially monoline. Too bold. Slash zero. Cap height too low compared to x-height. Acute and circumflex accents poor in some cases (í, î).
Proportional sansserif: Verdana. Sets too wide for caption usage. Its letterforms, created for unambiguous decoding at close distances and small sizes, can look too conspicuous and inelegant at large caption sizes (e.g., J; 1; comma/semicolon). Known to be a poor performer in DVD bitmaps captions (Clark 2005b)
Casual: Comic Sans cannot be taken seriously as a captioning typeface and, by design, is appropriate mostly for cartoons or chat applications. By its designer’s own admission, Comic Sans was designed for children (Connare 2003). Variable curvatures of rounded letters will clash, as will the bracketed I.
Cursive: Script MT Bold is too filigreed and heavyweight. A “cursive” font merely has to be suggestive of handwriting and does not have to represent any traditional calligraphic style. “Cursive,” furthermore, does not automatically imply “swash.”
Small capitals: Tahoma Small Caps really are drawn small caps rather than interpolated ones. However, since the 708 spec errantly defines small caps as a typeface rather than a variation, we have to expect extended reading. There is little tradition of sansserif small capitals; a small-caps font should be proportional serif.

Font bug

Display of high-definition captions in Media Center is affected by a bug in which, irrespective of the viewer’s choice, the only font displayed is Lucida Console.

Development objectives

Repair the Lucida font bug in Media Center.
Comply with specifications for display of bitmap captions, as in DVB and DVD.
Add support for Line 22 caption decoding in PAL countries (a subset of Line 21).
Develop and test font families for analogue and digital captioning.
1. Use cases include:
  - Analogue: Line 21, teletext
  - Digital: 708, online
2. All-new fonts would be desirable, since captioning is its own medium. There are, however, some cases where reusing existing fonts makes sense.
  - A monospaced font designed for 708 could be reused in the analogue cases.
  - The small-capitals font in 708 can use the serif or sansserif proportional font with small caps.
3. The online case may need three complete families (e.g. serif, sansserif, monospaced), probably designed from scratch.
4. As a result, there may be a need for nine different families –
  1. monospaced serif
  2. proportional serif
  3. monospaced sansserif
  4. proportional sansserif
  5. casual
  6. cursive
  7. serif (online)
  8. sansserif (online)
  9. monospaced (online)
  All variations require drawn small capitals.
5. For the specific case of the cursive font family, recommend contracting with Hermann Zapf to design a custom variant of Zapf Chancery, or a replacement font with comparable legibility.
Typefaces must be tested to demonstrate superiority to competing fonts. Test conditions include:

Subjects

Deaf; hard-of-hearing; low-vision; second-language; nondisabled native speaker

Case (for North American scenarios)

All upper case; mixed case

Background masks

Present; absent

Colours

Reasonable foreground/background choices; colour deficiency

Character edging

Present; absent
Research interests include:
- ClearType antialiasing of captions:
  - with no background mask superimposed on moving, multicoloured backgrounds
  - with background mask in different foreground colours
- Effect of serifs (including slabserifs)
- Edge-case usage (e.g., all captions set in cursive font)
- Effect on fixations, especially with scrolling and/or all-caps rendering
- First-language interference for second-language readers
- Scrolling vs. block display
- Visual impairment, including font zooming in assistive technology (known to pixelate fonts)

References

Clark, Joe, 1989. Typographic requirements for captioning for HDTV
—, 2001. The hearing majority of captioning viewers
—, 2005a. What’s wrong with Tiresias?
—, 2005b. What’s wrong with using Web screenfonts for captions and subtitles?
Connare, Vince, 2003. Comic Sans
FCC (Federal Communications Commission), 2000. FCC Adopts Technical Standards for Display of Closed Captioning on Digital Television Receivers
Jensema, Carl J., Sameh El Sharkawy, Ramalinga Sarma Danturthi, Robert Burch, and David Hsu, 2000a. “Eye-movement patterns of captioned-television viewers.” American Annals of the Deaf, 145(3):275–285
Jensema, Carl J., Ramalinga Sarma Danturthi, and Robert Burch, 2000b. “Time spent viewing captions on television programs.” American Annals of the Deaf, 145(5):464–468
Jordan, Amy B., et al., 2003. The State of Closed Captioning Services in the United States. Philadelphia: Annenberg Public Policy Center (scanned PDF)
Shaikh, A. Dawn, Barbara S. Chaparro, and Doug Fox, 2006. “Perception of fonts: Perceived personality traits and uses.” Usability News 8.1
Thorn, S., F. Thorn, and D. Malloy, 1995. “The elderly read TV captions as well as young adults.” Presentation at the annual meeting of the Association for Research in Vision and Ophthalmology
Thorn, Sondra, and Frank Thorn, 1989. “Television and vision: Reading captions when vision is blurred.” American Annals of the Deaf, 1989 March; 134(1):35–38
Thorn, Frank, and Sondra Thorn, 1996. “Television captions for hearing-impaired people: A study of key factors that affect reading performance.” Human Factors, 38(3):452–63
WGBH (Judith Navoy), 1997. Advanced Television Closed Captioning Features