Screenfont.ca

Debunking the research on Tiresias Screenfont

We are led to believe that the claims of Tiresias Screenfont’s superiority are backed up by research. Well, there’s certainly research, but most claims are not backed up.

The only two significant research papers are:

  1. Silver, Janet, John Gill, Christopher Sharville, James Slater and Michael Martin, “A new font for digital television subtitles” (undated, but other citations list it as May 1998)
  2. Gill, J., J. Silver, C. Sharville, J. Slater and M. Martin, “Design of a typeface for digital television” (undated, but also 1998)

We’ll spend most of our time discussing the Silver paper, unless otherwise noted. And the only version of Tiresias under discussion is Tiresias Screenfont.

Subject bias

The research on Tiresias makes no effort at all to acknowledge the full range of caption and subtitle viewership, let alone accommodate that viewership. The Tiresias Screenfont project was set up to create a typeface for visually-impaired viewers of TV captions – a minority within the captioning audience and a minority within a minority of the television audience.

Researchers did not recognize that the caption- and subtitle-viewing audience consists of:

nondisabled people, notably including those who do not understand the source language of the production, but not exclusively those
Jordan et al. used about one-quarter each deaf, hard-of-hearing, ESL, and nondisabled subjects
people with a visual impairment
people with a hearing impairment
people with combined visual and hearing impairments
Identified with specific needs as early as 1989. The Silver paper notes that “several” of the visually-impaired subjects used hearing aids

The Silver paper notes that “the choice of typeface is important for many viewers, particularly older people,” suggesting that some viewers will never find the choice of typeface important. It’s assumed they meant that nondisabled viewers will never care what font is used. If that were true, it would be an argument for standardizing on Zapfino or a blackletter font, since any face will function adequately for those people.

Disability of participants

The paper is upfront about the disability groups it aims to test: “We were primarily interested in two groups of viewers, the visually-impaired and the hearing-impaired.” But what they’re really interested in is visual impairment. The paper states (emphasis added):

Any research must start with a survey of work published by others. Although a great deal has been written about print, most of it seems to be rather old and consequently quite difficult to obtain. Much of the more modern work appears to address the problems of reading disability (or dyslexia) rather than normal reading ability with normal vision or visual disability.

The paper’s experimental subjects are heavily weighted toward the visually impaired. It additionally uses a few hard-of-hearing subjects and occasional nondisabled people as “controls,” a term that implies they would never have cause to watch captions or subtitles themselves.

The study used 35 visually-impaired, 48 hearing-impaired, and 14 nondisabled subjects. The nondisabled people appear to have come entirely from the cohort of friends of and “signers” for the deaf group, since the study does not tell us how many sighted friends of the blind group were used.

Age

This experiment pretty much deals with Caption Fonts for Grannies and little else.

In all, 35 visually-impaired subjects were used, 9 males and 26 females. The age of the subjects averaged 60.4 years (8 years [to] 95 years).... The [deaf] group consisted of 48 hearing-impaired people, divided equally between the sexes, [where] the average age was 62 years (17 to 94, although this average is skewed by 5 people under 21).

You know you’ve got a problem when five people under 21 are deemed to “skew” your results.

Viewers of all ages watch captions and subtitles. Even visually-impaired people, the true target of this study, watch captions and subtitles at all stages of their lives.

Typographic assumptions

By researcher Gill’s own admission, the initial font-development team was not replete with typographic knowledge, which comes through in the Silver paper.

“There are a number of approaches to displaying material in a different format from the one in which it was prepared. One approach is to add black borders; this does not affect the subtitling typeface”
Adding marks to a typeface affects it by definition. A black-bordered typeface that is typeset in white or the colour of the background is an outline font, which nobody uses for extended text.
“Another problem with transferring material prepared for a fixed-spaced font to a proportional-spaced font will be the position of the text on the screen. However, it should be possible to write a simple computer program to ensure that leading spaces prepared for fixed space (i.e. analogue television) typeface are converted to an equivalent position on a digital television screen. This aspect has not been studied as part of this project, but we do not expect it to have a major effect on the legibility of the subtitles”
The paper argues that the centering effects created by typing space characters to arrange lines of text can be easily recalculated with a proportionally-spaced font. A space character in a font (at least the space character used in captioning and subtitling nearly all the time) will have a known width, but the characters on the rest of the line won’t. It’s a nontrivial calculation task to infer what the caption author meant by the use of spaces (hanging punctuation? right justification? centering?), calculate the true width of the visible characters, and manually position them onscreen to match the monospaced original. Moreover, different screens will have different positions, though the Silver paper ignores the problem by adopting a 14:9 aspect ratio.

Additionally, while the paper goes to some lengths to explain TV interlacing and other technological limitations, the researchers did not understand a basic fact of screen typography – no part of a letter can ever be narrower than a pixel or a subpixel. We are told that “[t]he main design aim was to arrive at characters that could be distinguished from each other as easily as possible. Sloping the vertical strokes on some characters made a difference, although this was kept to a minimum.” It seems the researchers were unaware that the typical minimum width of a stem is a full pixel and that sloped vertical strokes (à la Eras) cannot be rendered in a column of single pixels.

Errors in experimental design

It would be hard to come up with a worse method of testing caption and subtitle fonts.

A standard sentence was printed in three fonts:

All were actually printed in 14 point. To simulate the subtitling situation, the examples were viewed in the clinic through a closed-circuit-television reading machine which enlarged them if needed and reversed the polarity, providing well- contrasted white text on a black background. Patients attending one of the researchers’ low-vision clinic who mentioned difficulty with television were shown the examples in random order and asked to decide which was the easiest, which the most difficult to read, and the reasons for their choice. Many patients attend with an escort and some of these who expressed an interest were used as controls and to expand the numbers.

Here’s what actually happened, translated into understandable English.

  1. To test extended reading of titles that appear and disappear or scroll and disappear, researchers typed up a single sentence on paper and displayed it fixedly on a monitor. (What was the sentence, by the way?)
  2. A reading test in which type size is important allowed participants to zoom the type if they wanted.
  3. The comparison typefaces were, first, a dot-matrix font (variously 7×5 or 9×10 dots; we aren’t told the resolution used), whose appearance would be judged inferior to nearly any typeface shown at higher resolution, and, second, a print font. The researchers admit that print fonts are different from screenfonts (“ink-print, public-display systems, computer screens, etc., present environments with different constraints and these must be addressed appropriately”), but used a print font anyway.
  4. Sighted people were not separately recruited, but were driftnetted into the experiment to pad it out.

The experiment actually gets worse:

A later version of the Tiresias Screenfont typeface was produced with improvements to the kerning after the testing commenced; the test sentences were not revised, to maintain coherence.

In other words, the first version of the font was so bad they had to fix it. Apparently it is this second version that was used with hearing-impaired subjects, making comparisons rather difficult. (It also emphasizes the bias toward visually-impaired subjects; their complaints resulted in immediate design changes, which the paper does not describe.)

Next we move on to the hearing-impaired subjects, who were given a completely different experiment:

A short video was prepared using the new font in four alternative sizes.... A program that had actually been transmitted was used with only the font changed. The subtitles appeared in white on a black strap just above the bottom of the screen. Groups of hearing-impaired people from organisations for deaf and hearing-impaired people were invited to view the video under controlled conditions and complete a short questionnaire. A small group of people with normal hearing plus escorts, signers, etc., acted as controls.

In this session, then:

  1. They’re using a different font.
  2. Font size is explicitly part of the experiment.
  3. Actual captions on actual video are used, though it is not clear whether the “black strap” extends rectangularly around the maximum area covered by all caption lines or merely sits as a background on each individual line. (We later learn it was something like the former.)
  4. The video was viewed “under controlled conditions,” although home viewers are acknowledged to sit too far away from the screen.

The researchers failed to understand how real people watch captions and to control for distracting factors:

It proved extremely difficult to direct people’s attention only to the font. Many of those who claimed to prefer the present font did so on the basis that speakers are distinguished by colour or position on the screen in some types of subtitling. While some subjects were keen to have colour, others complained of problems with certain colours (though the “problem” colours differed), [while] many people commented that the white on black was easier to read than colour. There was a widely-held opinion that the strap was wider than it might be, too far up the screen, and actually covered the mouths of some of the speakers. This remark was made at all levels of hearing impairment.

In short, the captions used on the video didn’t look like real-world captions, which distracted the subject so much they scarcely had time to read the words and consider the font. Even the black mask didn’t look the way it does in real captions.

Results are unreliable

For these reasons alone, Silver et al.’s results are unreliable. The researchers give the game away by admitting as much.

It is freely conceded that this “testing” is far from ideal and could even be described as anecdotal: All the subjects were to some extent self-selected, and they were in no way stratified or subjected to any of the research criteria normally adopted by the writers.... There have been criticisms: The subtitles in the video were held to be “too conspicuous” by one professional observer, another felt that it would be “irritating to read in large blocks.” However, the font has been designed only against the criteria for subtitles, and may be improved in the light of experience and further constructive criticism.

In fact, “the criteria for subtitles” were not used, according to the paper’s own evidence. Testing with blind people didn’t even use captions. Testing with deaf people used titles they did not accept as captions. Testing with nondisabled people was an afterthought.

Using print fonts to test screenfonts

Letters are things and not pictures of things. The authors seem aware that letters on screen are different things from letters on paper:

The structure of the television system requires that fonts for display on television screens may need to have several different characteristics from similar versions intended for conventional use in the form of ink-print on paper.

Yet incredibly, the experiments conducted with low-vision viewers did not even use real captions or subtitles. Printouts viewed through a monitor were used instead. We see this error from time to time, as in a survey of typefaces for digital TV in the U.S., which also used print fonts. This error alone casts considerable doubt on the validity of the results.

Viewing distance and acuity

It’s accepted that people sit too far from their television sets. The Silver paper acknowledges as such, though it does not provide a citation for its claim that “[a] survey in the early 1980s showed that the average domestic viewer actually watched from a distance of more than 7H” (seven times the height of the screen). A distance of 4H to 6H is considered optimal, according to a hard-to-find standard reference paper (CCIR Recommendation 500-3 [1986], “Methods for the subjective assessment of the quality of television pictures,” which we have summarized).

The Tiresias Screenfont project’s goal was to create a font for visually-impaired viewers of captioning. They already see badly and probably sit too far away from the screen. Yet the Silver paper describes as ideal that “a font designed for modern TV use [must] remain acceptable when used with... larger displays” of up to one metre in diagonal measurement viewed at a distance of 3H. The paper disregards the reality of blurred and too-distant reading in favour of a hypothetical future in which viewers have huge sets they watch right up close.

The authors also miss a golden opportunity to use the research cited by Thorn and Thorn documenting that deaf and hard-of-hearing people tend to have incorrect eyeglass prescriptions.

Beauty vs. legibility

John Gill has separately stated that the Tiresias Screenfont project’s goals included designing a legible font, not a beautiful one. It is axiomatic to typographic sophistiqués that the two are independent variables, but may nonetheless coexist in the same font. To rule out æsthetic criteria in design and evaluation of a typeface does not quite miss the point of typography but certainly misses half of it.

The experimenters are not even consistent in their ill regard for assessments of æsthetics. In discussing background and contrast, for example, we are told that “[r]ecent research using CRT displays has shown that white on black is preferred by the largest number of people, with white on dark blue being the second choice.... Quite possibly other preferences are aesthetic except in rare pathological cases.” The only time you get to say you like a font is if there’s something wrong with you.

Other factors contributing to legibility must be considered too. Shaw... found that size is the most important factor, but the density or weight of the print is significant too, and it was far more difficult to discriminate letters that are closely crowded together. Such factors are self-evident to optometrists or ophthalmologists, but fonts are normally designed by graphic artists who are primarily interested in aesthetic criteria.

Fonts are in fact “normally designed” by type designers, many of whom could not produce a usable page layout to save their lives, and they are “primarily interested” in more than making something look pretty.

They continue:

It is obvious that the more different each letter is from those around it the fewer errors will occur. Decorated letters will fuse and create noise in the system. Script is considerably more difficult than standard print.

It is not at all obvious that letters must be as different from each other as possible in order to be read. That would imply that letters are individually discerned (most of the time, they aren’t), and the claim ignores the fact that adjacent letters usually do look different from each other. Most two-letter pairs aren’t of the same letter. Also, “decorated letters” are different from each other.

It is indeed true that spacing is important in legibility, and the paper later correctly observes that “sometimes Helvetica is used with spacing so close that the characters almost touch each other, which reduces legibility.”

Yet having built up this case for æsthetics as a twee, unproductive, and scientifically expendable pastime of “graphic artists” and the non-pathological, the researchers go on to make a case for æsthetics:

Æsthetic qualities, while theoretically not terribly important, will actually have a considerable effect on any individual’s desire to read for long periods.

Apparently it really is important after all to design fonts that perform well and that readers like.

Unfamiliarity with caption and subtitle typography

The informed reader can keep a cheat sheet of topics on which Silver et al. are uninformed or underinformed, and caption and subtitle typography as found in the real world can be added to that list. “[B]old and italic are used for emphasis and other colours are used in subtitling,” the authors claim, falsely. Teletext has no bold or italic. Line 21 has no bold but does have italics (and underlining), but the study did not consider Line 21. Bitmap captioning and subtitling, as found in DVD, can use either variation, and italics can surely be found, but these too were not part of the remit.

[T]ime constraints in the first phase of this project did not permit the consideration of these variations,” the authors note, to which it is interesting to add that Tiresias Screenfont Bold and Italic were nonexistent fonts at the time of the experiment and stil do not exist today. There were no “variations” to “consider” and still aren’t.

Separately, John Gill stated that italics are not used in U.K. “subtitling,” hence they were not designed into Tiresias Screenfont. This decision itself indicates that the project had little to do with understanding, let alone improving, caption and subtitle typography.

Of course Tiresias was better than the existing font. Anything would be

It goes without saying that Tiresias Screenfont fared better with subjects than the comparison typefaces, the original dot-matrix teletext font (called AlphaMosaic) and Times Roman.

For visually-impaired subjects and their sighted friends, who are deemed “controls” in this experiment:

Although the Tiresias Screenfont typeface was the first choice for the majority of the subjects (17/26), eight preferred Times New Roman. The reasons given were interesting: In nearly every case the subjects selecting the Tiresias typeface described it as “clearest” or “easiest to read,” while those selecting Times New Roman commented on its “elegance” and “better spacing.” AlphaMosaic was immediately identified as “worst” by all except one of the subjects, although many hesitated over the choice of “best.” The one subject who preferred AlphaMosaic gave “thickest” as the reason.

There does not appear to be a significant difference between the controls and the patients, although the sample is too small for certainty....

Results for hard-of-hearing subjects and their hearing controls are not reported. The only results given for them are size preferences.

Any font is better than AlphaMosaic

You didn’t need to run a study to tell us that any font would be better than AlphaMosaic, a 7×5- or 9×10-dot pixel font. Of course even Times Roman works better than AlphaMosaic, though no one with expertise would choose a bracketed-serif print font for TV captions. Pretty much any font at all will work better than a typeface that recalls an Apple II computer monitor. The fonts on your first cellphone were better than AlphaMosaic. Needlepoint has a higher resolution than AlphaMosaic. Of course other fonts work better.

We have, moreover, evidence that the phenomenon of “any font works better than a lousy font” holds true for visually-impaired people in other contexts.

Arditi (2004) devised software that allowed low-vision people to construct their own font, with adjustable letterspacing, stroke width, serif size, x-height, and aspect ratio. Interestingly, the demographics of the study were similar to Silver’s, with an average age of 68.6, though the male/female breakdown is not given.

Arditi tested Times New Roman against each person’s customized font in a test of onscreen reading. (Times New Roman was not designed for onscreen reading.) Arditi found that:

The most striking thing about these [results] is their variability. Participants varied widely in the font that produced the most legible text, and this diversity supports a basic premise of this project: That participants have different needs in terms of font characteristics. Of course, variability alone does not indicate useful diversity of adjustments. However, all participants’ reading acuities improved with the adjusted font. These data suggest that it is unlikely that there is one most legible font that will meet the needs of all. [...]

[F]or a broad range of visual acuities, the adjusted font and Times New Roman are of almost identical legibility. The data do not indicate that the adjustable font yields better legibility than Times New Roman, but it is clear that even this first attempt at adjustable typography produced fonts that rival in legibility one whose design has, through a kind of typographic natural selection, evolved through many design generations, and is one of the most popular fonts in existence today.

In other words, you need a suite of fonts customized to specific viewer needs, and nearly anything beats the incumbent if the incumbent wasn’t designed for the purpose being tested.

Additionally, Arditi found that the parameter with the single greatest effect on “acuity enhancement” was spacing. Silver et al.’s paper joins with Arditi’s and several others in verifying the importance of character spacing in legibility.

Conclusions

The research that claims to support Tiresias Screenfont’s superiority over other fonts is suspect. They didn’t test with the right people. The people they did test with were old. They changed the experiment between the blind and the deaf subjects. They didn’t report all their results. They used the wrong comparison fonts.

Question

Still think this font is worth up to 17 grand?

See also

Speaking notes: “Don’t show printouts to grannies and call that a test”
Presented at ATypI Brighton 2007. All about the research flaws of Tiresias. Gives new principles for research into caption/subtitle fonts
A Tiresias clone called Tioga

References

Version history

2005.07.21
Posted.
2006.02.23
Added link to preferred television viewing distance documentation.

You were here: Screenfont.caFontsToday’s fonts
Tiresias → Debunking the research on Tiresias Screenfont