When making a web site accessible, you inadvertently run into the topic of screen readers. Screen readers help people with visual impairments understand what is on the screen.
Few people think about the implications of having visual content translated into an auditory representation. In this article, I want to explain how screen reader users “hear” the web, what strategies they apply when confronted with a new web site and how you can make their life a bit easier.
What is a screen reader?
A screen reader is an assistive technology that reads the content of the screen. It is a normal software application that can be controlled by keyboard. No special equipment is needed.
Usually, screen readers generate speech output, so instead of looking at the screen, you are listening to the screen. They read the whole screen, so not only text, but also announce interactive elements like buttons or menus.
There are several screen readers out there for the different operating systems. On Windows, Jaws and NVDA are the most popular. IOS comes with an integrated screen reader, VoiceOver, which also runs on IPhones. Android phone owners can use TalkBack.
Some users use a refreshable braille display in addition to the speech output. A braille display is a mechanical tool that consists of rows of pins. 6 pins together form a braille character. You can read the display by feeling which pins are activated.
In the shoes of the user
In a previous article, I wrote about the little game we played with our colleagues: they had to navigate our software without a screen just by listening to the screen reader. This was certainly an eye-opener (pun intended) for the sighted users because it made them realize how different it is when you only have your ears to understand the complex content of the screen.
When you go to a new web page, you first need to build up a mental image of the structure and content. Sighted persons can take in the whole structure at a quick glance because they have a bird’s eye view and see everything at the same time. This makes it easy to understand the general structure, and then zoom in on the part you are interested in.
Users of screen readers have a different perspective. When you open a web page with a screen reader, you start at the top of the page. You can have the screen reader read the entire page, or tab through the elements, from top to bottom. This way, you slowly build up an understanding of the structure, piece by piece. This is how people unfamiliar with a screen reader use it.
Of course, this is a cumbersome way to read a web page. Usually, screen reader users do not sit patiently and walk over the screen. They have a couple of different strategies to get an overview over the page quickly and then zoom in on the part they are interested in, just like sighted users.
It’s rather tedious to listen to the screen reader to read things for you because it is slow, usually slower than the normal reading speed. To fix this problem, a lot of screen reader users increase the speed of the speech output.
The need for increasing the speed is also the reason why screen reader voices usually sound rather synthetic. Usually, if you increase the speed of speech, the pitch goes up, making it unintelligible at some point. The synthesis method used for screen reader voices makes it possible to speed up the speech output without increasing the pitch, so you can set the speed quite fast. You cannot do this with the more natural sounding synthetic voices you hear at train stations or in buses.
The normal rate of natural speech ranges between 4-8 syllables per second. Frequent screen reader users speed this up a lot, some even up to 22 syllables per second! At this speed, the speech is completely unintelligible for untrained listeners. In fact, at this speed, screen reader users can read documents faster than sighted users reading with their eyes.
Gaining an overview
Even if you read the page very fast, it is inefficient to just walk over it from top to bottom. The screen reader has some convenience functions built in that provide the user with additional navigation possibilities. Because a screen reader does not just read the text on the page, it presents the page to the user in a structured way. The information about the page structure is taken from the so-called accessibility tree which is provided by the browser. The browser bases this tree on the HTML DOM-tree.
This means that, for example, if you use the <h1> tag, the screen reader will know that this is a heading of level 1. The screen reader also understands the other HTML tags, like paragraphs, tables, lists, links, buttons or other form fields.
With this information, screen reader users can surf the metadata of the page to get an overview. According to a recent survey on screen reader usage, reading the headings of a web page is the most important way to find information: 68,8% of the users jump through the headings in the page to get to the information they are looking for.
Once you are in the right sub-part of the page, you can use the arrow keys to walk over the content and have the screen reader speak it. So, you are completely independent of the navigation via tab key.
Besides headings, the screen reader also provides shortcuts to navigate complex elements like lists or tables. The user can jump from item to item or cell to cell. While doing that, the screen reader also gives some meta information to support orientation. For example, which position the current list item has, or in what row and column you are.
By default, when a user opens a web page, the screen reader has this behaviour that I mentioned above: the content of the page is presented like a structured element. This is called the browse mode or virtual mode. In this mode, the user can jump to the different parts using the character keys. For example, with NVDA, pressing H brings you to the next heading.
But sometimes you need to interact with the page and enter text, for example in a search bar. In this case, the screen reader automatically switches to the so-called focus mode. In this mode, the shortcuts of the browse mode are disabled, and you can interact with the page without interference. The switching of modes is indicated by click sounds or beeps.
The mode switching can be confusing for inexperienced users because the shortcuts change. Also, in some cases, you have to manually switch the mode before you can type into a text field.
When semantics are misleading
The screen reader takes the information of what is a button, menu etc. directly from the dedicated tags and attributes of HTML. If these semantic tags or attributes are used wrongly, it can fail to parse the page correctly. Then screen reader users cannot use the page very well.
In this screenshot, you see the NVDA menu for selecting page elements on the stackoverflow page. It is clearly wrong that the “Hot Network Questions” is a subsection of “python regex parser for gitlab merge requests”. So, in this case, heading levels were used wrongly in the HTML, confusing the screen reader and the user.
But not only wrongly used attributes are a problem: often, the semantic attributes are not even used.
This is often the case with custom controls. For example, a div which is styled to look like a button, cannot be recognized as button by the screen reader unless a special attribute is added (in this case: role=”button”). A lot of websites use custom controls and are missing the appropriate attributes.
Filling out forms
Filling out a form is usually not a problem for screen reader users when the form supports keyboard navigation. What is often a problem for the user is to find out what he or she is filling out.
For the sighted user, it is obvious that when “Name” is written next to an input field, the name should be written. The screen reader user cannot make this connection because they do not see the visual proximity. The only information they get when jumping to the input field is that it is an input field.
On this page from WebAim, you can experience the difference.
With the NVDA screen reader, you can make the speech output visible. I made a screen shot of that. While there is no obvious visual difference, for the screen reader user the fields in the first form “labelled form” have useful labels, while the others just say “input”.
So, the label “First Name” needs to be linked to the input on the semantic level for the screen reader to make a connection. In HTML, the tag <label> specifies an element as a label for a form field. With the attribute “for” you can then link the label to the form field by using the element id:
<label for="textfield2">First Name:</label> <input type="text" name="textfield2" id="textfield2">
Screen reader users are not necessarily blind. In fact, most “blind” users are not completely blind, but have some vision left. Still, a lot of screen reader users cannot see enough to see what is in an image, especially if it is small. But often images on a page provide important information. Maybe there is a button which consists only of an image, or there is an image in a text that gives contextual information.
In HTML, the alt-attribute is used to add invisible textual information to an image. The screen reader associates the value of the attribute to the image, so the screen reader user gets an idea of what is in the image.
It is not always easy to decide what goes into the alt-attribute. Sometimes, the description to give would be very extensive. In other cases, the image is just there to make the page look nicer, so it does not need to be described. To make this a bit easier, the Web Accessibility Initiative even created a decision tree for the alt-attribute.
If you have read this far, you should now have an idea how screen reader users surf the web. If you are a web developer, and you want to make your page more screen reader-friendly, these are the main take-aways:
- Make sure the page has correct semantics
- Label your forms
- Tag your images