Language and accessibility on the internet.

Posted by Floris Claassens on February 17, 2022

A gathering of research, bad opinions and some extra curiosities

Screen readers are a great piece of software that help people with a visual impairment navigate the digital world. Though screen readers can interpret a lot of web content without any help, from time to time they do need some cues from developers on how to interpret the content. One of those cues is the language of the page. Many people will consume multilingual content at some point in their life. Screen readers need a way to find out which language to use for their pronunciation.

On the internet screen readers use the “lang” attribute (short for language) which can be set on the html element of the page. As the value of this attribute developers can use a (usually) two to four letter code to designate the main language of the page. The first two letters depict the language, and the last two letters depict the dialect. For example, fr-fr is the code for metropolitan French while fr-ca is the code for Canadian French. If you are familiar with SEO (search engine optimization) you might also have heard about the “hreflang” attribute, which uses the same language codes. Though they might seem very similar, they serve a very different purpose.

Lang vs hreflang

The hreflang attribute

The hreflang is an attribute you can set on links like the <link>  or <a>  element in html. It will tell other software what is the main language of the page the link refers to.

This is primarily used for search engine optimization. In fact, google introduced the hreflang attribute just for this purpose. When using google you might have noticed that the localized version of a page appears above the default version in your search results. Which Google relies mostly on the hreflang attribute to determine which version of a page to use in your search results. As John Mueller, a Webmaster Trend Analyst at Google, said about the lang attribute during a video hangout:

“We don’t use that (the lang attribute, red.) at all. So we use the hreflang links if you have that, if you have different language versions. But the language attribute within the HTML markup is something we don’t use at all. We’ve found that this language markup is something that is almost always wrong. So we tend to ignore that.”

The lang attribute

Though there are some search engines, like Bing, that still rely on the lang attribute, these remarks have led some people to claim you do not need to bother with the lang attribute anymore. This, as I will explain below, is terrible advice.

The lang attribute can be set on the <html> element and tells other software the main language of the page. As mentioned some search engines use the lang attribute and it is critical for screen readers. If you want to experience what happens when you do not set the lang attribute, start up your favourite screen reader, go to the German version of the TOPdesk website and try to navigate the site.

At the writing of this article you will find the German words will be pronounced using English pronunciation. Those proficient in using web developer tools can find in the <head> element that while the site makes use of the hreflang attribute, this does little to remedy the user experience for people reliant on screen readers. The maintainers of the site have been notified of this problem, so hopefully this will remedied very soon. Unfortunately you can find plenty of other examples if you pay attention to the lang attribute. As John Mueller mentioned: “We’ve found that this language markup is something that is almost always wrong.”

Besides helping screen readers determine the language of the page the lang attribute is also used for a variety of other things, such as spelling checkers, CSS-hyphens and translation software.

Dynamically setting the lang attribute

So with the use and importance of the lang attribute established, the question becomes: How to deal with it? While single language pages simply can set the lang attribute on the <html> element of their page and be done, international corporations and organisations are not so lucky. TOPdesk itself currently supports 13 languages. Having 13 copies of the same html page would make the codebase messy and hard to maintain. There are all kinds of libraries, like i18n (internationalisation), that will help with presenting your page in the language of the users choosing, without requiring multiple copies of your html code. In a similar way you can use some simple javascript document.documentElement.setAttribute(‘lang’, ‘fr-fr’); to dynamically set the language of your page.

Though this will change the lang attribute of your html, the effectiveness of this code is heavily reliant on when you use it. To understand what can go wrong it is useful to know how screen readers determines the language of the page. When you surf to a website the browser requests the page you want to view from the server. The server then prepares a page and sends it to the browser. From this point the browser will run the code the server has sent. Once a screen reader has read an html element it will determine the language of that element based on the lang attribute when read, regardless of later changes as performed by javascript code.

Simplification of how a screenreader gets the code from a web page.
Image: Simplification of how a screenreader gets web page code.

Some sample vue code

Luckily this leaves us a small window to actually set the lang attribute. Either before the server sends the page to the browser or in the time between an html element being served and the screen reader actually reading it. You can exploit this window by using a framework that renders server-side like next.js or nuxt.js. Another option is to set the lang attribute browser-side before the screenreader can read the html code. From personal testing I’ve found a gap of approximately 100 miliseconds between adding an html element to the DOM (Domain Object Model) and the screen reader reading the element in which you can still change the lang attribute. The code sample below will show how to achieve this for Vue 3. With some modifications I suspect it will translate well to other frameworks.

Vue 3 is a language for creating a dynamic website. It uses Javascript code to make changes to the DOM in the browser without having to request a new page from the server for every change. Since these changes all happen browser-side we will have to make use of the small period between the DOM being updated and the screen reader reading the new elements in the DOM. In our case we dynamically set the locale for i18n and then use the watchEffect event listener in the main javascript file to change the lang attribute to the locale or the fallback locale i18n will use.

watchEffect(() => {
	const { availableLocales, locale, fallbackLocale } = i18n.global;
	const usedLocale = availableLocales.includes(locale)
			? locale
			: fallbackLocale as string;
        // Set the lang attribute on the <html> element
	document.documentElement.setAttribute("lang", usedLocale);
});

Setting the locale

How you dynamically set your locale in i18n is up to you. You can either use the browser language by setting locale = navigator.language, add the language code to your url path, or retrieve it in some other way. The code sample below shows how to set up i18n to use a language code from your url path. The code assumes you add the language code immediately after the root url (e.g. https://www.topdesk.com/nl or https://www.mozilla.org/en-US/.) It will fall back to the browser language if the language is not available.

// Select the first part of the url path after the root url
const pathLanguage = window.location.href.split("/")[3];
// Set locale to the browser language
let locale = navigator.language;

// If the String from the url path resembles a country code, overwrite locale.
if (pathLanguage !== null && 
		pathLanguage.match(/^[A-z]{2}(-[A-z]{2})?$/g)) {
	locale = pathLanguage;
}

export const i18n = createI18n({
	locale: locale,
	fallbackLocale: "en",
	messages: {
		en,
		fr
	},
	silentTranslationWarn: true
});

Using the url path is an easy way to enable your users to select their preferred language for your site. When a user selects a different language simply redirect them to the same url, with a different language code in the path. Make sure that whatever method you use for redirection actually reloads the page. Otherwise the screen reader will already have settled on the language it originally found.

Multilanguage pages and user generated content

This leaves us with one more complication: Webpages using multiple languages. Your webpage might contain a quote, or other content in a different language than the main language of a page. In this case setting the lang attribute on the main html element will not be good enough. For those situations you can set the lang attribute on other html elements like the <div>, <span> and <p> elements as well. This way you can indicate to screen readers that all content in a specific element is in a different language. As described above, you can even add new elements to the DOM with a different language than the main page. As long as you set the lang attribute on the new element within 100 miliseconds screen readers will still correctly read the language you have set.

User generated content

While this provides a good solution for content you control, content you do not control adds one final layer of complexity to designating the correct language on your webpage. It can be hard to predict the language of content generated by users. Though you can use techniques such as letting user generated content inherit the language of the html element in which the content was added, or allowing the user to designate the language of the content, you cannot guarantee all content will receive the correct language designation.

The Web Content Accessibility Guidelines acknowledges this problem, providing ways to claim conformance with the guidelines for the parts of the website that do not contain content from uncontrolled sources. When you really are unsure about the language of your content you can also use lang="" to indicate the language is undetermined. Be aware though that the end result is up to the screen reader. From my own testing I found NVDA will use the language of the NVDA voice settings. Meanwhile JAWS will fall back to the closest html element with a determined language. So in general it is better to attempt to determine the language if you can.

The future?

Advancing progress in language recognition software is making it possible to also take this final hurdle. booking.com is a great example of a website that is using language recognition software. It was developed to offer translations for customer reviews, but is also used to set the lang attribute on individual reviews. Though currently it still can be expensive to use this technology, once free versions become available there will be ever more opportunity to make the internet accessible to all.

About the author: Floris Claassens

Floris is a Software Developer and a member of the Accessibility guild at TOPdesk. Interested in accessibility, elegant code, and mathematics.

More Posts