Broaden your audience: accessibility features in video streaming

Boy van Dijk Streaming Solutions Engineer

June 1, 2018

As the video streaming market has matured over the past years, quality of experience has improved a lot: consumers can choose from a wider variety of content offered by a wider variety of services, and almost all devices on the market can handle playback of high definition content.

However, one part of audiences is at risk of being left behind: for those with a visual or hearing impairment, watching videos is not necessarily a great experience. To allow people with such impairments to consume your video content in a meaningful way, you can add accessibility features to your service. By doing so, you can increase the ‘availability’ of your service in a broad sense, by not only delivering it to all devices, but for all people as well.

What is accessibility?

So, to start: what are accessibility features exactly? They come in different forms and most are aimed at either a hearing or visual impairment. When it comes to video, the most familiar ones are:

Captions for the deaf and hard of hearing (also known as SDH, or subtitles for the deaf and hard of hearing)
Audio description of visual content for those with low or no vision (also known as DVS, or descriptive video service)

Still from Stanley Donen’s ‘Charade’ (1963), taken from Wikipedia

Captions

Captions are similar to subtitles, but rather than only transcribing spoken dialog, they also describe the non-spoken elements of the audio track, such as birds whistling or music playing. Compared to audio description, they are more widely available, as they have been part of television services long before internet even existed, making them part of a well-known workflow.

Audio description

In between moments of spoken dialog, a track meant for audio description purposes adds spoken explanations of the most relevant visual information (e.g., somebody walking in a room). This accessibility feature isn’t as widely available as captions, but it is gaining traction.

There are two ways of delivering this feature: using a separate audio track that contains only the spoken explanations, which is then mixed with the normal audio on the client side (‘receiver mix’), or, in a more straightforward way, by offering an additional track that contains a ready mix of the audio description and the normal audio (‘broadcast mix’).

Delivering video with ‘receiver mix’ audio descriptions adds more flexibility for the consumer. For example, when watching in the company of others, a person with a visual impairment could play the audio track containing the descriptions on headphones, while the normal audio track is playing through regular speakers. The problem is that this kind of delivery makes additional demands of client-side playback capabilities.

A broader approach

Improving accessibility is not limited to the features mentioned above. In fact, it is about everything you should take into consideration to improve the accessibility of your content for people with visual, hearing or other impairments:

Avoid high frequency flashes in the video so that they can’t trigger epileptic seizures, which you can check for using tools such as PEAT (e.g., in 1997, 600 children were hospitalized after the première of a Pokémon episode)
Use more than one color to communicate and work with high contrast colors for those who are color blind
Avoid patterned backgrounds (for those with visual impairment)
Offer clean audio for hard of hearing (i.e., a track with suppressed background noises and emphasized speech)
In addition to or in place of captions, add a picture-in-picture presentation of a sign language translation of the spoken dialog and relevant non-spoken audio elements
Create full transcripts
Dub the audio or add subtitles for those who are not fluent in the main language

The importance of accessibility

Offering an inclusive video streaming service by adding features like captions or an audio description track is more than a sympathetic gesture towards the (potential) customers that depend on these features. In many regions, it is or will soon be required by law (one overview here and another overview here, with more in-depth information on the impact of the Americans with Disabilities Act (ADA) on online video in this .pdf).

For those who worry about the financial impact: adding accessibility features might well be a sensible business strategy. As it stands, Netflix is not only the biggest provider of streaming video in the world, but also at the forefront of offering accessibility features to its customers. And, to add to that another major streaming service that is making big strides in the accessibility department: Amazon’s.

Increased flexibility

Moreover, accessibility features should be understood in a broader context, as they not only make content accessible to those with a visual or hearing impairment, but can also increase the flexibility of content for regular users.

Look at it like this: first, on-demand video brought the possibility to consume video content whenever, then smartphones and 4G mobile data plans added the option to watch it wherever, and now, by adding accessibility features, video content can be enjoyed in whichever form is fitting for a given time and place; with or without sound, with or without images.

In other words: by using accessibility features to increase your content’s flexibility, you might also increase the time that your customers use your service.

Take the latest episode of your favorite series as an example. You want to watch it during your commute in public transport, but forgot to bring headphones? No problem, switch on captions. Or you desperately need to catch up on some interesting documentary that everyone is talking about, but need some time away from screens as well? Just enable the audio description track, go for a walk and listen to the documentary. Sure, you will not get the full experience, but in some cases that is better than none.

Silence is golden

In case you still consider audio and video to be inseparable for video content, this statistic might help: in 2016, 85% of video on Facebook was watched without sound. This is quite an astonishing number, but it should be noted that sound was disabled by default at the time.

However, when Facebook enabled sound by default a year later, critics were vocal and many news outlets wrote about how users could disable it themselves. So, while the statistics have probably shifted, it seems fair to assume that a considerable part of the videos on the biggest social network in the world are still watched in silence.

Last but not least, certain accessibility features can aid in optimizing content for search engines, as these are more optimized for text than they are for video and sound. Adding a track with captions to your video, or perhaps even a full transcript, can therefore make a difference in how easily (and often) your content is found.

Successful delivery

So, what role can Unified Origin play in all of this? As with most features in the world of video streaming, it is not just about producing the content and adding it to your stream. Like other features, accessibility features need to be properly signaled, delivered, and, last but not least, supported by the player.

Whether or not the latter is the case of course depends on your player of choice, which, by the way, can also be more or less accessible. The part that Unified Origin can take care of for you is the signaling and delivery of accessibility features, for which support was added in 2017.

Using version 1.8.0 of the Unified Streaming Platform or above, adding signaling for accessibility features can be done by using the –track_role and –track_kind options. The first you can use while creating the server manifest, the latter only when packaging the content (the step that comes before creating the server manifest).

An example workflow is shown below:

#!/bin/bash

# Packaging regular audio and video tracks to fMP4 (CMAF or ISMV)
mp4split -o video-250k.cmfv video-250k.mp4
mp4split -o video-750k.cmfv video-750k.mp4
mp4split -o video-1500k.cmfv video-1500k.mp4
mp4split -o audio-128k.cmfa audio-128k.mp4

# Packaging captions and audio description tracks to fMP4 (CMAF or ISMV)
# Including signaling for audio description:
mp4split -o audio-description-128k.cmfa \
 audio-description-128k.mp4 \
 --track_kind="about:html-kind@main-desc"
mp4split -o captions-en.cmft
 captions-en.ttml

# Creating a server manifest, includes signaling for captions
# The '--track_description'-option ensures unique name 
# for audio description track:
mp4split -o presentation.ism
 --hls.client_manifest_version=4 \
 video-250k.cmfv \
 video-750k.cmfv \
 video-1500k.cmfv \
 audio-128k.cmfa \
 audio-description-128k.cmfa \
 --track_description="English (describes video)" \
 captions-en.cmft --track_role=caption

Signaling accessibility

When you would package your content and create a server manifest as explained above, the additional signaling for the tracks with accessibility features in the client manifests for DASH and HLS is shown below. For HLS, this is completely compliant with Apple’s General Authoring Requirements.

For DASH, the guidelines on signaling accessibility are less clear (or even confusing). The audio description signaling as shown below follows the DVB-DASH specification and the captions signaling is based on the DASH Role scheme as specified in the official DASH specification.

All of the signaling has been confirmed working (using Apple’s native video player on iOS, tvOS and macOS, and using dash.js for DASH).

Audio description in .m3u8 (HLS):

CHARACTERISTICS="public.accessibility.describes-video",AUTOSELECT=YES

Audio description in .mpd (DASH):

<Accessibility
 schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007"
 value="1">
</Accessibility>
<Role
 schemeIdUri="urn:mpeg:dash:role:2011"
 value="alternate">
</Role>

Captions in .m3u8 (HLS):

CHARACTERISTICS="public.accessibility.describes-music-and-sound",AUTOSELECT=YES

Captions in .mpd (DASH):

<Role
 schemeIdUri="urn:mpeg:dash:role:2011"
 value="caption"
 schemeIdUri="urn:mpeg:dash:role:2011"
 value="subtitle">
</Role>

Working examples

On our features-page you can find working examples that include the correct signaling for captions and audio description tracks. You can test playback yourself or download the client manifests to inspect the signaling (which is the same as shown above). However, note that these examples only demonstrate the signaling and the related player behavior, but not proper captioning or actual audio descriptions as these accessibility features are not available for Tears of Steel, which is the demo content that we use:

9 min read