How to Achieve Low Latency With What You’ve Got

July 2, 2020

Channel zapping, sub-second, glass to glass, parity. What are we talking about? The confusing world of low latency. With multiple definitions, standards and solutions low latency can seem like yet another technological rabbit hole. But it doesn’t have to be. After this blog you’ll know your hand-wave latency from your ‘chunked’ CMAF, we’ll bring you up to date on the award-winning low latency algorithm we’ve been working and most importantly you’ll know how to get the lowest latency possible with what you already have right now.

What is latency?

Let’s start at the beginning, what is latency? Latency is the time between the same event either side of a process, that is: I send you a file, you receive my file, and the time difference between the two is the latency. In a modern streaming workflow this is complicated by the sheer amount of separate parts, each with a subset of processes. So it is important to define what you are focusing on. On the most general level, two types of latency are relevant:

Glass to glass latency

The whole process of delivering a stream, from glass lens of the camera to glass screen of the viewer. Also known as ‘end-to-end’ or ‘hand-wave’ latency, I wave at the camera, how long until you see me wave? That’s your overall latency.

Start-up delay

The delay from pressing play/instigating playback to video actually beginning (which does not necessarily impact glass to glass latency).

Glass to glass latency is great as a metric for comparison or as a goal, but when trying to reduce that number you’ll need to work on the individual contributory processes, the most important of which we’ll highlight in this blog. Also, it’s important to note that an increased start-up delay does not necessarily increase glass to glass latency, as some players may delay starting playback intentionally, in order to get as close as possible to the live edge.

So what is Low latency?

Ok, that’s latency but what is low latency? Latency for large scale live streaming is generally considered ‘low’ around 5 seconds end-to-end or lower. That’s slightly less than the latency of broadcast delivery, which makes it perfect for delivery of live sports and lower than necessary for most other use cases, especially considering the costs involved. However, it still isn’t low enough for things like teleconferencing, live auctions and online gambling.

It’s important to consider that this blog focuses on OTT delivery of video (that is, streaming over HTTP). Protocols such as WebRTC, SRT and RTMP offer the potential for (much) lower latency, but each has specific drawbacks. SRT is stymied by lack of configurability and player support, RTMP by lack of compatibility with iOS and while WebRTC is perfect for the limited audience of a live auction its inability to scale past a thousand or so clients makes it useless for large scale live events. It’s also worth mentioning that implementing DRM is far more difficult for all of the above compared to HTTP streaming (with a range of proven technologies, like PlayReady, Widevine and FairPlay).

Bottom-line: while low latency may be more of a challenge using HTTP streaming, its configurability, scalability and reach make it a standout winner amongst the standards currently on offer.

Overview of different categories of latency for streaming video

Let’s take a closer look at current latency figures. For HTTP Streaming that figure can be anywhere up to 60 seconds. The good news is that this gives plenty of room for improvement. Realistically for OTT delivery, end-to-end latency of just a few seconds is possible when your delivery pipeline is fully optimized.

So how does one achieve low latency? A lot has been written about this, both for DASH and HLS. The problem is that the approaches that are discussed most often are the most costly and complicated: chunked CMAF delivery with DASH and Apple’s own Low-Latency HLS.

Although it may be worth exploring these solutions for use cases where every second counts, they are not a hard requirement for drastically reducing latency. That’s why we’ll discuss chunked CMAF and Apple Low-Latency HLS delivery only briefly here, to spend the remainder of this blog on practical improvements that offer a higher return on investment.

Chunked CMAF

Irrespective of low latency, CMAF has its advantages, being referenceable in both HLS and DASH manifests. When it comes to low latency, adding extra ‘moof’ and ‘mdat’ boxes allows a CMAF segment to be split into smaller ‘chunks’. This means a CMAF segment containing only one IDR frame at the beginning can be transmitted in smaller chunks while the rest of the GOP is still being encoded. The benefit of chunked CMAF is that it allows you to deliver streams over HTTP with the lowest possible latency (if the rest of your setup is already fully optimized). The downside is that your encoder, origin, CDN and player must support it in order to make it work.

In addition, it complicates bitrate adaptation algorithms that rely on segment download times, amongst other things, to maximize the requested bitrate while provisioning against bitrate fluctuations and stalling events. Given the very small data sizes of CMAF chunks it’s very difficult to determine the currently available network throughput, which in turn makes it hard to properly adapt the requested bitrate. Overall, this will have a potentially detrimental effect on user experience.

LL-HLS

Apple’s low latency offering first came in 2019 and made use of HTTP/2 Push, something not supported by most CDNs. After the CDNs largely failed to adopt this approach Apple announced updates to the specification in early 2020, for which it will roll out support with its upcoming macOS, iOS, iPadOS and tvOS software releases this autumn. The new specification includes the use of tags such as EXT‑X PRELOAD-HINT which allow players to make requests that allow for delivery of the next part of a segment as it becomes available (where each individual part may be a CMAF chunk, see above). The large changes to LL-HLS in 2020 left anyone attempting to implement the earlier version needing to completely redesign their solution, putting back widespread implementation. And, like chunked CMAF, it works best when the encoder, origin, CDN and player all support it, making it a significant effort to implement.

How low is low enough?

Considering the complicated nature of the solutions that we discussed briefly above, the question is how low your latency actually needs to be. Is the lowest latency possible between that ball being thrown on the field and your customer seeing it, truly your end goal? Or is it the difference in latency that your customers experience between devices that’s bothering you? Once you accept that a system will always add latency — intentionally even, in the case of ‘broadcast delay’ — maybe you just want to make sure that everybody is seeing the same thing at the same time:

Broadcast parity

Where irrespective of device the same thing is happening at the same time. When Shane Long scored the quickest ever Premier League goal in 2019, in 7.69 seconds, viewers relying on OTT delivery were still waiting for kick-off as TV viewers finished their celebrations.

For broadcast parity once you have reached your target latency you need a way to keep it there. To achieve this a player may increase or decrease the speed of playback slightly, for example. (One method that allows you to define this is DVB-DASH Low Latency mode, more on which later on.)

Of course you need to get your HTTP streaming latency in the same ballpark as your broadcast latency before you can start thinking about parity. So now that we know what latency is all about, let’s look at some practical ways to pare it down.

Where is latency added & how can I reduce it?

If we’re being pedantic, every, single, step, of, the, way. As soon as light hits the lens every process and transfer the signal traverses will add latency:

Transmission

Where does your live stream encoder sit? Because if it’s at the end of your broadcast process your OTT delivery will most likely have more delay than your broadcast delivery by design.

Start closer to the source: Get as close as possible. That’s theoretically easy, practically it’s expensive at best and impossible at worst, if you take your feeds from outside. And that is why this blog doesn’t focus on this, or on other parts of the workflow that most likely aren’t in your control to change.

Encoding

The gains you accomplish here come from your settings, they will make the encoding process quicker and give you plenty of options when it comes to setting up your player.

Smaller GOPs: The smaller your GOP size the quicker it is complete and can be passed to your publishing point. Bear in mind the larger the GOP the more efficient the compression, there will be a trade off in ’quality of experience’ for the end user as GOP size reduces. Currently most setups use GOP sizes of around 2 – 4 seconds (e.g., for 25fps video with 48kHz audio we would suggest 1.92 seconds).

Aligned streams: Your GOP size should still allow for alignment between your video and audio tracks. That is, your GOP should ideally fit an exact multiple of audio frames. Our online documentation has an excellent explanation on how to best align your streams.

Origin

The Origin itself only takes <100ms to process your stream, but how you set up your publishing point can have a dramatic effect on your latency.

Segment size: Segments are made up of x amount of fragments sent by your encoder (where each fragment usually contains one GOP). The smaller your segments the quicker it can be passed on to the player. Segment sizes have already shrunk considerably, with Apple reducing their recommended segment length from 10 to 6 seconds a few years back (which, as you will see shortly, can reduce latency by up to 12 seconds). In fact according to a 2019 report segment size reduction is how over 40% of streaming companies target latency reduction. However, be aware that reducing segment size will increase the amount of overhead on your system as it creates a larger number of segments overall.

DRM signaling behavior: If you’re using DRM it will add a slight delay on playback because a license needs to be retrieved. By including DRM information in the DASH MPD or HLS Master Playlist, using our –mpd.inline_drm option for DASH or configuring the signaling behavior using CPIX, the player can begin license requests immediately rather than waiting for the first segment of video. Just make sure your player is set up to read and use the DRM information from the manifest.

Player

This is where all your hard work comes to fruition and begins to make sense.

Smaller buffers: It’s not unusual for players to buffer 3 segments before playing, in fact this is the default for Apple players. If your segment size is 6 seconds, that means at the very least you will be 18 seconds from the live edge, and that isn’t taking into account the latency added so far by your workflow. By using smaller segments even if you retain that 3 segment buffer it will be significantly lower. You can reduce the amount of buffer as well to one or two segments but remember the buffer is there to improve quality of experience. Making the buffer shorter, reduces the response time of the player increasing the chance of stalling, which can be very disruptive for the viewer.

If you have followed the above advice you should now be seeing much lower latency figures, but testing is key here. Any reduction in latency should not adversely affect the overall experience of the viewer.

To further improve latency they are two additional strategies to discuss: variable playback rate and latency aware bitrate adaptation. Unified Streaming has done work on both: support for the first has been added as part of our DVB-DASH Low Latency mode implementation, and in the realm of the latter we have developed the Twitch award winning Lear2Adapt Low Latency algorithm. To conclude this blog, we’ll discuss the advantages of both.

DVB-DASH Low Latency and achieving parity

As discussed earlier broadcast parity is more important than low latency for traditional broadcasters, who are adding OTT as a supplementary service. DVB-DASH Low Latency mode adds functionality to aid normalising different broadcast methods.

Once you have worked out what the latency of your broadcast TV signal is, Low Latency mode for DVB-DASH allows you to add a ‘ServiceDescription’ element to the MPD to signal the latency target and how much a player is allowed to vary its playback rate in order to hit that target.

<ServiceDescription  id="1">
  <Scope schemeIdUri="urn:dvb:dash:lowlatency:scope:2019" />
  <Latency
    target="3000"
    max="6000"
    min="1500" />
  <PlaybackRate
    max="1.5"
    min="0.5" />
</ServiceDescription>

You can set an overall target but also a minimum and maximum figure as seen above. When setting the playback rate boundaries, you should take into account that setting them too aggressively may impact the user experience and may lead to stalling.

One of the clear benefits of this approach is that it is backwards compatible with (DVB-)DASH clients that do not support the additional functionality, as there is nothing ‘special’ about the media segments and clients can simply ignore the additional elements in the MPD.

If you are interested in adding support for DVB-DASH Low Latency mode to your stream, please contact us so that we can share how to set it up.

Our Award winning Low-Latency algorithm

As previously mentioned when you split segments into ever smaller chunks, current solutions for throughput estimation become inaccurate. According to Twitch one of the main obstacles in low latency bitrate adaptation is throughput estimation. One of our research engineers, Theo Karagkioules has designed an algorithm called Learn2Adapt-LowLatency (L2A-LL), which uses machine learning techniques to provide an adaptation strategy that achieves latency of less than 2 seconds. His algorithm won the Twitch ‘near-second latency’ Grand Challenge at MMSys2020. It has also been implemented in a fork of dash.js player for those who wish to put it to the test, and work is currently underway to add it to the dash.js’s general release.

Conclusion

As we’ve seen there are many low latency standards that are still work in progress but by now you should have a good grasp of what low latency is and what you can do right now to reduce it. You should also know what you can’t do and whether you really do want the lowest possible latency or just broadcast parity, and if that’s the case you should have a pretty good idea how to accomplish that too. If you take anything away from this blog, remember these 3 things:

Shorter GOPs from your encoder
Smaller segments from the origin
Less buffers for the player.

Low latency is one of many challenges in a cutting-edge streaming workflow, here at Unified Streaming we are always working at that edge to provide you with the best solution. Feel free to contact us about low latency or any other streaming related issues we can help you with.

13 min read

How to achieve low latency with what you’ve got

What is latency?

Glass to glass latency

Start-up delay

So what is Low latency?

Chunked CMAF

LL-HLS

How low is low enough?

Broadcast parity

Where is latency added & how can I reduce it?

Transmission

Encoding

Origin

Player

DVB-DASH Low Latency and achieving parity

Our Award winning Low-Latency algorithm

Conclusion

13 min read

How to achieve low latency with what you’ve got

What is latency?

Glass to glass latency

Start-up delay

So what is Low latency?

Chunked CMAF

LL-HLS

How low is low enough?

Broadcast parity

Where is latency added & how can I reduce it?

Transmission

Encoding

Origin

Player

DVB-DASH Low Latency and achieving parity

Our Award winning Low-Latency algorithm

Conclusion

Share

Related to this blog

Explore our live streaming solution

Learn more

Unified Origin

Stream any format

Learn more

Improving Unified Origin’s performance with cloud storage using Apache subrequests

Blog · 01/02/2021