This blog post presents an online (machine) learning bitrate adaptation solution, that focuses on low-latency applications.

What makes streaming adaptive?

To keep pace with the explosion of video traffic over the past decade, significant progress has been made in the development and design of adaptive video streaming solutions and standards. Most notably the MPEG Dynamic Adaptive Streaming over HTTP (DASH) ISO standard, along with IETF’s HTTP Live Streaming (HLS) specification, has been adopted as the most widespread video delivery methods, allowing Content Delivery Networks (CDN) to leverage the full capacity of existing Hypertext Transfer Protocol (HTTP) infrastructure; instead of relying upon networks of dedicated streaming servers. Such adaptive streaming solutions employ bitrate adaptation algorithms, that seamlessly adjust (or adapt) the rate of the media stream, to compensate for changing network conditions.

Chunked CMAF: the way forward

Adaptive streaming over HTTP has introduced settings, such as partitioning the video sequence in smaller fragments of constant duration at the server and pre-buffering at the client, that have a direct impact on the Quality of Experience (QoE) and achievable Quality of Service (QoS). In the context of streaming, as previously discussed in detail in our earlier blog, the timeliness of video delivery is measured by latency; the time difference between the moment video content is generated and the moment it is rendered on the client’s screen. Until recently latency in the range of > 30 s has been the norm, associating Over-The-Top (OTT) delivery with latency higher than broadcast (~4–5 s).

A first approach to tackle this discrepancy and to reduce end-to-end latency, to at least match broadcast standards, came in the form of shortening the segment duration. However, such an approach can diminish visual quality, while small segment durations may often lead to unstable delivery infrastructure in CDNs.

Another approach that has now been adopted as the main discipline for low-latency streaming is a combination of a new container format and the chunked transfer property supported natively in HTTP 1.1 and beyond. Common Media Application Format (CMAF) is simply a standardized container that can hold video, audio, or text data. The efficiency of CMAF, which is deployed using either HLS or DASH, is driven by the fact that CMAF-wrapped media segments can be simultaneously referenced by HLS playlists and DASH manifests. This enables content owners to package and store one set of files, halving storage costs. Media are organized according to CMAF in multiple quality presentations, i.e. multiple resolutions to meet the diverse display capabilities of different types of user devices and multiple encoding rates to facilitate adaptation to changing network characteristics. Further, each quality presentation is partitioned into multiple fragments (or chunks) to increase the granularity of the sequential bitrate decisions and to reduce latency. When coupled with chunked transfer and the newly introduced CMAF ingest protocol, all the means necessary to significantly reduce live streaming latency are in place.

Why is low latency adaptive streaming difficult?

Nonetheless, although chunked CMAF transfer has set the stage for latency reduction, optimal bitrate adaptation over fluctuating channels remains an elusive task. Moreover, low-latency constraints pose an additional strain on bitrate adaptation algorithms. The application-level queue at the video client used for storing downloaded parts of the video, also known as the buffer, is deployed to protect the client from abrupt changes in the communication channel (throughput, jitter, etc.). By deduction, a reduction in buffer service-time equals a reduction in latency. Therefore, in low-latency applications, the buffer length is typically constrained by an upper limit (buffer target value), in the order of the aimed latency. A shorter buffer though offers less protection against channel state estimation errors, propagated to the bitrate decisions, which in turn can have a detrimental effect on the streaming experience. Thus, one of the main challenges currently faced by the multimedia industry concerns accurate throughput estimation to support bitrate adaptation, particularly in the context of low-latency streaming.

Our solution: Learn2Adapt


A low-latency bitrate adaptation algorithm is, in essence, an optimization solution with the objective of constraining latency, while at the same time maximizing achievable video bitrate and ensuring uninterrupted and stable streaming. Before diving into the inner workings of our solution, called Learn2Adapt-LowLatency (L2A-LL), let’s first have a brief look at pre-existing solutions and their pitfalls.

Primarily, heuristic approaches have been proposed for bitrate adaptation, which are mainly classified into two main categories according to the input dynamic considered for adaptation. First, throughput-based methods estimate the available channel rate to decide on the bitrate of the streamed video. These methods are as accurate as their throughput estimation module. Second, buffer-based methods use application-level signals, such as the instantaneous buffer level to perform the adaptation. Such methods become highly unstable when used in the very small buffer regime of low-latency.

Recent adaptation algorithms, in an effort to overcome these limitations, resort to learning techniques such as reinforcement learning or dynamic programming to attain optimal quality adaptation. However, their practical implementation for low-latency may be hindered by the complexity of exploring the complete optimization space or by channel model requirements.

The scope of L2A-LL is to offer a novel perspective on the problem, from the point of view of online optimization, mitigating the requirement for throughput estimation altogether.

How does it work?

Figure 1: Schematic representation of bitrate adaptation via L2A-LL

L2A-LL formulates the bitrate adaptation optimization problem under an online (machine) learning framework.

First, as shown in Figure 1, we assume that the video client consists of the buffer queue that stores video data, the display that rendered the media content, and the adaptive streaming logic, that in the case of L2A-LL is modeled by a learning agent, whose objective is to minimize the average buffer displacement during the streaming session. The video client is connected to a repository that contains the video sequence in multiple representations (i.e. multiple resolutions and encoding bitrates to facilitate adaptation). Second, certain requirements regarding the decision set (available bitrates) and constraint functions are fulfilled by a) allowing the agent to make decisions on the video quality of each chunk, according to a probability distribution and by b) deriving an appropriate constraint function associated with the upper bound of the buffer queue, that adheres to time averaging constraints. Third, the channel rate evolution is modeled via an adversarial setting, that reveals the cost (i.e throughput during respective downloads) of each decision only after it has been taken. It is worth mentioning here that the sequential bitrate decisions are based on historical values in the form of feedback, without requiring any complex throughput estimation operations.

The bitrate adaptation optimization problem is eventually solved by L2A-LL; a novel adaptation algorithm based on the online optimization theory. One of the strong properties of L2A-LL is that its formulation is modular, allowing to incorporate more QoE factors (and their weight prioritization) to account for different QoE objectives and multiple streaming scenarios and/or user classes.

What does it offer?

L2A-LL provides performance guarantees, that can be very useful for commercial deployment, while it does not require any statistical assumptions or estimations for the unknowns. Overall, as proven in real experiments, L2A-LL performs well over a wide spectrum of network and application scenarios, due to its design principle; its ability to learn. This property makes L2A-LL a robust bitrate adaptation solution and allows its classification in the small set of bitrate adaptation algorithms that mitigate the main limitation of existing approaches; the dependence on complex statistical models.

A detailed technical description of L2A-LL is provided in our publication. There we have conducted an experimental evaluation that shows that L2A-LL reduces broadcast level latency by 50% while providing a high average streaming bitrate, without impairing the overall QoE. This result is independent of the channel and application scenarios. L2A-LL is lightweight and can be deployed even in mobile devices and allows for modular QoE adjustments and prioritization. This is of significant relevance in the field of modern adaptive streaming, where OTT video service providers are continuously expanding their services to include more diverse user classes, network scenarios, and streaming applications.

Don’t take our word for it

When a joint public call for low-latency bitrate adaptation algorithms was made by Twitch and ACM MMSys'20, L2A-LL was submitted. After rigorous real experimentation, L2A-LL proved to reduce latency significantly (to <2 s levels), while providing a high average streaming bitrate and an overall high QoE; a result consistent in all tested channel and application scenarios. L2A-LL received a favorable evaluation and was ultimately selected as the winning solution.

See it in action

L2A-LL has been contributed in dash.js (v. 3.2.0 and beyond), DASH-IF’s open-source video player. Additionally, a live demo of L2A-LL can be accessed here.