Simplifying live streaming at scale with DASH-IF

Article

21/1/2021

Simplifying live streaming at scale with DASH-IF

Along with a number of other partners, Unified Streaming has been working on a DASH-IF project to form a new specification for a live ingest protocol. It began as an internal project with a few other vendors a couple of years ago, after a European broadcaster highlighted a need to document what should come out of live encoders. The specification has now been published and is ready to use. In this article I’ll outline the reasons why the new protocol is necessary and what some of the main considerations have been in developing it.

Partners in this project have been recruited from across the streaming spectrum to ensure a complete view of workflows in designing the specification. They include Microsoft, HULU, AWS Elemental, Akamai, Comcast, Bitmovin, Qualcomm, Media Excel, Harmonic, Ateme and CenturyLink.

The project addresses live OTT streaming at scale, which is notoriously difficult to support because there are so many moving parts.

In particular, there were four areas that we felt could be improved:

Timed metadata: We wanted to simplify getting programming information and other metadata into the fMP4 presentation. We also wanted to factor in signal splice information to enable just-in-time packaging and ad insertion.

Reliability: To improve fault tolerance and redundancy, we wanted OTT workflows to be able to use multiple live encoders, origins or CDNs to achieve end-to-end failover support.

Low latency: This is really something that builds up across a workflow. Having the right specifications and avoiding additional conversions will help OTT providers to reach their latency goal.

Better than broadcast quality: Support for high quality video streaming with qualities comparable or better as compared to broadcast.

By addressing these areas, we have created a protocol that is fit for the demands of modern streaming and creates a much more efficient workflow, which ultimately saves on cost and resources. Low latency is already a big topic in the industry and will open up valuable new opportunities to engage viewers and maximize advertising revenues. But it is essential that the workflow is as efficient as possible to fulfill such potential.

Rather than just standardizing a player format, we set out to create a more efficient overall workflow. We began by identifying some interfaces that would benefit from standardization, in particular we looked at the output of the adaptive bitrate encoder and origin server, or packager, that are pushing towards a content delivery network (CDN).

A live encoder would typically push its content out because it needs to concentrate its resources on encoding the content. In other words, it doesn’t have the capacity to deal with a lot of requests from clients. Instead, the live encoder tends to produce a frame or fragment and push it out to the origin server, which in turn makes it available to clients. Both MPEG-DASH and HLS are great protocols for this use case.

But in DASH and HLS there is no specified behavior for pushing content: it’s mostly about the HTTP GET pulling the content out. So we developed a profile for fMP4 and CMAF that would allow the packaging to take place later on.

We have developed two profiles, or interfaces. If you have a workflow where there’s a separate encoder and packager or origin server then you could just send CMAF tracks without a manifest and do on-the-fly packaging, then push that or link it to a CDN. The second interface would allow you just to post, or push, DASH directly, so in this case you would be pushing segments and manifests. Both cases use HTTP POST.

This is of course a very simplified overview of CMAF ingest. In the first interface, without a manifest, you just use different TCP connections and post a CMAF track so that actually it’s an fMP4 file, but there is no multiplexing so each track is a separate file and you’re just posting them. Something I want to make clear is that it’s not mandatory to use a long running post: you can use long running posts with chunk transfer or you can just post each fragment separately.

The important thing is that the receiver is detecting the fragments when it’s receiving them. And it’s triggered by that so this also works well in low latency where you might have small chunks. The receiver can just detect the chunks and produce the output manifest on the fly. So what you see here is just different video and audio tracks being posted to the origin server. We use the CMAF tracks for video and audio, but we also use it for program metadata, which is something that is still relatively new.

A nice thing about the protocol is that it’s quite easy to disconnect and rejoin later. If, for example, the source goes down and then starts up again it will resume the post at pre-defined segment boundaries. A second dual redundant encoder can be used, so if the CMAF header and the last fragment cannot be sent, the receiver can still have the complete stream for archiving from the redundant encoder.

Another thing we wanted to achieve with the new spec was just-in-time packaging for low latency. One thing you can do in a CMAF specification is insert top level boxes, called DASH event message boxes, which can carry metadata. To achieve fast packaging before our new protocol, the entire media file would need to be scanned to find those boxes, but we defined a separate track so an ISO BMFF-capable device can easily see the metadata track and find the right information.

The usage of the CMAF container format allows usage of latest video codecs such as High Efficiency video coding with support for High Dynamic Range (HDR) and resolutions up to 4K. The CMAF container format is popular and many new codecs have bindings to the CMAF container format. The CMAF ingest protocol can therefore seamlessly support new and emerging codecs, without any modifications to the protocol, as well as high quality video. This support for new codecs was something we found lacking in other protocols that were not based on CMAF (e.g. WebRTC, RTMP etc.).

In these ways we have addressed the three main aims of the project: improving timed metadata, reliability, low latency and video streaming quality.

Live streaming is set to become harder to deliver at scale as an increase in concurrent viewers is met by the need to provide better, more engaging streams for viewers. As competition for viewer attention increases, features that rely on timed metadata and low latency will become more important. Underpinning these features will be a need for better reliability and the new ingest specification should go a long way to addressing that.

This article was published on Streaming Media