TL;DR - How the DASH-IF Live Media Ingest Protocol - Interface 1 using the CMAF file format can be utilised when ingesting media to a smart origin (just in time packager) within a live adaptive-bitrate streaming workflow. 

The foundation for something groundbreaking

The CMAF or Common Media Application Format specification published in 2018 is a set of rules used to govern how audio, video and metadata tracks are stored. While CMAF has it's own specification, many of the guidelines are a subset of the much larger MP4 specification, formally known as ISOBMFF.

ISOBMFF overview
Image A - various types of mp4 file exist. Each guided by a specific set of rules from within the ISOBMFF specification.

Identifying the gaps

CMAF predominantly gets mentioned when talking about delivery of media ‘packages’ in both Live and VOD workflows. This is where an encoder will deliver multiple media fragments (often per track) alongside a HLS Playlist and DASH manifest which forms a package the client receives.

Push-based HTTP streaming workflow
Image B - A generic encoder responsible for packaging media that is posted directly to a cdn or cache. Separate fragments for the same track, each with a custom url/name.

In Live workflows Encoder/Packagers often rely on legacy ingest specifications such as RTMP, WebRTC, MPEG2-TS or Microsoft Smooth Streaming. Specifications which are no longer developed and fail to provide the mechanisms to handle technologies such as HEVC, VVC, HDR, SCTE-35/Timed-Metadata or various forms of Subtitles.

If a specification does support one of these technologies generally it’s via an amendment. Sometimes a proprietary implementation by a specific vendor for a specific use case which fails to be widely supported or benefit the wider industry.

Whilst the above example may seem simple, limitations can arise when wanting to provide redundancy, scale to meet requirements or provide additional capabilities such as; time-based restart and tracks selection per device.

Built by everyone, for the benefit of everyone

Published in March 2020, the Live Media Ingest Protocol is a collaboration between Unified-Streaming, members of the DASH Industry Forum (DASH-IF) and key players operating within the OTT industry (Akamai, AWS, BBC & Hulu to name a few).

Born from the need to provide a redundant, fault tolerant, scalable method of ingesting media benefiting from the existing CMAF specification and providing support for the technologies that fail to be supported by existing specifications. The protocol has been developed whilst factoring in the development of other specifications such as MPEG-DASH 4th Edition and MPEG-I Network Based Media Processing - NBMP.

As an evolution to the widely adopted Microsoft Smooth Streaming Ingest Protocol (fragmented-MP4), the Live Media Ingest Protocol is capable of being used in both Live and VOD streaming workflows. It fits well with existing interoperability guidelines from both Apple and DASH-IF for workflows such as ad insertion and low latency.

Pull-based HTTP streaming workflow
Image C - A generic encoder now posts synchronised cmaf tracks of a fixed length to a generic origin. The origin responsible for packaging the media can now deliver tracks in multiple formats and configurations based upon any requirement.

Interface 1 outlines how CMAF tracks can be received by Packagers/Origins for archiving and/or downstream media processing. By using CMAF tracks in combination with an Origin, it’s now possible to implement redundant, scalabile, low latency workflows.

An encoder can generate media fragments as small as 1 sample or video frame (low latency) and deliver these to an origin for processing. The origin then repackages these fragments in the desired format (HLS, DASH, MSS), fragment length (1,92s for HLS, 3,84s for DASH/MSS) and track combination (only want to provide video & audio, but not subtitles) based upon client request. Now that each track is delivered as a separate file and synchronised, it provides opportunities for redundancy.

Redundant encoders pull-based HTTP streaming workflow
Image D - Redundant encoders within a pull-based HTTP streaming workflow

To achieve this, each encoder using its internal systemclock (UTC) as reference should timestamp the fragment with a decode time/offset based upon the same algorithm (UTC + Time Scale x Sample Duration).

By doing so this guarantees identical fragments are sent with the same decode time from each encoder, so if an encoder fails it's not a problem. But if both encoders stop there will be a gap or discontinuity in the timeline.

The origin will use this decode time to calculate which fragments have been received and concatenate them into a single track stored locally to the origin. If encoder B sends a fragment with the decode time which has already been received from encoder A, it is simply ignored and the next fragment/decode time in the sequence is used to build a continuous media timeline.

Concatenated CMAF track with video frames from different sources.
Image E - combination of video frames from different encoders used to build a single track with a continuous timeline. A = frames from encoder a and B = frames from encoder b

Now that it's possible to guarantee synchronization between source and destination (based upon UTC timing) alongside redundancy at the encoders. Why not also provide redundancy at the Origin as most encoders are capable of posting the same streams to multiple locations.

Full redundant pull-based HTTP streaming workflow
Image F - Full redundancy across both paths with time synchronization. Providing opportunity for stream selection based upon, region, availability, configuration and more.

This isn’t just theory - go ahead and try it yourself for free!

DASH-IF collaborated with FFmpeg to implement support for Interface 1/CMAF.

FFMPEG and Unified Origin

Unified Streaming offers a free docker based demo to try this out for yourself using the following Github project. 

A free 30day license to run the project can be offered without question by getting started here.

This demonstrates how separate encoders (in this case FFMPEG) can be used to post time synchronized fragments to a single Origin. By default both encoders will be running but it's possible to stop/start (leaving encoder running) to see how redundancy is achieved with no impact to playout from the Origin.

We also offer a Live Demo where you can see encoder fail-over in action. Scheduled to automatically fail an encoder mid-stream at every 1 minute & 5 second interval (eg, 12:01:05, 12:02:05). The Active ContainerID in the picture will change showing how encoder failure can be handled without impacting playback (no discontinuities or buffering).

If you have any questions, comments or wish to collaborate with us. Please get in touch

For awareness, the Live Media Ingest Protocol also defines an Interface 2. Which outlines existing best practice for delivering DASH/HLS packages similar to that described earlier on. As interface 1 is the focus we have not gone into further details. Should you wish to know more please see the following explainer video on our website or read about it in section 6 of the protocol documentation.