Signaling information about media content, what's so fancy about that? Metadata may seem like the most boring part of a video streaming setup, but it is vital and adaptive bitrate streaming would not be possible without it.
Take the bitrate of a video as an example. If such information is not signaled as metadata, a player cannot determine which bitrate it should request given current network conditions and other circumstances.
Metadata like that is nothing new, though. It applies to the entire stream or track and doesn't change over time. In other words, it's static. But what if you want to add flexibility, and introduce certain kinds of information at certain points in time? That's where timed metadata comes in, the topic of this blog post and a way to further smarten your streams.
What is it?
For those unfamiliar with the concept of timed metadata: it is different to 'ordinary' static metadata, which can be any information about the entire stream or track, as the timed variety applies only to a specific time or time range. Generally speaking, it signals the start or the end of an event and shares information about that event.
An event can be a variety of things:
- A program that is part of a continuous livestream
- A song, e.g. in a podcast
- A commercial or other kind of break (e.g., using SCTE-35 markers)
- A point of measurement, for metrics (e.g., using Nielsen tags)
Based on the kind of information in the timed metadata, a server or a player can behave in certain ways. For example, a player can:
- Make a request to an ad network and insert and play the ad that is returned
- Display the cover art of a song that is playing (when streaming audio only)
- Overlay the video with specified information
- Make the video clickable (to follow a specific link, e.g. to a product page or another video)
- Prevent a commercial being skipped or easily silenced
- Return metrics
The problem with the above is that it requires a player that is capable of both correctly interpreting the metadata and of performing the intended behavior. In most cases, this will require a custom setup that uses a custom player, which complicates deployment and impacts overall compatibility.
Handling it server-side
The solution to compatibility issues at the player-side is to shift advanced metadata-based behavior to the server-side, i.e. to the origin from which the stream is served. Dynamic ad insertion is a good example of such advanced behavior, with timed metadata signaling the start and length of an ad break. Based on such metadata and using Unified Remix, ads can be inserted on-the-fly in a way that is compatible with all players and devices, a topic that was discussed in this earlier blog post.
In addition, a server-side workflow based on timed metadata can be used to let Unified Capture make a clip of each program that is part of a livestream, in order to automatically create a Video on Demand (VOD) library of that livestream.
Whereas an Electronic Programming Guide (EPG) often contains slightly outdated information due to shifts in programming, timed metadata is generated when a program actually starts and ends. This allows Capture to create frame accurate VOD clips for each program.
Handling timed metadata server-side does not mean that all metadata is handled by the server. Some data will be passed through to the player, if required: to report back about certain viewing metrics, to disable the ability to skip while playing an ad, and to present artist information, cover art and link overlays.
Even when certain timed metadata needs to be handled by the player, it is still smarter to handle it server-side first. Using a product like Unified Origin, a single stream of metadata will automatically be signaled correctly in Apple's HTTP Live Streaming (HLS) and MPEG-DASH, as ID3 tags and 'event' messages respectively (more on that below).
Unified Origin can handle timed metadata for Live as well as VOD. In the latter case the content should be prepared correctly using Unified Packager, while for Live, timed metadata should be ingested as a separate stream, which, like the audio, video and subtitles, must be carried in ISO Base Media File Format (ISO BMFF) containers.
How to: Live
How timed metadata should be structured in an ISO BMFF contained media segment is defined in a paragraph of the MPEG-DASH specification that is dedicated to event messages (ISO/IEC 23009, §5.10.3 - Inband Event Signalling).
Following this specification, the stream that carries media segments with timed metadata is an 'inband' event stream. This as opposed to events that are not signaled in media segments, but in a stream's manifest.
Unified Origin only supports inband event streams. Sometimes, such a stream is referred to as a 'sparse track', because the metadata isn't a continuous but an intermittent stream of information.
In a Live streaming environment, the inband event streams are part of the encoder output, just like the regular media streams are. The encoder will post them to the same Unified Origin Live publishing point as the regular streams and, if the inband event streams are structured correctly, Unified Origin will handle them accordingly.
If set up properly, no extra steps are required to get timed metadata to work when streaming Live with Unified Origin. If you are interested and would like more information, please contact us.
Signaling the right schemes
How the metadata carried in inband event streams should be structured, is defined in several schemes. There are several predefined ones, but, if necessary, a custom scheme can be defined to represent anything.
An overview of some common schemes for inband event streams is presented in Table 1 below:
||Signals DASH specific events for DASH clients
||ISO / IEC 23009-1 (2014), §5.10.4
||Basic metadata relating to current program
||ETSI TS 103 285, §220.127.116.11 (pdf)
||Contains a binary SCTE-35 message
||ANSI / SCTE 14-3 (2015), §7.3.2
||Contains a Nielsen ID3 tag
||Nielsen ID3 in MPEG-DASH
|<application provider specific>
Table 1: Overview of MPEG-DASH event messaging schemes
When we look at the purpose of each of the predefined schemes above, DASH specific events are used to signal to a client when a stream will end, as some players need that information to properly close a stream.
Then there's the DVB scheme that simply standardizes how information about the current program should be formatted so that it can be read and displayed by a wide variety of devices (such as set-top boxes).
DASH event messages can be used to carry other formats as well, such as the ID3 tags
Nielsen, the company used by broadcasters worldwide to gather audience metrics, has a scheme identifier as well. It is an interesting example because it shows that DASH event messages can be used to carry other formats as well, such as the ID3 tags that Nielsen requires for their measurements.
Then there's the scheme that signals SCTE-35 markers. These markers are of interest to many in the video streaming industry, because they're based on familiar workflows from the world of broadcasting, where they are widely used to signal when and for how long a stream should be interrupted for advertisements.
SCTE-35 markers are part of the holy grail of the industry: server-side dynamic ad insertion
Within an online video streaming setup, SCTE-35 markers can be part of the current holy grail of the industry: server-side dynamic ad insertion. Signaling advertisements is not the only purpose of these markers though, as they can also announce events like the start and end of a program.
Similar to the ID3 tags that are carried in event messages for Nielsen, the content of SCTE-35 markers is not contained in the event messages themselves. Instead, the event messages are used to carry the marker's content as binary data.
The format in which SCTE-35 markers should be signaled in event messages so that they are compatible with Unified Origin is described in Table 2 below. This formatting is supported by selected encoders from Media Excel and we are currently testing it with other major vendors, such as Telestream.
||The value of the SCTE-35 PID
||Non-negative number expressing splice time relative to track fragment base media decode time (tfdt) expressed in timescale
||Duration of event in media presentation time, 0xFFFFFFFF indicates unknown duration
||Unique identifier for message
||Splice info section including CRC
Table 2: Structure of a SCTE-35 marker in an event message
How to: VOD
In general, signaling timed metadata in a Video on Demand setup works the same as when streaming Live. If you have a MP4 that contains inband event messages, you can use it and add it to your stream in the same way as a MP4 that contains audio, video or subtitles.
First you use Unified Packager to convert the progressive MP4 into a fragmented MP4:
mp4split -o meta.ismv meta.mp4
Then you add the metadata to your server manifest along with the other content that should become part of the stream:
mp4split -o movie.ism video.ismv audio.ismv \
In addition, Unified Packager can be used to prepare and add metadata that is stored in ID3 tags. To make separate ID3 tags into 'timed' metadata, you need to create a kind of playlist. This is simply a text file that ends in the '.meta' extension. It contains an entry for each tag and each entry must specify the following:
- A point in time when to insert the data
- The type of the input data
- A file path to the data
The entries should be structured like this: <time><space><type><space><file>. The 'time' should be specified in seconds and needs to increase with every entry. The 'type' of metadata must be 'id3' as this signals that the 'data' is an ID3 tag, while the tags themselves should adhere to version 2.4 of the ID3 specification. Last but not least, the 'file' path should either be relative to the .meta-file location, or absolute.
All in all, your 'id3.meta' needs to look similar to this:
0.00 id3 1.id3
2.5 id3 2.id3
5 id3 3.id3
7.500 id3 4.id3
10,00 id3 5.id3
Using this id3.meta-file, you can prepare a fMP4 that contains the metadata with Unified Packager:
mp4split -o id3-meta.ismv id3.meta
Finally, you can generate a server manifest, using the fMP4 with metadata in combination with the fMP4-files that contain the actual media content:
mp4split -o movie.ism video.ismv audio.ismv \
When you have successfully added timed metadata to either your Live or On Demand video stream, it depends on your setup and on the kind of metadata whether it will be put to use on the server- or on the client-side.
To make a successful client-side approach possible, the ingested metadata needs to be passed through to the client correctly. Unified Origin makes this possible by dynamically adjusting the metadata's structure and signaling, depending on the requested playout format.
For DASH, Unified Origin will list all applicable metadata schemes as 'InBandEventStream' elements in each of the manifest's Adaptation Sets. This ensures that a player can register a handler for each schemeIdUri that it can parse. In addition, the event messages will be multiplexed in all audio and video output streams.
For HLS playout, the timed metadata will be inserted in the MPEG-TS output streams according to Apple's Timed Metadata for HTTP Live Streaming specification. Considering that fMP4 support has been added to the HLS specification, it will be interesting to see if Apple will also add support for inband event messages.
So, what will the use of timed metadata add to your video streaming setup? There is a wide variety of possibilities, some of which have been described in this blog post. Without a doubt, metadata can be a powerful tool, but only when put to proper use, as functionality depends heavily on implementation and integration. If you would like to discuss the opportunities for your service, please feel free to contact us.