Old habits die hard. This is true even for a fast moving, technology driven industry like video streaming. Ever since the introduction of Apple's HLS specification in 2009, there seems to be a general understanding about how a stream's separate media segments should be uniquely identified, so that they can be referenced in a manifest and requested by a player. The formula: simply assign sequential numbers to sequential segments.
Number segments sequentially instead of adding timestamps can significantly impact the load on an origin as it increases the chance of 404s.
This straightforward and simple solution seems useful enough, but the problem is: there has long existed an alternative that offers clear advantages and no real disadvantages. With their Smooth Streaming specification, Microsoft chose to order each segment on a timeline, with each segment's unique identifier corresponding to its place on that timeline. Simply put: instead of sequential numbers, Microsoft decided to add timestamps to segments.
That difference may seem subtle, except that it's not. Choosing to number segments sequentially instead of adding timestamps does have an influence on the experience of the end user and can, in specific scenarios, significantly impact the load on an origin as it increases the chance of 404s.
A less straightforward choice
Whereas the choice of playout format used to determine whether streamed segments would be numbered (HLS) or ordered on a timeline (Smooth), this no longer holds for DASH, as it offers both options. And while we're convinced that the advantages of using a segment timeline are numerous, many of our clients tend to show an initial preference for segment numbering.
Speculating on the reasons why segment numbering is still preferred by many, some suggestions are that:
- Segment numbering is used by the most widely used streaming protocol (HLS).
- Throughout the DASH specification, segment numbering seems to be presupposed.
- It simply sounds easier and more straightforward than using timestamps.
Of specific interest is the way the DASH specification presents both options. Reading through the specification, neither is explicitly preferred, but segment numbering is presupposed in most examples. However, upon closer reading, the specification clearly mentions several benefits of using a segment timeline, albeit with little fanfare. This blog post presents those benefits and, in doing so, explains our preference for using timestamps.
Signaling segment availability
To start off, it's important to understand that every DASH manifest makes use of two different timelines. One is the Media Presentation timeline to which the presentation times of all of the content in the manifest is mapped. This timeline is used to synchronize the different media components of the stream and enables seamless switching of different coded versions of the same components.
The second timeline is the one we will focus on in this blog post. This is the segment timeline, which is used to signal to clients the availability time of segments at specified HTTP-URLs. Of course, signaling this information for Live scenarios (a 'dynamic Media Presentation', in DASH-speak) is more complicated than it is for on demand content, because the segments of a livestream only become available over time.
With Live scenarios in mind, it's important that a player can determine the availability of segments easily.
Thus, with Live scenarios in mind, it's important that a player can determine the availability of segments easily, because successful play-out at the Live edge requires the player to request segments that have become available mere seconds ago (or close to, depending on the buffering model). If a player cannot reliably determine which exact segments these are, it could request segments that are not yet available, resulting in unnecessary 404s.
No wall clocks, please
Perhaps unsurprisingly, the way that the availability of segments is signaled in the manifest presents the biggest difference between the use of numbered segments and segments with a timestamp. Whereas the availability of segments with a timestamp can be determined based on nothing but the information signaled in the manifest, a player has to use the wall clock time as a reference to determine the availability of numbered segments.
This may seem counterintuitive. Why not just request the segment with the highest number signaled in the manifest if you want to start play-out at the Live edge of a stream that uses numbered segments? Have a look at the way in which a timeline is constructed within a DASH manifest when one uses segment numbering and it becomes easier to understand the problem:
As you can see, there isn't a list of all segments, but merely a template with the (average) duration of each segment (duration / timescale = segment length in seconds). To be clear, DASH doesn't prohibit listing all of the individual segments (a so call SegmentList), but that is not desirable, as such a list will increase the size of the manifest considerably for no good reason.
Apart from the template as shown above, the manifest also signals when the first segment became available. A player has to combine this timing information and the template with what it considers to be the wall clock time at the moment of play-out to determine what segments are available:
(current wall clock time according to player - time at which first segment became available according to manifest) / (duration / timescale) + 1 = number of most recent segment
The problem with this formula is that the calculation will only be accurate if both the player and the origin use the exact same timing, which is far from obvious. Even the official DASH specification acknowledges this as it lists a variety of reasons why a player and origin would use conflicting time sources.
A few of these reasons are: a player without access to accurate timing, a player's clock that drifts against the system clock, combined with not synchronizing it often enough and caching that delays the availability of media segments, thus breaking the timing synchronization with the player.
A more reliable Live edge
Things are quite different when you have a look at a template that uses timestamps instead of numbers, as is shown below. Here, the manifest simply presents a complete list of all the available segments, albeit in a rather compact way. The 't'-element represents the timestamp of the first segment that has an exact duration specified by the 'd'-element, whereas the 'r'-element tells the player how many subsequent segments with the same duration are available:
<S t="0" d="50" r="100" />
<S d="51" />
<S d="50" r="80" />
<S d="49" />
Well then, how to calculate the exact timestamp of the most recent segment? Simply take the last time the 't'-element was specified and add the total duration of all of the following segments to it, except for the last segment signaled in the manifest and there you have it: the timestamp of this last segment without the need to know the current wall clock time.
So, one of the clear advantages of using segments with timestamps is the ease with which the Live edge of a livestream can be found. This advantage alone is considerable. Having studied the relevant parts of the DASH specification and taking our own experiences into consideration, other benefits of timestamps are the following:
Variable segment durations
As can be seen in the example above, the duration of each individual segment is specified when using a timeline, whereas only one average duration can be set when numbered segments are used. The big difference is that this means that, when using timestamps, varying segment durations can be dealt with properly.
There are many reasons why the duration of media segments can vary, even if a set duration is specified in the encoder. Working with numbered segments, the manifest doesn't show these variations, which means that calculating the number of the most recent segment becomes unreliable.
(As explained above, the calculation makes use of a single variable that represents the average duration of all segments, which cannot accurately represent the true duration of each segment if they vary in length.)
Using a timeline with timestamps, this problem doesn't apply.
Another considerable benefit of using a timeline is the ability to signal discontinuities in a stream. A temporary lack of input? In the case of segment numbering, the manifest won't signal the discontinuity. Therefore, a player won't notice anything until it requests a segment from when the input was unavailable. Such a request will lead to a HTTP 404 response, telling the player that the requested segment is unavailable, after which the player can only try its luck on the next segment and so on.
Using a timeline, handling a discontinuity is as straightforward as not signaling any new segments until the input is available again. However, this does make calculating the timestamp of the next segment impossible, because such a calculation relies on the total duration of the segments that preceded it.
Fortunately, the solution to this problem is simple. To signal the exact timestamp of the first new segment after a discontinuity, a 't'-element containing the timestamp is added to this segment. For an example of such a situation, see below, where the input was cut off after half of the 100th segment was ingested ('d=25') and made available again at 't=7350':
<S t="0" d="50" r="98" />
<S d=25 />
<S t="7350" d="50" r="150" />
Sample accurate timing
The added benefit of the fact that each segment's duration is accurately signaled in the manifest, is that a segment timeline offers sample accurate timing. Not only does the timestamp represent the duration of the segment, it also represents the duration of the sample that the segment contains.
Therefore, a segment timeline shouldn't only be your preferred template to signal segments when streaming Live, but when working with VOD as well. This will make requesting the samples within a certain timeframe as easy as requesting the segments with the timestamp that cover that timeframe.
Automatically extended timeline
One seemingly unique advantage to using segment numbering is that, in normal situations, a player only needs to request a client manifest once, because the information in that manifest will stay the same, whereas the use of a segment timeline requires regular updates to the manifest to signal new segments. However, the contrast is less stark than it seems at first.
Yes, a client manifest using segment numbering doesn't need to be updated at all, but the use of a segment timeline doesn't necessitate a request for an updated client manifest each time the end of the timeline is reached. A player can easily extend the timeline on its own. This works similar to calculating the availability of the next segment when numbered segments are used.
To calculate the timestamp of the next segment, the player can simply take the timestamp of the most recent segment and add it to the timing information of the media contained in the segment (signaled in the 'tfdt'-box of the fMP4). No wall clock needed, no rocket science involved.
Taking all of the above into consideration, the overarching disadvantage of using sequentially numbered segments is that they increase the chance on 404s. Using wall clock time as a reference, the inability to signal discontinuities, and the inability to signal variations in segment durations are all a possible cause of this.
Firstly, 404s and using a wall clock as a reference: as explained earlier, a reliable calculation is impossible without proper match between the timing used by the player and the origin. In the case that a player's time source is in front of that of the origin, requests will be made for segments that are not yet available.
As these scenarios don't apply when using a segment timeline, doing so can significantly reduce the occurrence of 404s.
Secondly, regarding discontinuities that aren't signaled in the manifest causing erroneous requests: a player will assume that segments from within the timeframe of the discontinuity are available until a request for such a segment proves otherwise and returns a HTTP 404 response.
Thirdly, regarding the inability to signal variations and the availability of segments: if a certain segment turns out to be two seconds longer than the set duration, the next segment will be available two seconds later than a player's calculation would imply. If the player then requests the second segment immediately, this will once again lead to a 404.
These three scenarios don't apply when using a segment timeline. Thus, doing so will significantly reduce the scenarios in which 404s can occur. Plus, it will offer the added benefit of sample accurate timing as well as a quick and reliable method to let a player find the Live edge.
Do It Yourself
Convinced about the benefits and curious to try it? If you're using our Unified Origin, make sure that your encoder inserts UTC timing. Besides that, don't force the origin to use a minimum fragment length for its DASH manifests (--mpd.minimum_fragment_length) and don't force it to use a specific DASH profile either (--mpd_profile). That's it, because Unified Origin streams DASH with a segment timeline by default. And in case you're not familiar with our Unified Origin yet, feel free to try it and contact us if you have any more questions.