Scaling video streaming: Live versus VOD

More and more video is streamed over the internet every day. To be able to target a big audience, while still keeping control over the way in which you deliver your videos, a tailored solution of hardware and software is called for. To support the wide variety of devices in use nowadays, with differing protocols and digital rights management solutions to take into account, the Unified Origin server offers a highly capable and efficient solution that is easy to deploy. This blog post looks into the scalability of video streaming based on this solution, and the performance in different scenarios, being live or on demand.

How many Unified Origin servers does one need? This question often arises when designing a video streaming platform.

The Unified Origin allows you to stream video in all of the important formats, such as MPEG-DASH, Apple HLS and Microsoft Smooth, with different DRM schemes such as PlayReady, Widevine, Primetime, FairPlay and Marlin, while storing only a single set of media source files. In addition, the Unified Origin can stream live content ingested from popular encoders such as Elemental Live, Thomson, Harmonic, Allegro, Envivio, Digital Rapids and many others.

Unified Origin’s versatility, along with its reliability and high performance have made it a core component in large scale video streaming platforms such as those deployed by the Dutch National Broadcasting Organization (NPO) and the British Broadcasting Company (BBC). In addition, online video platforms that offer streaming video all across the world, such as Globo and DailyMotion, have been built using the Unified Origin server.

So, how many Unified Origin servers does one need? This question often arises when designing a video streaming platform. It concerns both the amount of hardware/cloud servers needed and the number of licenses required. The answer depends on the specific scenario and the design of the entire platform. This design includes the server hardware used, such as CPU, network interface cards (NICs) and type of storage used, like solid state drives (SSDs), RAM-disk and/or distributed object storage. In order to offer insight into the number of Unified Origin servers that different scenarios call for, this blog post will consider these and other design choices when scaling a video streaming platform based on Unified Origin.

The guidelines provided in this blog post are based on previous deployments of Unified Origin, such as those by NPO, BBC, Globo and DailyMotion, as well as our experience with other clients. We will distinguish the use cases of live, where a video stream from a live encoder is viewed by many users, and video on demand (VOD), where users can watch any video at any time. The reason for this distinction is that both call for their own design approach to achieve good performance.   

In the case of many users watching the same content, a single Origin could potentially serve many (millions of) users.

To start off, Figure 1 shows the schematic overview of a video streaming platform based on Unified Origin. The Unified Origin can either take input from the live encoder (live backend), or read data from local or remote dedicated storage (storage backend). The Unified Origin then produces the media presentation for streaming over HTTP, which can be delivered to different clients through a cache layer that will typically be handled by a content delivery network (CDN). The usage of such a cache layer reduces the request rate to the Unified Origin and increases efficiency, so that, in the case of many users watching the same content, a single Origin could potentially serve many (millions of) users.

Figure 1: Basic Video Streaming platform based on Unified Origin supporting both live and Video on Demand

Live scenarios

In a typical scenario of live, most users will watch the livestream, which results in those users calling for only a small amount of data. Thus, many of the scalability challenges for live are handled by the cache layer, which in a majority of scenarios will come down to CDNs like Amazon or Akamai. The number of necessary origins is mostly determined by the number of channels that are ingested, the throughput capacity of the Unified Origin server and the specific type of live service that is being offered to users. Using one or more CDNs, it is possible to service millions of viewers from a small number of origins and rather limited storage.

Overall with live, the number of possible channels is limited more by throughput than by storage capacity.

In general, more channels as well as lower network and disk throughput capacity will require more origins. Going into more detail, this will be considered in relation to ‘pure’ live and when offering options such as rewinding the livestream or the possibility to request older clips, which will impact performance. In addition to looking into these options, using cloud instances to run Unified Origin will be considered.

Pure Live

In the case of ‘pure’ live, the origin keeps a very minimal window of live video on disk. Sixty seconds only, for instance. Because of the limited storage needed for that, a RAM-disk can be used, which offers a very high throughput, making it possible to host a large of number channels on one (virtualized) server.

Each channel only takes up a small amount of disk space and even a large number of channels doesn’t require a lot of storage when only keeping a minimal window of video on disk. A stream with 5 bitrates and a total amount of 5700 Kbps (400, 800, 1000, 1500 and 2000 Kbps), for example, will take up about 42 MB of disk space per sixty second window of video (about 2.5 GB per hour). This amount of storage can easily be provided on RAM-disk or SSD, enabling high throughput.

Overall with live, the number of possible channels is limited more by throughput than by storage capacity. Because (most of) the contents can be cached, the number of viewers is less likely to affect the number of origins needed. Apart from the number of channels, the required throughput is determined by the range of bitrates, choice of DRM options and the variety of formats that need to be streamed.

Take Figure 2 as an example, with 1 source that’s encoded into 5 bitrates, which are ingested into the Unified Origin that streams them out in 4 formats, of which two are streamed ‘clear’ as well as DRM-protected. This makes the egress 6 times higher than the ingest, with the total throughput needed for the origin being equal to the ingest plus the egress. One thing to note here is that only the bitrates and formats that are called for will be streamed. Thus, the most demanding egress will be achieved only if the total number of viewers represents a wide enough variety of devices and connections speeds, such that those devices and connection speeds call for all of the available bitrates and formats to be streamed.

Figure 2 How the choice of bitrates and formats influences the required throughput capacity (ingest plus egress)

Specific considerations

To increase flexibility, channels can be separated and run in containerized origin servers using Docker (or similar container technologies). These container instances can be partitioned over different mount points of the underlying storage. Australia’s Foxtel is a client that uses this approach, they have assigned one containerized origin to each of their broadcast channels.

Furthermore, in live scenarios, we recommend to ‘secure’ the ingesting origin (the one the encoder is posting to) with a 'shield cache’ like Nginx in front of Apache, because Apache’s own caching is susceptible to the thundering herd problem.

Cloud instances

As for running Unified Origin in the cloud, an interesting question in the light of this blog post is how many live channels can be hosted on a cloud instance. A straightforward answer would be that, looking at the virtual server options offered as part of Amazon EC2, 10 channels per m3.2xlarge, c4.2xlarge or c3.2xlarge is a realistic estimate.

As explained earlier, the maximum number of possible channels is mostly limited by throughput (ingest plus egress). Take the calculation below as an example:

Ingest: 5 bitrates (400, 800, 1000, 1500 and 2000 kbps) total 5.7 Mbps
Egress: with three outputs formats, the egress equals three times the ingest, or 17.1 Mbps
Traffic: the ingest plus the egress totals 22.8 Mbps per channel
Total: taking 10 channels as an example, the total traffic would be 228 Mbps      

Therefore, the dedicated 1000 Mbps bandwidth of the c4 option, as well as the throughput of solutions with SSDs, like Amazon's c3 and m3 offerings, are all well within range for use with 10 channels. Looking at the dedicated bandwidth of the c4 option in specific, the 10 channels in the example above would take up about a quarter of its specified capacity. Thus, theoretically, there would be room for about 30 channels more, but that wouldn’t work in practice, as overhead needs to be taken into account as well as the possible use of CDNs and their configuration (which can increase traffic demand on the origin, especially when using multiple CDN).

Live with archiving

It’s also possible to work with a larger window of live video, so that viewers are able to skip back and forth within that window, as if they recorded the program using a DVR. Naturally, the storage space that this option requires, will be defined by the total size of the largest program that is to be streamed. If that amount of storage space doesn’t exceed the cache, overall performance shouldn’t be much different compared to 'pure' live. As for the length of the archive, we recommend a maximum of one day.

When an archive of older clips is part of the scenario, disk size and disk speed become more important. A possible scenario is a catch-up service that enables users to watch older episodes of a certain program. Essentially, offering viewers the option to watch older clips alongside a livestream is a hybrid of live and VOD. Therefore, when offering such an option, the considerations regarding performance are, apart from those regarding the livestream, similar to VOD, which we will take a look at next.

Video on Demand setup

In the VOD scenario, users can typically watch any content available at any time they like. Overall, the following aspects should be taken into account when designing a video streaming platform for video on demand:

  • The amount of storage needed to store the entire video collection
  • The number of concurrent users viewing at the peak times
  • The distribution of content popularity, from popular to non-popular
  • The variety of devices and connection speeds that needs to be served
  • The distribution of the physical location of users

The first thing to keep in mind is that, depending on the size of the content library, video on demand will require a significant amount of storage. For small libraries, local storage on file systems with backup copies might be sufficient. For very large libraries, such a setup isn’t enough and a dedicated storage solution will be needed. In practice, object storage using OpenStack Swift or Amazon S3 is quite popular. These options are relatively reliable for storing large amounts of data, as they typically offer protection to hardware failover. Solutions like OpenStack and Amazon are also flexible, as a pay as you go model is often deployed, where the customer is billed per amount of storage.

A side note about using object based storage solutions is that frequent HTTP access to objects is needed, and that the storage solution needs to support this natively. Amazon S3 can follow different access paradigms for example, such as frequent access and infrequent access, amongst others. For media streaming, frequent access based storage is preferred, as non-popular VOD content in particular will be accessed at the backend storage relatively often.

Experiments show that each server running Unified Origin can typically handle up to 1600 concurrent users on a cloud instance with 4GB of RAM.

The second thing to take into consideration when dimensioning a VOD system is the number of concurrent users that will be watching the content at peak hours. As VOD enables users to watch what they want, when they want, caching becomes decreasingly likely the larger a content library will be. This can result in a high frequency of calls from clients to the Unified Origin server (because the content cannot be served from cache). Experiments show that each server running Unified Origin can typically handle up to 1600 concurrent users on a cloud instance with 4GB of RAM. In more demanding situations the throughput of the Unified Origin can become a bottleneck, a problem that can be solved by adding more origin servers and using a load balancer to distribute the requests for video content over all of the available origin servers.

A third important aspect is the popularity of content. When only a small part of the content is watched by many users (say 90%), a well-designed caching layer can help to reduce the load on the Unified Origin server significantly. In such cases, the number of Unified Origin servers needed will mostly depend on the number of concurrent users and the percentage of non-popular, so called long tail content. With large enough content libraries, such long tail content takes up so much storage space that it’s improbable that all of it can be cached. Therefore, every time a request for such content is made, there’s a significant chance that it will hit the Origin directly, thus causing extra load on the server.

The two last things to take into account are the various physical locations of users and the variety of devices and connection speeds that needs to be served. Users at remote locations for example, will have more latency, thus taking up more server time when making a request to the Unified Origin. As for the variety of devices and connection speeds, that will influence the required throughput of the Unified Origin. Offering a bigger range of qualities will typically ingest more data into the origin server, while the range of devices that needs to be served will determine which formats the Unified Origin needs to stream out (with every new output option adding to the total egress, and thus the required throughput).

Helping you decide: sizing and the number of origins

The method that we have developed for sizing is based on the technical and practical knowledge that we have accumulated throughout the development of our various software offerings in the realm of large scale video streaming. Our take on the various scenarios described in this text is a result of this knowledge, and we hope that it offers some insight into the things that should be taken into taken into account when scaling a livestream or on demand video service.

Having said that, every use case is different and the scenarios above should therefore only be considered guidelines. If you are interested in tailor-made advice, please write us an email, as we will gladly help you with sizing and calculating the number of origins required in scenarios that are specific to your use case. We can provide you with direct help and calculation tools, as well as with further background information.