How we built a testbed to load test streaming video at sclae - Part 3:

November 11, 2021

In this third blog of our multi-part series on load testing video streaming at scale, we evaluate different Media Processing Functions (MPFs) in different parts of the network. We focus on achieving the best results when streaming linear Live channels.

In the first part of this series, we introduced the testbed used for our evaluation, which is based on MPEG’s Network Based Media Processing (NBMP). Also, in the first part we introduced the term Distributed Media Processing Workflows (DMPWs), which consist of multiple MPFs running at different geographical points in the Network. These distributed MPFs can provide performance gains in a DMPW by being closer to the end client devices. In the second part of the blog series, we evaluated different Video on Demand (VOD) setups that rely on remote object-based cloud storage to obtain its best performance gains.

Through the various tests presented in this blog we will determine:

The best cloud instance type for a specific Live video streaming configuration.
The linear scalability of Unified Origin in Live video streaming use cases.
The use of an Origin shield cache in front of Unified Origin.
The best configuration of edge based processing at nodes with close proximity to the user.

Testbed setup

Most popular setups for Live video streaming of Over-the-top media services (OTT) and TV Broadcaster consists of four elements: (1) Media Source (e.g., Live Encoder); (2) Origin server acting as the Media Processing Function; (3) a Content Delivery Network (CDN); and (4) client devices (Figure 1). This type of setup can use one or multiple CDNs to improve media delivery towards the client devices by caching the most watched (hot) content and reducing latency towards the client device. The reduction in latency happens by being geographically closer to the client device.

Figure 1. Most popular Live video streaming setup

In this blog we focus on the media delivery between the media origination and the CDN. Therefore, for evaluation purposes, we removed the CDN and client devices and replaced them with a Workload generator that generates realistic scenarios of Live video streaming cases shown in Figure 2.

Fig. 2 Distributed workload emulator — Figure 2 Distributed workload emulator

Setting up a publishing point for Live

The following code snippet is an example of how to create a Server Manifest (.isml). It uses mp4split (Unified Packager) to create a Live publishing point.

#!/bin/bash

set -e

mkdir -p /var/www/origin/live/test

mp4split --license-key=/etc/usp-license.key -o test.isml \
    --archiving=1 \
    --archive_length=3600 \
    --archive_segment_length=1800 \
    --dvr_window_length=30 \
    --restart_on_encoder_reconnect \
    --mpd.min_buffer_time=48/25 \
    --mpd.suggested_presentation_delay=48/25 \
    --hls.minimum_fragment_length=48/25 \
    --mpd.minimum_fragment_length=48/25 \
    --mpd.segment_template=time \
    --hls.client_manifest_version=4 \
    --hls.fmp4

mv test.isml /var/www/origin/live/test/

Media source (Live encoder)

For this Live video streaming use case, we used FFmpeg open-source software to emulate ingest CMAF tracks towards Unified Origin. Specifically, we used the DASH-IF Live Media Ingest Protocol - Interface 1.

A demonstration of a Live Origin demo using DASH-IF Live Media Ingest Protocol can be found in our Github repository.

In this case, the Live Media Source contains eight videos and two audio CMAF tracks generated by FFmpeg. The Docker container sends all media tracks to Unified Origin through a HTTP POST method. This process is described in more detail in DASH-IF Live Media Ingest Protocol document. Table 1 presents the media specifications of the emulated CMAF tracks.

Track	Resolution	Bitrate (kbps)	Codec
video‑1.cmfv	256x144	100	libx264
video‑2.cmfv	384x216	150	libx264
video‑3.cmfv	512x288	300	libx264
video‑4.cmfv	640x360	500	libx264
video‑5.cmfv	768x432	1000	libx264
video‑6.cmfv	1280x720	1500	libx264
video‑7.cmfv	1280x720	2500	libx264
video‑8.cmfv	1920x1080	3500	libx264
audio‑1.cmfa	NA	64	aac
audio‑2.cmfa	NA	128	aac

Table 1. CMAF tracks’ bitrate ladder generated by FFmpeg.

Below we present an example of a Docker compose file which uses FFmpeg to encode eight videos and two audio CMAF tracks, and then it sends the media to Unified Origin’s publishing point URI.

# docker-compose.yml
version: "2.1"
services:
 ffmpeg-1:
   image: ffmpeg:latest
   environment:
     - PUB_POINT_URI=http://${YOUR_ORIGIN_HOSTNAME}/test/test.isml
     - 'TRACKS={
       "video": [
       { "width": 256, "height": 144, "bitrate": "100k", "codec": "libx264", "framerate": 25, "gop": 48, "timescale": 25 },
       { "width": 384, "height": 216, "bitrate": "150k", "codec": "libx264", "framerate": 25, "gop": 48, "timescale": 25 },
       { "width": 512, "height": 288, "bitrate": "300k", "codec": "libx264", "framerate": 25, "gop": 48, "timescale": 25 },
       { "width": 640, "height": 360, "bitrate": "500k", "codec": "libx264", "framerate": 25, "gop": 48, "timescale": 25 },
       { "width": 768, "height": 432, "bitrate": "1000k", "codec": "libx264", "framerate": 25, "gop": 48, "timescale": 25 },
       { "width": 1280, "height": 720, "bitrate": "1500k", "codec": "libx264", "framerate": 25, "gop": 48, "timescale": 25 },
       { "width": 1280, "height": 720, "bitrate": "2500k", "codec": "libx264", "framerate": 25, "gop": 48, "timescale": 25 },
       { "width": 1920, "height": 1080, "bitrate": "3500k", "codec": "libx264", "framerate": 25, "gop": 48, "timescale": 25 }
       ],
       "audio": [
       { "samplerate": 48000, "bitrate": "64k", "codec": "aac", "language": "eng", "timescale": 48000, "frag_duration_micros": 1920000},
       { "samplerate": 48000, "bitrate": "128k", "codec": "aac", "language": "eng", "timescale": 48000, "frag_duration_micros": 1920000 } ] }'
     - MODE=CMAF
   depends_on:
     live-origin:
       condition: service_healthy

Origin configuration

We modified the IPv4 local port range on the Linux server running Apache web server to minimize the probability of TCP ports exhaustion.

#!/bin/bash

# The configuration file is located in '/proc/sys/net/ipv4/ip_local_port_range'
sudo sysctl -w net.ipv4.ip_local_port_range="2000 65535" # from 32768 60999

We used the following Apache web server configuration to enable Unified Origin module with one single web server virtual host. Also, we set the DocumentRoot to the full path of our Server Manifest file (.isml).

LoadModule smooth_streaming_module /usr/lib/apache2/modules/mod_smooth_streaming.so

UspLicenseKey /etc/usp-license.key

Listen 0.0.0.0:80

<VirtualHost *:80>
    ServerAdmin info@unified-streaming.com
    ServerName origin-live
    DocumentRoot /var/www/origin/live

    <Directory />
      Require all granted
      Satisfy Any
    </Directory>

    AddHandler smooth-streaming.extensions .ism .isml .mp4

    <Location />
      UspHandleIsm on
      UspHandleF4f on
    </Location>

    Options -Indexes

    # If not specified, the global error log is used
    LogFormat "%{%FT%T}t.%{usec_frac}t%{%z}t %v:%p %h:%{remote}p \"%r\"  %>s %I %O %{us}T" precise
    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" netdata
    ErrorLog /var/log/apache2/evaluation.unified-streaming.com-error.log
    CustomLog /var/log/apache2/evaluation.unified-streaming.com-access.log precise
    CustomLog /var/log/apache2/netdata_access.log netdata

    LogLevel warn

    HostnameLookups Off
    UseCanonicalName On
    ServerSignature On
    LimitRequestBody 0

    Header always set Access-Control-Allow-Headers "origin, range"
    Header always set Access-Control-Allow-Methods "GET, HEAD, OPTIONS"
    Header always set Access-Control-Allow-Origin "*"

</VirtualHost>

Workload Generator

The Media Sink (or Workload Generator) generates a load with a gradual increase of emulated workers from zero up to 50. Each worker emulates MPEG-DASH media HTTP(s) requests as follows:

Request MPD
Parse the MPD and metadata
URL segment generation
Set the last available media segment with temporary index ‘k’ = 0
Request audio segment ‘k’ with a bitrate of 128kbps
Request video segment ‘k’ with a bitrate of 3500kbps
If ‘k’ == total number of available media segments ‘N’ go to step 1, else ‘k’ + 1 and go to step 5

Type of cloud instance for Live video streaming

In previous Part 2 of our blog series we focused on using the best conditions for VOD streaming use cases with remote storage access. Because we did not have to store high amounts of data such as media, we selected the compute optimized instance family that suited the best for that use case.

However, most common Live video streaming scenarios require storing the media that is ingested by the encoder, which can later be available as a VOD asset in an ISOBMFF format (e.g., CMAF, fragmented MP4).

To identify the best cloud instance for Live video streaming, we classified the publishing point configuration of a Live channel into three use cases:

A: Pure Live,
B: Archiving with no DVR window
C: Archiving with an enabled DVR window

Table 2 provides the main configuration parameters to create a live publishing point. To know more details about each parameter please refer to our documentation or our Live Best Practices.

Use case/configuration	Pure Live (A)	Archiving, no ‘DVR’ (B)	Archiving and ‘DVR’ ( C )
archiving	true	true	true
archive_length	120	86400 (24h)	86400 (24h)
archive_segment_length	60	600 (10m)	600 (10m)
dvr_window_length	30	30	3600 (1h)

Table 2. Configuration of Live publishing point of most popular Live video streaming scenarios.

The following script shows an example on how we created a Live publishing point for each use case. We replaced each parameter’s value according to table 2.

#!/bin/bash

set -e

mkdir -p /var/www/origin/live/test

mp4split --license-key=/etc/usp-license.key -o test.isml \
    --archiving=${YOUR_VALUE} \
    --archive_length=${YOUR_VALUE} \
    --archive_segment_length=${YOUR_VALUE} \
    --dvr_window_length=${YOUR_VALUE} \
    --restart_on_encoder_reconnect \
    --mpd.min_buffer_time=48/25 \
    --mpd.suggested_presentation_delay=48/25 \
    --hls.minimum_fragment_length=48/25 \
    --mpd.minimum_fragment_length=48/25 \
    --mpd.segment_template=time \
    --hls.client_manifest_version=4 \
    --hls.fmp4

mv test.isml /var/www/origin/live/test/

Table 3 presents the specifications of three tested cloud instances. We selected at least one instance from the following instance families: memory, storage, and compute optimized. We also considered the baseline bandwidth of each cloud instance which guarantees a network bandwidth performance after 60 minutes of stress usage. For more information please refer to AWS EC2 network performance documentation.

Instance type	Instance	# vCPU	# Phys. Cores	Memory (GiB)	Storage	# of EC2 IOPS	Baseline Bandwidth (Gbps)	$/hr
Memory optimized	r5n.large	2	1	16	EBS Only	100	2.1	0.178
Storage optimized	i3en.large	2	1	16	1x1250 MB NVMe SSD	32,500	2.1	0.27
Compute optimized	c5n.large	2	1	5.25	EBS Only	100	3	0.123

Table 3. Type of AWS EC2 instances tested. Details provided by AWS EC2 for Frankfurt region datacenter. Checked on the 10^th of November 2021.

Experiment results of Live use cases

The following section shows the results of stress testing Unified Origin acting as de MPF in our testbed. We used three types of Live video streaming configurations (Table 2) and the three chosen cloud instances (Table 3). For each of the tests, the workload generator created a load hitting Unified Origin directly with a duration of one hour. In this section, we present the throughput and memory consumption by the cloud instance running Unified Origin.

Figures 4 and 5 present the average output throughput and average requests per second delivered to the Workload generator, respectively. The experiments using the cloud instance c5n.large obtained the highest throughput and requests per second among all three use cases. This result correlates to the baseline bandwidth provided by AWS EC2’s Server Level Agreements (SLA).

Fig. 4 and 5. Average Output Throughput and average request per second.

To obtain the best performance of Unified Origin, choose cloud instances based on the baseline bandwidth instead of burst bandwidth (see AWS EC2) .

In the previous figures 4 and 5 we noticed that both metrics had a higher variance for use case B in comparison to use cases A and C. This happens for two reasons: 1) there is no DVR window in use case B in comparison to C, therefore, the Workload generator will request the MPEG-DASH MPD more often, and 2) Unified Origin is creating more read/write operations by saving to disk the ingested media. The combination of these two factors and the type of load, makes the MPD requests in use case B more compute expensive.

Figure 6 shows the proportion (percentage) of a total number of requests generated by the Workload generator in each experiment. We can observe that use case A and B had an average of 3.03% requests for the MPD in comparison to video and audio segments requests with 48.5% each.

Population proportion per experiment of total number HTTP requests. — Figure 6. Proportion of the total number of HTTP requests by the Workload generator in each experiment.

Select the compute optimized instances for Pure live scenarios and storage optimized instances when archiving is enabled.

Experiment results of multiple Live channels

For the following experiments, we selected the Archiving-dvr ( C ) use case and the cloud instance i3en.large which offers a total of 16GiB of RAM memory. Each encoder is posting the same media bitrate ladder to a separate publishing point. It is important to notice that in these experiments we did not use an Origin shield cache in front of Unified Origin and configuration ( C ) increases the media archive size on disk over time.

Figure 7 presents the average received (IN) throughput by the encoder(s) and figure 8 shows the average outgoing (OUT) throughput towards the Workload emulator. For each experiment, we increased the number of unique Live channels at Unified Origin.

The results show a constant output throughput by Unified Origin regardless of the number of unique Live channels. In these tests conditions Unified Origin pushed an average of 3.8 Gigabits per second of throughput towards the Workload generator.

Figure 6. Throughput received and sent by Unified Origin server by adding a new channel. — Figures 7 and 8. Average received (IN) and sent (OUT) throughput by Unified Origin server with an increase of live channels.

We highly recommend setting up an Origin Shield cache in front to reduce the load and memory consumption at Unified Origin (docs)

Distributed media processing

In this section we consider edge based processing at nodes at close proximity to the user. Such deployments may run on multiple clouds and edge clouds at different geographical locations. The Live media source in our setup is always a remote source sitting in a central cloud (A) (in this case we use the Amazon EC2 region in Ireland). We deployed the Media Sink (or Workload Generator) in Frankfurt EC2 cloud (B) region. To test the edge based processing we deployed three configurations shown in figure 9.

Distributed edge processing configurations — Figure 9. Tested configurations for distributed edge processing.

Configuration 2 would correspond to typical usage with a content delivery network cache, while configurations 1 and 3 are smart edge configurations where active processing happens by the media function. The cache function was based on Nginx server. We deployed each edge configuration with the following cloud instance types shown in table 4.

Server name	Cloud instance type	Baseline Bandwidth (Gbps)
Media Source (ingest-server)	c5n.2xlarge	10
Media Processing Function (origin-server)	c5n.large	3
Cache Function (cache-server)	c5n.large	3
Workload generator	6 * c5n.4xlarge	6 * 15

Table 4. Type of AWS EC2 instances tested. Details provided by AWS EC2 for Frankfurt region datacenter. Checked on the 10^th of November 2021.

Experiment results for distributed media processing

The following section presents the experiment results of distributed media processing. Figure 10 presents the average output throughput from each server: ingest-server, orign-server, and cache-server.

Average ouput throughput of each cloud instance. — Figure 10. Average output throughput of each server and type of deployment configuration.

Figure 10 shows an average output throughput of 10.2 Mbps for each ingest-server in all three configurations. The origin-server obtained an average output throughput of 4.01 Mbps and 3.91 Mbps for configurations 2 and 3, respectively. After adding a Cache function in configurations 2 and 3, we reduced the load towards Unified Origin by 99.87% (from 3.32Gbps to 4.01 Mbps).

The results show that without a cache (Conf. 1) the performance is not improved, as the media processing function is not shielded by a cache. The CDN-like edge configuration 2 performs better in this case, as the processing is centralized and the caching, as the processing is centralized and the caching happens at the edge which is stable. However, edge configuration 3 with both caching and media processing functions achieves even better performance with an average of 20.2 Gbps of throughput and a lower variance in response times.

Conclusion

In this third part of our series on load testing streaming video at scale we tested various use cases for Live video streaming.

Based on our experiments we provide some key takeaways:

We presented the use of CMAF ingest live demo for the emulation of one or multiple encoders using FFmpeg.
Based on AWS EC2 network bandwidth specifications, always select a cloud instance by its baseline bandwidth instead of the burst bandwidth. In case your cloud instance reaches the indicated burst bandwidth, a bandwidth throttling can occur reducing the performance of Unified Origin.
When selecting a cloud instance for Live video streaming, verify the type of configuration. Compute optimized instances are more suitable for Pure-live cases, and storage optimized instances when you require to archive the media posted by the encoder.
Based on the total sum of your encoder’s bitrate ladder, you can estimate the required RAM memory and the maximum number of Live channels that your cloud instance can handle.
We strongly recommend using an Origin shield cache to reduce the load and memory consumption at Unified Origin.
Selecting cloud instances that fit your media workload and the type of scaling (vertical or horizontal) will reduce your overall costs.

If you want further information about how we built a testbed based on MPEG’s NBMP standard, we encourage you to read our paper presented at MMsys2021.

20 min read

Load testing streaming video at scale — Part 3: Optimizing streaming linear Live channels

Testbed setup

Setting up a publishing point for Live

Media source (Live encoder)

Origin configuration

Workload Generator

Type of cloud instance for Live video streaming

Experiment results of Live use cases

Experiment results of multiple Live channels

Distributed media processing

Experiment results for distributed media processing

Conclusion

20 min read

Load testing streaming video at scale — Part 3: Optimizing streaming linear Live channels

Testbed setup

Setting up a publishing point for Live

Media source (Live encoder)

Origin configuration

Workload Generator

Type of cloud instance for Live video streaming

Experiment results of Live use cases

Experiment results of multiple Live channels

Distributed media processing

Experiment results for distributed media processing

Conclusion

Share