As part of the launch of our Remix VOD2Live solution we created a demo to show how it can schedule a playlist of VOD assets and deliver it as a linear live stream, without relying on an expensive encoder. But not only the demo itself is new. We’re running it on a new infrastructure setup and using a new style of deployment of Docker containers running on Kubernetes. This blog presents a deep dive into this setup, and why we built it like we did.

The problem

Firstly I’d like to describe some of the problems of the existing demo environment which lead us to develop this new one. Our older demos are all running in a single shared environment, which is always running the latest trunk build of our software. While this has the advantage of always showing the latest and greatest features, it also has a few downsides:

    • Downtime while deploying the new build whenever a developer commits a change
    • Frequent discontinuities in live streams
    • Everything runs on the same system, so if it breaks all demos are broken
    • Configuration to handle requirements for every demo is overly complicated
multiple demos on single origin

So before building this new demo we took some time out to think about what we really wanted from our demos and then tried to design a system to match. A general concept we wanted to follow was separation of concerns. What this means for us is that each demo should be broken down into the required components (for example, a Unified Origin, a web front end and some content) and also that demos themselves should be separated. So rather than running 20+ demos on a single Unified Origin, each demo would have its own.

We also wanted to be able to reuse content for multiple demos as it doesn’t make sense to store dozens of copies of Tears of Steel for different demos.

separate origin per demo

We also wanted to run multiple versions of each demo, so we can maintain the cycle of constant updates to the latest trunk build, while separately having a stable version which should have minimal downtime and will generally be running our latest General Availability release.

The solution

There are two significant parts to the new environment, namely the infrastructure and the workflows.

The infrastructure

For the infrastructure we decided to run everything on our own hardware rather than in the cloud, primarily for cost reasons as well as not really needing the flexibility and scaling capabilities. For compute we decided on using Docker containers orchestrated by Kubernetes as it gives a huge amount of control and flexibility around deployment. As well as this we set up a Ceph storage cluster which gives us fast and reliable storage that can be presented as block devices, a shared file system similar to NFS, and as an S3-like object store.

We decided to run storage and compute on the same hardware, rather than having a separate dedicated storage system, as it allows us to easily scale up by just adding regular commodity servers and adding them to the clusters.

Logically this looks something like the below diagram:

demo infrastructure using ceph and kubernetes

The workflow

On the workflow side everything starts from a git repository. Each repository has multiple branches which will map to deployment environments, for example trunk which would run the latest development build, and stable which would run the latest GA release. There are three critical parts within this:

  • resources to build Docker images (a Dockerfile, config files to copy in, etc.)

  • a Helm chart for deploying to Kubernetes

  • a Jenkinsfile defining the workflow to build, test and deploy

While the exact composition of these components might vary, for example a more complex demo might require multiple Docker images for different services, the general structure will always be the same.

We have built a standard library of components, such as Unified Origin, that can be pulled in using Helm’s subchart functionality and configured appropriately at deployment time.

Our Jenkins CI jobs are configured to scan all branches within the git repository and run on any which contain a Jenkinsfile. This makes it extremely flexible for development and testing purposes as a new feature or change can be made in a separate branch, which will deploy to its own environment, and be tested thoroughly before being merged back in.

We also created triggers so that whenever a new build of the core Unified Streaming software is completed and passes regression tests it triggers the trunk build and deploy workflows for every demo, ensuring that we can always verify each demo with the latest software without any risk to the stability of the public facing demo site.

demo workflow using git, jenkins and kubernetes

Now this new system does have a downside of being rather more complex than the old way of working, when creating a new demo the developer needs to really think about all of the required components and how best to fit them together rather than just sticking a new webpage and content on the existing server. But this does not outweigh the significant improvements in flexibility and stability granted by the new environment.

A practical example

Looping back to the start where we talked about the new Remix VOD2Live demo let's take a look at how it actually works.

First we start with a git repository that will contain the Jenkins workflow as code, as well as everything required to build the Docker image and deploy it to Kubernetes using Helm.

demo git repository

The Jenkinsfile defines the CI workflow that will build a new Docker image, deploy that image to Kubernetes and then validate the deployment is working as expected.

It starts out by defining a set of environment variables that will be used by the workflow, including some which are calculated based on how the job has been triggered. i.e. whether it was triggered by a change to this git repository, or to the upstream dependency which is the base Unified Origin image.

Then it moves through the various workflow stages.

Building the Docker image

Building the Docker image is done using Kaniko, as this can run inside of a Docker container and doesn’t need access to the Docker daemon. An argument is passed to the build process to inform it which upstream Docker image to build from. And the built image is tagged and pushed with 4 different tags to the image repository.

stage('Build Origin Docker image') {
    steps {
        container('kaniko') {
            sh 'echo "{\\"auths\\":{\\"$REGISTRY_URL\\":{\\"username\\":\\"$REGISTRY_TOKEN_USR\\",\\"password\\":\\"$REGISTRY_TOKEN_PSW\\"}}}" > /kaniko/.docker/config.json'
            sh """/kaniko/executor \
                    -f `pwd`/docker/origin/Dockerfile \
                    -c `pwd`/docker/origin \
                    --cache=true \
                    --cache-repo=$DOCKER_REPO/cache \
                    --build-arg UPSTREAM_DOCKER_IMAGE=$UPSTREAM_DOCKER_IMAGE \
                    --destination $DOCKER_REPO/$BRANCH_NAME:$params.VERSION \
                    --destination $DOCKER_REPO/$BRANCH_NAME:$params.SVN_COMMIT \
                    --destination $DOCKER_REPO/$BRANCH_NAME:$GIT_COMMIT \
                    --destination $DOCKER_REPO/$BRANCH_NAME:latest
            """
        }
    }
}

The Dockerfile used to build the image is fairly straightforward, it just builds on top of the existing Unified Origin image but copies in a new entrypoint script and the website files. The new entrypoint script runs the Unified Remix command line against every SMIL file mounted into the container, and then uses mp4split to create a server manifest from that with the options set to serve it as a VOD2Live stream.

ARG UPSTREAM_DOCKER_IMAGE

FROM $UPSTREAM_DOCKER_IMAGE

# Copy in entrypoint
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh

# Copy in demo page
COPY index.html /var/www/unified-origin/index.html
COPY clientaccesspolicy.xml /var/www/unified-origin/clientaccesspolicy.xml
COPY crossdomain.xml /var/www/unified-origin/crossdomain.xml
COPY favicon.ico /var/www/unified-origin/favicon.ico
COPY V2L_SMIL_Playlist_diagram_2020.svg /var/www/unified-origin/V2L_SMIL_Playlist_diagram_2020.svg
COPY static /var/www/unified-origin/static

EXPOSE 80

ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]

CMD ["-D", "FOREGROUND"]

Deploying to Kubernetes

Once the Docker image has been successfully built it needs to be deployed to our Kubernetes cluster which is done using Helm. Before running Helm a few changes are made to the values file and chart definition to set the version number correctly. Then Helm is run to upgrade or install the demo into its own namespace. A lot of configuration is set at this point by passing in environment variables that will be made available to the Docker container. This includes the credentials for storage authentication, and the exact parameters to use when packaging the server manifest file with mp4split.

We are aggressively subdividing our Kubernetes environment into namespaces so that every branch of every demo gets its own namespace, so in this example the trunk branch of the VOD2Live demo will be deployed in a namespace vod2live-trunk. This makes it straightforward to access and monitor the different versions.

stage('Deploy to Kubernetes') {
    steps {
        container('helm') {
            sh """
                sed -i \
                    -e "s|tag: latest|tag: $params.VERSION|g" \
                    -e "s|repository: .*|repository: $DOCKER_REPO/$BRANCH_NAME|g" \
                    chart/values.yaml
                sed -i \
                    -e "s|version: trunk|version: $params.VERSION|g" \
                    -e "s|appVersion: trunk|appVersion: $params.VERSION|g" \
                    chart/Chart.yaml
                helm --kubeconfig $KUBECONFIG \
                    upgrade \
                    --install \
                    --wait \
                    --timeout 600s \
                    --namespace $NAMESPACE \
                    --create-namespace \
                    --set licenseKey=$UspLicenseKey \
                    --set imagePullSecret.username=$REGISTRY_TOKEN_USR \
                    --set imagePullSecret.password=$REGISTRY_TOKEN_PSW \
                    --set imagePullSecret.secretName=gitlab-reg-secret \
                    --set imagePullSecret.registryURL=$REGISTRY_URL \
                    --set image.repository=$DOCKER_REPO/$BRANCH_NAME \
                    --set image.tag=$params.VERSION \
                    --set environment=$BRANCH_NAME \
                    --set env[0].name=REMOTE_STORAGE_URL \
                    --set env[0].value=$REMOTE_STORAGE_URL \
                    --set env[1].name=S3_ACCESS_KEY \
                    --set env[1].value=$S3_KEY_USR \
                    --set env[2].name=S3_SECRET_KEY \
                    --set env[2].value=$S3_KEY_PSW \
                    --set env[3].name=S3_REGION \
                    --set env[3].value=$S3_REGION \
                    --set env[4].name=MP4SPLIT_OPTS \
                    --set env[4].value="--timed_metadata --splice_media --hls.client_manifest_version 5 --hls.minimum_fragment_length 48/25 --dvr_window_length 1800 --no_inband_parameter_sets --hls.fmp4" \
                    $RELEASE_NAME \
                    ./chart
            """
        }
    }
}

The Unified Origin Docker container that gets deployed has its own simple self checks, but they only confirm that the Apache webserver is running, so the final step in the workflow is to validate that the Origin plugin is working as expected and serving up an HLS manifest on the URL used by the demo.

stage('Test Deployment') {
    steps {
        sh 'curl --silent --fail --show-error http://$RELEASE_NAME.$NAMESPACE.svc.k8s.unified-streaming.com/unified-learning.isml/.m3u8'
    }
}

A potential future improvement to the workflow could be to actually validate the output manifest, perhaps using Apple’s mediastreamvalidator tool, but for now it just checks for a successful 200 response.

Branching workflow

Once the workflow completes the demo is now successfully updated and running the newest trunk build of our software. However, we don’t necessarily want to always be running the latest and greatest version on our public demo as it’s always possible that we introduced a new bug that slipped through our automated testing process that affects the demo somehow.

So to maintain a stable public version we use a git branch based workflow using staging and stable branches that require manual intervention. So when someone wants to update the public demo, they first merge any changes from trunk into the staging branch. At this stage they might also make additional changes, for example to pin to a specific version of the software. Jenkins will then deploy this to the staging environment where everything can be manually verified.

This leads to a git graph looking something like: 

git branch graph

After the changes have been verified they can then be merged to the stable branch to trigger deployment to the public facing demo.

And the output of this workflow is the stable demo which is available here.

Getting to this point has taken a lot of iteration and testing, experimenting with exactly how to structure our repositories and Jenkins pipelines. But the end result has made it significantly easier for developers to build and test demos while maintaining control and keeping a stable public version.