Canary deployments with Spinnaker and Kubernetes – Part 2

Canary deployments with Spinnaker and Kubernetes – Part 2

In part 1 of this tutorial, we briefly looked at the concept of canary deployments, and installed Jenkins and Prometheus on an EKS-based Kubernetes cluster. In this part, we will setup Spinnaker using AWS S3 as its backend. After enabling canary deployment functionality we’ll set up a canary pipeline to test our basic service.

If you haven’t been through part 1, I would suggest going back and working through the installation process before continuing here.

Setup authentication for Spinnaker

To install and configure Spinnaker we will use Halyard. Halyard is a command-line tool written by Netflix to administer Spinnaker installations. Before continuing, you should install it.

Spinnaker requires a service account that will allow it to execute commands against our cluster within Spinnaker. The following hefty list of commands will set this up:

CONTEXT=$(kubectl config current-context)
kubectl create namespace spinnaker
kubectl apply -f https://d3079gxvs8ayeg.cloudfront.net/templates/spinnaker-service-account.yaml
kubectl apply -f https://d3079gxvs8ayeg.cloudfront.net/templates/spinnaker-cluster-role-binding.yaml
TOKEN=$(kubectl get secret --context $CONTEXT \
$(kubectl get serviceaccount spinnaker-service-account \
--context $CONTEXT \
-n spinnaker \
-o jsonpath='{.secrets[0].name}') \
-n spinnaker \
-o jsonpath='{.data.token}' | base64 --decode)
kubectl config set-credentials ${CONTEXT}-token-user --token $TOKEN
kubectl config set-context $CONTEXT --user ${CONTEXT}-token-user

One thing to check quickly before we move on is that we don’t have our AWS_PROFILE environment variable defined in our ~/.kube/config file. If you do find this under the users > user > exec > env section of the file, then remove it and save the file. If it is there, then Spinnaker will attempt to use that profile when executing kubectl commands against your cluster, and fail because it doesn’t exist in the container.

Enable Kubernetes support in Spinnaker

We want Spinnaker to be able to deploy services to our Kubernetes cluster. First up to enable Kubernetes and artifact support we run:

hal config provider kubernetes enable && hal config features edit --artifacts true

Next, we’ll add our Kubernetes account.

To get our Kubernetes context, run the following:

hal config provider kubernetes account add ekssample --provider-version v2 --context $(kubectl config current-context)

Setting up external storage

Spinnaker requires external storage configured. It uses this to store application settings and your configured pipelines. There are a number of available options. For this tutorial we’ll use AWS S3.

To get this up and going we will need a couple of things: an S3 bucket, and an IAM user with credentials that Spinnaker can use to access the bucket. Just to note, it is possible to set up IAM roles on the worker nodes to enable this, but that’s outside the scope of this tutorial.

Create a bucket in your account. Create an IAM user, and make a note of the access key and secret key of the user. Attach the managed policy in the screenshot below to your user (obviously in production we would want to limit these permissions to just our bucket):

spinnaker_s3fullaccess

Run the following to enable the bucket:

hal config storage s3 edit \
--access-key-id myaccesskey \
--secret-access-key \
--bucket mybucket \
--endpoint "" \
--region myregion

Replace mybucket and myregion in the above command with the name and region of the bucket you created. Replace myaccesskey with the access key of the IAM user you created previously. You will be prompted to enter your secret key as well.

Finally to enable S3 storage run:

hal config storage edit --type s3

Enabling docker registry

Next up we’ll enable docker registries, and add our Docker Hub registry.

To enable docker registries, run:

hal config provider docker-registry enable

followed by this command to add your application repo from Docker Hub. Obviously change the repo name to match where you have pushed the application to.

hal config provider docker-registry account add simple-web-server --address index.docker.io --repositories densikatshine/simple-web-server

Enable canary support

Almost there with the setup of Spinnaker! Let’s go ahead, and finish it off, by enabling canary support with Prometheus.

To start, run the following:

hal config canary enable && hal config canary aws enable

Spinnaker needs a place to store canary information. We’ll use the S3 bucket we created earlier. Run the following (replacing mybucket, with the bucket you created above):

hal config canary aws account add ekssample-canary --bucket mybucket --region ap-southeast-2

Set this as account as the default storage:

hal config canary edit --default-storage-account ekssample-canary

Finally, enable S3 as an enabled storage destination for canary information:

hal config canary aws edit --s3-enabled true

We need to add Prometheus as our metric store, collecting metric information from our application. That information will be used in canary decision making. To start, run “kubectl get svc” and look for the Prometheus server service. Mine looks like this

spinnaker_prometheusaddress

We are going to use that IP address. In a production setup, you would have a proper ingress address for Prometheus.

Let’s use that address to enable Prometheus canary support (replace myip with the address you got above):

hal config canary prometheus account add ekssample-prom --base-url http://myip

Finally set Prometheus as your default metrics store and enable it:

hal config canary edit --default-metrics-store prometheus && hal config canary prometheus enable

Before applying these settings, you will need to pick a version of Spinnaker to install. You can run “hal version list” to see what is available. I chose 1.15.2 for this tutorial. You can set the version by running “hal config version edit –version 1.15.2”.

To enable deployment onto our Kubernetes cluster we will run the following command to tell Halyard we are using the “distributed” deployment type:

hal config deploy edit --type distributed --account-name ekssample

Everything is configured. Let’s run “hal deploy apply”. All things going well, you’ll see the following after everything has finished running.

spinnaker_deployapply

Success!

Wait for about 5 minutes to let everything install, then connect to Spinnaker. We’ll use Kubernetes port forwarding to connect. In separate terminals, run the following to create tunnels:

export DECK_POD=$(kubectl get pods --namespace spinnaker -l "cluster=spin-deck" -o jsonpath="{.items[0].metadata.name}") && kubectl port-forward --namespace spinnaker $DECK_POD 9000

export GATE_POD=$(kubectl get pods --namespace spinnaker -l "cluster=spin-gate" -o jsonpath="{.items[0].metadata.name}") && kubectl port-forward --namespace spinnaker $GATE_POD 8084

Assuming everything connects, you can hit the Spinnaker console on http://localhost:9000

Configuring our initial application and pipeline

To create a new application in Spinnaker, head to the application page and select “Create Application” from the Actions drop-down in the top right:

spinnaker_appmenu

Create an application as per the screenshot below:

spinnaker_createapp

You’ll be dropped into the clusters page of the application. It will be empty for now.

Click on pipelines, and select create, call the pipeline “Initialise”. This pipeline will deploy the initial pods for our service. We will use this later on in our canary pipeline.

This is going to be a simple pipeline. It will have one effective step: it will deploy a Kubernetes manifest. Select your Account and Application. The manifest to use can be found here, just paste it in.

spinnaker_initialisepipeline

Save that, and head back to the pipelines page. You’ll see a “Start Manual Execution” option on your newly created pipeline. Run that.

spinnaker_manualrun

Assuming the deployment is successful, if you click on the Infrastructure tab you will see your service running!

spinnaker_initinfra

Let’s test our server.

Run the following to create a tunnel to our application:

prodpod=$(kubectl get pods -l=app=simple-web-server --namespace spinnaker -o jsonpath="{.items[0].metadata.name}") && kubectl port-forward $prodpod 8001:8080 --namespace spinnaker

Navigate to http://localhost:8001/hello/world. If your tunnel is running correctly you should see a simple “Hello world” string in your browser.

spinnaker_helloworld

Configuring our canary pipeline

Finally! We’re ready to configure a basic canary deployment pipeline.

The basic flow of our pipeline is as follows:

  1. A new version of our application is built and pushed to Docker Hub
  2. Our canary pipeline is triggered by this container being pushed
  3. We create a new “baseline” pod. This is a pod that is based on our currently running production version. This step is important. We want to collect metrics against a “new” copy of our old container so we don’t muddy the waters testing against a pod that might have been running for a long time.
  4. We create a pod that is based on our new container.
  5. We now run a canary analysis. This will involve measuring the latency of the /hello endpoint once a minute over a three minute period. If latency is acceptable after this period, we will promote this new container to production. If latency increases, we fail the pipeline.

Let’s enable canary analysis on our simple web server application:

spinnaker_swsconfig

Go to the config section of our app, as per the screenshot above. From here we’ll go to the features section, enable canary analysis, and save those changes.

spinnaker_canarycheck

Enabling this changes the menus available for our application. Instead of pipelines, you are going to see a delivery menu containing an option: “canary configs”

spinnaker_cconfigmenu.png

Head into canary configs and select “Add configuration”

Leave the name of the config as “new-config”

Select “Add Metric”

spinnaker_addmetric.png

Add the following settings for your metric

  • Name: greeting_seconds_sum
  • Fail on: Increase
  • Metric Name: greeting_seconds_sum

Create a new filter template in your metric settings with the following inputs

  • Name: greeting_latency
  • Template: kubernetes_pod_name=~”${scope}.+”

Select save on the filter template before saving your metric!

spinnaker_greetingmetric

On the main page of your canary config, you will see a “scoring” section. Under here change “Group 1” score to 100 and save your config

spinnaker_finalcanconfig

I’m sure some are wondering “where these metrics are coming from?” If you take a look at the main.go file in our project, you can see we register a Prometheus summary metric to our /hello endpoint. We then register a /metrics endpoint.

spinnaker_hellometriccode

We enable this /metrics endpoint in our Kubernetes manifest. The below tells Prometheus to scrape statistics from this endpoint of our service.

spinnaker_promscrape

Creating a canary pipeline

We’re ready to create our canary pipeline. Go back to the simplewebserver application in Spinnaker, and create a new pipeline called “Canary”

This time around we are going to add an automated trigger in the configuration of the section. Spinnaker allows you to trigger pipelines in a variety of automated ways. In this case, we will trigger a build when a new docker container is pushed to our Docker Hub repo.

The settings for our trigger will look like the below screenshot (change it to match the docker registry that you setup above when configuring Spinnaker).

spinnaker_dockertrigger

The next step in our pipeline will be of type “Find artifacts from resource (manifest)” with the following settings:

  • Stage Name: FindArtifact
  • Account: ekssample
  • Namespace: spinnaker
  • Kind: deployment
  • Selector: static target
  • Name: simple-web-server

spinnaker_findartifact.png

The purpose of this is to get the currently deployed container for this application. We will use this to set up our baseline container.

We will create a deploy manifest step to use this value with the following settings. The name of the stage should be “DeployBaseline”. An example of the manifest to use is here.

spinnaker_deploybaseline

As noted before, we use the currently deployed container as a baseline. We achieve that by referencing the artifact from the FindArtifact step in the image field:

image: '${#stage(''FindArtifact'').context["artifacts"][0]["reference"]}'

We will make another deploy manifest step to deploy our new container. An example manifest is here. The name of the step should be “DeployCanary”.

spinnaker_deploycanary

We pass through the tag value we get from the automated trigger in the configuration step of the pipeline:

image: 'densikatshine/simple-web-server:${trigger[''tag'']}'

We add our canary analysis step: add a canary analysis stage called “CanaryAnalysis” with the following values:

  • Config name: new-config (the canary config we previously set up)
  • Lifetime: 3 minutes (the total time to run the analysis)
  • Delay: 1 minute (this is the delay we allow before starting analysis)
  • Interval: 1 minute (how often we take a measurement)
  • Baseline: simple-web-server-baseline
  • Canary: simple-web-server-canary

For scoring thresholds set marginal to 50 and pass to 80

Set “If stage fails” to “ignore the failure”. We do this so we can continue with the pipeline and tear down our baseline and canary containers even if the canary analysis fails.

spinnaker_canaryanalysis

An important thing to note here: the baseline and canary values are passed through as the “scope” value in the metric filter template (that we created earlier). We will collect metrics against Kubernetes pods with the container name passed through. This is how we differentiate our baseline from our canary pod.

The threshold values aren’t particularly meaningful in our simple example, but essentially “Marginal” means that if any of our samples drop below this value we will immediately fail the analysis. The “Pass” value is the total score we are required to meet after all samples have been taken.

We make three more steps.

The first one is the production deployment step. This a deploy manifest step called “DeployProduction”. An example of the manifest is here. The critical point in this step is the following execution option:

spinnaker_canarysucceed

${ #stage('CanaryAnalysis')['status'].toString() == 'SUCCEEDED' }

This stage says only run our deployment to production step if the previous canary analysis step succeeded.

We create two “Delete Manifest” stages. They will be essentially the same task, with the Kubernetes deployment name changed. They will remove our baseline and canary deployments, whether or not the canary succeeded. To do this, just make them dependent on the canary analysis step, rather than the deploy production step.

spinnaker_deletebaseline

spinnaker_deletecanary

You should have a pipeline that looks this:

spinnaker_canarypipeline

We’re ready to give the pipeline a test run.

A couple of preliminaries: during the canary analysis stage you are going to create two port-forward tunnels (to your baseline and canary pods) and run some curl commands against the endpoints.

The commands to create the tunnels are:

baselinepod=$(kubectl get pods -l=release=simple-web-server-baseline --namespace spinnaker -o jsonpath="{.items[0].metadata.name}") && kubectl port-forward --namespace spinnaker $baselinepod 8001:8080

canarypod=$(kubectl get pods -l=release=simple-web-server-canary --namespace spinnaker -o jsonpath="{.items[0].metadata.name}") && kubectl port-forward $canarypod --namespace spinnaker 8002:8080

Our command is a simple one:

for i in {1..5}
do
    curl http://localhost:8001/hello/world
    curl http://localhost:8002/hello/world
done

We simply hit each pod 5 times with a simple request.

To kick off the pipeline, we modify the container tag in our Jenkinsfile, push it to Github, and run a build in Jenkins. Normally tags would be dynamically set based on branch name; as we’re just testing we’ll make the change manually and push. Let’s edit our Jenkinsfile for our project as per the screenshot below

spinnaker_newtag

Once you have pushed this, go back to Jenkins and run a build on your master branch. This will build the app and push this new tag to Docker Hub.

After you have pushed the container, take a look at the pipelines section in Spinnaker, you should see a new pipeline start running.

spinnaker_pipelinetriggered

When the deploy baseline and deploy canary sections are passed, we will have a baseline and canary container running. Run the commands I mentioned above to create your tunnels, then run the curl commands a couple of times to create some metric data.

After a few minutes, you will see the canary analysis is successful and the new container will be pushed to production.

spinnaker_passedcanary

Click on one of the canary reports, as highlighted in the above screenshot. You will get more results about the canary sample.

spinnaker_goodreport

If you jump into the infrastructure tab of our application in Spinnaker, you will see the new container is now the one running in production.

spinnaker_goodinfra

We have a good example. Now let’s run a test and insert some artificial latency. As we did above, modify the label in your Jenkinsfile with a different label, build the container, and have it pushed to Docker Hub.

This time when we test, we will run the slightly modified command. This will hit the /hello/slow endpoint once, which will add a 10-second sleep on the canary container.

for i in {1..5}
do
    curl http://localhost:8001/hello/world
    curl http://localhost:8002/hello/world
done

curl http://localhost:8002/hello/slow

You will only need to run it once after your baseline and canary containers are running.

Once we’ve run this test and our canary analysis has finished, you will see the analysis has failed. The container has not been promoted and we have removed our baseline and canary containers.

spinnaker_failedcanary

If we check into our detailed report for the canary analysis, we see our latency on the canary /hello endpoint has increased by a whopping 22963546%!

spinnaker_failedcanaryreport

If we check our infrastructure tab again, we can see we are still running the previously successfully tested container.

spinnaker_goodinfra

What now?

So there we have it: a simple canary deployment example. To be honest, we’ve barely scratched the surface of what Spinnaker is capable of.

To do real production canary deployments, you are going to want a tool that helps you gradually shift traffic. In the world of Kubernetes, this is going to be a service mesh like Istio. The idea is that you gradually shift traffic and then re-run your canary analysis step at each traffic shift increment.

Spinnaker also offers a wide range of deployment methods, not just canary.

Is it worth it?

At this point, you may be asking whether the effort is worth it? There is no denying it, Spinnaker is a beast. It has great power, but it can take a lot of work to set up a simple example like the one above.

My guess is this question revolves around the size of your development teams and your capability to manage a complex product like Spinnaker.

As you get larger and move faster, the more inherent risk there is to deployment. I like to think canary deployments attempt to mitigate some of the “unknown unknowns” of deploying software.

As every deployment essentially becomes a mini-experiment you need to get the most relevant metric data to make your decisions based on, this is going to be domain-specific and is going to require your developers and domain experts to a create a set of signals that you can use to make informed decisions with.

One of the initial fears people face with canary deployments is the feeling that they are testing in production. The way I see it, if you are using something like a blue/green deployment strategy, then you are essentially testing 100% of your traffic when you switch over to your new code.

It’s not uncommon, with something like Spinnaker, to chain more than one deployment method together. For example, you do your initial canary analysis, if that passes you then move on to do a blue/green switch when promoting your code to production. This gives you an extra safety mechanism to switch back quickly to your previous version of the software.

Summing up

I am hoping this tutorial will allow you the opportunity to kick the tyres of this great tool. However, it will require some patience to get going – it took me hours of reading and tinkering to get my pipelines set up the first time.

I’ll leave you with some links that I found valuable when researching this topic.

david.ensikat@shinesolutions.com

DevOps Engineer at Shine Solutions

No Comments

Leave a Reply