Deploy and bootstrap your Kubernetes cluster with Azure DevOps and GitOps

A while ago, I published a post about deploying AKS with Azure DevOps with extras like Nginx Ingress, cert-manager and several others. An Azure Resource Manager (ARM) template is used to deploy Azure Kubernetes Service (AKS). The extras are installed with Helm charts and Helm installer tasks. I mainly use it for demo purposes but I often refer to it in my daily work as well.

Although this works, there is another approach that combines an Azure DevOps pipeline with GitOps. From a high level point of view, that works as follows:

  • Deploy AKS with an Azure DevOps pipeline: declarative and idempotent thanks to the ARM template; the deployment is driven from an Azure DevOps pipeline but other solutions such as GitHub Actions will do as well (push)
  • Use a GitOps tool to deploy the GitOps agents on AKS and bootstrap the cluster by pointing the GitOps tool to a git repository (pull)

In this post, I will use Flux v2 as the GitOps tool of choice. Other tools, such as Argo CD, are capable of achieving the same goal. Note that there are ways to deploy Kubernetes using GitOps in combination with the Cluster API (CAPI). CAPI is quite a beast so let’s keep this post a bit more approachable. 😉

Let’s start with the pipeline (YAML):

# AKS deployment pipeline
trigger: none

variables:
  CLUSTERNAME: 'CLUSTERNAME'
  RG: 'CLUSTER_RESOURCE_GROUP'
  GITHUB_REPO: 'k8s-bootstrap'
  GITHUB_USER: 'GITHUB_USER'
  KEY_VAULT: 'KEYVAULT_SHORTNAME'

stages:
- stage: DeployGitOpsCluster
  jobs:
  - job: 'Deployment'
    pool:
      vmImage: 'ubuntu-latest'
    steps: 
    # DEPLOY AKS
    - task: AzureResourceGroupDeployment@2
      inputs:
        azureSubscription: 'SUBSCRIPTION_REF'
        action: 'Create Or Update Resource Group'
        resourceGroupName: '$(RG)'
        location: 'YOUR LOCATION'
        templateLocation: 'Linked artifact'
        csmFile: 'aks/deploy.json'
        csmParametersFile: 'aks/deployparams.gitops.json'
        overrideParameters: '-clusterName $(CLUSTERNAME)'
        deploymentMode: 'Incremental'
        deploymentName: 'aks-gitops-deploy'
       
    # INSTALL KUBECTL
    - task: KubectlInstaller@0
      name: InstallKubectl
      inputs:
        kubectlVersion: '1.18.8'

    # GET CREDS TO K8S CLUSTER WITH ADMIN AND INSTALL FLUX V2
    - task: AzureCLI@1
      name: RunAzCLIScripts
      inputs:
        azureSubscription: 'AzureMPN'
        scriptLocation: 'inlineScript'
        inlineScript: |
          export GITHUB_TOKEN=$(GITHUB_TOKEN)
          az aks get-credentials -g $(RG) -n $(CLUSTERNAME) --admin
          msi="$(az aks show -n CLUSTERNAME -g CLUSTER_RESOURCE_GROUP | jq .identityProfile.kubeletidentity.objectId -r)"
          az keyvault set-policy --name $(KEY_VAULT) --object-id $msi --secret-permissions get
          curl -s https://toolkit.fluxcd.io/install.sh | sudo bash
          flux bootstrap github --owner=$(GITHUB_USER) --repository=$(GITHUB_REPO) --branch=main --path=demo-cluster --personal

A couple of things to note here:

  • The above pipeline contains several strings in UPPERCASE; replace them with your own values
  • GITHUB_TOKEN is a secret defined in the Azure DevOps pipeline and set as an environment variable in the last task; it is required for the flux bootstrap command to configure the GitHub repo (e.g. deploy key)
  • the AzureResourceGroupDeployment task deploys the AKS cluster based on parameters defined in deployparams.gitops.json; that file is in a private Azure DevOps git repo; I have also added them to the gbaeke/k8s-bootstrap repository for reference
  • The AKS deployment uses a managed identity versus a service principal with manually set client id and secret (recommended)
  • The flux bootstrap command deploys an Azure Key Vault to Kubernetes Secrets controller that requires access to Key Vault; the script in the last task retrieves the managed identity object id and uses az keyvault set-policy to grant get key permissions; if you delete and recreate the cluster many times, you will have several UNKNOWN access policies at the Key Vault level

The pipeline is of course short due to the fact that nginx-ingress, cert-manager, dapr, KEDA, etc… are all deployed via the gbaeke/k8s-bootstrap repo. The demo-cluster folder in that repo contains a source and four kustomizations:

  • source: reference to another git repo that contains the actual deployments
  • k8s-akv2k8s-kustomize.yaml: deploys the Azure Key Vault to Kubernetes Secrets controller (akv2k8s)
  • k8s-secrets-kustomize.yaml: deploys secrets via custom resources picked up by the akv2k8s controller; depends on akv2k8s
  • k8s-common-kustomize.yaml: deploys all components in the ./deploy folder of gbaeke/k8s-common (nginx-ingress, external-dns, cert-manager, KEDA, dapr, …)

Overall, the big picture looks like this:

Note that the kustomizations that point to ./akv2k8s and ./deploy actually deploy HelmReleases to the cluster. For instance in ./akv2k8s, you will find the following manifest:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: akv2k8s
  namespace: flux-system
spec:
  chart:
    spec:
      chart: akv2k8s
      sourceRef:
        kind: HelmRepository
        name: akv2k8s-repo
  interval: 5m0s
  releaseName: akv2k8s
  targetNamespace: akv2k8s

This manifest tells Flux to deploy a Helm chart, akv2k8s, from the HelmRepository source akv2k8s-repo that is defined as follows:

---
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
  name: akv2k8s-repo
  namespace: flux-system
spec:
  interval: 1m0s
  url: http://charts.spvapi.no/

It is perfectly valid to use a kustomization that deploys manifests that contain resources of kind HelmRelease and HelmRepository. In fact, you can even patch those via a kustomization.yaml file if you wish.

You might wonder why I deploy the akv2k8s controller first, and then deploy a secret with the following manifest (upercase strings to be replaced):

apiVersion: spv.no/v1
kind: AzureKeyVaultSecret
metadata:
  name: secret-sync 
  namespace: flux-system
spec:
  vault:
    name: KEYVAULTNAME # name of key vault
    object:
      name: SECRET # name of the akv object
      type: secret # akv object type
  output: 
    secret: 
      name: SECRET # kubernetes secret name
      dataKey: values.yaml # key to store object value in kubernetes secret

The external-dns chart I deploy in later steps requires configuration to be able to change DNS settings in Cloudflare. Obviously, I do not want to store the Cloudflare secret in the k8s-common git repo. One way to solve that is to store the secrets in Azure Key Vault and then grab those secrets and convert them to Kubernetes secrets. The external-dns HelmRelease can then reference the secret to override values.yaml of the chart. Indeed, that requires storing a file in Key Vault which is easy to do like so (replace uppercase strings):

az keyvault secret set --name SECRETNAME --vault-name VAULTNAME --file ./YOURFILE.YAML

You can call the secret what you want but the Kubernetes secret dataKey should be values.yaml for the HelmRelease to work properly.

There are other ways to work with secrets in GitOps. The Flux v2 documentation mentions SealedSecrets and SOPS and you are of course welcome to use that.

Take a look at the different repos I outlined above to see the actual details. I think it makes the deployment of a cluster and bootstrapping the cluster much easier compared to suing a bunch of Helm install tasks and manifest deployments in the pipeline. What do you think?

An introduction to Flux v2

If you have read my blog and watched my Youtube channel, you know I have worked with Flux in the past. Flux, by weaveworks, is a GitOps Kubernetes Operator that ensures that your cluster state matches the desired state described in a git repository. There are other solutions as well, such as Argo CD.

With Flux v2, GitOps on Kubernetes became a lot more powerful and easier to use. Flux v2 is built on a set of controllers and APIs called the GitOps Toolkit. The toolkit contains the following components:

  • Source controller: allows you to create sources such as a GitRepository or a HelmRepository; the source controller acts on several custom resource definitions (CRDs) as defined in the docs
  • Kustomize controller: runs continuous delivery pipelines defined with Kubernetes manifests (YAML) files; although you can use kustomize and define kustomization.yaml files, you do not have to; internally though, Flux v2 uses kustomize to deploy your manifests; the kustomize controller acts on Kustomization CRDs as defined here
  • Helm controller: deploy your workloads based on Helm charts but do so declaratively; there is no need to run helm commands; see the docs for more information
  • Notification controller: responds to incoming events (e.g. from a git repo) and sends outgoing events (e.g. to Teams or Slack); more info here

If you throw it all together, you get something like this:

GitOps Toolkit components that make up Flux v2 (from https://toolkit.fluxcd.io/)

Getting started

To get started, you should of course look at the documentation over at https://toolkit.fluxcd.io. I also created a series of videos about Flux v2. The first one talks about Flux v2 in general and shows how to bootstrap a cluster.

Part 1 in the series about Flux v2

Although Flux v2 works with other source control systems than GitHub, for instance GitLab, I use GitHub in the above video. I also use kind, to make it easy to try out Flux v2 on your local machine. In subsequent videos, I use Azure Kubernetes Services (AKS).

In Flux v2, it is much easier to deploy Flux on your cluster with the flux bootstrap command. Flux v2 itself is basically installed and managed via GitOps principles by pushing all Flux v2 manifests to a git repository and running reconciliations to keep the components running as intended.

Kustomize

Flux v1 already supported kustomize but v2 takes it to another level. Whenever you want to deploy to Kubernetes with YAML manifests, you will create a kustomization, which is based on the Kustomization CRD. A kustomization is defined as below:

apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: realtimeapp-dev
  namespace: flux-system
spec:
  healthChecks:
  - kind: Deployment
    name: realtime-dev
    namespace: realtime-dev
  - kind: Deployment
    name: redis-dev
    namespace: realtime-dev
  interval: 1m0s
  path: ./deploy/overlays/dev
  prune: true
  sourceRef:
    kind: GitRepository
    name: realtimeapp-infra
  timeout: 2m0s
  validation: client

A kustomization requires a source. In this case, the source is a git repository called realtimeapp-infra that was already defined in advance. The source just points to a public git repository on Github: https://github.com/gbaeke/realtimeapp-infra.

The source contains a deploy folder, which contains a bases and an overlays folder. The kustomization points to the ./deploy/overlays/dev folder as set in path. That folder contains a kustomization.yaml file that deploys an application in a development namespace and uses the base from ./deploy/bases/realtimeapp as its source. If you are not sure what kustomize exactly does, I made a video that tries 😉 to explain it:

An introduction to kustomize

It is important to know that you do not need to use kustomize in your source files. If you point a Flux v2 kustomization to a path that just contains a bunch of YAML files, it will work equally well. You do not have to create a kustomization.yaml file in that folder that lists the resources (YAML files) that you want to deploy. Internally though, Flux v2 will use kustomize to deploy the manifests and uses the deployment order that kustomize uses: first namespaces, then services, then deployments, etc…

The interval in the kustomization (above set at 1 minute) means that your YAML files are applied at that interval, even if the source has not changed. This ensures that, if you modified resources on your cluster, the kustomization will reset the changes to the state as defined in the source. The source itself has its own interval. If you set a GitRepository source to 1 minute, the source is checked every 1 minute. If the source has changes, the kustomizations that depend on the source will be notified and proceed to deploy the changes.

A GitRepository source can refer to a specific branch, but can also refer to a semantic versioning tag if you use a semver range in the source. See checkout strategies for more information.

Deploying YAML manifests

If the above explanation of sources and kustomizations does not mean much to you, I created a video that illustrates these aspects more clearly:

In the above video, the source that points to https://github.com/gbaeke/realtimeapp-infra gets created first (see it at this mark). Next, I create two kustomizations, one for development and one for production. I use a kustomize base for the application plus two overlays, one for dev and one for production.

What to do when the app container images changes?

Flux v1 has a feature that tracks container images in a container registry and updates your cluster resources with a new image based on a filter you set. This requires read/write access to your git repository because Flux v1 set the images in your source files. Flux v2 does not have this feature yet (November 2020, see https://toolkit.fluxcd.io/roadmap).

In my example, I use a GitHub Action in the application source code repository to build and push the application image to Docker Hub. The GitHub action triggers a build job on two events:

  • push to main branch: build a container image with a short sha as the tag (e.g. gbaeke/flux-rt:sha-94561cb
  • published release: build a container image with the release version as the tag (e.g. gbaeke/flux-rt:1.0.1)

When the build is caused by a push to main, the update-dev-image job runs. It modifies kustomization.yaml in the dev overlay with kustomize edit:

update-dev-image:
    runs-on: ubuntu-latest
    if: contains(github.ref, 'heads')
    needs:
    - build
    steps:
    - uses: imranismail/setup-kustomize@v1
      with:
        kustomize-version: 3.8.6
    - run: git clone https://${REPO_TOKEN}@github.com/gbaeke/realtimeapp-infra.git .
      env:
        REPO_TOKEN: ${{secrets.REPO_TOKEN}}
    - run: kustomize edit set image gbaeke/flux-rt:sha-$(git rev-parse --short $GITHUB_SHA)
      working-directory: ./deploy/overlays/dev
    - run: git add .
    - run: |
        git config user.email "$EMAIL"
        git config user.name "$GITHUB_ACTOR"
      env:
        EMAIL: ${{secrets.EMAIL}}
    - run: git commit -m "Set dev image tag to short sha"
    - run: git push

Similarly, when the build is caused by a release, the image is updated in the production overlay’s kustomization.yaml file.

Conclusion

If you are interested in GitOps as an alternative for continuous delivery to Kubernetes, do check out Flux v2 and see if it meets your needs. I personally like it a lot and believe that they are setting the standard for GitOps on Kubernetes. I have not covered Helm deployments, monitoring and alerting features yet. I will create additional videos and posts about those features in the near future. Stay tuned!

Docker without Docker: a look at Podman

I have been working with Docker for quite some time. More and more however, I see people switching to tools like Podman and Buildah and decided to give that a go.

I installed a virtual machine in Azure with the following Azure CLI command:

az vm create \
  	--resource-group RESOURCEGROUP \
  	--name VMNAME \
  	--image UbuntuLTS \
	--authentication-type password \
  	--admin-username azureuser \
  	--admin-password PASSWORD \
	--size Standard_B2ms

Just replace RESOURCEGROUP, VMNAME and PASSWORD with the values you want to use and you are good to go. Note that the above command results in Ubuntu 18.04 at the time of writing.

SSH into that VM for the following steps.

Installing Podman

Installation of Podman is easy enough. The commands below do the trick:

. /etc/os-release
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/Release.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y upgrade 
sudo apt-get -y install podman

You can find more information at https://podman.io/getting-started/installation.

Where Docker uses a client/server model, with a privileged Docker daemon and a docker client that communicates with it, Podman uses a fork/exec model. The container process is a child of the Podman process. This also means you do not require root to run a container which is great from a security and auditing perspective.

You can now just use the podman command. It supports the same arguments as the docker command. If you want, you can even create a docker alias for the podman command.

To check if everything is working, run the following command:

podman run hello-world

It will pull down the hello-world image from Docker Hub and display a message.

I wanted to start my gbaeke/nasnet container with podman, using the following command:

podman run  -p 80:9090 -d gbaeke/nasnet

Of course, the above command will fail. I am not running as root, which means I cannot bind a process to a port below 1024. There are ways to fix that but I changed the command to:

podman run  -p 9090:9090 -d gbaeke/nasnet

The gbaeke/nasnet container is large, close to 3 GB. Pulling the container from Docker Hub went fast but Podman took a very long time during the Storing signatures phase. While the command was running, I checked disk space on the VM with df and noticed that the machine’s disk was quickly filling up.

On WSL2 (Windows Subsystem for Linux), I did not have trouble with pulling large images. With the docker info command, I found that it was using overlay2 as the storage driver:

Docker on WSL2 uses overlay2

You can find more information about Docker and overlay2, see https://docs.docker.com/storage/storagedriver/overlayfs-driver/.

With podman, run podman info to check the storage driver podman uses. Look for graphDriverName in the output. In my case, podman used vfs. Although vfs is well supported and runs anywhere, it does full copies of layers (represented by directories on your filesystem) in the image which results in using a lot of diskspace. If the disk is not super fast, this will result in long wait times when pulling an image and waste of disk space.

Without getting bogged down in the specifics of the storage drivers and their pros and cons, I decided to switch Podman from vfs to fuse-overlayfs. Fuse stands for Filesystem in Userspace, so fuse-overlayfs is the implementation of overlayfs in userspace (using FUSE). It supports deduplication of layers which will result in less consumption of disk space. This should be very noticeable when pulling a large image.

IMPORTANT: remove the containers folder in ~/.local/share to clear out container storage before installing overlayfs. Use the command below;

rm -rf ~/.local/share/containers

Installing fuse-overlayfs

The installation instructions are at https://github.com/containers/fuse-overlayfs. I needed to use the static build because I am running Ubuntu 18.04. On newer versions of Ubuntu, you can use apt install libfuse3-dev.

It’s of no use here to repeat the static build steps. Just head over to the GitHub repo and follow the steps. When asked to clone the git repo, use the following command:

git clone https://github.com/containers/fuse-overlayfs.git

The final step in the instructions is to copy fuse-overlayfs (which was just built with buildah) to /usr/bin.

If you now run podman info, the graphDrivername should be overlay. There’s nothing you need to do to make that happen:

overlay storage driver with /usr/bin/fuse-overlayfs as the executable

When you now run the gbaeke/nasnet container, or any sufficiently large container, the process should be much smoother. I can still take a couple of minutes though. Note that at the end, your output will be somewhat like below:

Output from podman run -p 9090:9090 -d gbaeke/nasnet

Now you can run podman ps and you should see the running container:

gbaeke/nasnet container is running

Go to http://localhost:9090 and you should see the UI. Go ahead and classify an image! 😉

Conclusion

Installing and using Podman is easy, especially if you are familiar with Docker somewhat. I did have trouble with performance and disk storage with large images but that can be fixed by swapping out vfs with something like overlayfs. Be aware that there are many other options and that it is quite complex under the hood. But with the above steps, you should be good to go.

Will I use podman from now on? Probably not as Docker provides me all I need for now and a lot of tools I use are dependent on it.

Using the Dapr InfluxDB component

A while ago, I created a component that can write to InfluxDB 2.0 from Dapr. This component is now included in the 0.10 release. In this post, we will briefly look at how you can use it.

If you do not know what Dapr is, take a look at https://dapr.io. I also have some videos on Youtube about Dapr. And be sure to check out the video below as well:

Let’s jump in and use the component.

Installing Dapr

You can install Dapr on Windows, Mac and Linux by following the instructions on https://dapr.io/. Just click the Download link and select your operating system. I installed Dapr on WSL 2 (Windows Subsystem for Linux) on Windows 10 with the following command:

wget -q https://raw.githubusercontent.com/dapr/cli/master/install/install.sh -O - | /bin/bash

The above command just installs the Dapr CLI. To initialize Dapr, you need to run dapr init.

Getting an InfluxDB database

InfluxDB is a time-series database. You can easily run it in a container on your local machine but you can also use InfluxDB Cloud. In this post, we will simply use a free cloud instance. Just head over to https://cloud2.influxdata.com/signup and signup for an account. Just follow the steps and use a free account. It stores data for maximum 30 days and has some other limits as well.

You will need the following information to write data to InfluxDB:

  • Organization: this will be set to the e-mail account you signed up with; it can be renamed if you wish
  • Bucket: your data is stored in a bucket; by default you get a bucket called e-mail-prefix’s Bucket (e.g. geert.baeke’s Bucket)
  • Token: you need a token that provides the necessary access rights such as read and/or write

Let’s rename the bucket to get a feel for the user interface. Click Data, Buckets and then Settings as shown below:

Getting to the bucket settings

Click Rename and follow the steps to rename the bucket:

Renaming the bucket

Now, let’s create a token. In the Load Data screen, click Tokens. Click Generate and then click Read/Write Token. Describe the token and create it like below:

Creating a token

Now click the token you created and copy it to the clipboard. You now have the organization name, a bucket name and a token. You still need a URL to connect to but that just the URL you see in the browser (the yellow part):

URL to send your data

Your URL will depend on the cloud you use.

Python code to write to InfluxDB with Dapr

The code below requires Python 3. I used version 3.6.9 but it will work with more recent versions of course.

import time
import requests
import os

dapr_port = os.getenv("DAPR_HTTP_PORT", 3500)

dapr_url = "http://localhost:{}/v1.0/bindings/influx".format(dapr_port)
n = 0.0
while True:
    n += 1.0
    payload = { 
        "data": {
            "measurement": "temp",
            "tags": "room=dorm,building=building-a",
            "values": "sensor=\"sensor X\",avg={},max={}".format(n, n*2)
            }, 
        "operation": "create" 
    }
    print(payload, flush=True)
    try:
        response = requests.post(dapr_url, json=payload)
        print(response, flush=True)

    except Exception as e:
        print(e, flush=True)

    time.sleep(1)

The code above is just an illustration of using the InfluxDB output binding from Dapr. It is crucial to understand that a Dapr process needs to be running, either locally on your system or as a Kubernetes sidecar, that the above program communicates with. To that end, we get the Dapr port number from an environment variable or use the default port 3500.

The Python program uses the InfluxDB output binding simply by posting data to an HTTP endpoint. The endpoint is constructed as follows:

dapr_url = "http://localhost:{}/v1.0/bindings/influx".format(dapr_port)

The dapr_url above is set to a URI that uses localhost over the Dapr port and then uses the influx binding by appending /v1.0/bindings/influx. All bindings have a specific name like influx, mqtt, etc… and that name is then added to /v1.0/bindings/ to make the call work.

So far so good, but how does the binding know where to connect and what organization, bucket and token to use? That’s where the component .yaml file comes in. In the same folder where you save your Python code, create a folder called components. In the folder, create a file called influx.yaml (you can give it any name you want). The influx.yaml contents is shown below:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: influx
spec:
  type: bindings.influx
  metadata:
  - name: Url
    value: YOUR URL
  - name: Token
    value: "YOUR TOKEN HERE"
  - name: Org
    value: "YOUR ORG"
  - name: Bucket
    value: "YOUR BUCKET"

Of course, replace the uppercase values above with your own. We will later tell Dapr to look for files like this in the components folder. Automatically, because you use the influx binding in your Python code, Dapr will go look for the file above (type: bindings.influx) and retrieve the required metadata. If any of the metadata is not set or if the file is missing or improperly formatted, you will get an error.

To actually use the binding, we need to post some data to the URI we constructed. The data we send is in the payload variable as shown below:

 payload = { 
        "data": {
            "measurement": "temp",
            "tags": "room=dorm,building=building-a",
            "values": "sensor=\"sensor X\",avg={},max={}".format(n, n*2)
            }, 
        "operation": "create" 
    }

It requires a measurement field, a tags and a values field and uses the InfluxDB line protocol to send the data. You can find more information about that here.

The data field in the payload is specific to the Influx component. The operation field is required by this Dapr component as it is written to listen for create operations.

Running the code

On your local machine, you will need to run Dapr together with your code to make it work. You use dapr run for this. To run the Python code (saved to app.py in my case), run the command below from the folder that contains the code and the components folder:

dapr run --app-id influx -d ./components python3 app.py

This starts Dapr and our application with app id influx. With -d, we point to the components file.

When you run the code, Dapr logs and your logs will be printed to the screen. In InfluxDB Cloud, we can check the data from the user interface:

Date Explorer (Note: other organization and bucket than the one used in this post)

Conclusion

Dapr can be used in the cloud and at the edge, in containers or without. In both cases, you often have to write data to databases. With Dapr, you can now easily write data as time series to InfluxDB. Note that Dapr also has an MQTT input and output binding. Using the same simple technique you learned in this post, you can easily read data from an MQTT topic and forward it to InfluxDB. In a later post, we will take a look at that scenario as well. Or check this video instead: https://youtu.be/2vCT79KG24E. Note that the video uses a custom compiled Dapr 0.8 with the InfluxDB component because this video was created during development.

Dapr Service Invocation between an HTTP Python client and a GRPC Go server

Recently, I published several videos about Dapr on my Youtube channel. The videos cover the basics of state management, PubSub and service invocation.

The Getting Started with state management and service invocation:

Let’s take a closer look at service invocation with HTTP, Python and Node.

Service Invocation with HTTP

Service Invocation Diagram
Service invocation (image from Dapr docs)

The services you write (here Service A and B) talk to each other using the Dapr runtime. On Kubernetes, you talk to a Dapr sidecar deployed alongside your service container. On your development machine, you run your services via dapr run.

If you want to expose a method on Service B and you use HTTP, you just need to expose an HTTP handler or route. For example, with Express in Node you would use something like:

const express = require('express');
const app = express();
app.post('/neworder', (req, res) => { your code }

You then run your service and annotate it with the proper Dapr annotations (Kubernetes):

annotations:
        dapr.io/enabled: "true"
        dapr.io/id: "node"
        dapr.io/port: "3000"

On your local machine, you would just run the service via dapr run:

dapr run --app-id node --app-port 3000 node app.js

In the last example, the Dapr id is node and we indicate that the service is listening on port 3000. To invoke the method from service A, it can use the following code (Python example shown):

dapr_port = os.getenv("DAPR_HTTP_PORT", 3500)
dapr_url = "http://localhost:{}/v1.0/invoke/node/method/neworder".format(dapr_port)
message = {"data": {"orderId": 1234}}
response = requests.post(dapr_url, json=message, timeout=5)

As you can see, service A does not contact service B directly. It just talks to its Dapr sidecar on localhost (or Dapr on your dev machine) and asks it to invoke the neworder method via a service that uses Dapr id node. It is also clear that both service A and B use HTTP only. Because you just use HTTP to expose and invoke methods, you can use any language or framework.

You can find a complete example here with Node and Python.

Service Invocation with HTTP and GRPC

Dapr has SDKs available for C#, Go and other languages. You might prefer those over the generic HTTP approach. In the case of Go, the SDK uses GRPC to interface with the Dapr runtime. With Dapr in between, one service can use HTTP while another uses GRPC.

Let’s take a look at a service that exposes a method (HelloFromGo) from a Go application. The full example is here. Instead of creating an HTTP route with the name of your method, you use an OnInvoke handler that looks like this (only the start is shown, see the full code):

func (s *server) OnInvoke(ctx context.Context, in *commonv1pb.InvokeRequest) (*commonv1pb.InvokeResponse, error) {
	var response string

	switch in.Method {
	case "HelloFromGo":

		response = s.HelloFromGo()

Naturally, you also have to implement an HelloFromGo() method as well:

// HelloFromGo is a simple demo method to invoke
func (s *server) HelloFromGo() string {
	return "Hello"

}

Another service can use any language or framework and invoke the above method with a POST to the following URL if the Dapr id of the Go service is goserver:

http://localhost:3500/v1.0/invoke/goserver/method/HelloFromGo

A POST to the above URL tells Dapr to execute the OnInvoke method via GRPC which will run the HelloFromGo function. It is perfectly possible to include a payload in your POST and have the OnInvoke handler to process that payload. The full example is here which also includes sending and processing a JSON payload and sending back a text response. You will need to somewhat understand how GRPC works and also understand protocol buffers. A good book on GRPC is the following one: https://learning.oreilly.com/library/view/grpc-up-and/9781492058328/.

Conclusion

Dapr allows you to choose between HTTP and GRPC interfaces to interact with the runtime. You can choose whatever is most comfortable to you. One team can use HTTP with Python, JavaScript etc… while other teams use GRPC with their language of choice. Whatever you choose, the Dapr runtime will make sure service invocation just works allowing you to focus on the code.

Trying Consul Connect on your local machine

In a previous post, I talked about installing Consul on Kubernetes and using some of its features. In that post, I did not look at the service mesh functionality. Before looking at that, it is beneficial to try out the service mesh features on your local machine.

You can easily install Consul on your local machine with Chocolatey for Windows or Homebrew for Mac. On Windows, a simple choco install consul is enough. Since Consul is just a single executable, you can start it from the command line with all the options you need.

In the video below, I walk through configuring two services running as containers on my local machine: a web app that talks to Redis. We will “mesh” both services and then use an intention to deny service-to-service traffic.

Consul Service Mesh on your local machine… speed it up! ☺

In a later post and video, we will look at Consul Connect on Kubernetes. Stay tuned!

Getting started with Consul on Kubernetes

Although I have heard a lot about Hashicorp’s Consul, I have not had the opportunity to work with it and get acquainted with the basics. In this post, I will share some of the basics I have learned, hopefully giving you a bit of a head start when you embark on this journey yourself.

Want to watch a video about this instead?

What is Consul?

Basically, Consul is a networking tool. It provides service discovery and allows you to store and retrieve configuration values. On top of that, it provides service-mesh capability by controlling and encrypting service-to-service traffic. Although that looks simple enough, in complex and dynamic infrastructure spanning multiple locations such as on-premises and cloud, this can become extremely complicated. Let’s stick to the basics and focus on three things:

  • Installation on Kubernetes
  • Using the key-value store for configuration
  • Using the service catalog to retrieve service information

We will use a small Go program to illustrate the use of the Consul API. Let’s get started… 🚀🚀🚀

Installation of Consul

I will install Consul using the provided Helm chart. Note that the installation I will perform is great for testing but should not be used for production. In production, there are many more things to think about. Look at the configuration values for hints: certificates, storage size and class, options to enable/disable, etc… That being said, the chart does install multiple servers and clients to provide high availability.

I installed Consul with Pulumi and Python. You can check the code here. You can use that code on Azure to deploy both Kubernetes and Consul in one step. The section in the code that installs Consul is shown below:

consul = v3.Chart("consul",
    config=v3.LocalChartOpts(
        path="consul-chart",
        namespace="consul",
        values={ 
            "connectInject": {
                "enabled": "true"
            },
            "client": {
                "enabled": "true",
                "grpc": "enabled"
            },
            "syncCatalog": {
                "enabled": "true"
            } 
        }        
    ),
    opts=pulumi.ResourceOptions(
        depends_on=[ns_consul],
        provider=k8s
    )    
)

The code above would be equivalent to this Helm chart installation (Helm v3):

helm install consul -f consul-helm/values.yaml \
--namespace consul ./consul-helm \
--set connectInject.enabled=true  \
--set client.enabled=true --set client.grpc=true  \
--set syncCatalog.enabled=true

Connecting to the Consul UI

The chart installs Consul in the consul namespace. You can run the following command to get to the UI:

kubectl port-forward services/consul-consul-ui 8888:80 -n consul8:80 -n consul

You will see the screen below. The list of services depends on the Kubernetes services in your system.

Consul UI with list of services

The services above include consul itself. The consul service also has health checks configured. The other services in the screenshot are Kubernetes services that were discovered by Consul. I have installed Redis in the default namespace and exposed Redis via a service called redisapp. This results in a Consul service called redisadd-default. Later, we will query this service from our Go application.

When you click Key/Value, you can see the configured keys. I have created one key called REDISPATTERN which is later used in the Go program to know the Redis channels to subscribe to. It’s just a configuration value that is retrieved at runtime.

A simple key/value pair: REDISPATTERn=*

The Key/Value pair can be created via the consul CLI, the HTTP API or via the UI (Create button in the main Key/Value screen). I created the REDISPATTERN key via the Create button.

Querying the Key/Value store

Let’s turn our attention to writing some code that retrieves a Consul key at runtime. The question of course is: “how does your application find Consul?”. Look at the diagram below:

Simplifgied diagram of Consul installation on Kubernetes via the Helm chart

Above, you see the Consul server agents, implemented as a Kubernetes StatefulSet. Each server pod has a volume (Azure disk in this case) to store data such as key/value pairs.

Your application will not connect to these servers directly. Instead, it will connect via the client agents. The client agents are implemented as a DaemonSet resulting in a client agent per Kubernetes node. The client agent pods expose a static port on the Kubernetes host (yes, you read that right). This means that your app can connect to the IP address of the host it is running on. Your app can discover that IP address via the Downward API.

The container spec contains the following code:

      containers:
      - name: realtimeapp
        image: gbaeke/realtime-go-consul:1.0.0
        env:
        - name: HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: CONSUL_HTTP_ADDR
          value: http://$(HOST_IP):8500

The HOST_IP will be set to the IP of the Kubernetes host via a reference to status.hostIP. Next, the environment variable CONSUL_HTTP_ADDR is set to the full HTTP address including port 8500. In your code, you will need to read that environment variable.

Retrieving a key/value pair

Here is some code to read a Consul key/value pair with Go. Full source code is here.

// return a Consul client based on given address
func getConsul(address string) (*consulapi.Client, error) {
	config := consulapi.DefaultConfig()
	config.Address = address
	consul, err := consulapi.NewClient(config)
	return consul, err
}

// get key/value pair from Consul client and passed key name
func getKvPair(client *consulapi.Client, key string) (*consulapi.KVPair, error) {
	kv := client.KV()
	keyPair, _, err := kv.Get(key, nil)
	return keyPair, err
}

func main() {
        // retrieve address of Consul set via downward API in spec
	consulAddress := getEnv("CONSUL_HTTP_ADDR", "")
	if consulAddress == "" {
		log.Fatalf("CONSUL_HTTP_ADDRESS environment variable not set")
	}

        // get Consul client
	consul, err := getConsul(consulAddress)
	if err != nil {
		log.Fatalf("Error connecting to Consul: %s", err)
	}

        // get key/value pair with Consul client
	redisPattern, err := getKvPair(consul, "REDISPATTERN")
	if err != nil || redisPattern == nil {
		log.Fatalf("Could not get REDISPATTERN: %s", err)
	}
	log.Printf("KV: %v %s\n", redisPattern.Key, redisPattern.Value)

... func main() continued...

The comments in the code should be self-explanatory. When the REDISPATTERN key is not set or another error occurs, the program will exit. If REDISPATTERN is set, we can use the value later:

pubsub := client.PSubscribe(string(redisPattern.Value))

Looking up a service

That’s great but how do you look up an address of a service? That’s easy with the following basic code via the catalog:

cat := consul.Catalog()
svc, _, err := cat.Service("redisapp-default", "", nil)
log.Printf("Service address and port: %s:%d\n", svc[0].ServiceAddress, 
  svc[0].ServicePort)

consul is a *consulapi.client obtained earlier. You use the Catalog() function to obtain access to catalog service functionality. In this case, we simply retrieve the address and port value of the Kubernetes service redisapp in the default namespace. We can use that information to connect to our Redis back-end.

Conclusion

It’s easy to get started with Consul on Kubernetes and to write some code to take advantage of it. Be aware though that we only scratched the surface here and that this is both a sample deployment (without TLS, RBAC, etc…) and some sample code. In addition, you should only use Consul in more complex application landscapes with many services to discover, traffic to secure and more. If you do think you need it, you should also take a look at managed Consul on Azure. It runs in your subscription but fully managed by Hashicorp! It can be integrated with Azure Kubernetes Service as well.

In a later post, I will take a look at the service mesh capabilities with Connect.

Progressive Delivery on Kubernetes: what are your options?

If you have ever deployed an application to Kubernetes, even a simple one, you are probably familiar with deployments. A deployment describes the pods to run, how many of them to run and how they should be upgraded. That last point is especially important because the strategy you select has an impact on the availability of the deployment. A deployment supports the following two strategies:

  • Recreate: all existing pods are killed and new ones are created; this obviously leads to some downtime
  • RollingUpdate: pods are gradually replaced which means there is a period when old and new pods coexist; this can result in issues for stateful pods or if there is no backward compatibility

But what if you want to use other methods such as BlueGreen or Canary? Although you could do that with a custom approach that uses deployments, there are some solution that provide a more automated approach. Below, I discuss two of them briefly. Videos provide a more in depth look.

Argo Rollouts

One of the solutions out there is Argo Rollouts. It is very easy to use. If you want to start slowly, with BlueGreen deployments and manual approval for instance, Argo Rollouts is recommended. It has a nice kubectl plugin and integration with Argo CD, a GitOps solution.

The following video demonstrates BlueGreen deployments:

BlueGreen deployments with Argo Rollouts

This video discusses a canary deployment with Argo Rollouts albeit a simple one without metric analysis:

Canary deployments with Argo Rollouts

This video shows the integration between Argo Rollouts and Argo CD:

Argo CD and Argo Rollouts integration

One thing to note is that, instead of a deployment, you will create a rollout object. It is easy to convert an existing deployment into a rollout. Other tools such as Flagger (see below), provide their functionality on top of an existing deployment.

For traffic splitting and metrics analysis, Argo Rollouts does not support Linkerd. More information about traffic splitting and management can be found here.

Flagger

Flagger, by Weaveworks, is another solution that provides BlueGreen and Canary deployment support to Kubernetes. In the video below, I demonstrate the basic look and feel of doing a canary deployment that includes metric analysis. Linkerd is used for gradual traffic shifting to the canary based on the built-in success rate metric of Linkerd:

Canary release with Flagger and Linkerd

If you want to get started with canary releases and easy traffic splitting and metrics, I suggest using the Flagger and Linkerd combination. This is based simply on the fact that Linkerd is much easier to install and use than Istio. Argo Rollouts in combination with Istio and Prometheus could be used to achieve exactly the same result.

Which one to use?

If you just want BlueGreen deployments with manual approvals, I would suggest using Argo Rollouts. When you integrate it with Argo CD, you can even use the Argo CD UI to promote your deployment. If you are comfortable with Istio and Prometheus, you can go a step further and add metrics analysis to automatically progress your deployment. You can also use a simple Kubernetes job to validate your deployment. Also, note that other metrics providers are supported.

Flagger supports more options for traffic splitting and metrics, due to its support for both Linkerd and Istio. Because Linkerd is so easy to use, Flagger is simpler to get started with canary releases and metrics analysis.

GitOps with Kubernetes: a better way to deploy?

I recently gave a talk at TechTrain, a monthly event in Mechelen (Belgium), hosted by Cronos. The talk is called “GitOps with Kubernetes: a better way to deploy” and is an introduction to GitOps with Weaveworks Flux as an example.

You can find a re-recording of the presentation on Youtube:

Giving Argo CD a spin

If you have followed my blog a little, you have seen a few posts about GitOps with Flux CD. This time, I am taking a look at Argo CD which, like Flux CD, is a GitOps tool to deploy applications from manifests in a git repository.

Don’t want to read this whole thing?

Here’s the video version of this post

There are several differences between the two tools:

  • At first glance, Flux appears to use a single git repo for your cluster where Argo immediately introduces the concept of apps. Each app can be connected to a different git repo. However Flux can also use multiple git repositories in the same cluster. See https://github.com/fluxcd/multi-tenancy for more information
  • Flux has the concept of workloads which can be automated. This means that image repositories are scanned for updates. When an update is available (say from tag v1.0.0 to v1.0.1), Flux will update your application based on filters you specify. As far as I can see, Argo requires you to drive the update from your CI process, which might be preferred.
  • By default, Argo deploys an administrative UI (next to a CLI) with a full view on your deployment and its dependencies
  • Argo supports RBAC and integrates with external identity providers (e.g. Azure Active Directory)

The Argo CD admin interface is shown below:

Argo CD admin interface… not too shabby

Let’s take a look at how to deploy Argo and deploy the app you see above. The app is deployed using a single yaml file. Nothing fancy yet such as kustomize or jsonnet.

Deployment

The getting started guide is pretty clear, so do have a look over there as well. To install, just run (with a deployed Kubernetes cluster and kubectl pointing at the cluster):

kubectl create namespace argocd 

kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Note that I installed Argo CD on Azure (AKS).

Next, install the CLI. On a Mac, that is simple (with Homebrew):

brew tap argoproj/tap

brew install argoproj/tap/argocd

You will need access to the API server, which is not exposed over the Internet by default. For testing, port forwarding is easiest. In a separate shell, run the following command:

kubectl port-forward svc/argocd-server -n argocd 8080:443

You can now connect to https://localhost:8080 to get to the UI. You will need the admin password by running:

kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2

You can now login to the UI with the user admin and the displayed password. You should also login from the CLI and change the password with the following commands:

argocd login localhost:8080

argocd account update-password

Great! You are all set now to deploy an application.

Deploying an application

We will deploy an application that has a couple of dependencies. Normally, you would install those dependencies with Argo CD as well but since I am using a cluster that has these dependencies installed via Azure DevOps, I will just list what you need (Helm commands):

helm upgrade --namespace kube-system --install --set controller.service.loadBalancerIP=<IPADDRESS>,controller.publishService.enabled=true --wait nginx stable/nginx-ingress 

helm upgrade --namespace kube-system --install --values /home/vsts/work/1/s/externaldns/values.yaml --set cloudflare.apiToken=<CF_SECRET> --wait externaldns stable/external-dns

kubectl create ns cert-manager

helm upgrade --namespace cert-manager --install --wait --version v0.12.0 cert-manager jetstack/cert-manager

To know more about these dependencies and use an Azure DevOps YAML pipeline to deploy them, see this post. If you want, you can skip the externaldns installation and create a DNS record yourself that resolves to the public IP address of Nginx Ingress. If you do not want to use an Azure static IP address, you can remove the loadBalancerIP parameter from the first command.

The manifests we will deploy with Argo CD can be found in the following public git repository: https://github.com/gbaeke/argo-demo. The application is in three YAML files:

  • Two YAML files that create a certificate cluster issuer based on custom resource definitions (CRDs) from cert-manager
  • realtime.yaml: Redis deployment, Redis service (ClusterIP), realtime web app deployment (based on this), realtime web app service (ClusterIP), ingress resource for https://real.baeke.info (record automatically created by externaldns)

It’s best that you fork my repo and modify realtime.yaml’s ingress resource with your own DNS name.

Create the Argo app

Now you can create the Argo app based on my forked repo. I used the following command with my original repo:

argocd app create realtime \   
--repo https://github.com/gbaeke/argo-demo.git \
--path manifests \
--dest-server https://kubernetes.default.svc \
--dest-namespace default

The command above creates an app called realtime based on the specified repo. The app should use the manifests folder and apply (kubectl apply) all the manifests in that folder. The manifests are deployed to the cluster that Argo CD runs in. Note that you can run Argo CD in one cluster and deploy to totally different clusters.

The above command does not configure the repository to be synced automatically, although that is an option. To sync manually, use the following command:

argocd app sync realtime

The application should now be synced and viewable in the UI:

Application installed and synced

In my case, this results in the following application at https://real.baeke.info:

Not Secure because we use Let’s Encrypt staging for this app

Set up auto-sync

Let’s set up this app to automatically sync with the repo (default = every 3 minutes). This can be done from both the CLI and the UI. Let’s do it from the UI. Click on the app and then click App Details. You will find a Sync Policy in the app details where you can enable auto-sync

Setting up auto-sync from the UI

You can now make changes to the git repo like changing the image tag for gbaeke/fluxapp (yes, I used this image with the Flux posts as well 😊 ) to 1.0.6 and wait for the sync to happen. Or sync manually from the CLI or the UI.

Conclusion

This was a quick tour of Argo CD. There is much more you can do but the above should get you started quickly. I must say I quite like the solution and am eager to see what the collaboration of Flux CD, Argo CD and Amazon comes up with in the future.