Kubernetes Workload Identity with AKS

When you run a workload, no matter how simple or complex, you often need to access protected resources in both a secure and manageable way. Often, a resource’s security is integrated with an identity store. Azure resources, for instance, can be secured with roles assigned to Azure Active Directory (AAD) users, groups, or service principals.

Although it is tempting to simply store a credential with your code, it makes your code less secure and makes tasks such as credential rotation or updates a burden. In Azure, the solution to these issues is straightforward: just use managed identity if the service that runs your code supports it. Most do! That’s also the case for Azure Kubernetes Service (AKS). It supports a feature called pod-managed identities that associates a pod with such a managed identity. From the containers running in the pod, a developer can easily request a token to access Azure resources securely. I have written about pod-managed identities before so take a look at that post to understand the concepts. The post contains some sample code for illustration purposes.

The pod-managed identity feature has been in preview forever. The current version, v1, actually will not leave the preview phase. It will be replaced by v2, which uses workload identity federation. It is important to realize that AAD workload identity federation is not limited to Kubernetes. It also works with other workloads, like GitHub workflows or even Google cloud. This also means that workload identity for Kubernetes works on other distributions, both in the cloud and on-premises. It’s not just for AKS.

Although pod-managed identities and workload identity federation achieve the same goals, they work entirely differently. Pod-managed identity is somewhat more complex because it uses Kubernetes custom resource definitions (CRDs) and requires pods that intercept IMDS traffic. Intercepting that traffic can cause issues for other pods, which means you have extra configuration work to exclude those pods.

At the time of this writing, January 2022, workload identity federation is in preview!

How does it work?

As mentioned above, workload identity federation on AKS is very different from pod-managed identity. At a basic level, all it does is token exchange. Your pod will have access to a token that your code will present to AAD. In turn, AAD, which is configured to trust that token, will issue an AAD token to access the resource protected by AAD. These tokens are JWT tokens (JSON Web Tokens).

A couple of things need to be done for this to work:

  • AKS must be configured with an OIDC issuer URL. That public URL will present information that allows AAD to verify the JWT token it receives from your app. You will need to register the feature on your subscription and add or update the aks-preview extension for Azure CLI.
  • You need to create an app registration in AAD for your service principal. We will use the Azure Portal for this. The portal has been updated to add federated credentials that work with Kubernetes. Currently, workload identity federation does not work with managed identities. Managed identities are basically a wrapper around app registrations so that you do not have to create and maintain these registrations. Managed identity support is on the roadmap.
  • You install the workload-identity-webhook chart on AKS. This is a mutating webhook that makes it easy for the developer to associate a pod with the service principal and automate the token creation.
  • You create a Kubernetes service account and configure your pod(s) to use it. The mutating webhook will spot this and configure the containers in your pod with environment variables and the federation token.

Let’s go through the steps to make this a bit clearer.

Configuring the app registration

Create an app registration and navigate to Certificates and Secrets. Click Add credential in the Federated credentials section:

Adding a federated credential

At the time of this writing, there were three supported scenarios: GitHub Actions, Kubernetes, and other. Select Kubernetes and specify the three required properties:

  • Cluster issuer URL: in the form of https://oidc.prod-aks.azure.com/SOMEGUID. Use az aks show -n CLUSTERNAME -g RESOURCEGROUP and look for issuerURL in the output
  • Namespace: the namespace that contains the service account; we will create it below
  • Service account name: the name of the Kubernetes service account

The namespace and service account name are used to create the subject identifier. The token your code presents to AAD will need that in the sub filed.

In the example below, I use the default namespace and a service account with called fed-sa:

The federated credential’s properties

Azure Active Directory, in particular this application, is now configured to trust tokens coming from our Kubernetes app. The token will need to contain the subject identifier in the sub field. The token will be signed and AAD can verify the signature from the information presented by the AKS OIDC issuer URL.

When you configure the app registration, a service principal is created with the same name. You can use it with Azure role-based access control. I gave this service principal (or app) Contributor access on my subscription (temporarily 😉):

Service principal with access to the subscription

App, service principal, …? It’s confusing, I know. Never mind though and read on! 😉

Installing the webhook

On your AKS cluster with the configured issuer URL, install the workload identity mutating webhook with Helm:

AZURE_TENANT_ID=YOURTENANTID 

helm repo add azure-workload-identity https://azure.github.io/azure-workload-identity/charts

helm repo update

helm install workload-identity-webhook azure-workload-identity/workload-identity-webhook \
   --namespace azure-workload-identity-system \
   --create-namespace \
   --set azureTenantID="${AZURE_TENANT_ID}"

Above, replace YOURTENANTID with the id of your Azure Active Directory tenant:

Azure AD Tenant ID in the portal

Creating a service account

In a later step, to test the setup, we will run the Azure CLI in a Kubernetes pod. To associate that pod with the AAD application and service principal, we need to create a service account and provide specific labels and annotations:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fed-sa
  namespace: default
  annotations:
    azure.workload.identity/client-id: APPID
    azure.workload.identity/tenant-id: YOURTENANTID
  labels:
    azure.workload.identity/use: "true"

Above, replace APPID with the ID of the application registration you created earlier:

Application ID of the app registration in which you configured the federated token trust

The labels and annotations for the service account and for pods are discussed here. The label on the service account is required for the webhook to know that this is a service account used with federated tokens. The annotations are optional. The tenant-id annotation defaults to the tenant id passed to the webhook Helm chart. I left it in to be explicit and to have all the environment variables I need for the Azure CLI login test.

If your pod has multiple containers, and you do not want to configure all containers with federated tokens, use the annotation azure.workload.identity/skip-containers at the pod level.

Configure a container in a pod with a federated token

We can now run a container to verify if the configuration works. The deployment below deploys an Azure CLI container. I use the latest tag which, at the time of this writing, resulted in Azure CLI version 2.32.0. Make sure you use 2.30.0 or higher. That version integrates the Microsoft Authentication Library (MSAL) as the underlying authentication library and supports logging in with a federated token.

Here is the deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: azcli-deployment
  labels:
    app: azcli
spec:
  replicas: 1
  selector:
    matchLabels:
      app: azcli
  template:
    metadata:
      labels:
        app: azcli
    spec:
      serviceAccount: fed-sa
      containers:
        - name: azcli
          image: mcr.microsoft.com/azure-cli:latest
          command:
            - "/bin/bash"
            - "-c"
            - "sleep infinity"

There is nothing special about this deployment. Instead of using the service account default, this pod is configured with the fed-sa service account. This is a normal Kubernetes service account. Because the service account has the label azure.workload.identity/use: “true”, the containers in the pod are modified by the webhook for token federation. The webhook adds several environment variables and mounts a volume based on a secret that contains the federation token. This is similar and in addition to the mounted token to access the Kubernetes API from the pod.

Here are the environment variables:

  • AZURE_AUTHORITY_HOST=https://login.microsoftonline.com/
  • AZURE_CLIENT_ID=client-id from service account annotation
  • AZURE_TENANT_ID=tenant-id from service account annotation or default from webhook
  • AZURE_FEDERATED_TOKEN_FILE=/var/run/secrets/tokens/azure-identity-token

The AZURE_FEDERATED_TOKEN_FILE contains the path to the file that contains the token (JWT) that will be presented to AAD by your application. In our case, we will configure the Azure CLI to use this token. You can get a shell to the container and cat the token:

The token (a JWT) in the token file

You can paste this token into the https://jwt.io debugger and see its content:

Token in jwt.io debugger

The token contains the issuer URL and the sub field contains a reference to the namespace and service account that we configured in the AAD app registration. Make sure there is a match!

Now we can use the Azure CLI (version >= 2.30.0) to log in using this token. Get a shell to the container and use the following command (–debug will give a lot of output):

az login --federated-token "$(cat $AZURE_FEDERATED_TOKEN_FILE)" --debug \
--service-principal -u $AZURE_CLIENT_ID -t $AZURE_TENANT_ID

We do not need to specify a password or certificate because the federated token will be used. Near the end of the output, you will see something like:

{
    "cloudName": "AzureCloud",
    "homeTenantId": "YOURTENANTID",
    "id": "...",
    "isDefault": true,
    "managedByTenants": [],
    "name": "subscription id",
    "state": "Enabled",
    "tenantId": "...",
    "user": {
      "name": "AADAPPID",
      "type": "servicePrincipal"
    }
  }

The above output shows that the user you are logged on with is the service principal associated with the app id. Let’s see if I can list AKS clusters:

Yep, I can list AKS clusters (and even create new ones 😉)

If you are interested in developer-oriented examples, check out the Azure AD Workload Identity documentation.

Conclusion

Workload Identity Overview

Azure AD workload identity for Kubernetes is relatively easy to configure. The diagram above summarizes all the bits and pieces you need: AKS OIDC config, the webhook (to configure containers in pods), and the AAD app.

An operator can easily use the Azure CLI to verify the configuration is correct. At the time of this writing, you have to create and manage an application registration. That will change once managed identities are supported.

Compared to pod-managed identities for AKS, the architecture is cleaner. On top of that, this feature works with other Kubernetes distributions as well, giving you the same technique to access AAD-protected resources. I am looking forward to seeing this evolve and becoming GA so customers can deploy this with confidence.

Taking Azure Container Apps for a spin

At Ignite November 2021, Microsoft released Azure Container Apps as a public preview. It allows you to run containerized applications on a serverless platform, in the sense that you do not have to worry about the underlying infrastructure.

The underlying infrastructure is Kubernetes (AKS) as the control plane with additional software such as:

  • Dapr: distributed application runtime to easily work with state, pub/sub and other Dapr building blocks
  • KEDA: Kubernetes event-driven autoscaler so you can use any KEDA supported scaler, in addition to scaling based on HTTP traffic, CPU and memory
  • Envoy: used to provide ingress functionality and traffic splitting for blue-green deployment, A/B testing, etc…

Your apps actually run on Azure Container Instances (ACI). ACI was always meant to be used as raw compute to build platforms with and this is a great use case.

Note: there is some discussion in the community whether ACI (via AKS virtual nodes) is used or not; I will leave it in for now but in the end, it does not matter too much as the service is meant to hide this complexity anyway

Azure Container Apps does not care about the runtime or programming model you use. Just use whatever feels most comfortable and package it as a container image.

In this post, we will deploy an application that uses Dapr to save state to Cosmos DB. Along the way, we will explain most of the concepts you need to understand to use Azure Container Apps in your own scenarios. The code I am using is on GitHub and written in Go.

Configure the Azure CLI

In this post, we will use the Azure CLI exclusively to perform all the steps. Instead of the Azure CLI, you can also use ARM templates or Bicep. If you want to play with a sample that deploys multiple container apps and uses Bicep, be sure to check out this great Azure sample.

You will need to have the Azure CLI installed and also add the Container Apps extension:

az extension add \
  --source https://workerappscliextension.blob.core.windows.net/azure-cli-extension/containerapp-0.2.0-py2.py3-none-any.whl

The extension allows you to use commands like az containerapp create and az containerapp update.

Create an environment

An environment runs one or more container apps. A container app can run multiple containers and can have revisions. If you know how Kubernetes works, each revision of a container app is actually a scaled collection of Kubernetes pods, using the scalers discussed above. Each revision can be thought of as a separate Kubernetes Deployment/ReplicaSet that runs a specific version of your app. Whenever you modify your app, depending on the type of modification, you get a new revision. You can have multiple active revisions and set traffic weights to distribute traffic as you wish.

Container apps, revisions, pods, and containers

Note that above, although you see multiple containers in a pod in a revision, that is not the most common use case. Most of the time, a pod will have only one application container. That is entirely up to you and the rationale behind using one or more containers is similar to multi-container pods in Kubernetes.

To create an environment, be sure to register or re-register the Microsoft.Web provider. That provider has the kubeEnvironments resource type, which represents a Container App environment.

az provider register --namespace Microsoft.Web

Next, create a resource group:

az group create --name rg-dapr --location northeurope

I have chosen North Europe here, but the location of the resource group does not really matter. What does matter is that you create the environment in either North Europe or Canada Central at this point in time (November 2021).

Every environment needs to be associated with a Log Analytics workspace. You can use that workspace later to view the logs of your container apps. Let’s create such a workspace in the resource group we just created:

az monitor log-analytics workspace create \
  --resource-group rg-dapr \
  --workspace-name dapr-logs

Next, we want to retrieve the workspace client id and secret. We will need that when we create the Container Apps environment. Commands below expect the use of bash:

LOG_ANALYTICS_WORKSPACE_CLIENT_ID=`az monitor log-analytics workspace show --query customerId -g rg-dapr -n dapr-logs --out tsv`
LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET=`az monitor log-analytics workspace get-shared-keys --query primarySharedKey -g rg-dapr -n dapr-logs --out tsv`

Now we can create the environment in North Europe:

az containerapp env create \
  --name dapr-ca \
  --resource-group rg-dapr \
  --logs-workspace-id $LOG_ANALYTICS_WORKSPACE_CLIENT_ID \
  --logs-workspace-key $LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET \
  --location northeurope

The Container App environment shows up in the portal like so:

Container App Environment in the portal

There is not a lot you can do in the portal, besides listing the apps in the environment. Provisioning an environment is extremely quick, in my case a matter of seconds.

Deploying Cosmos DB

We will deploy a container app that uses Dapr to write key/value pairs to Cosmos DB. Let’s deploy Cosmos DB:

uniqueId=$RANDOM
az cosmosdb create \
  --name dapr-cosmosdb-$uniqueId \
  --resource-group rg-dapr \
  --locations regionName='northeurope'

az cosmosdb sql database create \
    -a dapr-cosmosdb-$uniqueId \
    -g rg-dapr \
    -n dapr-db

az cosmosdb sql container create \
    -a dapr-cosmosdb-$uniqueId \
    -g rg-dapr \
    -d dapr-db \
    -n statestore \
    -p '/partitionKey' \
    --throughput 400

The above commands create the following resources:

  • A Cosmos DB account in North Europe: note that this uses session-level consistency (remember that for later in this post 😉)
  • A Cosmos DB database that uses the SQL API
  • A Cosmos DB container in that database, called statestore (can be anything you want)

In Cosmos DB Data Explorer, you should see:

statestore collection will be used as a State Store in Dapr

Deploying the Container App

We can use the following command to deploy the container app and enable Dapr on it:

az containerapp create \
  --name daprstate \
  --resource-group rg-dapr \
  --environment dapr-ca \
  --image gbaeke/dapr-state:1.0.0 \
  --min-replicas 1 \
  --max-replicas 1 \
  --enable-dapr \
  --dapr-app-id daprstate \
  --dapr-components ./components-cosmosdb.yaml \
  --target-port 8080 \
  --ingress external

Let’s unpack what happens when you run the above command:

  • A container app daprstate is created in environment dapr-ca
  • The container app will have an initial revision (revision 1) that runs one container in its pod; the container uses image gbaeke/dapr-state:1.0.0
  • We turn off scaling by setting min and max replicas to 1
  • We enable ingress with the type set to external. That configures a public IP address and DNS name to reach our container app on the Internet; Envoy proxy is used under the hood to achieve this; TLS is automatically configured but we do need to tell the proxy the port our app listens on (–target-port 8080)
  • Dapr is enabled and requires that our app gets a Dapr id (–enable-dapr and –dapr-app-id daprstate)

Because this app uses the Dapr SDK to write key/value pairs to a state store, we need to configure this. That is were the –dapr-components parameter comes in. The component is actually defined in a file components-cosmosdb.yaml:

- name: statestore
  type: state.azure.cosmosdb
  version: v1
  metadata:
    - name: url
      value: YOURURL
    - name: masterkey
      value: YOURMASTERKEY
    - name: database
      value: YOURDB
    - name: collection
      value: YOURCOLLECTION

In the file, the name of our state store is statestore but you can choose any name. The type has to be state.azure.cosmosdb which requires the use of several metadata fields to specify the URL to your Cosmos DB account, the key to authenticate, the database, and collection.

In the Go code, the name of the state store is configurable via environment variables or arguments and, by total coincidence, defaults to statestore 😉.

func main() {
	fmt.Printf("Welcome to super api\n\n")

	// flags
	... code omitted for brevity
	// State store name
	f.String("statestore", "statestore", "State store name")

The flag is used in the code that writes to Cosmos DB with the Dapr SDK (s.config.Statestore in the call to daprClient.SaveState below):

// write data to Dapr statestore
	ctx := r.Context()
	if err := s.daprClient.SaveState(ctx, s.config.Statestore, state.Key, []byte(state.Data)); err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		fmt.Fprintf(w, "Error writing to statestore: %v\n", err)
		return
	} else {
		w.WriteHeader(http.StatusOK)
		fmt.Fprintf(w, "Successfully wrote to statestore\n")
	}

After running the az containerapp create command, you should see the following output (redacted):

{
  "configuration": {
    "activeRevisionsMode": "Multiple",
    "ingress": {
      "allowInsecure": false,
      "external": true,
      "fqdn": "daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io",
      "targetPort": 8080,
      "traffic": [
        {
          "latestRevision": true,
          "revisionName": null,
          "weight": 100
        }
      ],
      "transport": "Auto"
    },
    "registries": null,
    "secrets": null
  },
  "id": "/subscriptions/SUBID/resourceGroups/rg-dapr/providers/Microsoft.Web/containerApps/daprstate",
  "kind": null,
  "kubeEnvironmentId": "/subscriptions/SUBID/resourceGroups/rg-dapr/providers/Microsoft.Web/kubeEnvironments/dapr-ca",
  "latestRevisionFqdn": "daprstate--6sbsmip.politegrass-37c1a51f.northeurope.azurecontainerapps.io",
  "latestRevisionName": "daprstate--6sbsmip",
  "location": "North Europe",
  "name": "daprstate",
  "provisioningState": "Succeeded",
  "resourceGroup": "rg-dapr",
  "tags": null,
  "template": {
    "containers": [
      {
        "args": null,
        "command": null,
        "env": null,
        "image": "gbaeke/dapr-state:1.0.0",
        "name": "daprstate",
        "resources": {
          "cpu": 0.5,
          "memory": "1Gi"
        }
      }
    ],
    "dapr": {
      "appId": "daprstate",
      "appPort": null,
      "components": [
        {
          "metadata": [
            {
              "name": "url",
              "secretRef": "",
              "value": "https://ACCOUNTNAME.documents.azure.com:443/"
            },
            {
              "name": "masterkey",
              "secretRef": "",
              "value": "MASTERKEY"
            },
            {
              "name": "database",
              "secretRef": "",
              "value": "dapr-db"
            },
            {
              "name": "collection",
              "secretRef": "",
              "value": "statestore"
            }
          ],
          "name": "statestore",
          "type": "state.azure.cosmosdb",
          "version": "v1"
        }
      ],
      "enabled": true
    },
    "revisionSuffix": "",
    "scale": {
      "maxReplicas": 1,
      "minReplicas": 1,
      "rules": null
    }
  },
  "type": "Microsoft.Web/containerApps"
}

The output above gives you a hint on how to define the Container App in an ARM template. Note the template section. It defines the containers that are part of this app. We have only one container with default resource allocations. It is possible to set environment variables for your containers but there are none in this case. We will set one later.

Also note the dapr section. It defines the app’s Dapr id and the components it can use.

Note: it is not a good practice to enter secrets in configuration files as we did above. To fix that:

  • add a secret to the Container App in the az containerapp create command via the --secrets flag. E.g. --secrets cosmosdb='YOURCOSMOSDBKEY'
  • in components-cosmosdb.yaml, replace value: YOURMASTERKEY with secretRef: cosmosdb

The URL for the app is https://daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io. When I browse to it, I just get a welcome message: Hello from Super API on Container Apps.

Every revision also gets a URL. The revision URL is https://daprstate–6sbsmip.politegrass-37c1a51f.northeurope.azurecontainerapps.io. Of course, this revision URL gives the same result. Our app has only one revision.

Save state

The application has a /state endpoint you can post a JSON payload to in the form of:

{
  "key": "keyname",
  "data": "datatostoreinkey"
}

We can use curl to try this:

curl -v -H "Content-type: application/json" -d '{ "key": "cool","data": "somedata"}' 'https://daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io/state'

Trying the curl command will result in an error because Dapr wants to use strong consistency with Cosmos DB and we configured it for session-level consistency. That is not very relevant for now as that is related to Dapr and not Container Apps. Switching the Cosmos DB account to strong consistency will fix the error.

Update the container app

Let’s see what happens when we update the container app. We will add an environment variable WELCOME to change the welcome message that the app displays. Run the following command:

az containerapp update \
  --name daprstate \
  --resource-group rg-dapr \
  --environment-variables WELCOME='Hello from new revision'

The template section in the JSON output is now:

"template": {
    "containers": [
      {
        "args": null,
        "command": null,
        "env": [
          {
            "name": "WELCOME",
            "secretRef": null,
            "value": "Hello from new revision"
          }
        ],
        "image": "gbaeke/dapr-state:1.0.0",
        "name": "daprstate",
        "resources": {
          "cpu": 0.5,
          "memory": "1Gi"
        }
      }
    ]

It is important to realize that, when the template changes, a new revision will be created. We now have two revisions, reflected in the portal as below:

Container App with two revisions

The new revision is active and receives 100% of the traffic. When we hit the / endpoint, we get Hello from new revision.

The idea here is that you deploy a new revision and test it before you make it active. Another option is to send a small part of the traffic to the new revision and see how that goes. It’s not entirely clear to me how you can automate this, including automated tests, similar to how progressive delivery controllers like Argo Rollouts and Flagger work. Tip to the team to include this! 😉

The az container app create and update commands can take a lot of parameters. Use az container app update –help to check what is supported. You will also see several examples.

Check the logs

Let’s check the container app logs that are sent to the Log Analytics workspace attached to the Container App environment. Make sure you still have the log analytics id in $LOG_ANALYTICS_WORKSPACE_CLIENT_ID:

az monitor log-analytics query   --workspace $LOG_ANALYTICS_WORKSPACE_CLIENT_ID   --analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'daprstate' | project ContainerAppName_s, Log_s, TimeGenerated | take 50"   --out table

This will display both logs from the application container and the Dapr logs. One of the log entries shows that the statestore was successfully initialized:

... msg="component loaded. name: statestore, type: state.azure.cosmosdb/v1"

Conclusion

We have only scratched the surface here but I hope this post gave you some insights into concepts such as environments, container apps, revisions, ingress, the use of Dapr and logging. There is much more to look at such as virtual network integration, setting up scale rules (e.g. KEDA), automated deployments, and much more… Stay tuned!

Kubernetes Blue-Green deployments with Argo Rollouts

In this post, we will take a look at 🟦/🟩 blue-green deployments in Kubernetes. With blue-green deployments, you deploy a new version of an application or service next to the live and stable version. After manual or automatic checks, you promote the new version to become the live version. Switching between versions is simply a networking change. This could be a change in a router configuration or, in the case of Kubernetes, a change in a Kubernetes service.

Note: there often is confusion about what is the 🟦 blue and what is the 🟩 green service; usually the green service is the live and stable one; the blue service is the newly deployed preview service you intend to promote; some documents switch it around; I sometimes do that as well, for instance on my YouTube channel 😉

A Kubernetes deployment resource does not have a StrategyType for blue-green deployments. It only supports RollingUpdate or Recreate. You can easily work around that with multiple deployments and services, as discussed by Nills Franssens here: Simple Kubernetes blue-green deployments.

When I need to do blue-green, I prefer using a progressive delivery controller such as Argo Rollouts or Flagger. They are both excellent pieces of software that make it easy to do blue-green deployments, in addition to canary deployments and automated tests. In this post, we will look at Argo Rollouts.

Want to see a video instead?

Installing Argo Rollouts

Installing Argo Rollouts is documented here. For a quick install, just do:

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

Argo Rollouts comes with a kubectl plugin for its CLI. Install it with brew install argoproj/tap/kubectl-argo-rollouts. That allows you to run the CLI with kubectl argo rollouts. If you do not use brew, install the plugin manually.

Deploy your application with a Rollout

Argo Rollouts uses a replacement for a Deployment resource: a Rollout. The YAML for a Rollout is almost identical to a Deployment except that the apiVersion and Kind are different. In the spec you can add a strategy section to specify whether you want a blueGreen or a canary rollout. Below is an example of a rollout for a simple API:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: superapi
spec:
  replicas: 2
  selector:
    matchLabels:
      app: superapi
  template:
    metadata:
      labels:
        app: superapi
    spec:
      containers:
      - name: superapi
        image: ghcr.io/gbaeke/super:1.0.2
        resources:
          requests:
            memory: "128Mi"
            cpu: "50m"
          limits:
            memory: "128Mi"
            cpu: "50m"
        env:
          - name: WELCOME
            valueFrom:
              configMapKeyRef:
                name: superapi-config
                key: WELCOME
        ports:
        - containerPort: 8080
  strategy:
    blueGreen:
      activeService: superapi-svc-active
      previewService: superapi-svc-preview
      autoPromotionEnabled: false

You will notice that the blueGreen strategy requires two services: an activeService and a previewService. Both settings refer to a Kubernetes service resource. Below is the activeService (previewService is similar and uses the same selector):

kind: Service
apiVersion: v1
metadata:
  name:  superapi-svc-active
spec:
  selector:
    app:  superapi
  type:  ClusterIP
  ports:
  - name:  http
    port:  80
    targetPort:  8080

The only thing we have to do, in this example, is to deploy the rollout and the two services with kubectl apply. In this post, however, we will use Kustomize to deploy everything.

Deploying a rollout with Kustomize

To deploy the rollout and its services with Kustomize, we can use the kustomization.yaml below:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: blue-green

nameSuffix: -geba
namePrefix: dev-

commonLabels:
  app: superapi
  version: v1
  env: dev


configurations:
  - https://argoproj.github.io/argo-rollouts/features/kustomize/rollout-transform.yaml

resources:
  - namespace.yaml
  - rollout.yaml
  - service-active.yaml
  - service-preview.yaml

configMapGenerator:
- name: superapi-config
  literals:
    - WELCOME=Hello from v1!
    - PORT=8080   

With Kustomize, we can ensure we deploy our resources to a specific namespace. Above, that is the blue-green namespace. We also add a prefix and suffix to the names of Kubernetes resources we create and we add labels as well (commonLabels). For this to work properly with a rollout, you have to add the configurations section. Without it, Kustomize will not know what to do with the rollout resource (kind=rollout).

Note that we also use a configMapGenerator that creates a ConfigMap that sets a welcome message. If you look at the rollout spec, you will see that the pod template uses it to set the WELCOME environment variable. The API that we deploy will respond with that message when you hit the root, for instance with curl.

To deploy with Kustomize, we can run kubectl apply -k . from the folder holding kustomization.yaml and the manifests in the resources list.

Checking the initial rollout with the UI

When we initially deploy our application, there is only one version of our app. The rollout uses a ReplicaSet to deploy two pods, similarly to a Deployment. Both the activeService and the previewService point to these two pods.

Argo Rollouts has a UI you can start with kubectl argo rollouts dashboard -n blue-green. The rollout is visualized as below:

Initial rollout of the application

In a tool like Octant, the resource viewer shows the relationships between the actual Kubernetes resources:

Resource viewer in Octant

Above, you can clearly see the Rollout creates a ReplicaSet which, in turn, creates the Pods (click image to enlarge). Both services point to the same pods.

Upgrading to a new version

We will now upgrade to a new version of the application: v2. To simulate this, we can simply modify the WELCOME message in the ConfigMapGenerator in kustomization.yaml. When we run kubectl apply -k . again, Kustomize will create a new ConfigMap with a different name (containing a hash) and will update that name in the pod template of the rollout. When you update the pod template of the rollout, the rollout knows it needs to upgrade with the blue-green strategy. This, again, is identical to how a deployment behaves. In the UI, we now see:

Rollout after introducing v2 changes

There are now two revisions, both backed by a ReplicaSet. Each ReplicaSet controls two pods. One set of pods is for the active service, the other set for the preview. We can click on the rollout to see those details:

Details of the rollout

Above, we can clearly see that revision one is the stable and active service. That is our initial v1 deployment. Revision 2 is the preview service, the v2 deployment. We can port forward to that service and view the welcome message:

Port forward to the preview service

In Octant, this is what we see in Resource Viewer:

Rollout after introducing v2 changes

Above, we can clearly see the rollout now uses two ReplicaSets to run the active and preview pods. The rollout also modified the service selectors and the labels on the pods by adding a label like rollouts-pod-template-hash:758d6b4845. Each revision has its own hash.

Promotion

Currently, the rollout is in a paused state. The Argo Rollouts UI shows this but you can also view this with the CLI by running kubectl argo rollouts get rollout dev-superapi-geba:

Getting the status of the rollout with the CLI

Above the status is paused with a message of BlueGreenPause. You can clearly see the green service is the stable and active one (v1) and the blue service is the preview service (v2). We can now promote the preview service to become stable and active.

To promote the service, in the web UI, click Promote and then Sure?. With the CLI, just run kubectl argo rollouts promote dev-superapi-geba. When you run the get command again, you will see:

Rollout after promotion of v2

Above, you can see the status as ✔️ Healthy. Revision 2 is now stable and active. Revision 1 will be scaled down by setting the number of pods in the ReplicaSet to 0. In the web UI, you now see:

Rollout after promotion of Revision 2

Note that it is still possible to rollback to revision one by clicking the Rollback button or using the CLI. That will keep Revision 2 active and create a Revision 3 for you to preview. After clicking Promote and Sure? again, you will then make Revision 3 active which is the initial v1 service.

Conclusion

If you have the need for blue-green deployments, it is highly recommended to use a progressive delivery controller like Argo Rollouts. It makes the whole process more intuitive and gives you fine control over upgrade, abort, promote and rollback operations. Above, we looked at blue-green with a manual pause, check, and promote. There are other options, such as analysis based on metrics with an automatic promotion that we will look at in later posts.

Trying out WebAssembly on Azure Kubernetes Service

Introduction

In October 2021, Microsoft announced the public preview of AKS support for deploying WebAssembly System Interface (WASI) workloads in Kubernetes. You can read the announcement here. In short, that means we can run another type of workload on Kubernetes, besides containers!

WebAssembly is maybe best known for the ability to write code with languages such as C#, Go and Rust that can run in the browser, alongside JavaScript code. One example of this is Blazor, which allows you to build client web apps with C#.

Besides the browser, there are ways to run WebAssembly modules directly on the operating system. Because WebAssembly modules do not contain machine code suitable for a specific operating system and CPU architecture, you will need a runtime that can interpret the WebAssembly byte code. At the same time, WebAssembly modules should be able to interface with the operating system, for instance to access files. In other words, WebAssembly code should be able to access specific parts of the operating system outside the sandbox it is running in by default.

The WebAssembly System Interface (or WASI) allows WebAssembly modules to interact with the outside world. It allows you to declare what the module is allowed to see and access.

One example of a standalone runtime that can run WebAssembly modules is wasmtime. It supports interacting with the host environment via WASI as discussed above. For example, you can specify access to files on the host via the –dir flag and be very specific about what files and folders are allowed.

An example with Rust

In what follows, we will create Hello World-style application with Rust. You do not have to know anything about Rust to follow along. As a matter of fact, I do not know that much about Rust either. I just want a simple app to run on Azure Kubernetes Service later. Here’s the source code:

use std::env;

fn main() {
  println!("Content-Type: text/plain\n");
  println!("Hello, world!");

  printenv();
  
}

fn printenv() {
  for (key, value) in env::vars() {
    println!("{}: {}", key, value);
  }
}

Note: Because I am a bit more comfortable with Go, I first created a demo app with Go and used TinyGo to build the WebAssembly module. That worked great with wasmtime but did not work well on AKS. There is probably a good explanation for that. I will update this post when I learn more.

To continue with the Rust application, it is pretty clear what it does: it prints the Content-Type for a HTTP response, a Hello, World! message, and all environment variables. Why we set the Content-Type will become clearer later on!

To build this app, we need to target wasm32-wasi to build a WebAssembly module that supports WASI as well. You can run the following commands to do so (requires that Rust is installed on your system):

rustup target add wasm32-wasi
cargo build --release --target wasm32-wasi

The rustup command should only be run once. It adds wasm32-wasi as a supported target. The cargo build command then builds the WebAssembly module. On my system, that results in a file in the target/wasm32-wasi/release folder called sample.wasm (name comes from a setting in cargo.toml) . With WebAssembly support in VS Code, I can right click the file and use Show WebAssembly:

Showing the WebAssembly Module in VS Code (WebAssembly Toolkit for VS Code extension)

We can run this module with cargo run but that runs the app directly on the operating system. In my case that’s Ubuntu in Windows 11’s WSL2. To run the WebAssembly module , you can use wasmtime:

wasmtime sample.wasm

The module will not read the environment variables from the host. Instead, you pass environment variables from the wasmtime cli like so (command and result shown below):

wasmtime --env test=hello sample.wasm

Content-Type: text/plain

Hello, world!
test: hello

Publishing to Azure Container Registry

A WebAssembly can be published to Azure Container Registry with wasm-to-oci (see GitHub repo). The command below publishes our module:

wasm-to-oci push sample.wasm <ACRNAME>.azurecr.io/sample:1.0.0

Make sure you are logged in to ACR with az acr login -n <ACRNAME>. I also enabled anonymous pull on ACR to not run into issues with pulls from WASI-enabled AKS pools later. Indeed, AKS will be able to pull these artefacts to run them on a WASI node.

Here is the artefact as shown in ACR:

WASM module in ACR with mediaType = application/vnd.wasm.content.layer.v1+wasm

Running the module on AKS

To run WebAssembly modules on AKS nodes, you need to enable the preview as described here. After enabling the preview, I deployed a basic Kubernetes cluster with one node. It uses kubenet by default. That’s good because Azure CNI is not supported by WASI node pools.

az aks create -n wademo -g rg-aks --node-count 1

After finishing the deployment, I added a WASI nodepool:

az aks nodepool add \
    --resource-group rg-aks \
    --cluster-name wademo \
    --name wasipool \
    --node-count 1 \
    --workload-runtime wasmwasi

The aks-preview extension (install or update it!!!) for the Azure CLI supports the –workload-runtime flag. It can be set to wasmwasi to deploy nodes that can execute WebAssembly modules. The piece of technology that enables this is the krustlet project as described here: https://krustlet.dev. Krustlet is basically a WebAssembly kubelet. It stands for Kubernetes Rust Kubelet.

After running the above commands, the command kubectl get nodes -o wide will look like below:

NAME                                STATUS   ROLES   AGE    VERSION         INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
aks-nodepool1-23291395-vmss000000   Ready    agent   3h6m   v1.20.9         10.240.0.4    <none>        Ubuntu 18.04.6 LTS   5.4.0-1059-azure   containerd://1.4.9+azure
aks-wasipool-23291395-vmss000000    Ready    agent   3h2m   1.0.0-alpha.1   10.240.0.5    <none>        <unknown>            <unknown>          mvp

As you can see it’s early days here! 😉 But we do have a node that can run WebAssembly! Let’s try to run our module by deploying a pod via the manifest below:

apiVersion: v1
kind: Pod
metadata:
  name: sample
  annotations:
    alpha.wagi.krustlet.dev/default-host: "0.0.0.0:3001"
    alpha.wagi.krustlet.dev/modules: |
      {
        "sample": {"route": "/"}
      }
spec:
  hostNetwork: true
  containers:
    - name: sample
      image: <ARCNAME>.azurecr.io/sample:1.0.0
      imagePullPolicy: Always
  nodeSelector:
    kubernetes.io/arch: wasm32-wagi
  tolerations:
    - key: "node.kubernetes.io/network-unavailable"
      operator: "Exists"
      effect: "NoSchedule"
    - key: "kubernetes.io/arch"
      operator: "Equal"
      value: "wasm32-wagi"
      effect: "NoExecute"
    - key: "kubernetes.io/arch"
      operator: "Equal"
      value: "wasm32-wagi"
      effect: "NoSchedule"

Wait a moment! There is a new acronym here: WAGI! WASI has no network primitives such as sockets so you should not expect to build a full webserver with it. WAGI, which stands for WebAssembly Gateway Interface, allows you to run WASI modules as HTTP handlers. It is heavily based on CGI, the Common Gateway Interface that allows mapping HTTP requests to executables (e.g. a Windows or Linux executable) via something like IIS or Apache.

We will need a way to map a route such as / to a module and the response to a requests should be HTTP responses. That is why we set the Content-Type in the example by simply printing it to stdout. WAGI will also set several environment variables with information about the incoming request. That is the reason we print all the environment variables. This feels a bit like the early 90’s to me when CGI was the hottest web tech in town! 😂

The mapping of routes to modules is done via annotations, as shown in the YAML. This is similar to the modules.toml file used to start a Wagi server manually. Because the WASI nodes are tainted, tolerations are used to allow the pod to be scheduled on such nodes. With the nodeSelector, the pod needs to be scheduled on such a node.

To run the WebAssembly module, apply the manifest above to the cluster as usual (assuming the manifest is in pod.yaml:

kubectl apply -f pod.yaml

Now run kubectl get pods. If the status is Registered vs Running, this is expected. The pod will not be ready either:

NAME    READY   STATUS       RESTARTS   AGE
sample  0/1     Registered   0          108m

In order to reach the workload from the Internet, you need to install nginx with a value.yaml file that contains the internal IP address of the WASI node as documented here.

After doing that, I can curl the public IP address of the nginx service of type LoadBalancer:

~ curl IP

Hello, world!
HTTP_ACCEPT: */*
QUERY_STRING: 
SERVER_PROTOCOL: HTTP/1.0
GATEWAY_INTERFACE: CGI/1.1
REQUEST_METHOD: GET
SERVER_PORT: 3001
REMOTE_ADDR: 10.240.0.4
X_FULL_URL: http://10.240.0.5:3001/
X_RAW_PATH_INFO: 
CONTENT_TYPE: 
SERVER_NAME: 10.240.0.5
SCRIPT_NAME: /
AUTH_TYPE: 
PATH_TRANSLATED: 
PATH_INFO: 
CONTENT_LENGTH: 0
X_MATCHED_ROUTE: /
REMOTE_HOST: 10.240.0.4
REMOTE_USER: 
SERVER_SOFTWARE: WAGI/1
HTTP_HOST: 10.240.0.5:3001
HTTP_USER_AGENT: curl/7.58.0

As you can see, WAGI has set environment variables that allows your handler to know more about the incoming request such as the HTTP User Agent.

Conclusion

Although WebAssembly is gaining in popularity to build browser-based applications, it is still early days for running these workloads on Kubernetes. WebAssembly will not replace containers anytime soon. In fact, that is not the actual goal. It just provides an additional choice that might make sense for some applications in the future. And as always, the future will arrive sooner than expected!

DNS Options for Private Azure Kubernetes Service

When you deploy Azure Kubernetes Service (AKS), by default the API server is publicly made available. That means it has a public IP address and an Azure-assigned name that’s resolvable by public DNS servers. To secure access, you can use authorized IP ranges.

As an alternative, you can deploy a private AKS cluster. That means the AKS API server gets an IP address in a private Azure virtual network. Most customers I work with use this option to comply with security policies. When you deploy a private AKS cluster, you still need a fully qualified domain name (FQDN) that resolves to the private IP address. There are several options you can use:

  • System (the default option): AKS creates a Private DNS Zone in the Node Resource Group; any virtual network that is linked to that Private DNS Zone can resolve the name; the virtual network used by AKS is automatically linked to the Private DNS Zone
  • None: default to public DNS; AKS creates a name for your cluster in a public DNS zone that resolves to the private IP address
  • Custom Private DNS Zone: AKS uses a Private DNS Zone that you or another team has created beforehand; this is mostly used in enterprise scenarios when the Private DNS Zones are integrated with custom DNS servers (e.g., on AD domain controllers, Infoblox, …)

The first two options, System and None, are discussed in the video below:

Overview of the 3 DNS options with a discussion of the first two: System and None

The third option, custom Private DNS Zone, is discussed in a separate video:

Private AKS with a custom Private DNS Zone

With the custom DNS option, you cannot use any name you like. The Private DNS Zone has to be like: privatelink.<region>.azmk8s.io. For instance, if you deploy your AKS cluster in West Europe, the Private DNS Zone’s name should be privatelink.westeurope.azmk8s.io. There is an option to use a subdomain as well.

When you use the custom DNS option, you also need to use a user-assigned Managed Identity for the AKS control plane. To make the registration of the A record in the Private DNS Zone work, in addition to linking the Private DNS Zone to the virtual network, the managed identity needs the following roles (at least):

  • Private DNS Zone Contributor role on the Private DNS Zone
  • Network Contributor role on the virtual network used by AKS

To deploy a private AKS cluster with a custom Private DNS Zone, you can use the following Azure CLI command which also sets the network plugin to azure (as an example). Private cluster also works with kubenet if you prefer that model. For other examples, see Create a private Azure Kubernetes Service cluster – Azure Kubernetes Service | Microsoft Docs.

az aks create \
    --resource-group RGNAME \
    --name aks-private \
    --network-plugin azure \
    --vnet-subnet-id "resourceId of AKS subnet" \
    --docker-bridge-address 172.17.0.1/16 \
    --dns-service-ip 10.3.0.10 \
    --service-cidr 10.3.0.0/24 \
    --enable-managed-identity \
    --assign-identity "resourceId of user-assigned managed identity" \
    --enable-private-cluster \
    --load-balancer-sku standard \
    --private-dns-zone "resourceId of Private DNS Zone"

The option that is easiest to use is the None option. You do not have to worry about Private DNS Zones and it just works. That option, at the time of this writing (June 2021) is still in preview and needs to be enabled on your subscription. In most cases though, I see enterprises go for the third option where the Private DNS Zones are created beforehand and integrated with custom DNS.

Approving a private endpoint connection with Azure CLI

In my previous post, I wrote about App Services with Private Link and used Azure Front Door to publish the web app. Azure Front Door Premium (in preview), can create a Private Endpoint and link it to your web app via Azure Private Link. When that happens, you need to approve the pending connection in Private Link Center.

The pending connection would be shown here, ready for approval

Although this is easy to do, you might want to automate this approval. Automation is possible via a REST API but it is easier via Azure CLI.

To do so, first list the private endpoint connections of your resource, in my case that is a web app:

az network private-endpoint-connection list --id /subscriptions/SUBID/resourceGroups/RGNAME/providers/Microsoft.Web/sites/APPSERVICENAME

The above command will return all private endpoint connections of the resource. For each connection, you get the following information:

 {
    "id": "PE CONNECTION ID",
    "location": "East US",
    "name": "NAME",
    "properties": {
      "ipAddresses": [],
      "privateEndpoint": {
        "id": "PE ID",
        "resourceGroup": "RESOURCE GROUP NAME OF PE"
      },
      "privateLinkServiceConnectionState": {
        "actionsRequired": "None",
        "description": "Please approve this connection.",
        "status": "Pending"
      },
      "provisioningState": "Pending"
    },
    "resourceGroup": "RESOURCE GROUP NAME OF YOUR RESOURCE",
    "type": "YOUR RESOURCE TYPE"
  }

To approve the above connection, use the following command:

az network private-endpoint-connection approve --id  PE CONNECTION ID --description "Approved"

The –id in the approve command refers to the private endpoint connection ID, which looks like below for a web app:

/subscriptions/YOUR SUB ID/resourceGroups/YOUR RESOURCE GROUP/providers/Microsoft.Web/sites/YOUR APP SERVICE NAME/privateEndpointConnections/YOUR PRIVATE ENDPOINT CONNECTION NAME

After running the above command, the connection should show as approved:

Approved private endpoint connection

When you automate this in a pipeline, you can first list the private endpoint connections of your resource and filter on provisioningState=”Pending” to find the ones you need to approve.

Hope it helps!

Azure App Services with Private Link

In one of my videos on my YouTube channel, I discuss Azure App Services with Private Link. The video describes how it works and provides an example of deploying the infrastructure with Bicep. The Bicep templates are on GitHub.

If you want to jump straight to the video, here it is:

In the rest of this blog post, I provide some more background information on the different pieces of the solution.

Azure App Service

Azure App Service is a great way to host web application and APIs on Azure. It’s PaaS (platform as a service), so you do not have to deal with the underlying Windows or Linux servers as they are managed by the platform. I often see AKS (Azure Kubernetes Service) implementations to host just a couple of web APIs and web apps. In most cases, that is overkill and you still have to deal with Kubernetes upgrades, node patching or image replacements, draining and rebooting the nodes, etc… And then I did not even discuss controlling ingress and egress traffic. Even if you standardize on packaging your app in a container, Azure App Service will gladly accept the container and serve it for you.

By default, Azure App Service gives you a public IP address and FQDN (Fully Qualified Domain Name) to reach your app securely over the Internet. The default name ends with azurewebsites.net but you can easily add custom domains and certificates.

Things get a bit more complicated when you want a private IP address for your app, reachable from Azure virtual networks and on-premises networks. One solution is to use an App Service Environment. It provides a fully isolated and dedicated environment to run App Service apps such as web apps and APIs, Docker containers and Functions. You can create an internal ASE which results in an Internal Load Balancer in front of your apps that is configured in a subnet of your choice. There is no need to configure Private Endpoints to make use of Private Link. This is often called native virtual network integration.

At the network level, an App Service Environment v2, works as follows:

External ASE
ASE networking (from Microsoft website)

Looking at the above diagram, an ILB ASE (but also an External ASE) also makes it easy to connect to back-end systems such as on-premises databases. The outbound connection to internal resources originates from an IP in the chosen integration subnet.

The downside to ASE is that its isolated instances (I1, I2, I3) are rather expensive. It also takes a long time to provision an ASE but that is less of an issue. In reality though , I would like to see App Service Environments go away and replaced by “regular” App Services with toggles that give you the options you require. You would just deploy App Services and set the options you require. In any case, native virtual network integration should not depend on dedicated or shared compute. One can only dream right? 😉

Note: App Service Environment v3, in preview at the time of this writing, provides a simplified deployment experience and also costs less. See App Service Environment v3 public preview – Azure App Service

As an alternative to an ASE for a private app, consider a non-ASE App Service that, in production, uses Premium V2 or V3 instances. The question then becomes: “How do you get a private IP address?” That’s where Private Link comes in…

Azure Private Link with App Service

Azure Private Link provides connectivity to Azure services (such as App Service) via a Private Endpoint. The Private Endpoint creates a virtual network interface card (NIC) on a subnet of your choice. Connections to the NICs IP address end up at the Private Link service the Private Endpoint is connected to. Below is an example with Azure SQL Database where one Private Endpoint is mapped, via Azure Private Link, to one database. The other databases are not reachable via the endpoint.

Private Endpoint connected to Azure SQL Database (PaaS) via Private Link (source: Microsoft website)

To create a regular App Service that is accessible via a private IP, we can do the same thing:

  • create a private endpoint in the subnet of your choice
  • connect the private endpoint to your App Service using Private Link

Both actions can be performed at the same time from the portal. In the Networking section of your App Service, click Configure your private endpoint connections. You will see the following screen:

Private Endpoint connection of App Service

Now click Add to create the Private Endpoint:

Creating the private endpoint

The above creates the private endpoint in the default subnet of the selected VNET. When the creation is finished, the private endpoint will be connected to App Service and automatically approved. There are scenarios, such as connecting private endpoints from other tenants, that require you to approve the connection first:

Automatically approved connection

When you click on the private endpoint, you will see the subnet and NIC that was created:

Private Endpoint

From the above, you can click the link to the network interface (NIC):

Network interface created by the private endpoint

Note that when your delete the Private Endpoint, the interface gets deleted as well.

Great! Now we have an IP address that we can use to reach the App Service. If you use the default name of the web app, in my case https://web-geba.azurewebsites.net, you will get:

Oops, no access on the public name (resolves to public IP)

Indeed, when you enable Private Link on App Service, you cannot access the website using its public IP. To solve this, you will need to do something at the DNS level. For the default domain, azurewebsites.net, it is recommended to use Azure Private DNS. During the creation of my Private Endpoint, I turned on that feature which resulted in:

Private DNS Zone for privatelink.azurewebsites.net

You might wonder why this is a private DNS zone for privatelink.azurewebsites.net? From the moment you enable private link on your web app, Microsoft modifies the response to the DNS query for the public name of your app. For example, if the app is web-geba.azurewebsites.net and you query DNS for that name, it will respond with a CNAME of web-geba.privatelink.azurewebsites.net. If that cannot be resolved, you will still get the public IP but that will result in a 403.

In my case, as long as the DNS servers I use can resolve web-geba.privatelink.azurewebsites.net and I can connect to 10.240.0.4, I am good to go. Note however that the DNS story, including Private DNS and your own DNS servers, is a bit more complex that just checking a box! However, that is not the focus of this blogpost so moving on… 😉

Note: you still need to connect to the website using https://web-geba.azurewebsites.net in your browser

Outbound connections to internal resources

One of the features of App Service Environments, is the ability to connect to back-end systems in Azure VNETs or on-premises. That is the result of native VNET integration.

When you enable Private Link on a regular App Service, you do not get that. Private Link only enables private inbound connectivity but does nothing for outbound. You will need to configure something else to make outbound connections from the Web App to resources such as internal SQL Servers work.

In the network configuration of you App Service, there is another option for outbound connectivity to internal resources – VNet integration.

VNET Integration

In the Networking section of App Service, find the VNet integration section and click Click here to configure. From there, you can add a VNet to integrate with. You will need to select a subnet in that VNet for this integration to work:

Outbound connectivity for App Service to Azure VNets

There are quite some things to know when it comes to VNet integration for App Service so be sure to check the docs.

Private Link with Azure Front Door

Often, a web app is made private because you want to put a Web Application Firewall (WAF) in front of the app. Typically, that goal is achieved by putting Azure Application Gateway (AG) with WAF in front of an internal App Services Environment. As as alternative to AG, you can also use virtual appliances such as Barracuda WAF for Azure. This works because the App Services Environment is a first-class citizen of your Azure virtual network.

There are multiple ways to put a WAF in front of a (non-ASE) App Service. You can use Front Door with the App Service as the origin, as long as you restrict direct access to the origin. To that end, App Services support access restrictions.

With Azure Front Door Premium, in preview at the time of this writing (June 2021), you can use Private Link as well. In that case, Azure Front Door creates a private endpoint. You cannot control or see that private endpoint because it is managed by Front Door. Because the private endpoint is not in your tenant, you will need to approve the connection from the private endpoint to your App Service. You can do that in multiple ways. One way is Private Link Center Pending Connections:

Pending Connections

If you check the video at the top of this page, this is shown here.

Conclusion

The combination of Azure networking with App Services Environments (ASE) and “regular” App Services (non-ASE) can be pretty confusing. You have native network integration for ASE, private access with private link and private endpoints for non-ASE, private DNS for private link domains, virtual network service endpoints, VNet outbound configuration for non-ASE etc… Most of the time, when I am asked for the easiest and most cost-effective option for a private web app in PaaS, I go for a regular non-ASE App Service and use Private Link to make the app accessible from the internal network.

A quick look at azure/kubelogin

I have talked about and demonstrated the use of kubelogin in previous posts and videos. Because I often get questions about logging on to Azure Kubernetes Services (AKS) integrated with Azure AD (AAD) in a non-interactive fashion, I decided to write this separate post about it.

What is kubelogin?

Kubelogin is a client-go credential plugin that implements Azure AD authentication. Kubernetes and its CLI, kubectl, are written in Go and client-go is a package or library that allows you to talk to Kubernetes from the Go language. Client-go supports credentials plugins to integrate with authentication protocols that are not supported by default by kubectl. Do not confuse azure/kubelogin with int128/kubelogin. The latter is a generic credential plugin that supports OpenID Connect in general, while the former was specifically created for Azure.

Why use it?

When you integrate an AKS cluster with Azure AD, you can grant users and groups in Azure AD, access rights to your cluster. You do that via Kubernetes RBAC or Azure RBAC for Kubernetes. Once you have assigned the necessary access rights to the user or group, a user can login by first obtaining credentials with the Azure CLI:

az aks get-credentials -n CLUSTERNAME -g RESOURCEGROUP

After running the above command, the user will not be asked to authenticate yet. However, when a command such as kubectl get nodes is run, the user will need to authenticate to Azure AD by opening a browser and entering a code:

Prompted to enter a code

When the code is entered, and the user has the necessary role to run the command, the output will appear.

This is great when you are working interactively on the command line but not so great in a pipeline. Often, engineers circumvent this by using:

az aks get-credentials -n CLUSTERNAME -g RESOURCEGROUP --admin

The use of –admin switches to client certificate authentication and gives you full control of the cluster. In general, this is not recommended. It is worth noting that, at the time of this writing, there is also a preview feature that can disable the use of local accounts.

What to do in a pipeline?

In a pipeline, the easiest way to login with an Azure AD account is as follows:

  • Use the Azure CLI and logon with an account that has the required role on the Kubernetes cluster
  • Use az aks get-credentials to obtain cluster credentials and DO NOT use –admin; this creates a kube config file on the CI/CD agent (e.g. GitHub runner, Azure DevOps agent, etc…)
  • Download kubelogin if required (mostly, that will be needed)
  • Use kubelogin to update the kube config file with the token of the Azure CLI user; this is one of the options and has been added in March of 2021

Check out the following sample Azure DevOps pipeline below:

trigger: none

pool:
  vmImage: ubuntu-latest

steps:
- task: KubectlInstaller@0
  inputs:
    kubectlVersion: 'latest'

- task: AzureCLI@2
  inputs:
    azureSubscription: 'NAME OF AZURE DEVOPS SERVICE CONNECTION'
    scriptType: 'bash'
    scriptLocation: 'inlineScript'
    inlineScript: |
      az aks get-credentials -n CLUSTERNAME -g CLUSTERRESOURCEGROUP

      # get kubelogin
      wget https://github.com/Azure/kubelogin/releases/download/v0.0.9/kubelogin-linux-amd64.zip
      unzip kubelogin-linux-amd64.zip
      sudo mv bin/linux_amd64/kubelogin /usr/bin
      kubelogin convert-kubeconfig -l azurecli

      kubectl get nodes

In Azure DevOps, you can specify a name of a service connection in the azureSubscription parameter of the AzureCLI@2 task. The account used by the service connection needs access rights to the Kubernetes cluster.

The command kubelogin convert-kubeconfig -l azurecli modifies the kube config obtained with az aks get-credentials with a token for the account used by the Azure CLI. To use the Azure CLI credential, you have to use managed AAD integration.

Although the above is for Azure DevOps, the process is similar for other CI/CD systems such as GitHub workflows. In GitHub, you can use the azure/CLI action, which requires an azure/login action first. The azure/login action uses a service principal to connect. That service principal needs access rights to the Kubernetes cluster.

Note that there are many other ways to obtain the token. You are not restricted to use the Azure CLI credentials. You can also use your own service principal or a managed service identity (MSI). Check the README of azure/kubelogin for more info.

Building a GitHub Action with Docker

While I was investigating Kyverno, I wanted to check my Kubernetes deployments for compliance with Kyverno policies. The Kyverno CLI can be used to do that with the following command:

kyverno apply ./policies --resource=./deploy/deployment.yaml

To do this easily from a GitHub workflow, I created an action called gbaeke/kyverno-cli. The action uses a Docker container. It can be used in a workflow as follows:

# run kyverno cli and use v1 instead of v1.0.0
- name: Validate policies
  uses: gbaeke/kyverno-action@v1
  with:
    command: |
      kyverno apply ./policies --resource=./deploy/deployment.yaml

You can find the full workflow here. In the next section, we will take a look at how you build such an action.

If you want a video instead, here it is:

GitHub Actions

A GitHub Action is used inside a GitHub workflow. An action can be built with Javascript or with Docker. To use an action in a workflow, you use uses: followed by a reference to the action, which is just a GitHub repository. In the above action, we used uses: gbaeke/kyverno-action@v1. The repository is gbaeke/kyverno-action and the version is v1. The version can refer to a release but also a branch. In this case v1 refers to a branch. In a later section, we will take a look at versioning with releases and branches.

Create a repository

An action consists of several files that live in a git repository. Go ahead and create such a repository on GitHub. I presume you know how to do that. We will add several files to it:

  • Dockerfile and all the files that are needed to build the Docker image
  • action.yml: to set the name of our action, its description, inputs and outputs and how it should run

Docker image

Remember that we want a Docker image that can run the Kyverno CLI. That means we have to include the CLI in the image that we build. In this case, we will build the CLI with Go as instructed on https://kyverno.io. Here is the Dockerfile (should be in the root of your git repo):

FROM golang:1.15
COPY src/ /
RUN git clone https://github.com/kyverno/kyverno.git
WORKDIR kyverno
RUN make cli
RUN mv ./cmd/cli/kubectl-kyverno/kyverno /usr/bin/kyverno
ENTRYPOINT ["/entrypoint.sh"]

We start from a golang image because we need the go tools to build the executable. The result of the build is the kyverno executable in /usr/bin. The Docker image uses a shell script as its entrypoint, entrypoint.sh. We copy that shell script from the src folder in our repository.

So go ahead and create the src folder and add a file called entrypoint.sh. Here is the script:

#!/usr/bin/env bash
set -e
set -o pipefail
echo ">>> Running command"
echo ""
bash -c "set -e;  set -o pipefail; $1"

This is just a bash script. We use the set commands in the main script to ensure that, when an error occurs, the script exits with the exit code from the command or pipeline that failed. Because we want to run a command like kyverno apply, we need a way to execute that. That’s why we run bash again at the end with the same options and use $1 to represent the argument we will pass to our container. Our GitHub Action will need a way to require an input and pass that input as the argument to the Docker container.

Note: make sure the script is executable; use chmod +x entrypoint.sh

The action.yml

Action.yml defines our action and should be in the root of the git repo. Here is the action.yml for our Docker action:

name: 'kyverno-action'
description: 'Runs kyverno cli'
branding:
  icon: 'command'
  color: 'red'
inputs:
  command:
    description: 'kyverno command to run'
    required: true
runs:
  using: 'docker'
  image: 'Dockerfile'
  args:
    - ${{ inputs.command }}

Above, we give the action a name and description. We also set an icon and color. The icon and color is used on the GitHub Marketplace:

command icon and color as defined in action.yml (note that this is the REAL action; in this post we call the action kyverno-action as an example)

As stated earlier, we need to pass arguments to the container when it starts. To achieve that, we define a required input to the action. The input is called command but you can use any name.

In the run: section, we specify that this action uses Docker. When you use image: Dockerfile, the workflow will build the Docker image for you with a random name and then run it for you. When it runs the container, it passes the command input as an argument with args: Multiple arguments can be passed, but we only pass one.

Note: the use of a Dockerfile makes running the action quite slow because the image needs to be built every time the action runs. In a moment, we will see how to fix that.

Verify that the image works

On your machine that has Docker installed, build and run the container to verify that you can run the CLI. Run the commands below from the folder containing the Dockerfile:

docker build -t DOCKER_HUB_USER/kyverno-action:v1.0.0 .

docker run DOCKER_HUB_USER/kyverno-action:v1.0.0 "kyverno version"

Above, I presume you have an account on Docker Hub so that you can later push the image to it. Substitute DOCKER_HUB_USER with your Docker Hub username. You can of course use any registry you want.

The result of docker run should be similar to the result below:

>>> Running command

Version: v1.3.5-rc2-1-g3ab75095
Time: 2021-04-04_01:16:49AM
Git commit ID: main/3ab75095b70496bde674a71df08423beb7ba5fff

Note: if you want to build a specific version of the Kyverno CLI, you will need to modify the Dockerfile; the instructions I used build the latest version and includes release candidates

If docker run was successful, push the image to Docker Hub (or your registry):

docker push DOCKER_HUB_USER/kyverno-action:v1.0.0

Note: later, it will become clear why we push this container to a public registry

Publish to the marketplace

You are now ready to publish your action to the marketplace. One thing to be sure of is that the name of your action should be unique. Above, we used kyverno-action. When you run through the publishing steps, GitHub will check if the name is unique.

To see how to publish the action, check the following video:

video starts at the marketplace publishing step

Note that publishing to the marketplace is optional. Our action can still be used without it being published. Publishing just makes our action easier to discover.

Using the action

At this point, you can already use the action when you specify the exact release version. In the video, we created a release called v1.0.0 and optionally published it. The snippet below illustrates its use:

- name: Validate policies
  uses: gbaeke/kyverno-action@v1.0.0
  with:
    command: |
      kyverno apply ./policies --resource=./deploy/deployment.yaml

Running this action results in a docker build, followed by a docker run in the workflow:

The build step takes quite some time, which is somewhat annoying. Let’s fix that! In addition, we will let users use v1 instead of having to specify v1.0.0 or v1.0.1 etc…

Creating a v1 branch

By creating a branch called v1 and modifying action.yml to use a Docker image from a registry, we can make the action quicker and easier to use. Just create a branch in GitHub and call it v1. We’ll use the UI:

create the branch here; if it does not exist there will be a create option (here it exists already)

Make the v1 branch active and modify action.yml:

In action.yml, instead of image: ‘Dockerfile’, use the following:

image: 'docker://DOCKER_HUB_USER/kyverno-action:v1.0.0'

When you use the above statement, the image will be pulled instead of built from scratch. You can now use the action with @v1 at the end:

# run kyverno cli and use v1 instead of v1.0.0
- name: Validate policies
  uses: gbaeke/kyverno-action@v1
  with:
    command: |
      kyverno apply ./policies --resource=./deploy/deployment.yaml

In the worflow logs, you will see:

The action now pulls the image from Docker Hub and later runs it

Conclusion

We can conclude that building GitHub Actions with Docker is quick and fun. You can build your action any way you want, using the tools you like. Want to create a tool with Go, or Python or just Bash… just do it! If you do want to build a GitHub Action with JavaScript, then be sure to check out this article on devblogs.microsoft.com.

Using Kyverno for Kubernetes Policies

In an earlier blogpost, I wrote about Kubernetes Policies on Azure Kubernetes Service with the Azure Policy add-on. The add-on installs Gatekeeper v3 on AKS, which relies on Open Policy Agent to define your policies. Open Policy Agent is a general cloud-native solution for policy-based control, which goes beyond Kubernetes. Defining custom policies for OPA (and thus Gatekeeper), requires knowledge of rego, their policy language. Rego is very powerful and flexible but can be a bit daunting. As always, there’s a learning curve but the feedback I get is that it can be quite steep.

When you are using Azure Policy with the AKS add-on, you can only use the built-in Azure policies. If you want custom policies, you should install Gatekeeper v3 on AKS yourself and write your own ConstraintTemplates that contain the policy logic written in rego.

If you only need policies for Kubernetes and you want to express the policies in YAML, Kyverno is a good alternative. It makes it relatively easy to write validation policies. In addition to validation policies, Kyverno supports mutation and generation policies. More about that later.

Installation

Installation is very easy via a raw YAML manifest or a Helm chart. Because the Kyverno policy engine runs as an admission webhook, it requires secure communication from the Kubernetes API server. By default, the installation uses self-signed certificates.

The simplest way to install it is via the command below:

kubectl create -f https://raw.githubusercontent.com/kyverno/kyverno/main/definitions/release/install.yaml

Always check the raw YAML before submitting it to your cluster! By default, the admission webhook is installed in the kyverno namespace, via a deployment that deploys 1 replica of ghcr.io/kyverno/kyverno:v1.3.5-rc2 (or whatever is in the install.yaml at the time of installation). This install.yaml always refers to the latest release, which includes release candidates. You should change the version of the image to the latest stable release in production scenarios. At the time of writing, the latest stable release was 1.3.4.

Creating policies

As discussed above, you can write three types of policies:

  • validation: write rules to deny the creation of resources and enforce them in realtime or audit them
  • mutation: patch incoming JSON requests to modify them before validation and submission to etcd
  • generation: creating additional objects; e.g., when you create a namespace, add roles to the namespace or add a default-deny network policy

To illustrate the creation of these types of policies, I created a video on my YouTube channel:

CI/CD Policy Check

Before you deploy workloads to Kubernetes, it is a good idea to check if your manifests pass your policy rules before you deploy. For OPA, you can do that with conftest. On GitHub Marketplace, you will find several actions that can run conftest in a workflow.

To check your manifests with Kyverno, there is the Kyverno CLI. You simply put the same policies you submit to your cluster in a folder (e.g., policies) and then run the CLI as shown below (in the folder containing the policies and deploy folders):

kyverno apply ./policies --resource=./deploy/deployment.yaml

Above, the policies are applied to just one manifest (deployment.yaml). It works with multiple manifests as well. When there is an issue, you will see it in the output:

policy require-run-as-non-root -> resource default/Deployment/go-template-deployment failed: 
1. autogen-check-containers: validation error: Running as root is not allowed. The fields spec.securityContext.runAsNonRoot, spec.containers[*].securityContext.runAsNonRoot, and spec.initContainers[*].securityContext.runAsNonRoot must be `true`. Rule autogen-check-containers[0] failed at path /spec/template/spec/containers/0/securityContext/runAsNonRoot/. Rule autogen-check-containers[1] failed at path /spec/template/spec/containers/0/securityContext/runAsNonRoot/. 

pass: 14, fail: 1, warn: 0, error: 0, skip: 0

Above, kyverno apply found that my deployment has securityContext.runAsNonRoot: false set, which is not allowed.

To run this check in a GitHub workflow, I created a GitHub action that does exactly that. Apparently, such an action did not exist. Drop me a comment if there is another way. You can find the GitHub Action on the marketplace: https://github.com/marketplace/actions/kyverno-cli.

To use the action in a workflow, drop in a snippet similar to the one below:

    - name: Validate policy
      uses: gbaeke/kyverno-cli@v1
      with:
        command: |
          kyverno apply ./policies --resource=./deploy/deployment.yaml

Here’s a link to a workflow that uses it: https://github.com/gbaeke/go-template/blob/main/.github/workflows/test.yml.

There’s more you can do with the CLI so be sure to check out the documentation.

Conclusion

Although we only scratched the surface in this post and the above video, in my opinion Kyverno is somewhat easier to get started with than OPA Gatekeeper. Having the ability to create mutation and generation policies opens up all kinds of interesting scenarios as well. The documentation is clear and the examples are a good way to get you started. If you only need policies on Kubernetes and not the wide capabilities of OPA, give it a try and tell me what you think!

%d bloggers like this: