A look at Windows containers on AKS

Now that the public preview of Windows containers on AKS is available, let’s look at the basics. You need a couple of things to get started, including a couple of subscription-wide settings. I recommend using a subscription that is not used to roll out production AKS clusters. Make sure the Azure CLI (az) is homed to the subscription. Use Azure Cloud Shell to make your life easier:

  • Install the aks-preview extension
  • Register the Windows preview feature
  • Check that the feature is active; this will take a few minutes
  • Register the Microsoft.ContainerService resource provider again (only if the Windows preview feature is active)

The following commands make the above happen:

az extension add --name aks-preview

az feature register --name WindowsPreview --namespace Microsoft.ContainerService

az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/WindowsPreview')].{Name:name,State:properties.state}"

az provider register --namespace Microsoft.ContainerService

With that out of the way, deploy a new AKS cluster:

az aks create \
     --resource-group RESOURCEGROUP \
     --name winclu \
     --node-count 1 \
     --kubernetes-version 1.13.5 \
     --generate-ssh-keys \
     --windows-admin-password APASSWORDHERE \
     --windows-admin-username azureuser \
     --enable-vmss \
     --enable-addons monitoring \
     --network-plugin azure

Replace RESOURCEGROUP with an ARM resource group and replace APASSWORDHERE with a complex password. If you have ever deployed clusters that support multiple node pools with virtual machine scale sets, the above command will be very familiar. The only real difference here is –windows-admin-password and –windows-admin-username which are required to deploy the Windows hosts that will run your containers.

You can use the Windows user name and password to RDP into the Kubernetes nodes. You will need to deploy a jump host that has a route to the Kubernetes virtual network to make this happen as the Kubernetes hosts are not exposed with a public IP address. As they shouldn’t… 😉

Note that you need to deploy a node pool with Linux first (as in the above command). That is why the number of nodes has been set to the minimum. You cannot delete this node pool after adding a Windows node pool.

After deployment, you will see the cluster in the portal with the Linux node pool with one node:

node pool with one node

When you click Add node pool, you will be able to select the OS type of a new pool:

Both Linux and Windows as OS type for the node pool

We will add a Windows node pool via the CLI. The node pool will use the Standard_D2s_v3 virtual machine size by default, which is also the recommended minimum.

az aks nodepool add \
     --resource-group RESOURCEGROUP \
     --cluster-name winclu \
     --os-type Windows \
     --name winpl \
     --node-count 1 \
     --kubernetes-version 1.13.5

Note: the name of the Windows node pool cannot be longer than 6 characters

The node pool is now being added and will soon be ready:

windows node pool being added

When ready, you will see an additional scale set in the resource group that backs this AKS deployment:

additional scale set for the Windows node pool

We can now schedule pods on the Windows node pool. You can schedule a pod on a Windows node by adding a nodeSelector to the pod spec:

nodeSelector:         
  "beta.kubernetes.io/os": windows 

To try this, let’s deploy a Windows version of my realtime-go app with the following command. The gist contains the YAML required to deploy the app and a service. It uses the gbaeke/realtime-go-win image on Docker Hub. The base image is mcr.microsoft.com/windows/nanoserver:1809. You need to use the 1809 version because the hosts use 1809 as well. With Hyper-V isolation, the kernel match would not be required.

kubectl apply -f https://gist.githubusercontent.com/gbaeke/ed029e8ccbf345661ed7f07298a36c21/raw/02cedf88defa7a0a3dedff5e06f7e2fc5bbeccbe/realtime-go-win.yaml 

This should deploy the app but sadly, it will error out. It needs a running redis server. Let’s deploy that the quick and dirty way (command on one line below):

kubectl run redis --image=redis --replicas=1 --overrides='{ "spec": { "template": { "spec": { "nodeSelector": { "beta.kubernetes.io/os": "linux" } } } } }' --expose --port 6379

I realize it’s ugly with the override but it does the trick. The above command creates a deployment called redis that sets the nodeSelector to target Linux nodes. It also creates a service of type ClusterIP that exposes port 6379. The ClusterIP allows the realtime-go-win container to connect to redis over the Kubernetes network. Now delete the realtime-go container and recreate it:

kubectl delete -f https://gist.githubusercontent.com/gbaeke/ed029e8ccbf345661ed7f07298a36c21/raw/02cedf88defa7a0a3dedff5e06f7e2fc5bbeccbe/realtime-go-win.yaml

kubectl apply -f https://gist.githubusercontent.com/gbaeke/ed029e8ccbf345661ed7f07298a36c21/raw/02cedf88defa7a0a3dedff5e06f7e2fc5bbeccbe/realtime-go-win.yaml 

Note that I could not get DNS resolution to work in the Windows container. Normally, the realtime-go container should be able to find the redis service via the name redis or the complete FQDN of redis.default.svc.cluster.local. Because that did not work, the code in the realtime-go-win container was modified to use environment variables injected by Kubernetes:

redisHost := getEnv("REDISHOST", "")
if redisHost == "" {
    redisIP := getEnv("REDIS_SERVICE_HOST", "localhost")
    redisPort := getEnv("REDIS_SERVICE_PORT", "6379")
    redisHost = redisIP + ":" + redisPort
} 

Conclusion

Deploying an AKS cluster with both Linux and Windows node pools is a simple matter. Because you can now deploy both Windows and Linux containers, you have some additional work to make sure Windows containers go to Windows hosts and Linux containers to Linux hosts. Using a nodeSelector is an easy way to do that. There are other methods as well such as node taints. Sadly, I had an issue with Kubernetes DNS in the Windows container so I switched to injected environment variables.

Deploying Azure resources using webhookd

In the previous blog post, I discussed adding SSL to webhookd. In this post, I will briefly show how to use this solution to deploy Azure resources.

To run webhookd, I deployed a small Standard_B1s machine (1GB RAM, 1 vCPU) with a system assigned managed identity. After deployment, information about the managed identity is available via the Identity link.

Code running on a machine with a managed identity needs to do something specific to obtain information about the identity like a token. With curl, you would issue the following command:

curl 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F' -H Metadata:true -s

The response would be JSON that contains a field called access_token. You could parse out the access_token and then use the token in a call to the Azure Resource Manager APIs. You would use the token in the autorization header. Full details about acquiring these tokens can be found here. On that page, you will find details about acquiring the token with Go, JavaScript and several other languages.

Because we are using webhookd and shell scripts, the Azure CLI is the ideal way to create Azure resources. The Azure CLI can easily authenticate with the managed identity using a simple command: az login –identity. Here’s a shell script that uses it to create a virtual machine:

#!/bin/bash echo "Authenticating...`az login --identity`" 

echo "Creating the resource group...`az group create -n $rg -l westeurope`"

echo "Creating the vm...`az vm create --no-wait --size Standard_B1s --resource-group $rg --name $vmname --image win2016datacenter --admin-username azureuser --admin-password $pw`"

The script expects three parameters: rg, vmname and pw. We can pass these parameters as HTTP query parameters. If the above script would be in the ./scripts/vm folder as create.sh, I could do the following call to webhookd:

curl --user api -XPOST "https://<public_server_dns>/vm/create?vmname=myvm&rg=myrg&pw=Abcdefg$$$$!!!!" 

The response to the above call would contain the output from the three az commands. The az login command would output the following:

 data:   {
data: "environmentName": "AzureCloud",
data: "id": "<id>",
data: "isDefault": true,
data: "name": "<subscription name>",
data: "state": "Enabled",
data: "tenantId": "<tenant_id>",
data: "user": {
data: "assignedIdentityInfo": "MSI",
data: "name": "systemAssignedIdentity",
data: "type": "servicePrincipal"
data: }

Notice the user object, which clearly indicates we are using a system-assigned managed identity. In my case, the managed identity has the contributor role on an Azure subscription used for testing. With that role, the shell script has the required access rights to deploy the virtual machine.

As you can see, it is very easy to use webhookd to deploy Azure resources if the Azure virtual machine that runs webhookd has a managed identity with the required access rights.

Using certmagic to add SSL to webhookd

A while ago, I stumbled upon https://github.com/ncarlier/webhookd. It is a simple webhook server, written in Go, that can execute shell scripts. To use it, simply install it on a Linux box and execute it. By default, the executable looks at the ./scripts folder and maps each shell script to a URL you can call. It is well documented so do take a look at the GitHub page for full details on its configuration.

Out of the box, webhookd supports basic authentication if you supply a .htpasswd file. It does not, however, support SSL. That can be fixed in several ways though. In my case, I wanted one executable that supports SSL with Let’s Encrypt certificates. As it turns out, there is a great solution for that: https://github.com/mholt/certmagic.

To simplify using webhookd together with certmagic, I forked the webhookd repo and added certmagic support. The fork is here: https://github.com/gbaeke/webhookd. To use it, use go get github.com/gbaeke/webhookd and work from there. The fork could be improved by adding extra parameters for e-mail address and DNS name. For now, change the code by following the steps below:

  • In main.go, search for mail@mail.com and replace it with a valid e-mail address; although not required it is a good practice to supply an e-mail address to the folks at Let’s Encrypt
  • In main.go, search for www.example.com and replace it with a valid DNS name
  • The DNS name you use needs to resolve to the IP address of the machine that runs webhookd; it should be a public IP address because the code uses the HTTP challenger
  • The machine that runs webhookd should expose port 80 and port 443
  • If you want to use the Let’s Encrypt staging servers during testing (recommended) change certmagic.LetsEncryptProductionCA to certmagic.LetsEncryptStagingCA

In my case, the machine that runs webhookd is a small Linux machine running on Microsoft Azure. The DNS name is actually a CNAME record that is an alias for the DNS name of the virtual machine (e.g. vmname.westeurope.cloudapp.azure.com). You are now ready to build webhookd with go build. When it’s ready, just execute webhookd. When you run this the first time, certmagic will notice there is no certificate and will start to talk to Let’s Encrypt using the ACME protocol. By default, HTTP verification is used which just means Let’s Encrypt will tell certmagic to host a file over plain HTTP. When Let’s Encrypt can retrieve that file, it concludes you must be the owner of the DNS name used in the certificate. The certificate will be issued and stored on the file system under $home/.local/share/certmagic/acme.

You will get some messages regarding the certificate request process as shown below. When the cached certificate is found and it is valid, you will just get the Serving HTTP->HTTPS message:

image

Note that you will not be able to bind to low ports like 80 and 443 as a non-root user. So either run webhookd as root or use setcap. For instance sudo setcap cap_net_bind_service=+ep /path/to/webhookd. After running the setcap command, you can run webhookd as a non-root user and it will be able to bind to port 80 and 443.

I also have basic authentication enabled for a user called api. To test the configuration, I can use curl like so:

image

Due to the use of the Let’s Encrypt production CA, there is no need to use the –insecure flag with curl. The certificate is fully trusted on my (Windows) machine. If you pulled down the complete repository, the scripts folder contains a shell script called echo.sh. That script is automatically made available as /echo. Everything the script echoes to stdout is used as output of the HTTP call. Simple but effective!

In a follow-up post, we will take a look at using webhookd to deploy Azure resources using a managed identity and the Azure CLI. Stay tuned!