Azure Policy for Kubernetes: Contraints and ConstraintTemplates

In one on my videos on my YouTube channel, I talked about Kubernetes authentication and used the image below:

Securing access to the Kubernetes API Server

To secure access to the Kubernetes API server, you need to be authenticated and properly authorized to do what you need to do. The third mechanism to secure access is admission control. Simply put, admission control allows you to inspect requests to the API server and accept or deny the request based on rules you set. You will need an admission controller, which is just code that intercepts the request after authentication and authorization.

There is a list of admission controllers that are compiled-in with two special ones (check the docs):

  • MutatingAdmissionWebhook
  • ValidatingAdmissionWebhook

With the two admission controllers above, you can develop admission plugins as extensions and configure them at runtime. In this post, we will look at a ValidatingAdmissionWebhook that is used together with Azure Policy to inspect requests to the AKS API Server and either deny or audit these requests.

Note that I already have a post about Azure Policy and pod security policies here. There is some overlap between that post and this one. In this post, we will look more closely at what happens on the cluster.

Want a video instead?

Azure Policy

Azure has its own policy engine to control the Azure Resource Manager (ARM) requests you can make. A common rule in many organizations for instance is the prohibition of creation of expensive resources or even creating resources in unapproved regions. For example, a European company might want to only create resources in West Europe or North Europe. Azure Policy is the engine that can enforce such a rule. For more information, see Overview of Azure Policy. In short, you select from an ever growing list of policies or you create your own. Policies can be grouped in policy initiatives. A single policy or an initiative gets assigned to a scope, which can be a management group, a subscription or a resource group. In the portal, you then check for compliance:

Compliancy? What do I care? It’s just my personal subscription 😁

Besides checking for compliance, you can deny the requests in real time. There are also policies that can create resources when they are missing.

Azure Policy for Kubernetes

Although Azure Policy works great with Azure Resource Manager (ARM), which is basically the API that allows you to interact with Azure resources, it does not work with Kubernetes out of the box. We will need an admission controller (see above) that understands how to interpret Kubernetes API requests in addition to another component that can sync policies in Azure Policy to Kubernetes for the admission controller to pick up. There is a built-in list of supported Kubernetes policies.

For the admission controller, Microsoft uses Gatekeeper v3. There is a lot, and I do mean a LOT, to say about Gatekeeper and its history. We will not go down that path here. Check out this post for more information if you are truly curious. For us it’s enough to know that Gatekeeper v3 needs to be installed on AKS. In order to do that, we can use an AKS add-on. In fact, you should use the add-on if you want to work with Azure Policy. Installing Gatekeeper v3 on its own will not work.

Note: there are ways to configure Azure Policy to work with Azure Arc for Kubernetes and AKS Engine. In this post, we only focus on the managed Azure Kubernetes Service (AKS)

So how do we install the add-on? It is very easy to do with the portal or the Azure CLI. For all details, check out the docs. With the Azure CLI, it is as simple as:

az aks enable-addons --addons azure-policy --name CLUSTERNAME --resource-group RESOURCEGROUP

If you want to do it from an ARM template, just add the add-on to the template as shown here.

What happens after installing the add-on?

I installed the add-on without active policies. In kube-system, you will find the two pods below:

azure-policy and azure-policy-webhook

The above pods are part of the add-on. I am not entirely sure what the azure-policy-webhook does, but the azure-policy pod is responsible for checking Azure Policy for new assignments and translating that to resources that Gatekeeper v3 understands (hint: constraints). It also checks policies on the cluster and reports results back to Azure Policy. In the logs, you will see things like:

  • No audit results found
  • Schedule running
  • Creating constraint

The last line creates a constraint but what exactly is that? Constraints tell GateKeeper v3 what to check for when a request comes to the API server. An example of a constraint is that a container should not run privileged. Constraints are backed by constraint templates that contain the schema and logic of the constraint. Good to know, but where are the Gatekeeper v3 pods?

Gatekeeper pods in the gatekeeper-system namespace

Gatekeeper was automatically installed by the Azure Policy add-on and will work with the constraints created by the add-on, synced from Azure Policy. When you remove these pods, the add-on will install them again.

Creating a policy

Although you normally create policy initiatives, we will create a single policy and see what happens on the cluster. In Azure Policy, choose Assign Policy and scope the policy to the resource group of your cluster. In Policy definition, select Kubernetes cluster should not allow privileged containers. As discussed, that is one of the built-in policies:

Creating a policy that does not allow privileged containers

In the next step, set the effect to deny. This will deny requests in real time. Note that the three namespaces in Namespace exclusions are automatically added. You can add extra namespaces there. You can also specifically target a policy to one or more namespaces or even use a label selector.

Policy parameters

You can now select Review and create and then select Create to create the policy assignment. This is the result:

Policy assigned

Now we have to wait a while for the change to be picked up by the add-on on the cluster. This can take several minutes. After a while, you will see the following log entry in the azure-policy pod:

Creating constraint: azurepolicy-container-no-privilege-blablabla

You can see the constraint when you run k get constraints. The constraint is based on a constraint template. You can list the templates with k get constrainttemplates. This is the result:

constraint templates

With k get constrainttemplates k8sazurecontainernoprivilege -o yaml, you will find that the template contains some logic:

the template’s logic

The block of rego contains the logic of this template. Without knowing rego, which is the policy language used by Open Policy Agent (OPA) which is used by Gatekeeper v3 on our cluster, you can actually guess that the privileged field inside securityContext is checked. If that field is true, that’s a violation of policy. Although it is useful to understand more details about OPA and rego, Azure Policy hides the complexity for you.

Does it work?

Let’s try to deploy the following deployment.yaml:

apiVersion: apps/v1
kind: Deployment
  name: nginx-deployment
    app: nginx
  replicas: 3
      app: nginx
        app: nginx
        - name: nginx
          image: nginx:1.14.2
            - containerPort: 80
            privileged: true

After running kubectl apply -f deployment.yaml, everything seems fine. But when we run kubectl get deploy:

Pods are not coming up

Let’s run kubectl get events:


Notice that denied the request because privileged was set to true.

Adding more policies

Azure Security Center comes with a large initiative, Azure Security Benchmark, that also includes many Kubernetes policies. All of these policies are set to audit for compliance. On my system, the initiative is assigned at the subscription level:

Azure Security Benchmark assigned at subscription level with name Security Center

The Azure Policy add-on on our cluster will pick up the Kubernetes policies and create the templates and constraints:

Several new templates created

Now we have two constraints for k8sazurecontainernoprivilege:

Two constraints: one deny and the other audit

The new constraint comes from the larger initiative. In the spec, the enforcementAction is set to dryrun (audit). Although I do not have pods that violate k8sazurecontainernoprivilege, I do have pods that violate another policy that checks for host path mapping. That is reported back by the add-on in the compliance report:

Yes, akv2k8s maps to /etc/kubernetes on the host


In this post, you have seen what happens when you install the AKS policy add-on and enable a Kubernetes policy in Azure Policy. The add-on creates constraints and constraint templates that Gatekeeper v3 understands. The rego in a constraint template contains logic used to define the policy. When the policy is set to deny, Gatekeeper v3, which is an admission controller denies the request in real-time. When the policy is set to audit (or dry run at the constraint level), audit results are reported by the add-on to Azure Policy.

How to delete an stubborn Azure Virtual Hub

A while ago, I created an Azure Virtual WAN (Standard) and added a virtual hub. For some reason, the virtual hub ended up in the state below:

Hub and routing status: Failed (Ouch!)

I tried to reset the router and virtual hub but to no avail. Next, I tried to delete the hub. In the portal, this resulted in a validating state that did not end. In the Azure CLI, an error was thrown.

Because this is a Microsoft Partner Network (MPN) subscription, I also did not have technical support or an easy way to enable it. I ended up buying Developer Support for a month just to open a service request.

The (helpful) support engineer asked me to do the following:

  • Open Azure Resource Explorer
  • Navigate to subscriptions and the resource group that contains the hub
  • Under providers, navigate to Microsoft.Network
  • Locate the virtual hub and do GET, EDIT, PUT (set Read/Write mode first)
After clicking GET and EDIT, PUT can be clicked

At first it did not seem to work but in my case, the PUT operation just took a very long time. After the PUT operation finished, I could delete the virtual hub from the portal.

Long story short: if you ever have a resource you cannot delete, give Azure Resource Explorer and the above procedure a try. Your mileage may vary though!

Integrate an Azure Storage Account with Active Directory

A customer with a Windows Virtual Desktop deployment needed access to several file shares for one of their applications. The integration of Azure Storage Accounts with Active Directory allows us to provide this functionality without having to deploy and maintain file services on a virtual machine.

A sketch of the environment looks something like this:

Azure File Share integration with Active Directory

Based on the sketch above, you should think about the requirements to make this work:

  • Clients that access the file share need to be joined to a domain. This can be an Azure Active Directory Domain Services (AADDS) managed domain or just plain old Active Directory Domain Services (ADDS). The steps to integrate the storage account with AADDS or AADS are different as we will see later. I will only look at ADDS integration via a PowerShell script. In this case, the WVD virtual machines are joined to ADDS and domain controllers are available on Azure.
  • Users that access the file share need to have an account in ADDS that is synced to Azure AD (AAD). This is required because users or groups are given share-level permissions at the storage account level to their AAD identity. The NTFS-level permissions are given to the ADDS identity. Since this is a Windows Virtual Desktop deployment, that is already the case.
  • You should think about how the clients (WVD here) connect to the file share. If you only need access from Azure subnets, then VNET Service Endpoints are a good choice. This will configure direct routing to the storage account in the subnet’s route table and also provides the necessary security as public access to the storage account is blocked. You could also use Private Link or just access the storage account via public access. I do not recommend the latter so either use service endpoints or private link.

Configuring the integration

In the configuration of the storage account, you will see the following options:

Storage account AD integration options

Integration with AADDS is just a click on Enabled. For ADDS integration however, you need to follow another procedure from a virtual machine that is joined to the ADDS domain.

On that virtual machine, log on with an account that can create a computer object in ADDS in an OU that you set in the script. For best results, the account you use should be synced to AAD and should have at least the Contributor role on the storage account.

Next, download the Microsoft provided scripts from here and unzip them in a folder like C:\scripts. You should have the following scripts and modules in there:

Scripts and PowerShell module for Azure Storage Account integration

Next, add a script to the folder with the following contents and replace the <PLACEHOLDERS>:

Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope CurrentUser


Import-Module -Name AzFilesHybrid


$SubscriptionId = "<YOUR SUB ID>"
$ResourceGroupName = "<YOUR RESOURCE GROUP>"
$StorageAccountName = "<YOUR STORAGE ACCOUNT NAME>"

Select-AzSubscription -SubscriptionId $SubscriptionId 

Join-AzStorageAccountForAuth `
        -ResourceGroupName $ResourceGroupName `
        -Name $StorageAccountName `
        -DomainAccountType "ComputerAccount" ` 
        -OrganizationalUnitName "<OU DISTINGUISHED NAME>

Debug-AzStorageAccountAuth -StorageAccountName $StorageAccountName -ResourceGroupName $ResourceGroupName -Verbose

Run the script from the C:\scripts folder so it can execute CopyToPSPath.ps1 and import the AzFilesHybrid module. The Join-AzStorageAccountForAuth cmdlet does the actual work. When you are asked to rerun the script, do so!

The result of the above script should be a computer account in the OU that you chose. The computer account has the name of the storage account.

In the storage account configuration, you should see the following:

The blurred section will show the domain name

Now we can proceed to granting “share-level” access rights, similar to share-level rights on a Windows file server.

Granting share-level access

Navigate to the file share and click IAM. You will see the following:

IAM on the file share level

Use + Add to add AAD users or groups. You can use the following roles:

  • Storage File Data SMB Share Reader: read access
  • Storage File Data SMB Share Contributor: read, write and delete
  • Storage File Data SMB Share Elevated Contributor: read, write and delete + modify ACLs at the NTFS level

For example, if I needed to grant read rights to the group APP_READERS in ADDS, I would grant the Storage File Data SMB Share Reader role to the APP_READERS group in Azure AD (synced from ADDS).

Like on a Windows File Server, share permissions are not sufficient. Let’s add some good old NTFS rights…

Granting NTFS Access Rights

For a smooth experience, log on to a domain-joined machine with an ADDS account that is synced to an AAD account that has at least the Contributor role on the storage account.

To grant the NTFS right, map a drive with the storage account key. Use the following command:

net use <desired-drive-letter>: \\<storage-account-name>\<share-name> /user:Azure\<storage-account-name> <storage-account-key>

Get the storage account key from here:

Storage account access keys

Now you can open the mapped drive in Explorer and set NTFS rights. Alternatively, you can use icacls.exe or other tools.

Mapping the drive for users

Now that the storage account is integrated with ADDS, a user can log on to a domain-joined machine and mount the share without having to provide credentials. As long as the user has the necessary share and NTFS rights, she can access the data.

Mapping the drive can be done in many ways but a simple net use Z: \\\share will suffice.

Securing the connection

You should configure the storage account in such a way that it only allows access from selected clients. In this case, because the clients are Windows Virtual Desktops in a specific Azure subnet, we can use Virtual Network Service Endpoints. They can be easily configured from Firewalls and Virtual Networks:

Access from selected networks only: 3 subnets in this case

Granting access to specific subnets results in the configuration of virtual network service endpoints and a modification of the subnet route table with a direct route to the storage account on the Microsoft network. Note that you are still connecting to the public IP of the storage account.

If you decide to use Private Link instead, you would get a private IP in your subnet that is mapped to the storage account. In that case, even on-premises clients could connect to the storage account over the VPN or ExpressRoute private peerings. Of course, using private link would require extra DNS configuration as well.

Some extra notes

  • when you cannot configure the integration with the PowerShell script, follow the steps to enable the integration manually; do not forget the set the Kerberos password properly
  • it is recommended to put the AD computer accounts that represent the storage accounts in a separate OU; enable a Group Policy on that OU that prevents password resets on the computer accounts


Although there is some work to be done upfront and there are requirements such as Azure AD and Azure AD Connect, using an Azure Storage Account to host Active Directory integrated file shares is recommended. Remember that it works with both AADDS and ADDS. In this post, we looked at ADDS only and integration via the Microsoft-provided PowerShell scripts.

Front Door with WordPress on Azure App Service

Here’s a quick overview of the steps you need to take to put Front Door in front of an Azure Web App. In this case, the web app runs a WordPress site.

Step 1: DNS

Suppose you deployed the Web App and its name is and you want to reach the site via Traffic will flow like this:

user types ---CNAME to> Front Door --- connects to using host header

It’s clear that later, in Front Door, you will have to specify the host header ( in this case). More on that later…

If you have worked with Azure Web App before, you probably know you need to configure the host header sent by the browser as a custom domain on the web app. Something like this:

Custom domain in Azure Web App (no https configured – hence the red warning)

In this case, we do not want to resolve to the web app but to Front Door. To make the custom domain assignment work (because the web app will verify the custom name), add the following TXT record to DNS:

TXT awverify.wp 

For example in CloudFlare:

awverify txt record in CloudFlare DNS

With the above TXT record, I could easily add as a custom domain to the web app.

Note: is a CNAME to your Front Door domain (see below)

Step 2: Front Door

My Front Door designer looks like this:

Front Door designer

When you create a Front Door, you need to give it a name. In my case that is With as a CNAME for, you can easily add as an additional Frontend host.

The backend pool is the Azure Web App. It’s configured as follows:

Front Door backend host (only one in the pool); could also have used the Azure App Service backend type

You should connect to the web app using its original name but send as the host header. This allows Front Door to connect to the web app correctly.

The last part of the Front Door config is a simple rule that connects the frontend to the backend pool using HTTP only.

Step 3: WordPress config

With the default Azure WordPress templates, you do not need to modify anything because wp-config.php contains the following settings:

define('WP_SITEURL', 'http://' . $_SERVER['HTTP_HOST'] . '/');                                                        define('WP_HOME', 'http://' . $_SERVER['HTTP_HOST'] . '/');

If you want, you can change this to:

define('WP_SITEURL', '' );                                                                         define('WP_HOME', '');   

Step 4: Blocking access from other locations

In general, you want users to only connect to the site via Front Door. To achieve this, add the following access restrictions to the Web App:

Access restrictions to only allow traffic from Front Door and Azure basic infrastructure services

AKS Azure Monitor metrics and alerts

In today’s post, we will take a quick look at Azure Kubernetes Service (AKS) metrics and alerts for Azure Monitor. Out of the box, Microsoft offers two ways to obtain metrics:

  • Metrics that can easily be used with Azure Monitor to generate alerts; these metrics are written to the Azure Monitor metrics store
  • Metrics forwarded to Log Analytics; with Log Analytics queries (KQL), you can generate alerts as well

In this post, we will briefly look at the metrics in the Azure Monitor metrics store. In the past, the AKS metrics in the metrics store were pretty basic:

Basic Azure Monitor metrics for AKS

Some time ago however, support for additional metrics was introduced:

insights.container/nodes metrics
insights.containers/pods metrics

Although you can find the above data in Log Analytics as well, it is just a bit easier to work with these metrics when they are in the metrics store. Depending on the age of your cluster, these metrics might not be enabled. Check this page to learn how to enable them:

When the metrics are enabled, it is easy to visualize them from the Metrics pane. Note that metrics can be split. The screenshot below shows the nodes count, split in Ready and NotReady:

Pretty uneventful… 2 nodes in ready state

To generate an alert based on the above metrics, a new alert rule can be generated. Although the New alert rule link is greyed out, you can create the alert from Azure Monitor:

Creating a alert on node count from Azure Monitor

And of course, when this fires you will see this in Azure Monitor:

Heeeeelp… node down
Details about the alert

Azure SQL Database High Availability

Creating a SQL Database in Azure is a simple matter: create the database and server, get your connection string and off you go! Before starting though, spend some time thinking about the level of high availability (HA) that you want:

  • What is the required level of HA within the deployment region (e.g. West Europe)?
  • Do you require failover to another region (e.g. from West Europe to North Europe)?

HA in a single region

To achieve the highest level of availability in a region, do the following:

  • Use the Premium (DTU) or Business Critical tier (vCore): Azure will use the premium availability model for your database
  • Enable Availability Zone support if the region supports it: copies of your database will be spread over the zones

The diagram below illustrates the premium availability model (from the Microsoft docs):

Premium Availability Model

The region will contain one primary read/write replica and several secondary replicas. Each replica uses local SSD storage. The database is replicated synchronously and failover is swift and without data loss. If required, you can enable a read replica and specify you want to connect to the read replica by adding ApplicationIntent=ReadOnly to the connection string. Great for reporting and BI!

Spreading the databases over multiple zones is as simple as checking a box. Availability zone support comes at no extra cost but can increase the write commit latency because the nodes are a few kilometers apart. The option to enable zone support is in the Configure section as shown below:

Enabling zone redundancy

To read more about high availability, including the standard availability model for other tiers, check out the docs.

For critical applications, we typically select the Premium/Business Critical model as it provides good performance coupled to the highest possible availability in a region.


The geo-replication feature replicates a database asynchronously to another region. Suppose you have a database and server in West Europe that you want to replicate to France Central. In the portal, navigate to the database (not the server) and select Geo-Replication. Then check the region, in this case France Central. The following questions show up:


A database needs a logical server object that contains the database. To replicate the database to France Central, you need such a server object in that region. The UI above allows you to create that server.

Note that the databases need to use the same tier although the secondary can be configured with less DTUs or vCores. Doing so is generally not recommended.

After configuration, the UI will show the active replication. In this case, I am showing replication from North Europe to West Europe (and not France Central):

Geo-replication is easy to configure but in practice, we recommend to use Failover Groups. A Failover Group uses geo-replication under the hood but gives you some additional features such as:

  • Automated failover (vs manual with just geo-replication)
  • Two servers to use in your connection string that use CNAMEs that are updated in case of failover; one CNAME always points to the read/write replica, the other to the read replica

Failover groups are created at the server level instead of the database level:

Failover group

Below, there is a failover group aks-fo with the primary server in North Europe and the secondary in West Europe:

Failover group details

You can manually fail over the database if needed:

Failover and forced failover

Failover allows you to failover the database without data loss if both databases are still active. Forced failover performs a failover even if the primary is down, which might lead to data loss.

Note: when you configure a Failover Group, geo-replication will automatically be configured for you.

Connecting from an application

To connect to a database configured in a failover group, first get the failover group server names:

Read/write and read-only listener endpoints

Next, in your application, use the appropriate connection string. For instance, in Go:

var sqldb *sql.DB
var server = ""
var port = 1433
var user = "USERNAME"
var password = "PASSWORD"
var database = "DBNAME"

func init() {
    // Build connection string
    connString := fmt.Sprintf("server=%s;user id=%s;password=%s;port=%d;database=%s;",
        server, user, password, port, database)

    var err error

    // Create connection pool
    sqldb, err = sql.Open("sqlserver", connString)
    if err != nil {
        log.Fatal("Error creating connection pool: ", err.Error())
    ctx := context.Background()

    //above commands actually do not connect to SQL but the ping below does
    err = sqldb.PingContext(ctx)
    if err != nil {

During a failover, there will be an amount of time that the database is not available. When that happens, the connection will fail. The error is shown below:

[db customers]: invalid response code 500, body: {"name":"fault","id":"7JPhcikZ","message":"Login error: mssql: Database 'aksdb' on server 'akssrv-ne' is not currently available.  Please retry the connection later.  If the problem persists, contact customer support, and provide them the session tracing ID of '6D7D70C3-D550-4A74-A69C-D689E6F5CDA6'.","temporary":false,"timeout":false,"fault":true}

Note: the Failover Group uses a CNAME record which resolves to the backend servers in either West or North Europe. Make sure you allow connections to these servers in the firewall or you will get the following error:

db customers]: invalid response code 500, body: {"name":"fault","id":"-p9TwZkm","message":"Login error: mssql: Cannot open server 'akssrv-ne' requested by the login. Client with IP address 'IP ADDRESS' is not allowed to access the server.  To enable access, use the Windows Azure Management Portal or run sp_set_firewall_rule on the master database to create a firewall rule for this IP address or address range.  It may take up to five minutes for this change to take effect.","temporary":false,"timeout":false,"fault":true} 


For the highest level of availability, use the regional premium availability model with Availability Zone support. In addition, use a Failover Group to enable replication of the database to another region. A Failover Group automatically connects your application to the primary (read/write) or secondary replica (read) via a CNAME and can failover automatically after some grace period.

Deploy AKS and Traefik with an Azure DevOps YAML pipeline

This post is a companion to the following GitHub repository: The repository contains ARM templates to deploy an AD integrated Kubernetes cluster and an IP address plus a Helm chart to deploy Traefik. Traefik is configured to use the deployed IP address. In addition to those files, the repository also contains the YAML pipeline, ready to be imported in Azure DevOps.

Let’s take a look at the different building blocks!

AKS ARM Template

The aks folder contains the template and a parameters file. You will need to modify the parameters file because it requires settings to integrate the AKS cluster with Azure AD. You will need to specify:

  • clientAppID: the ID of the client app registration
  • serverAppID: the ID of the server app registration
  • tenantID: the ID of your AD tenant

Also specify clientId, which is the ID of the service principal for your cluster. Both the serverAppID and the clientID require a password. The passwords have been set via a pipeline secret variable.

The template configures a fairly standard AKS cluster that uses Azure networking (versus kubenet). It also configures Log Analytics for the cluster (container insights).

Deploying the template from the YAML file is done with the task below. You will need to replace YOUR SUBSCRIPTION with an authorized service connection:

 - task: AzureResourceGroupDeployment@2
     azureSubscription: 'YOUR SUBSCRIPTION'
     action: 'Create Or Update Resource Group'
     resourceGroupName: '$(aksTestRG)'
     location: 'West Europe'
     templateLocation: 'Linked artifact'
     csmFile: 'aks/deploy.json'
     csmParametersFile: 'aks/deployparams.t.json'
     overrideParameters: '-serverAppSecret $(serverAppSecret) -clientIdsecret $(clientIdsecret) -clusterName $(aksTest)'
       deploymentMode: 'Incremental'
       deploymentName: 'CluTest' 

The task uses several variables like $(aksTestRG) etc… If you check azure-pipelines.yaml, you will notice that most are configured at the top of the file in the variables section:

  aksTest: 'clu-test'
  aksTestRG: 'rg-clu-test'
  aksTestIP: 'clu-test-ip' 

The two secrets are the secret 🔐 vaiables. Naturally, they are configured in the Azure DevOps UI. Note that there are other means to store and obtain secrets, such as Key Vault. In Azure DevOps, the secret variables can be found here:

Azure DevOps secret variables

IP Address Template

The ip folder contains the ARM template to deploy the IP address. We need to deploy the IP address resource to the resource group that holds the AKS agents. With the names we have chosen, that name is MC_rg-clu-test_clu-test_westeurope. It is possible to specify a custom name for the resource group.

Because we want to obtain the IP address after deployment, the ARM template contains an output:

 "outputs": {
        "ipaddress": {
            "type": "string",
            "value": "[reference(concat('Microsoft.Network/publicIPAddresses/', parameters('ipName')), '2017-10-01').ipAddress]"

The output ipaddress is of type string. Via the reference template function we can extract the IP address.

The ARM template is deployed like the AKS template but we need to capture the ARM outputs. The last line of the AzureResourceGroupDeployment@2 that deploys the IP address contains:

deploymentOutputs: 'armoutputs'

Now we need to extract the IP address and set it as a variable in the pipeline. One way of doing this is via a bash script:

 - task: Bash@3
        targetType: 'inline'
        script: |
          echo "##vso[task.setvariable variable=test-ip;]$(echo '$(armoutputs)' | jq .ipaddress.value -r)" 

You can set a variable in Azure DevOps with echo ##vso[task.setvariable variable=variable_name;]value. In our case, the “value” should be the raw string of the IP address output. The $(armoutputs) variable contains the output of the IP address ARM template as follows:

{"ipaddress":{"type":"String","value":"IP ADDRESS"}}

To extract IP ADDRESS, we pipe the output of “echo $(armoutputs)” to js .ipaddress.value -r which extracts the IP ADDRESS from the JSON. The -r parameter removes double quotes from the IP ADDRESS to give us the raw string. For more info about jq, check .

We now have the IP address in the test-ip variable, to be used in other tasks via $(test-ip).

Taking care of the prerequisites

In a later phase, we install Traefik via Helm. So we need kubectl and helm on the build agent. In addition, we need to install tiller on the cluster. Because the cluster is RBAC-enabled, we need a cluster account and a role binding as well. The following tasks take care of all that:

- task: KubectlInstaller@0
     kubectlVersion: '1.13.5'

- task: HelmInstaller@1
     helmVersionToInstall: '2.14.1'

- task: AzureCLI@1
    azureSubscription: 'YOUR SUB'
    scriptLocation: 'inlineScript'
    inlineScript: 'az aks get-credentials -g $(aksTestRG) -n $(aksTest) --admin'

 - task: Bash@3
     filePath: 'tiller/'
     workingDirectory: 'tiller/' 

Note that we use the AzureCLI built-in task to easily obtain the cluster credentials for kubectl on the build agent. We use the –admin flag to gain full access. Note that this downloads sensitive information to the build agent temporarily.

The last task just runs a shell script to configure the service account and role binding and install tiller. Check the repository to see the contents of this simple script. Note that this is the quick and easy way to install tiller, not the most secure way! 🙇‍♂️

Install Traefik and use the IP address

The repository contains the downloaded chart (helm fetch stable/traefik –untar). The values.yaml file was modified to set the ingressClass to traefik-ext. We could have used the chart from the Helm repository but I prefer having the chart in source control. Here’s the pipeline task:

 - task: HelmDeploy@0
     connectionType: 'None'
     namespace: 'kube-system'
     command: 'upgrade'
     chartType: 'FilePath'
     chartPath: 'traefik-ext/.'
     releaseName: 'traefik-ext'
     overrideValues: 'loadBalancerIP=$(test-ip)'
     valueFile: 'traefik-ext/values.yaml' 

kubectl is configured to use the cluster so connectionType can be set to ‘None’. We simply specify the IP address we created earlier by setting loadBalancerIP to $(test-ip) with the overrides for values.yaml. This sets the loadBalancerIP setting in Traefik’s service definition (in the templates folder). Service.yaml in the templates folder contains the following section:

  type: {{ .Values.serviceType }}
  {{- if .Values.loadBalancerIP }}
  loadBalancerIP: {{ .Values.loadBalancerIP }}
  {{- end }} 


Deploying AKS together with one or more public IP addresses is a common scenario. Hopefully, this post together with the GitHub repo gave you some ideas about automating these deployments with Azure DevOps. All you need to do is create a pipeline from the repo. Azure DevOps will read the azure-pipelines.yml file automatically.

Azure Front Door and multi-region deployments

In the previous post, we looked at publishing and securing an API with Azure Front Door and Azure Web Application Firewall. The API ran on Kubernetes, exposed by Kong and Kong Ingress Controller. Kong was configured to require an API key to call the /users API, allowing us to identify the consumer of the API. The traffic flow was as follows:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer) -- HTTP --> API Kubernetes service --> API pods 

Although Kubernetes makes the API(s) highly available, you might want to take extra precautions such as deploying the API in multiple regions. In this post, we will take a look at doing so. That means we will deploy the API in both West and North Europe, in two distinct Kubernetes clusters:

  • we-clu: Kubernetes cluster in West Europe
  • ne-clu: Kubernetes cluster in North Europe

The flow is very similar of course:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer; region to connect to depends on Front Door configuration and health probes) -- HTTP --> API Kubernetes service --> API pods  

Let’s take a look at the configuration! By the way, the supporting files to deploy the Kubernetes objects are here: To deploy Kong, check out this post.


We deploy a Kubernetes cluster in each region, install Helm, deploy Kong, deploy our API and configure ingresses and related Kong custom resource definitions (CRDs). The result is an external IP address in each region that leads to the Kong proxy. Search for “kong” on this blog to find posts with more details about this deployment.

Note that the API deployment specifies an environment variable that will contain the string WE or NE. This environment variable will be displayed in the output when we call the API. Here is the API deployment for West Europe:

apiVersion: v1
kind: Service
  name: func
  - port: 80
    protocol: TCP
    targetPort: 80
    app: func
  type: ClusterIP
apiVersion: apps/v1
kind: Deployment
  name: func
  replicas: 2
      app: func
        app: func
      - name: func
        image: gbaeke/ingfunc
        - name: REGION
          value: "WE"
        - containerPort: 80 

When we call the API and Azure Front Door uses the backend in West Europe, the result will be:

WE included in the response from the West Europe cluster

Origin APIs

The origin APIs need to be exposed on the public Internet using a DNS name. Azure Front Door requires a public backend to connect to. Naturally, the backend can be configured to only accept incoming requests from Front Door. In our case, the APIs are available on the public IP of the Kong proxy. The following names were used:

  • Kong proxy in West Europe
  •; Kong proxy in North Europe

Both endpoints are configured to accept TLS connections only, and use a Let’s Encrypt wildcard certificate for *

Front Door Configuration

The Front Door designer looks the same as in the previous post:

Front Door Designer

However, the backend pool api-o now has two backends:

Two backend hosts, both enabled with same priority and weight

To determine the health of the backend, Front Door needs to be configured with a health probe that returns status code 200. If we were to specify the probe below, the health probe would fail:

Errrrrrr, this won’t work

The health probe would hit Kong’s proxy and return a 404 (Not Found). We did not create a route for /, only for /users. With Azure Front Door, when all health probes fail, all backends are considered healthy. Yes, you read that right.

Although we could create a route called /health that returns a 200, we will use the following probe just to make it work:

Fixing the health probe (quick and dirty fix); just can the /users API

If you are exposing multiple APIs on each cluster, the health probe above would not make sense. Also note that the purpose of the health probe is to determine if the cluster is up or not. It will not fix one API behaving badly or being removed accidentally!

You can check the health probes from the portal:

Yep! Health probes in West and North Europe are at 100%

Connection test

When I connect from my home laptop in Belgium, I get the following response:

Connection to West Europe cluster

When I connect from my second home in Dublin 🤷‍♀️ I get:

Connection from a VM in North Europe (I was kidding about the second home)

If you enable logging to Log Analytics, you can check this in the FrontDoorAccessLog:

Connection from home via Brussels (West Europe would show DB in Tenant_x)

When I remove the /users API in West Europe (kubectl delete deploy func), my home laptop will connect to North Europe as expected:

I didn’t fake this! It’s 100% real, my laptop now connects to the North Europe cluster via Front Door as expected

Note that the calls will not fail from the moment you delete the /users API (the health probe here). That depends on the following setting (backends in Front Door designer):

When should the backend be determined healthy or unhealthy; decrease sample size and or samples to make it go faster

The backend health percentage graph indicates the probe failure as well:

We’ve lost West Europe folks!


When you are going for a multi-region deployment of services, Azure Front Door is one of the options. Of course, there is much more to a multi-region deployment than the “front-end stuff” described in this post. What do you do with databases for instance? Can you use active-active write regions (e.g. Cosmos DB) or does your database only support active/passive with read replicas?

As in other load balancing and fail over solutions, proper health probes are crucial in the design. Think about what a good health probe can be and what it means when it is not available. One option is to just write a health probe exposed via an endpoint such as /health that merely returns a 200 status code. But your health probe could also be designed to connect to backend systems such as databases or queues to determine the health of the system.

Hopefully, this post gives you some ideas to start! Follow me on Twitter for updates.

Publishing and securing your API with Kong and Azure Front Door

In the post, Securing your API with Kong and CloudFlare, I exposed a dummy API on Kubernetes with Kong and published it securely with CloudFlare. The breadth of features and its ease of use made CloudFlare a joy to work with. It didn’t take long before I got the question: “can’t you do that with Azure only?”. The answer is obvious: “Of course you can!”

In this post, the traffic flow is as follows:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer) -- HTTP --> API Kubernetes service --> API pods

Similarly to CloudFlare, Azure Front Door provides a fully trusted certificate for consumers of the API. In contrast to CloudFlare, Azure Front Door does not provide origin certificates which are trusted by Front Door. That’s easy to solve though by using a fully trusted Let’s Encrypt certificate which is stored as a Kubernetes secret and used in the Kubernetes Ingress definition. For this post, I requested a wildcard certificate for * via

Let’s take it step-by-step, starting at the API and Kong level.

APIs and Kong

Just like in the previous posts, we have a Kubernetes service called func and back-end pods that host the API implemented via Azure Functions in a container. Below you see the API pods in the default namespace. For convenience, Kong is also deployed in that namespace (not recommended in production):

A view on the API pods and Kong via k9s

The ingress definition is shown below:

apiVersion: extensions/v1beta1
kind: Ingress
  name: func
  namespace: default
  annotations: kong http-auth
  - hosts:
    - host:
        - path: /users
            serviceName: func
            servicePort: 80 

Kong will pick up the above definition and configure itself accordingly.

The API is exposed publicly via where the o stands for origin. The secret refers to a secret which contains the wildcard certificate for *

apiVersion: v1
kind: Secret
  namespace: default
  tls.crt: certificate
  tls.key: key

Naturally, certificate and key should be replaced with the base64-encoded strings of the certificate and key you have obtained (in this case from

At the DNS level, should refer to the external IP address of the exposed Kong Ingress Controller (proxy):

The service kong-kong-proxy is exposed via a public IP address (service of type LoadBalancer)

For the rest, the Kong configuration is not very different from the configuration in Securing your API with Kong and CloudFlare. I did remove the whitelisting configuration, which needs to be updated for Azure Front Door.

Great, we now have our API listening on but it is not exposed via Azure Front Door and it does not have a WAF policy. Let’s change that.

Web Application Firewall (WAF) Policy

You can create a WAF policy from the portal:

WAF Policy

The above policy is set to detection only. No custom rules have been defined, but a managed rule set is activated:

Managed rule set for OWASP

The WAF policy was saved as baekeapiwaf. It will be attached to an Azure Front Door frontend. When a policy is attached to a frontend, it will be shown in the policy:

Associated frontends (Front Door front-ends)

Azure Front Door

We will now add Azure Front Door to obtain the following flow:

Consumer ---> (Front Door + WAF) -->

The final configuration in Front Door Designer looks like this:

Front Door Designer

When a request comes in for, the response from is served. Caching was not enabled. The frontend and backend are tied together via the routing rule.

The first thing you need to do is to add the frontend which is in the above config. There’s not much to say about that. Just click the blue plus next to Frontend hosts and follow the prompts. I did not attach a WAF policy to that frontend because it will not forward requests to the backend. We will use a custom domain for that.

Next, click the blue plus again to add the custom domain (here In your DNS zone, create a CNAME record that maps to the name:

Mapping of custom domain to domain in CloudFlare DNS

I attached the WAF policy baekeapiwaf to the front-end domain:

WAF policy with OWASP rules to protect the API

Next, I added a certificate. When you select Front Door managed, you will get a Digicert managed image. If the CNAME mapping is not complete, you will get an e-mail from Digicert to approve certificate issuance. Make sure you check your e-mails if it takes long to issue the certificate. It will take a long time either way so be patient! 💤💤💤

Now that we have the frontend, specify the backend that Front Door needs to connect to:

Backend pool

The backend pool uses the API exposed at as defined earlier. With only one backend, priority and weight are of no importance. It should be clear that you can add multiple backends, potentially in different regions, and load balance between them.

You will also need a health probe to check for healthy and unhealthy backends:

Health probes of the backend

Note that the above health check does NOT return a 200 OK status code. That is the only status code that would result in a healthy endpoint. With the above config, Kong will respond with a “no Route matched” 404 Not Found error instead. That does not mean that Front Door will not route to this endpoint though! When all endpoints are in a failed state, Front Door considers them healthy anyway 😲😲😲 and routes traffic using round-robin. See the documentation for more info.

Now that we have the frontend and the backend, let’s tie the two together with a rule:

First part of routing rule

In the first part of the rule, we specify that we listen for requests to (and not the domain) and that we only accept https. The pattern /* basically forwards everything to the backend.

In the route details, we specify the backend to route to:

Backend to route to

Clearly, we want to route to the api-o backend we defined earlier. We only connect to the backend via HTTPS. It only accepts HTTPS anyway, as defined at the Kong level via a KongIngress resource.

Note that it is possible to create a HTTP to HTTPS redirect rule. See the post Azure Front Door Revisited for more information. Without the rule, you will get the following warning:

Please disregard this warning 😎

Test, test, test

Let’s call the API via the http tool:

Clearly, Azure Front Door has served this request as indicated by the X-Azure-Ref header. Let’s try http:

Azure Front Door throws the above error because the routing rule only accepts https on!

White listing Azure Front Door

To restrict calls to the backend to Azure Front Door, I used the following KongPlugin definition:

kind: KongPlugin
  name: whitelist-fd
  namespace: default
plugin: ip-restriction 

The IP range is documented here. Note that the IP range can and probably will change in the future.

In the ingress definition, I added the plugin via the annotations:

annotations: kong http-auth, whitelist-fd 

Calling the backend API directly will now fail:

That’s a no no! Please use the Front Door!


Publishing APIs (or any web app), whether they are running on Kubernetes or other systems, is easy to do with the combination of Azure Front Door and Web Application Firewall policies. Do take pricing into account though. It’s a mixture of relatively low fixed prices with variable pricing per GB and requests processed. In general, CloudFlare has the upper hand here, from both a pricing and features perspective. On the other hand, Front Door has advantages when it comes to automating its deployment together with other Azure resources. As always: plan, plan, plan and choose wisely! 🦉

Azure API Management with public APIs on Kubernetes

In my previous blog post, I looked at Azure API Management in combination with private APIs hosted on Kubernetes. The APIs were exposed via Traefik and an internal load balancer. To make that scenario work, the Azure API Management premium SKU is required, which is quite costly.

This post describes another approach where the APIs are exposed on the public Internet via an Ingress Controller that requires HTTPS in addition to restricting the API caller to the IP address of the Azure API Management instance. Something like this:

Internet client -> Azure API Management --> Ingress Controller (with IP whitelisting per ingress) --> API service (Kubernetes) --> API pods (Kubernetes, part of a Deployment)

Let’s see how this works, shall we?

API Management

Deploy Azure API management from the portal. In this case, you can use the other SKUs such as Basic and Standard. Note the IP address of the Azure API Management instance on the Overview page:

IP address of API Management

Ingress Controller

As usual, let’s use Traefik. When you have Helm installed, use the following command:

helm install stable/traefik --name traefik --set serviceType=LoadBalancer,rbac.enabled=true,ssl.enabled=true,ssl.enforced=true,acme.enabled=true,,onHostRule=true,acme.challengeType=tls-alpn-01,acme.staging=false,dashboard.enabled=true,externalTrafficPolicy=Local --namespace kube-system

Note the use of externalTrafficPolicy=Local. This lets Traefik know the IP address of the actual caller, which is required because we want to restrict access to the IP address of API Management.

Ingress object

When your API is deployed via a deployment and a service of type ClusterIP, use the following ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
  name: func
  annotations: traefik "YOURIP/32"
  - hosts:
    - host:
        - path: /
            serviceName: func
            servicePort: 80

The above ingress object, exposes the internal service func via Traefik. The whitelist-source-range annotation is used to limit access to this resource to the IP address of Azure API Management. Replace YOURIP with that IP address. Obviously, replace the host with a host that resolves to the external IP of the load balancer that provides access to Traefik. The Let’s Encrypt configuration automatically provisions a valid certificate to the service.

When I navigate to the API on my local computer, the following happens:

No access to the API if the request does not come from API management

When I test the API from API Management (after setting the back-end correctly):

API management can call the back-end API


What do you do when you do not want to spend money on the premium SKU? The answer is clear: use the lower SKUs if possible and restrict access to the back-end APIs with other means such as IP whitelisting. Other possibilities include using some form of authentication such as basic authentication etc…