Azure SQL Database High Availability

Creating a SQL Database in Azure is a simple matter: create the database and server, get your connection string and off you go! Before starting though, spend some time thinking about the level of high availability (HA) that you want:

  • What is the required level of HA within the deployment region (e.g. West Europe)?
  • Do you require failover to another region (e.g. from West Europe to North Europe)?

HA in a single region

To achieve the highest level of availability in a region, do the following:

  • Use the Premium (DTU) or Business Critical tier (vCore): Azure will use the premium availability model for your database
  • Enable Availability Zone support if the region supports it: copies of your database will be spread over the zones

The diagram below illustrates the premium availability model (from the Microsoft docs):

Premium Availability Model

The region will contain one primary read/write replica and several secondary replicas. Each replica uses local SSD storage. The database is replicated synchronously and failover is swift and without data loss. If required, you can enable a read replica and specify you want to connect to the read replica by adding ApplicationIntent=ReadOnly to the connection string. Great for reporting and BI!

Spreading the databases over multiple zones is as simple as checking a box. Availability zone support comes at no extra cost but can increase the write commit latency because the nodes are a few kilometers apart. The option to enable zone support is in the Configure section as shown below:

Enabling zone redundancy

To read more about high availability, including the standard availability model for other tiers, check out the docs.

For critical applications, we typically select the Premium/Business Critical model as it provides good performance coupled to the highest possible availability in a region.

Geo-replication

The geo-replication feature replicates a database asynchronously to another region. Suppose you have a database and server in West Europe that you want to replicate to France Central. In the portal, navigate to the database (not the server) and select Geo-Replication. Then check the region, in this case France Central. The following questions show up:

Geo-Replication

A database needs a logical server object that contains the database. To replicate the database to France Central, you need such a server object in that region. The UI above allows you to create that server.

Note that the databases need to use the same tier although the secondary can be configured with less DTUs or vCores. Doing so is generally not recommended.

After configuration, the UI will show the active replication. In this case, I am showing replication from North Europe to West Europe (and not France Central):

Geo-replication is easy to configure but in practice, we recommend to use Failover Groups. A Failover Group uses geo-replication under the hood but gives you some additional features such as:

  • Automated failover (vs manual with just geo-replication)
  • Two servers to use in your connection string that use CNAMEs that are updated in case of failover; one CNAME always points to the read/write replica, the other to the read replica

Failover groups are created at the server level instead of the database level:

Failover group

Below, there is a failover group aks-fo with the primary server in North Europe and the secondary in West Europe:

Failover group details

You can manually fail over the database if needed:

Failover and forced failover

Failover allows you to failover the database without data loss if both databases are still active. Forced failover performs a failover even if the primary is down, which might lead to data loss.

Note: when you configure a Failover Group, geo-replication will automatically be configured for you.

Connecting from an application

To connect to a database configured in a failover group, first get the failover group server names:

Read/write and read-only listener endpoints

Next, in your application, use the appropriate connection string. For instance, in Go:

var sqldb *sql.DB
var server = "aks-fo.database.windows.net"
var port = 1433
var user = "USERNAME"
var password = "PASSWORD"
var database = "DBNAME"

func init() {
    // Build connection string
    connString := fmt.Sprintf("server=%s;user id=%s;password=%s;port=%d;database=%s;",
        server, user, password, port, database)

    var err error

    // Create connection pool
    sqldb, err = sql.Open("sqlserver", connString)
    if err != nil {
        log.Fatal("Error creating connection pool: ", err.Error())
    }
    ctx := context.Background()

    //above commands actually do not connect to SQL but the ping below does
    err = sqldb.PingContext(ctx)
    if err != nil {
        log.Fatal(err.Error())
    }
    log.Printf("Connected!\n")
}

During a failover, there will be an amount of time that the database is not available. When that happens, the connection will fail. The error is shown below:

[db customers]: invalid response code 500, body: {"name":"fault","id":"7JPhcikZ","message":"Login error: mssql: Database 'aksdb' on server 'akssrv-ne' is not currently available.  Please retry the connection later.  If the problem persists, contact customer support, and provide them the session tracing ID of '6D7D70C3-D550-4A74-A69C-D689E6F5CDA6'.","temporary":false,"timeout":false,"fault":true}

Note: the Failover Group uses a CNAME record aks-fo.database.windows.net which resolves to the backend servers in either West or North Europe. Make sure you allow connections to these servers in the firewall or you will get the following error:

db customers]: invalid response code 500, body: {"name":"fault","id":"-p9TwZkm","message":"Login error: mssql: Cannot open server 'akssrv-ne' requested by the login. Client with IP address 'IP ADDRESS' is not allowed to access the server.  To enable access, use the Windows Azure Management Portal or run sp_set_firewall_rule on the master database to create a firewall rule for this IP address or address range.  It may take up to five minutes for this change to take effect.","temporary":false,"timeout":false,"fault":true} 

Conclusion

For the highest level of availability, use the regional premium availability model with Availability Zone support. In addition, use a Failover Group to enable replication of the database to another region. A Failover Group automatically connects your application to the primary (read/write) or secondary replica (read) via a CNAME and can failover automatically after some grace period.

Deploy AKS and Traefik with an Azure DevOps YAML pipeline

This post is a companion to the following GitHub repository: https://github.com/gbaeke/aks-traefik-azure-deploy. The repository contains ARM templates to deploy an AD integrated Kubernetes cluster and an IP address plus a Helm chart to deploy Traefik. Traefik is configured to use the deployed IP address. In addition to those files, the repository also contains the YAML pipeline, ready to be imported in Azure DevOps.

Let’s take a look at the different building blocks!

AKS ARM Template

The aks folder contains the template and a parameters file. You will need to modify the parameters file because it requires settings to integrate the AKS cluster with Azure AD. You will need to specify:

  • clientAppID: the ID of the client app registration
  • serverAppID: the ID of the server app registration
  • tenantID: the ID of your AD tenant

Also specify clientId, which is the ID of the service principal for your cluster. Both the serverAppID and the clientID require a password. The passwords have been set via a pipeline secret variable.

The template configures a fairly standard AKS cluster that uses Azure networking (versus kubenet). It also configures Log Analytics for the cluster (container insights).

Deploying the template from the YAML file is done with the task below. You will need to replace YOUR SUBSCRIPTION with an authorized service connection:

 # DEPLOY AKS IN TEST   
 - task: AzureResourceGroupDeployment@2
   inputs:
     azureSubscription: 'YOUR SUBSCRIPTION'
     action: 'Create Or Update Resource Group'
     resourceGroupName: '$(aksTestRG)'
     location: 'West Europe'
     templateLocation: 'Linked artifact'
     csmFile: 'aks/deploy.json'
     csmParametersFile: 'aks/deployparams.t.json'
     overrideParameters: '-serverAppSecret $(serverAppSecret) -clientIdsecret $(clientIdsecret) -clusterName $(aksTest)'
       deploymentMode: 'Incremental'
       deploymentName: 'CluTest' 

The task uses several variables like $(aksTestRG) etc… If you check azure-pipelines.yaml, you will notice that most are configured at the top of the file in the variables section:

variables:
  aksTest: 'clu-test'
  aksTestRG: 'rg-clu-test'
  aksTestIP: 'clu-test-ip' 

The two secrets are the secret 🔐 vaiables. Naturally, they are configured in the Azure DevOps UI. Note that there are other means to store and obtain secrets, such as Key Vault. In Azure DevOps, the secret variables can be found here:

Azure DevOps secret variables

IP Address Template

The ip folder contains the ARM template to deploy the IP address. We need to deploy the IP address resource to the resource group that holds the AKS agents. With the names we have chosen, that name is MC_rg-clu-test_clu-test_westeurope. It is possible to specify a custom name for the resource group.

Because we want to obtain the IP address after deployment, the ARM template contains an output:

 "outputs": {
        "ipaddress": {
            "type": "string",
            "value": "[reference(concat('Microsoft.Network/publicIPAddresses/', parameters('ipName')), '2017-10-01').ipAddress]"
        }
     } 

The output ipaddress is of type string. Via the reference template function we can extract the IP address.

The ARM template is deployed like the AKS template but we need to capture the ARM outputs. The last line of the AzureResourceGroupDeployment@2 that deploys the IP address contains:

deploymentOutputs: 'armoutputs'

Now we need to extract the IP address and set it as a variable in the pipeline. One way of doing this is via a bash script:

 - task: Bash@3
      inputs:
        targetType: 'inline'
        script: |
          echo "##vso[task.setvariable variable=test-ip;]$(echo '$(armoutputs)' | jq .ipaddress.value -r)" 

You can set a variable in Azure DevOps with echo ##vso[task.setvariable variable=variable_name;]value. In our case, the “value” should be the raw string of the IP address output. The $(armoutputs) variable contains the output of the IP address ARM template as follows:

{"ipaddress":{"type":"String","value":"IP ADDRESS"}}

To extract IP ADDRESS, we pipe the output of “echo $(armoutputs)” to js .ipaddress.value -r which extracts the IP ADDRESS from the JSON. The -r parameter removes double quotes from the IP ADDRESS to give us the raw string. For more info about jq, check https://stedolan.github.io/jq/ .

We now have the IP address in the test-ip variable, to be used in other tasks via $(test-ip).

Taking care of the prerequisites

In a later phase, we install Traefik via Helm. So we need kubectl and helm on the build agent. In addition, we need to install tiller on the cluster. Because the cluster is RBAC-enabled, we need a cluster account and a role binding as well. The following tasks take care of all that:

- task: KubectlInstaller@0
   inputs:
     kubectlVersion: '1.13.5'


- task: HelmInstaller@1
   inputs:
     helmVersionToInstall: '2.14.1'

- task: AzureCLI@1
  inputs:
    azureSubscription: 'YOUR SUB'
    scriptLocation: 'inlineScript'
    inlineScript: 'az aks get-credentials -g $(aksTestRG) -n $(aksTest) --admin'

 - task: Bash@3
   inputs:
     filePath: 'tiller/tillerconfig.sh'
     workingDirectory: 'tiller/' 

Note that we use the AzureCLI built-in task to easily obtain the cluster credentials for kubectl on the build agent. We use the –admin flag to gain full access. Note that this downloads sensitive information to the build agent temporarily.

The last task just runs a shell script to configure the service account and role binding and install tiller. Check the repository to see the contents of this simple script. Note that this is the quick and easy way to install tiller, not the most secure way! 🙇‍♂️

Install Traefik and use the IP address

The repository contains the downloaded chart (helm fetch stable/traefik –untar). The values.yaml file was modified to set the ingressClass to traefik-ext. We could have used the chart from the Helm repository but I prefer having the chart in source control. Here’s the pipeline task:

 - task: HelmDeploy@0
   inputs:
     connectionType: 'None'
     namespace: 'kube-system'
     command: 'upgrade'
     chartType: 'FilePath'
     chartPath: 'traefik-ext/.'
     releaseName: 'traefik-ext'
     overrideValues: 'loadBalancerIP=$(test-ip)'
     valueFile: 'traefik-ext/values.yaml' 

kubectl is configured to use the cluster so connectionType can be set to ‘None’. We simply specify the IP address we created earlier by setting loadBalancerIP to $(test-ip) with the overrides for values.yaml. This sets the loadBalancerIP setting in Traefik’s service definition (in the templates folder). Service.yaml in the templates folder contains the following section:

 spec:
  type: {{ .Values.serviceType }}
  {{- if .Values.loadBalancerIP }}
  loadBalancerIP: {{ .Values.loadBalancerIP }}
  {{- end }} 

Conclusion

Deploying AKS together with one or more public IP addresses is a common scenario. Hopefully, this post together with the GitHub repo gave you some ideas about automating these deployments with Azure DevOps. All you need to do is create a pipeline from the repo. Azure DevOps will read the azure-pipelines.yml file automatically.

Azure Front Door and multi-region deployments

In the previous post, we looked at publishing and securing an API with Azure Front Door and Azure Web Application Firewall. The API ran on Kubernetes, exposed by Kong and Kong Ingress Controller. Kong was configured to require an API key to call the /users API, allowing us to identify the consumer of the API. The traffic flow was as follows:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer) -- HTTP --> API Kubernetes service --> API pods 

Although Kubernetes makes the API(s) highly available, you might want to take extra precautions such as deploying the API in multiple regions. In this post, we will take a look at doing so. That means we will deploy the API in both West and North Europe, in two distinct Kubernetes clusters:

  • we-clu: Kubernetes cluster in West Europe
  • ne-clu: Kubernetes cluster in North Europe

The flow is very similar of course:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer; region to connect to depends on Front Door configuration and health probes) -- HTTP --> API Kubernetes service --> API pods  

Let’s take a look at the configuration! By the way, the supporting files to deploy the Kubernetes objects are here: https://github.com/gbaeke/api-kong/tree/master. To deploy Kong, check out this post.

Kubernetes

We deploy a Kubernetes cluster in each region, install Helm, deploy Kong, deploy our API and configure ingresses and related Kong custom resource definitions (CRDs). The result is an external IP address in each region that leads to the Kong proxy. Search for “kong” on this blog to find posts with more details about this deployment.

Note that the API deployment specifies an environment variable that will contain the string WE or NE. This environment variable will be displayed in the output when we call the API. Here is the API deployment for West Europe:

apiVersion: v1
kind: Service
metadata:
  name: func
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: func
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: func
spec:
  replicas: 2
  selector:
    matchLabels:
      app: func
  template:
    metadata:
      labels:
        app: func
    spec:
      containers:
      - name: func
        image: gbaeke/ingfunc
        env:
        - name: REGION
          value: "WE"
        ports:
        - containerPort: 80 

When we call the API and Azure Front Door uses the backend in West Europe, the result will be:

WE included in the response from the West Europe cluster

Origin APIs

The origin APIs need to be exposed on the public Internet using a DNS name. Azure Front Door requires a public backend to connect to. Naturally, the backend can be configured to only accept incoming requests from Front Door. In our case, the APIs are available on the public IP of the Kong proxy. The following names were used:

  • api-o-we.baeke.info: Kong proxy in West Europe
  • api-o-ne.baeke.info; Kong proxy in North Europe

Both endpoints are configured to accept TLS connections only, and use a Let’s Encrypt wildcard certificate for *.baeke.info.

Front Door Configuration

The Front Door designer looks the same as in the previous post:

Front Door Designer

However, the backend pool api-o now has two backends:

Two backend hosts, both enabled with same priority and weight

To determine the health of the backend, Front Door needs to be configured with a health probe that returns status code 200. If we were to specify the probe below, the health probe would fail:

Errrrrrr, this won’t work

The health probe would hit Kong’s proxy and return a 404 (Not Found). We did not create a route for /, only for /users. With Azure Front Door, when all health probes fail, all backends are considered healthy. Yes, you read that right.

Although we could create a route called /health that returns a 200, we will use the following probe just to make it work:

Fixing the health probe (quick and dirty fix); just can the /users API

If you are exposing multiple APIs on each cluster, the health probe above would not make sense. Also note that the purpose of the health probe is to determine if the cluster is up or not. It will not fix one API behaving badly or being removed accidentally!

You can check the health probes from the portal:

Yep! Health probes in West and North Europe are at 100%

Connection test

When I connect from my home laptop in Belgium, I get the following response:

Connection to West Europe cluster

When I connect from my second home in Dublin 🤷‍♀️ I get:

Connection from a VM in North Europe (I was kidding about the second home)

If you enable logging to Log Analytics, you can check this in the FrontDoorAccessLog:

Connection from home via Brussels (West Europe would show DB in Tenant_x)

When I remove the /users API in West Europe (kubectl delete deploy func), my home laptop will connect to North Europe as expected:

I didn’t fake this! It’s 100% real, my laptop now connects to the North Europe cluster via Front Door as expected

Note that the calls will not fail from the moment you delete the /users API (the health probe here). That depends on the following setting (backends in Front Door designer):

When should the backend be determined healthy or unhealthy; decrease sample size and or samples to make it go faster

The backend health percentage graph indicates the probe failure as well:

We’ve lost West Europe folks!

Conclusion

When you are going for a multi-region deployment of services, Azure Front Door is one of the options. Of course, there is much more to a multi-region deployment than the “front-end stuff” described in this post. What do you do with databases for instance? Can you use active-active write regions (e.g. Cosmos DB) or does your database only support active/passive with read replicas?

As in other load balancing and fail over solutions, proper health probes are crucial in the design. Think about what a good health probe can be and what it means when it is not available. One option is to just write a health probe exposed via an endpoint such as /health that merely returns a 200 status code. But your health probe could also be designed to connect to backend systems such as databases or queues to determine the health of the system.

Hopefully, this post gives you some ideas to start! Follow me on Twitter for updates.

Publishing and securing your API with Kong and Azure Front Door

In the post, Securing your API with Kong and CloudFlare, I exposed a dummy API on Kubernetes with Kong and published it securely with CloudFlare. The breadth of features and its ease of use made CloudFlare a joy to work with. It didn’t take long before I got the question: “can’t you do that with Azure only?”. The answer is obvious: “Of course you can!”

In this post, the traffic flow is as follows:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer) -- HTTP --> API Kubernetes service --> API pods

Similarly to CloudFlare, Azure Front Door provides a fully trusted certificate for consumers of the API. In contrast to CloudFlare, Azure Front Door does not provide origin certificates which are trusted by Front Door. That’s easy to solve though by using a fully trusted Let’s Encrypt certificate which is stored as a Kubernetes secret and used in the Kubernetes Ingress definition. For this post, I requested a wildcard certificate for *.baeke.info via https://www.sslforfree.com/

Let’s take it step-by-step, starting at the API and Kong level.

APIs and Kong

Just like in the previous posts, we have a Kubernetes service called func and back-end pods that host the API implemented via Azure Functions in a container. Below you see the API pods in the default namespace. For convenience, Kong is also deployed in that namespace (not recommended in production):

A view on the API pods and Kong via k9s

The ingress definition is shown below:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: func
  namespace: default
  annotations:
    kubernetes.io/ingress.class: kong
    plugins.konghq.com: http-auth
spec:
  tls:
  - hosts:
    - api-o.baeke.info
    secretName: wildcard-baeke.info.tls
  rules:
    - host: api-o.baeke.info
      http:
        paths:
        - path: /users
          backend:
            serviceName: func
            servicePort: 80 

Kong will pick up the above definition and configure itself accordingly.

The API is exposed publicly via https://api-o.baeke.info where the o stands for origin. The secret wildcard-baeke.info.tls refers to a secret which contains the wildcard certificate for *.baeke.info:

apiVersion: v1
kind: Secret
metadata:
  name: wildcard-baeke.info.tls
  namespace: default
type: kubernetes.io/tls
data:
  tls.crt: certificate
  tls.key: key

Naturally, certificate and key should be replaced with the base64-encoded strings of the certificate and key you have obtained (in this case from https://www.sslforfree.com).

At the DNS level, api-o.baeke.info should refer to the external IP address of the exposed Kong Ingress Controller (proxy):

The service kong-kong-proxy is exposed via a public IP address (service of type LoadBalancer)

For the rest, the Kong configuration is not very different from the configuration in Securing your API with Kong and CloudFlare. I did remove the whitelisting configuration, which needs to be updated for Azure Front Door.

Great, we now have our API listening on https://api-o.baeke.info but it is not exposed via Azure Front Door and it does not have a WAF policy. Let’s change that.

Web Application Firewall (WAF) Policy

You can create a WAF policy from the portal:

WAF Policy

The above policy is set to detection only. No custom rules have been defined, but a managed rule set is activated:

Managed rule set for OWASP

The WAF policy was saved as baekeapiwaf. It will be attached to an Azure Front Door frontend. When a policy is attached to a frontend, it will be shown in the policy:

Associated frontends (Front Door front-ends)

Azure Front Door

We will now add Azure Front Door to obtain the following flow:

Consumer ---> https://api.baeke.info (Front Door + WAF) --> https://api-o.baeke.info

The final configuration in Front Door Designer looks like this:

Front Door Designer

When a request comes in for api.baeke.info, the response from api-o.baeke.info is served. Caching was not enabled. The frontend and backend are tied together via the routing rule.

The first thing you need to do is to add the azurefd.net frontend which is baeke-api.azurefd.net in the above config. There’s not much to say about that. Just click the blue plus next to Frontend hosts and follow the prompts. I did not attach a WAF policy to that frontend because it will not forward requests to the backend. We will use a custom domain for that.

Next, click the blue plus again to add the custom domain (here api.baeke.info). In your DNS zone, create a CNAME record that maps api.yourdomain.com to the azurefd.net name:

Mapping of custom domain to azurefd.net domain in CloudFlare DNS

I attached the WAF policy baekeapiwaf to the front-end domain:

WAF policy with OWASP rules to protect the API

Next, I added a certificate. When you select Front Door managed, you will get a Digicert managed image. If the CNAME mapping is not complete, you will get an e-mail from Digicert to approve certificate issuance. Make sure you check your e-mails if it takes long to issue the certificate. It will take a long time either way so be patient! 💤💤💤

Now that we have the frontend, specify the backend that Front Door needs to connect to:

Backend pool

The backend pool uses the API exposed at api-o.baeke.info as defined earlier. With only one backend, priority and weight are of no importance. It should be clear that you can add multiple backends, potentially in different regions, and load balance between them.

You will also need a health probe to check for healthy and unhealthy backends:

Health probes of the backend

Note that the above health check does NOT return a 200 OK status code. That is the only status code that would result in a healthy endpoint. With the above config, Kong will respond with a “no Route matched” 404 Not Found error instead. That does not mean that Front Door will not route to this endpoint though! When all endpoints are in a failed state, Front Door considers them healthy anyway 😲😲😲 and routes traffic using round-robin. See the documentation for more info.

Now that we have the frontend and the backend, let’s tie the two together with a rule:

First part of routing rule

In the first part of the rule, we specify that we listen for requests to api.baeke.info (and not the azurefd.net domain) and that we only accept https. The pattern /* basically forwards everything to the backend.

In the route details, we specify the backend to route to:

Backend to route to

Clearly, we want to route to the api-o backend we defined earlier. We only connect to the backend via HTTPS. It only accepts HTTPS anyway, as defined at the Kong level via a KongIngress resource.

Note that it is possible to create a HTTP to HTTPS redirect rule. See the post Azure Front Door Revisited for more information. Without the rule, you will get the following warning:

Please disregard this warning 😎

Test, test, test

Let’s call the API via the http tool:

Clearly, Azure Front Door has served this request as indicated by the X-Azure-Ref header. Let’s try http:

Azure Front Door throws the above error because the routing rule only accepts https on api.baeke.info!

White listing Azure Front Door

To restrict calls to the backend to Azure Front Door, I used the following KongPlugin definition:

apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: whitelist-fd
  namespace: default
config:
  whitelist: 
  - 147.243.0.0/16
plugin: ip-restriction 

The IP range is documented here. Note that the IP range can and probably will change in the future.

In the ingress definition, I added the plugin via the annotations:

annotations:
  kubernetes.io/ingress.class: kong
  plugins.konghq.com: http-auth, whitelist-fd 

Calling the backend API directly will now fail:

That’s a no no! Please use the Front Door!

Conclusion

Publishing APIs (or any web app), whether they are running on Kubernetes or other systems, is easy to do with the combination of Azure Front Door and Web Application Firewall policies. Do take pricing into account though. It’s a mixture of relatively low fixed prices with variable pricing per GB and requests processed. In general, CloudFlare has the upper hand here, from both a pricing and features perspective. On the other hand, Front Door has advantages when it comes to automating its deployment together with other Azure resources. As always: plan, plan, plan and choose wisely! 🦉

Azure API Management with public APIs on Kubernetes

In my previous blog post, I looked at Azure API Management in combination with private APIs hosted on Kubernetes. The APIs were exposed via Traefik and an internal load balancer. To make that scenario work, the Azure API Management premium SKU is required, which is quite costly.

This post describes another approach where the APIs are exposed on the public Internet via an Ingress Controller that requires HTTPS in addition to restricting the API caller to the IP address of the Azure API Management instance. Something like this:

Internet client -> Azure API Management --> Ingress Controller (with IP whitelisting per ingress) --> API service (Kubernetes) --> API pods (Kubernetes, part of a Deployment)

Let’s see how this works, shall we?

API Management

Deploy Azure API management from the portal. In this case, you can use the other SKUs such as Basic and Standard. Note the IP address of the Azure API Management instance on the Overview page:

IP address of API Management

Ingress Controller

As usual, let’s use Traefik. When you have Helm installed, use the following command:

helm install stable/traefik --name traefik --set serviceType=LoadBalancer,rbac.enabled=true,ssl.enabled=true,ssl.enforced=true,acme.enabled=true,acme.email=name@domain.com,onHostRule=true,acme.challengeType=tls-alpn-01,acme.staging=false,dashboard.enabled=true,externalTrafficPolicy=Local --namespace kube-system

Note the use of externalTrafficPolicy=Local. This lets Traefik know the IP address of the actual caller, which is required because we want to restrict access to the IP address of API Management.

Ingress object

When your API is deployed via a deployment and a service of type ClusterIP, use the following ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: func
  annotations:
    kubernetes.io/ingress.class: traefik
    traefik.ingress.kubernetes.io/whitelist-source-range: "YOURIP/32"
spec:
  tls:
  - hosts:
    - api.domain.com
  rules:
    - host: api.domain.com
      http:
        paths:
        - path: /
          backend:
            serviceName: func
            servicePort: 80

The above ingress object, exposes the internal service func via Traefik. The whitelist-source-range annotation is used to limit access to this resource to the IP address of Azure API Management. Replace YOURIP with that IP address. Obviously, replace the host api.domain.com with a host that resolves to the external IP of the load balancer that provides access to Traefik. The Let’s Encrypt configuration automatically provisions a valid certificate to the service.

When I navigate to the API on my local computer, the following happens:

No access to the API if the request does not come from API management

When I test the API from API Management (after setting the back-end correctly):

API management can call the back-end API

Conclusion

What do you do when you do not want to spend money on the premium SKU? The answer is clear: use the lower SKUs if possible and restrict access to the back-end APIs with other means such as IP whitelisting. Other possibilities include using some form of authentication such as basic authentication etc…

Azure API Management and Azure Kubernetes Service

You have decided to host your APIs in Kubernetes in combination with an API management solution? You are surely not the only one! In an Azure context, one way of doing this is combining Azure API Management and Azure Kubernetes Service (AKS). This post describes one of the ways to get this done. We will use the following services:

  • Virtual Network: AKS will use advanced networking and Azure CNI
  • Private DNS: to host a private DNS zone (private.baeke.info) ; note that private DNS is in public preview
  • AKS: deployed in a subnet of the virtual network
  • Traefik: Ingress Controller deployed on AKS, configured to use an internal load balancer in a dedicated subnet of the virtual network
  • Azure API Management: with virtual network integration which requires Developer or Premium; note that Premium comes at a hefty price though

Let’s take it step by step but note that this post does not contain all the detailed steps. I might do a video later with more details. Check the YouTube channel for more information.

We will setup something like this:

Consumer --> Azure API Management public IP --> ILB (in private VNET) --> Traefik (in Kubernetes) --> API (in Kubernetes - ClusterIP service in front of a deployment) 

Virtual Network

Create a virtual network in a resource group. We will add a private DNS zone to this network. You should not add resources such as virtual machines to this virtual network before you add the private DNS zone.

I will call my network privdns and add a few subnets (besides default):

  • aks: used by AKS
  • traefik: for the internal load balancer (ILB) and the front-end IP addresses
  • apim: to give API management access to the virtual network

Private DNS

Add a private DNS zone to the virtual network with Azure CLI:

az network dns zone create -g rg-ingress -n private.baeke.info --zone-type Private --resolution-vnets privdns 

You can now add records to this private DNS zone:

az network dns record-set a add-record \
   -g rg-ingress \
   -z private.baeke.info \
   -n test \
   -a 1.1.1.1

To test name resolution, deploy a small Linux virtual machine and ping test.private.baeke.info:

Testing the private DNS zone

Update for June 27th, 2019: the above commands use the old API; please see https://docs.microsoft.com/en-us/azure/dns/private-dns-getstarted-cli for the new syntax to create a zone and to link it to an existing VNET; these zones should be viewable in the portal via Private DNS Zones:

Private DNS zones in the portal

Azure Kubernetes Service

Deploy AKS and use advanced networking. Use the aks subnet when asked. Each node you deploy will get 30 IP address in the subnet:

First IP addresses of one of the nodes

Traefik

To expose the APIs over an internal IP we will use ingress objects, which require an Ingress Controller. Traefik is just one of the choices available. Any Ingress Controller will work.

Instead of using ingresses, you could also expose your APIs via services of type LoadBalancer and use an internal load balancer. The latter approach would require one IP per API where the ingress approach only requires one IP in total. That IP resolves to Traefik which uses the host header to route to the APIs.

We will install Traefik with Helm. Check my previous post for more info about Traefik and Helm. In this case, I will download and untar the Helm chart and modify values.yaml. To download and untar the Helm chart use the following command:

helm fetch stable/traefik --untar

You will now have a traefik folder, which contains values.yaml. Modify values.yaml as follows:

Changes to values.yaml

This will instruct Helm to add the above annotations to the Traefik service object. It instructs the Azure cloud integration components to use an internal load balancer. In addition, the load balancer should be created in the traefik subnet. Make sure that your AKS service principal has the RBAC role on the virtual network to perform this operation.

Now you can install Traefik on AKS. Make sure you are in the traefik folder where the Helm chart was untarred:

helm install . --name traefik --set serviceType=LoadBalancer,rbac.enabled=true,dashboard.enabled=true --namespace kube-system

When the installation is finished, there should be an internal load balancer in the resource group that is behind your AKS cluster:

ILB deployed

The result of kubectl get svc -n kube-system should result in something like:

EXTERNAL-IP is the front-end IP on the ILB for the traefik service

We can now reach Treafik on the virtual network and create an A record that resolves to this IP. The func.private.baeke.info I will use later, resolves to the above IP.

Azure API Management

Deploy API Management from the portal. API Management will need access to the virtual network which means we need a version (SKU) that has virtual network support. This is needed simply because the APIs are not exposed on the public Internet.

For testing, use the Developer SKU. In production, you should use the Premium SKU although it is very expensive. Microsoft should really make the virtual network integration part of every SKU since it is such a common scenario! Come on Microsoft, you know it’s the right thing to do! 😉

API Management virtual network integration

Above, API Management is configured to use the apim subnet of the virtual network. It will also be able to resolve private DNS names via this integration. Note that configuring the network integration takes quite some time.

Deploy a service and ingress

I deployed the following sample API with a simple deployment and service. Save this as func.yaml and run kubectl apply -f func.yaml. You will end up with two pods running a super simple and stupid API plus a service object of type ClusterIP, which is only reachable inside Kubernetes:

apiVersion: v1
kind: Service
metadata:
  name: func
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: func
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: func
spec:
  replicas: 2
  selector:
    matchLabels:
      app: func
  template:
    metadata:
      labels:
        app: func
    spec:
      containers:
      - name: func
        image: gbaeke/ingfunc
        ports:
        - containerPort: 80

Next, deploy an ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: func
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
    - host: func.private.baeke.info
      http:
        paths:
        - path: /
          backend:
            serviceName: func
            servicePort: 80

Notice I used func.private.baeke.info! Naturally, that name should resolve to the IP address on the ILB that routes to Traefik.

Testing the API from API Management

In API Management, I created an API that uses func.private.baeke.info as the backend. Yes, I know, the API name is bad. It’s just a sample ok? 😎

API with backend func.private.baeke.info

Let’s test the GET operation I created:

Great success! API management can reach the Kubernetes-hosted API via Traefik

Conclusion

In this post, we looked at one way to expose Kubernetes-hosted APIs to the outside world via Azure API Management. The traffic flow is as follows:

Consumer --> Azure API Management public IP --> ILB (in private VNET) --> Traefik (in Kubernetes) --> API (in Kubernetes - ClusterIP service in front of a deployment)

Because we have to use host names in ingress definitions, we added a private DNS zone to the virtual network. We can create multiple A records, one for each API, and provide access to these APIs with ingress objects.

As stated above, you can also expose each API via an internal load balancer. In that case, you do not need an Ingress Controller such as Traefik. Alternatively, you could also replace Azure API Management with a solution such as Kong. I have used Kong in the past and it is quite good! The choice for one or the other will depend on several factors such as cost, features, ease of use, support, etc…

Azure DevOps multi-stage YAML pipelines

A while ago, the Azure DevOps blog posted an update about multi-stage YAML pipelines. The concept is straightforward: define both your build (CI) and release (CD) pipelines in a YAML file and stick that file in your source code repository.

In this post, we will look at a simple build and release pipeline that builds a container, pushes it to ACR, deploys it to Kubernetes linked to an environment. Something like this:

Two stages in the pipeline – build and deploy (as simple as it can get, almost)

Note: I used a simple go app, a Dockerfile and a Kubernetes manifest as source files, check them out here.

Note: there is also a video version 😉

Note: if you start from a repository without manifests and azure-pipelines.yaml, the pipeline build wizard will propose Deploy to Azure Kubernetes Service. The wizard that follows will ask you some questions but in the end you will end up with a configured environment, the necessary service connections to AKS and ACR and even a service.yaml and deployment.yaml with the bare minimum to deploy your container!

“Show me the YAML!!!”

The file, azure-pipelines.yaml contains the two stages. Check out the first stage (plus trigger and variables) below:

trigger:
- master

variables:
  imageName: 'gosample'
  registry: 'REGNAME.azurecr.io'

stages:
- stage: build
  jobs:
  - job: 'BuildAndPush'
    pool:
      vmImage: 'ubuntu-latest'
    steps:
    - task: Docker@2
      inputs:
        containerRegistry: 'ACR'
        repository: '$(imageName)'
        command: 'buildAndPush'
        Dockerfile: '**/Dockerfile'
    - task: PublishPipelineArtifact@0
      inputs:
        artifactName: 'manifests'
        targetPath: 'manifests' 

The pipeline runs on a commit to the master branch. The variables imageName and registry are referenced later using $(imageName) and $(registry). Replace REGNAME with the name of your Azure Container Registry.

It’s a multi-stage pipeline, so we start with stages: and then define the first stage build. That stage has one job which consists of two steps:

  • Docker task (v2): build a Docker image based on the Dockerfile in the source code repository and push it to the container registry called ACR; ACR is a reference to a service connection defined in the project settings
  • PublishPipelineArtifact: the source code repository contains Kubernetes deployment manifests in YAML format in the manifests folder; the contents of that folder is published as a pipeline artifact, to be picked up in a later stage

Now let’s look at the deployment stage:

- stage: deploy
  jobs:
  - deployment: 'DeployToK8S'
    pool:
      vmImage: 'ubuntu-latest'
    environment: dev
    strategy:
      runOnce:
        deploy:
          steps:
            - task: DownloadPipelineArtifact@1
              inputs:
                buildType: 'current'
                artifactName: 'manifests'
                targetPath: '$(System.ArtifactsDirectory)/manifests'
            - task: KubernetesManifest@0
              inputs:
                action: 'deploy'
                kubernetesServiceConnection: 'dev-kub-gosample-1558821689026'
                namespace: 'gosample'
                manifests: '$(System.ArtifactsDirectory)/manifests/deploy.yaml'
                containers: '$(registry)/$(imageName):$(Build.BuildId)' 

The second stage uses a deployment job (quite new; see this). In a deployment job, you can specify an environment to link to. In the above job, the environment is called dev. In Azure DevOps, the environment is shown as below:

dev environment

The environment functionality has Kubernetes integration which is pretty neat. You can drill down to the deployed objects such as deployments and services:

Kubernetes deployment in an Azure DevOps environment

The deployment has two tasks:

  • DownloadPipelineArtifact: download the artifact published in the first stage to $(System.ArtifactsDirectory)/manifests
  • KubernetesManifest: this task can deploy Kubernetes manifests; it uses an AKS service connection that was created during creation of the environment; a service account was created in a specific namespace and with access rights to that namespace only; the manifests property will look for an image name in the Kubernetes YAML files and append the tag which is the build id here

Note that the release stage will actually download the pipeline artifact automatically. The explicit DownloadPipelineArtifact task gives additional control over the download location.

The KubernetesManifest task is relatively new at the time of this writing (end of May 2019). Its image substitution functionality could be enough in many cases, without having to revert to Helm or manual text substitution tasks. There is more to this task than what I have described here. Check out the docs for more info.

Conclusion

If you are just starting out building CI/CD pipelines in YAML, you will probably have a hard time getting uses to the schema. I know I had! 😡 In the end though, doing it this way with the pipeline stored in source control will pay off in the long run. After some time, you will have built up a useful library of these pipelines to quickly get up and running in new projects. Recommended!!! 😉🚀🚀🚀

Creating and deploying a model with Azure Machine Learning Service

In this post, we will take a look at creating a simple machine learning model for text classification and deploying it as a container with Azure Machine Learning service. This post is not intended to discuss the finer details of creating a text classification model. In fact, we will use the Keras library and its Reuters newswire dataset to create a simple dense neural network. You can find many online examples based on this dataset. For further information, be sure to check out and buy 👍 Deep Learning with Python by François Chollet, the creator of Keras and now at Google. It contains a section that explains using this dataset in much more detail!

Machine Learning service workspace

To get started, you need an Azure subscription. Once you have the subscription, create a Machine Learning service workspace. Below, you see such a workspace:

My Machine Learning service workspace (gebaml)

Together with the workspace, you also get a storage account, a key vault, application insights and a container registry. In later steps, we will create a container and store it in this registry. That all happens behind the scenes though. You will just write a few simple lines of code to make that happen!

Note the Authoring (Preview) section! These were added just before Build 2019 started. For now, we will not use them.

Azure Notebooks

To create the model and interact with the workspace, we will use a free Jupyter notebook in Azure Notebooks. At this point in time (8 May 2019), Azure Notebooks is still in preview. To get started, find the link below in the Overview section of the Machine Learning service workspace:

Getting Started with Notebooks

To quickly get the notebook, you can clone my public project: ⏩⏩⏩ https://notebooks.azure.com/geba/projects/textclassificationblog.

Creating the model

When you open the notebook, you will see the following first four cells:

Getting the dataset

It’s always simple if a prepared dataset is handed to you like in the above example. Above, you simply use the reuters class of keras.datasets and use the load_data method to get the data and directly assign it to variables to hold the train and test data plus labels.

In this case, the data consists of newswires with a corresponding label that indicates the category of the newswire (e.g. an earnings call newswire). There are 46 categories in this dataset. In the real world, you would have the newswire in text format. In this case, the newswire has already been converted (preprocessed) for you in an array of integers, with each integer corresponding to a word in a dictionary.

A bit further in the notebook, you will find a Vectorization section:

Vectorization

In this section, the train and test data is vectorized using a one-hot encoding method. Because we specified, in the very first cell of the notebook, to only use the 10000 most important words each article can be converted to a vector with 10000 values. Each value is either 1 or 0, indicating the word is in the text or not.

This bag-of-words approach is one of the ways to represent text in a data structure that can be used in a machine learning model. Besides vectorizing the training and test samples, the categories are also one-hot encoded.

Now the dense neural network model can be created:

Dense neural net with Keras

The above code defines a very simple dense neural network. A dense neural network is not necessarily the best type but that’s ok for this post. The specifics are not that important. Just note that the nn variable is our model. We will use this variable later when we convert the model to the ONNX format.

The last cell (16 above) does the actual training in 9 epochs. Training will be fast because the dataset is relatively small and the neural network is simple. Using the Azure Notebooks compute is sufficient. After 9 epochs, this is the result:

Training result

Not exactly earth-shattering: 78% accuracy on the test set!

Saving the model in ONNX format

ONNX is an open format to store deep learning models. When your model is in that format, you can use the ONNX runtime for inference.

Converting the Keras model to ONNX is easy with the onnxmltools:

Converting the Keras model to ONNX

The result of the above code is a file called reuters.onnx in your notebook project.

Predict with the ONNX model

Let’s try to predict the category of the first newswire in the test set. Its real label is 3, which means it’s a newswire about an earnings call (earn class):

Inferencing with the ONNX model

We will use similar code later in score.py, a file that will be used in a container we will create to expose the model as an API. The code is pretty simple: start an inference session based on the reuters.onnx file, grab the input and output and use run to predict. The resulting array is the output of the softmax layer and we use argmax to extract the category with the highest probability.

Saving the model to the workspace

With the model in reuters.onnx, we can add it to the workspace:

Saving the model in the workspace

You will need a file in your Azure Notebook project called config.json with the following contents:

{
     "subscription_id": "<subscription-id>",
     "resource_group": "<resource-group>",
     "workspace_name": "<workspace-name>" 
} 

With that file in place, when you run cell 27 (see above), you will need to authenticate to Azure to be able to interact with the workspace. The code is pretty self-explanatory: the reuters.onnx model will be added to the workspace:

Models added to the workspace

As you can see, you can save multiple versions of the model. This happens automatically when you save a model with the same name.

Creating the scoring container image

The scoring (or inference) container image is used to expose an API to predict categories of newswires. Obviously, you will need to give some instructions how scoring needs to be done. This is done via score.py:

score.py

The code is similar to the code we wrote earlier to test the ONNX model. score.py needs an init() and run() function. The other functions are helper functions. In init(), we need to grab a reference to the ONNX model. The ONNX model file will be placed in the container during the build process. Next, we start an InferenceSession via the ONNX runtime. In run(), the code is similar to our earlier example. It predicts via session.run and returns the result as JSON. We do not have to worry about the rest of the code that runs the API. That is handled by Machine Learning service.

Note: using ONNX is not a requirement; we could have persisted and used the native Keras model for instance

In this post, we only need score.py since we do not train our model via Azure Machine learning service. If you train a model with the service, you would create a train.py file to instruct how training should be done based on data in a storage account for instance. You would also provision compute resources for training. In our case, that is not required so we train, save and export the model directly from the notebook.

Training and scoring with Machine Learning service

Now we need to create an environment file to indicate the required Python packages and start the image build process:

Create an environment yml file via the API and build the container

The build process is handled by the service and makes sure the model file is in the container, in addition to score.py and myenv.yml. The result is a fully functional container that exposes an API that takes an input (a newswire) and outputs an array of probabilities. Of course, it is up to you to define what the input and output should be. In this case, you are expected to provide a one-hot encoded article as input.

The container image will be listed in the workspace, potentially multiple versions of it:

Container images for the reuters ONNX model

Deploy to Azure Container Instances

When the image is ready, you can deploy it via the Machine Learning service to Azure Container Instances (ACI) or Azure Kubernetes Service (AKS). To deploy to ACI:

Deploying to ACI

When the deployment is finished, the deployment will be listed:

Deployment (ACI)

When you click on the deployment, the scoring URI will be shown (e.g. http://IPADDRESS:80/score). You can now use Postman or any other method to score an article. To quickly test the service from the notebook:

Testing the service

The helper method run of aci_service will post the JSON in test_sample to the service. It knows the scoring URI from the deployment earlier.

Conclusion

Containerizing a machine learning model and exposing it as an API is made surprisingly simple with Azure Machine learning service. It saves time so you can focus on the hard work of creating a model that performs well in the field. In this post, we used a sample dataset and a simple dense neural network to illustrate how you can build such a model, convert it to ONNX format and use the ONNX runtime for scoring.

Azure Kubernetes Service and Azure Firewall

Deploying Azure Kubernetes Service (AKS) is, like most other Kubernetes-as-a-service offerings such as those from DigitalOcean and Google, very straightforward. It’s either a few clicks in the portal or one or two command lines and you are finished.

Using these services properly and in a secure fashion is another matter though. I am often asked how to secure access to the cluster and its applications. In addition, customers also want visibility and control of incoming and outgoing traffic. Combining Azure Firewall with AKS is one way of achieving those objectives.

This post will take a look at the combination of Azure Firewall and AKS. It is inspired by this post by Dennis Zielke. In that post, Dennis provides all the necessary Azure CLI commands to get to the following setup:

AKS and Azure Firewall (from
https://medium.com/@denniszielke/setting-up-azure-firewall-for-analysing-outgoing-traffic-in-aks-55759d188039 by Dennis Zielke)

In what follows, I will keep referring to the subnet names and IP addresses as in the above diagram.

Azure Firewall

Azure Firewall is a stateful firewall, provided as a service with built-in high availability. You deploy it in a subnet of a virtual network. The subnet should have the name AzureFirewallSubnet. The firewall will get two IP addresses:

  • Internal IP: the first IP address in the subnet (here 10.0.3.4)
  • Public IP: a public IP address; in the above setup we will use it to provide access to a Kubernetes Ingress controller via a DNAT rule

As in the physical world, you will need to instruct systems to route traffic through the firewall. In Azure, this is done via a route table. The following route table was created:

Route table

In (1) a route to 0.0.0.0/0 is defined that routes to the private IP of the firewall. The route will be used when no other route applies! The route table is associated with just the aks-5-subnet (2), which is the subnet where AKS (with advanced networking) is deployed. It’s important to note that now, all external traffic originating from the Kubernetes cluster passes through the firewall.

When you compare Azure Firewall to the Network Virtual Appliances (NVAs) from vendors such as CheckPoint, you will notice that the capabilities are somewhat limited. On the flip side though, Azure Firewall is super simple to deploy when compared with a highly available NVA setup.

Before we look at the firewall rules, let’s take a look at the Kubernetes Ingress Controller.

Kubernetes Ingress Controller

In this example, I will deploy nginx-ingress as an Ingress Controller. It will provide access to HTTP-based workloads running in the cluster and it can route to various workloads based on the URL. I will deploy the nginx-ingress with Helm.

Think of an nginx-ingress as a reverse proxy. It receives http requests, looks at the hostname and path (e.g. mydomain.com/api/user) and routes the request to the appropriate Kubernetes service (e.g. the user service).

Diagram showing Ingress traffic flow in an AKS cluster
Ingress in Kubernetes (from Microsoft:
https://docs.microsoft.com/en-us/azure/aks/operator-best-practices-network#distribute-ingress-traffic )

Normally, the nginx-ingress service is accessed via an Azure external load balancer. Behind the scenes, this is the result of the service object having spec.type set to the value LoadBalancer. If we want external traffic to nginx-ingress to pass through the firewall, we will need to tell Kubernetes to create an internal load balancer via an annotation. Let’s do that with Helm. First, you will need to install tiller, the server-side component of Helm. Use the following procedure from the Microsoft documentation:

  • Create a service account for tiller: link
  • Configure tiller: link

With tiller installed, issue the following two commands:

kubectl create ns ingress 

helm install stable/nginx-ingress --namespace ingress --set controller.replicaCount=2 --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-internal"=true --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-internal-subnet"=ing-4-subnet

The second command installs nginx-ingress in the ingress namespace. The two –set parameters add the following annotations to the service object (yes I know, the Helm annotation parameters are ugly 🤢):

service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/azure-load-balancer-internal-subnet: ing-4-subnet

This ensures an internal load balancer gets created. It gets created in the mc-* resource group that backs your AKS deployment:

Internal load balancer created by the Kubernetes cloud integration components

Note that Kubernetes creates the load balancer, including the rules and probes for port 80 and 443 as defined in the service object that comes with the Helm chart. The load balancer is created in the ing-4-subnet as instructed by the service annotation. Its private IP address is 10.0.4.4 as in the diagram at the top of this post

DNAT Rule to Load Balancer

To provide access to internal resources, Azure Firewall uses DNAT rules which stands for destination network address translation. The concept is simple: traffic to the firewall’s public IP on some port can be forwarded to an internal IP on the same or another port. In our case, traffic to the firewall’s public IP on port 80 and 443 is forwarded to the internal load balancer’s private IP on port 80 and 443. The load balancer will forward the request to nginx-ingress:

DNAT rule forwarding port 80 and 443 traffic to the internal load balancer

If the installation of nginx-ingress was successful, you should end up at the default back-end when you go to http://firewallPublicIP.

nginx-ingress default backend when browsing to public IP of firewall

If you configured Log Analytics and installed the Azure Firewall solution, you can look at the firewall logs. DNAT actions are logged and can be inspected:

Firewall logs via Log Analytics

Application and Network Rules

Azure Firewall application rules are rules that allow or deny outgoing HTTP/HTTPS traffic based on the URL. The following rules were defined:

Application rules

The above rules allow http and https traffic to destinations such as docker.io, cloudflare and more.

Note that another Azure Firewall rule type, network rules, are evaluated first. If a match is found, rule evaluation is stopped. Suppose you have these network rules:

Network rules

The above network rule allows port 22 and 443 for all sources and destinations. This means that Kubernetes can actually connect to any https-enabled site on the default port, regardless of the defined application rules. See rule processing for more information.

Threat Intelligence

This feature alerts on and/or denies network traffic coming from known bad IP addresses or domains. You can track this via Log Analytics:

Threat Intelligence Alerts and Denies on Azure Firewall

Above, you see denied port scans, traffic from botnets or brute force credentials attacks all being blocked by Azure Firewall. This feature is currently in preview.

Best Practices

The AKS documentation has a best practices section that discusses networking. It contains useful information about the networking model (Kubenet vs Azure CNI), ingresses and WAF. It does not, at this point in time (May 2019), desicribe how to use Azure Firewall with AKS. It would be great if that were added in the near future.

Here are a couple of key points to think about:

  • WAF (Web Application Firewall): Azure Firewall threat intelligence is not WAF; to enable WAF, there are several options:
    • you can enable mod_security in nginx_ingress
    • you can use Azure WAF or a 3rd party WAF
    • you can use cloud-native WAFs such as TwistLock (WAF is one of the features of this product; it also provides firewall and vulnerability assessment)
  • remote access to Kubernetes API: today, the API server is exposed via a public IP address; having the API server on a local IP will be available soon
  • remote access to Kubernetes hosts using SSH: only allow SSH on the private IP addresses; use a bastion host to enable connectivity

Conclusion

Azure Kubernetes Service (AKS) can be combined with Azure Firewall to control network traffic to and from your Kubernetes cluster. Log Analytics provides the dashboard and logs to report and alert on traffic patterns. Features such as threat intelligence provide an extra layer of defense. For HTTP/HTTPS workloads (so most workloads), you should complement the deployment with a WAF such as Azure Application Gateway or 3rd party.

Further improvements to the IoT Hub to TimescaleDB Azure Function

In the post Improving an Azure Function that writes IoT Hub data to TimescaleDB, we added some improvements to an Azure Function that uses the Event Hub trigger to write messages from IoT Hub to TimescaleDB:

  • use of the Event Hub enqueuedTime timestamp instead of NOW() in the INSERT statement (yes, I know, using NOW() did not make sense 😉)
  • make the code idempotent to handle duplicates (basically do nothing when a unique constraint is violated)

In general, I prefer to use application time (time at the event publisher) versus the time the message was enqueued. If you don’t have that timestamp, enqueuedTime is the next best thing.

How can we optimize the function even further? Read on about the cardinality setting!

Event Hub trigger cardinality setting

Our JavaScript Azure Function has its settings in function.json. For reference, here is its content:

{
"bindings": [
{
"type": "eventHubTrigger",
"name": "IoTHubMessages",
"direction": "in",
"eventHubName": "hub-pg",
"connection": "EH",
"cardinality": "one",
"consumerGroup": "pg"
}
]
}

Clearly, the function uses the eventHubTrigger for an Event Hub called hub-pg. In connection, EH refers to an Application Setting which contains the connections string to the Event Hub. Yes, I excel at naming stuff! The Event Hub has defined a consumer group called pg that we are using in this function.

The cardinality setting is currently set to “one”, which means that the function can only process one message at a time. As a best practice, you should use a cardinality of “many” in order to process batches of messages. A setting of “many” is the default.

To make the required change, modify function.json and set cardinality to “many”. You will also have to modify the Azure Function to process a batch of messages versus only one:

Processing batches of messages

With cardinality set to many, the IoTHubMessages parameter of the function is now an array. To retrieve the enqueuedTime from the messages, grab it from the enqueuedTimeUtcArray array using the index of the current message. Notice I also switched to JavaScript template literals to make the query a bit more readable.

The number of messages in a batch is controlled by maxBatchSize in host.json. By default, it is set to 64. Another setting,prefetchCount, determines how many messages are retrieved and cached before being sent to your function. When you change maxBatchSize, it is recommended to set prefetchCount to twice the maxBatchSize setting. For instance:

{
"version": "2.0",
"extensions": {
"eventHubs": {
"batchCheckpointFrequency": 1,
"eventProcessorOptions": {
"maxBatchSize": 128,
"prefetchCount": 256
}
}
}
}

It’s great to have these options but how should you set them? As always, the answer is in this book:

Afbeeldingsresultaat voor it depends joke

A great resource to get a feel for what these settings do is this article. It also comes with a Power BI report that allows you to set the parameters to see the results of load tests.

Conclusion

In this post, we used the function.json cardinality setting of “many” to process a batch of messages per function call. By default, Azure Functions will use batches of 64 messages without prefetching. With the host.json settings of maxBatchSize and prefetchCount, that can be changed to better handle your scenario.

%d bloggers like this: