Using the OAuth Client Credentials Flow

I often get questions about protecting applications like APIs using OAuth. I guess you know the drill:

  • you have to obtain a token (typically a JWT or JSON Web Token)
  • the client submits the token to your backend (via a Authorization HTTP header)
  • the token needs to be verified (do you trust it?)
  • you need to grab some fields from the token to use in your application (claims).

When the client is a daemon or some server side process, you can use the client credentials grant flow to obtain the token from Azure AD. The flow works as follows:

OAuth Client Credentials Flow (image from Microsoft docs)

The client contacts the Azure AD token endpoint to obtain a token. The client request contains a client ID and client secret to properly authenticate to Azure AD as a known application. The token endpoint returns the token. In this post, I only focus on the access token which is used to access the resource web API. The client uses the access token in the Authorization header of requests to the API.

Let’s see how this works. Oh, and by the way, this flow should be done with Azure AD. Azure AD B2C does not support this type of flow (yet).

Create a client application in Azure AD

In Azure AD, create a new App Registration. This can be a standard app registration for Web APIs. You do not need a redirect URL or configure public clients or implicit grants.

Standard run of the mill app registration

In Certificates & secrets, create a client secret and write it down. It will not be shown anymore when you later come back to this page:

Yes, I set it to Never Expire!

From the Overview page, note the application ID (also client ID). You will need that later to request a token.

Why do we even create this application? It represents the client application that will call your APIs. With this application, you control the secret that the client application uses but also the access rights to the APIs as we will see later. The client application will request a token, specifying the client ID and the client secret. Let’s now create another application that represents the backend API.

Create an API application in Azure AD

This is another App Registration, just like the app registration for the client. In this case, it represents the API. Its settings are a bit different though. There is no need to specify redirect URIs or other settings in the Authentication setting. There is also no need for a client secret. We do want to use the Expose an API page though:

Expose API page

Make sure you get the application ID URI. In the example above, it is api://06b2a484-141c-42d3-9d73-32bec5910b06 but you can change that to something more descriptive.

When you use the client credentials grant, you do not use user scopes. As such, the Scopes defined by this API list is empty. Instead, you want to use application roles which are defined in the manifest:

Application role in the manifest

There is one role here called invokeRole. You need to generate a GUID manually and use that as the id. Make sure allowedMemberTypes contains Application.

Great! But now we need to grant the client the right to obtain a token for one or more of the roles. You do that in the client application, in API Permissions:

Client application is granted access to the invokeRole application role of the API application

To grant the permission, just click Add a permission, select My APIs, click your API and select the role:

Selecting the role

Delegated permissions is greyed out because there are no user scopes. Application permissions is active because we defined an application role on the API application.

Obtaining a token

The server-side application only needs to do one call to the token endpoint to obtain the access token. Here is an example call with curl:

curl -d "grant_type=client_credentials&client_id=f1f695cb-2d00-4c0f-84a5-437282f3f3fd&client_secret=SECRET&audience=api%3A%2F%2F06b2a484-141c-42d3-9d73-32bec5910b06&scope=api%3A%2F%2F06b2a484-141c-42d3-9d73-32bec5910b06%2F.default" -X POST "https://login.microsoftonline.com/019486dd-8ffb-45a9-9232-4132babb1324/oauth2/v2.0/token"

Ouch, lots of gibberish here. Let’s break it down:

  • the POST needs to send URL encoded data in the body; curl’s -d takes care of that but you need to perform the URL encoding yourself
  • grant_type: client_credentials to indicate you want to use this flow
  • client_id: the application ID of the client app registration in Azure AD
  • client_secret: URL encoded secret that you generated when you created the client app registration
  • audience: the resource you want an access token for; it is the URL encoding of api://06b2a484-141c-42d3-9d73-32bec5910b06 as set in Expose an API
  • scope: this one is a bit special; for the v2 endpoint that we use here it needs to be api://06b2a484-141c-42d3-9d73-32bec5910b06/.default (but URL encoded); the scope (or roles) that the client application has access to will be included in the token

The POST goes to the Azure AD v2.0 token endpoint. There is also a v1 endpoint which would require other fields. See the Microsoft docs for more info. Note that I also updated the application manifests to issue v2 tokens via the accessTokenAcceptedVersion field (set to 2).

The result of the call only results in an access token (no refresh token in the client credentials flow). Something like below with the token shortened:

{"token_type":"Bearer","expires_in":3600,"ext_expires_in":3600,"access_token":"eyJ0e..."}

The access_token can be decoded on https://jwt.ms:

Decoded token

Note that the invokeRole is present because the client application was granted access to that role. We also know the application ID that represents the API, which is in the aud field. The azp field contains the application ID of the client application.

Great, we can now use this token to call our API. The raw HTTP request would be in this form.

GET https://somehost/calc/v1/add/1/1 HTTP/1.1 
Host: somehost 
Authorization: Bearer eyJ0e...

Of course, your application needs to verify the token somehow. This can be done in your application or in an intermediate layer such as API Management. We will take a look at how to do this with API Management in a later post.

Conclusion

Authentication, authorization and, on a broader scale, identity can be very challenging. Technically though, a flow such as the client credentials flow, is fairly simple to implement once you have done it a few times. Hopefully, if you are/were struggling with this type of flow, this post has given you some pointers!

Giving linkerd a spin

A while ago, I gave linkerd a spin. Due to vacations and a busy schedule, I was not able to write about my experience. I will briefly discuss how to setup linkerd and then deploy a sample service to illustrate what it can do out of the box. Let’s go!

Wait! What is linkerd?

linkerd basically is a network proxy for your Kubernetes pods that’s designed to be deployed as a service mesh. When the pods you care about have been infused with linkerd, you will automatically get metrics like latency and requests per second, a web portal to check these metrics, live inspection of traffic and much more. Below is an example of a Kubernetes namespace that has been meshed:

A meshed namespace; all deployments in this particular namespace are meshed which means all pods get the linkerd network proxy that provides the metrics and features such as encryption

Installation

I can be very brief about this: installation is about as simple as it gets. Simply navigate to https://linkerd.io/2/getting-started to get started. Here are the simplified steps:

  • Download the linkerd executable as described in the Getting Started guide; I used WSL for this
  • Create a Kubernetes cluster with AKS (or another provider); for AKS, use the Azure CLI to get your credentials (az aks get-credentials); make sure the Azure CLI is installed in WSL and that you connected to your Azure subscription with az login
  • Make sure you can connect to your cluster with kubectl
  • Run linkerd check –pre to check if prerequisites are fulfilled
  • Install linkerd with linkerd install | kubectl apply -f –
  • Check the installation with linkerd check

The last step will nicely show its progress and end when the installation is complete:

linkerd check output

Exploring linkerd with the dashboard

linkerd automatically installs a dashboard. The dashboard is exposed as a Kubernetes service called linkerd-web. The service is of type ClusterIP. Although you could expose the service using an ingress, you can easily tunnel to the service with the following linkerd command (first line is the command; other lines are the output):

linkerd dashboard

Linkerd dashboard available at:
http://127.0.0.1:50750
Grafana dashboard available at:
http://127.0.0.1:50750/grafana
Opening Linkerd dashboard in the default browser
Failed to open Linkerd dashboard automatically
Visit http://127.0.0.1:50750 in your browser to view the dashboard

From WSL, the dashboard can not open automatically but you can manually browse to it. Note that linkerd also installs Prometheus and Grafana.

Out of the box, the linkerd deployment is meshed:

Adding linkerd to your own service

In this section, we will deploy a simple service that can add numbers and add linkerd to it. Although there are many ways to do this, I chose to create a separate namespace and enable auto-injection via an annotation. Here’s the yaml to create the namespace (add-ns.yaml):

apiVersion: v1
kind: Namespace
metadata:
  name: add
  annotations:
    linkerd.io/inject: enabled

Just run kubectl create -f add-ns.yaml to create the namespace. The annotation ensures that all pods added to the namespace get the linkerd proxy in the pod. All traffic to and from the pod will then pass through the proxy.

Now, let’s install the add service and deployment:

apiVersion: v1
kind: Service
metadata:
  name: add-svc
spec:
  ports:
  - port: 80
    name: http
    protocol: TCP
    targetPort: 8000
  - port: 8080
    name: grpc
    protocol: TCP
    targetPort: 8080
  selector:
    app: add
    version: v1
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: add
spec:
  replicas: 2
  selector:
    matchLabels:
      app: add
  template:
    metadata:
      labels:
        app: add
        version: v1
    spec:
      containers:
      - name: add
        image: gbaeke/adder

The deployment deploys to two pods with the gbaeke/adder image. To deploy the above, save it to a file (add.yaml) and use the following command to deploy:

kubectl create -f add-yaml -n add

Because the deployment uses the add namespace, the linkerd proxy will be added to each pod automatically. When you list the pods in the deployment, you see:

Each add pod has two containers: the actual add container based on gbaeke/adder and the proxy

To see more details about one of these pods, I can use the following command:

k get po add-5b48fcc894-2dc97 -o yaml -n add

You will clearly see the two containers in the output:

Two containers in the pod: actual service (gbaeke/adder) and the linkerd proxy

Generating some traffic

Let’s deploy a client that continuously uses the calculator service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: add-cli
spec:
  replicas: 1
  selector:
    matchLabels:
      app: add-cli
  template:
    metadata:
      labels:
        app: add-cli
    spec:
      containers:
      - name: add-cli
        image: gbaeke/adder-cli
        env:
        - name: SERVER
          value: "add-svc"

Save the above to add-cli.yaml and deploy with the below command:

kubectl create -f add-cli.yaml -n add

The deployment uses another image called gbaeke/adder-cli that continuously makes requests to the server specified in the SERVER environment variable.

Checking the deployment in the linkerd portal

When you now open the add namespace in the linked portal, you should see something similar to the below screenshot (note: I deployed 5 servers and 5 clients):

A view on the add namespace; linkerd has learned how the deployments talk to eachother

The linkerd proxy in all pods sees all traffic. From the traffic, it can infer that the add-cli deployment talks to the add deployment. The add deployment receives about 150 requests per second. The 99th percentile latency is relatively high because the cluster nodes are very small, I deployed more instances and the client is relatively inefficient.

When I click the deployment called add, the following screen is shown:

A view on the deployment

The deployment clearly shows where traffic is coming from plus relevant metrics such as RPS and P99 latency. You also get a view on the live calls now. Note that the client is using GRPC which uses a HTTP POST. When you scroll down on this page, you get more information about the caller and a view on the individual pods:

A view on the inbound calls to the deployment plus a view on the pods

To see live calls in more detail, you can click the Tap icon:

A live view on the calls with Tap

For each call, details can be requested:

Request details

Conclusion

This was just a brief look at linkerd. It is trivially easy to install and with auto-injection, very simple to add it to your own services. Highly recommended to give it a spin to see where it can add value to your projects!

Azure SQL Database High Availability

Creating a SQL Database in Azure is a simple matter: create the database and server, get your connection string and off you go! Before starting though, spend some time thinking about the level of high availability (HA) that you want:

  • What is the required level of HA within the deployment region (e.g. West Europe)?
  • Do you require failover to another region (e.g. from West Europe to North Europe)?

HA in a single region

To achieve the highest level of availability in a region, do the following:

  • Use the Premium (DTU) or Business Critical tier (vCore): Azure will use the premium availability model for your database
  • Enable Availability Zone support if the region supports it: copies of your database will be spread over the zones

The diagram below illustrates the premium availability model (from the Microsoft docs):

Premium Availability Model

The region will contain one primary read/write replica and several secondary replicas. Each replica uses local SSD storage. The database is replicated synchronously and failover is swift and without data loss. If required, you can enable a read replica and specify you want to connect to the read replica by adding ApplicationIntent=ReadOnly to the connection string. Great for reporting and BI!

Spreading the databases over multiple zones is as simple as checking a box. Availability zone support comes at no extra cost but can increase the write commit latency because the nodes are a few kilometers apart. The option to enable zone support is in the Configure section as shown below:

Enabling zone redundancy

To read more about high availability, including the standard availability model for other tiers, check out the docs.

For critical applications, we typically select the Premium/Business Critical model as it provides good performance coupled to the highest possible availability in a region.

Geo-replication

The geo-replication feature replicates a database asynchronously to another region. Suppose you have a database and server in West Europe that you want to replicate to France Central. In the portal, navigate to the database (not the server) and select Geo-Replication. Then check the region, in this case France Central. The following questions show up:

Geo-Replication

A database needs a logical server object that contains the database. To replicate the database to France Central, you need such a server object in that region. The UI above allows you to create that server.

Note that the databases need to use the same tier although the secondary can be configured with less DTUs or vCores. Doing so is generally not recommended.

After configuration, the UI will show the active replication. In this case, I am showing replication from North Europe to West Europe (and not France Central):

Geo-replication is easy to configure but in practice, we recommend to use Failover Groups. A Failover Group uses geo-replication under the hood but gives you some additional features such as:

  • Automated failover (vs manual with just geo-replication)
  • Two servers to use in your connection string that use CNAMEs that are updated in case of failover; one CNAME always points to the read/write replica, the other to the read replica

Failover groups are created at the server level instead of the database level:

Failover group

Below, there is a failover group aks-fo with the primary server in North Europe and the secondary in West Europe:

Failover group details

You can manually fail over the database if needed:

Failover and forced failover

Failover allows you to failover the database without data loss if both databases are still active. Forced failover performs a failover even if the primary is down, which might lead to data loss.

Note: when you configure a Failover Group, geo-replication will automatically be configured for you.

Connecting from an application

To connect to a database configured in a failover group, first get the failover group server names:

Read/write and read-only listener endpoints

Next, in your application, use the appropriate connection string. For instance, in Go:

var sqldb *sql.DB
var server = "aks-fo.database.windows.net"
var port = 1433
var user = "USERNAME"
var password = "PASSWORD"
var database = "DBNAME"

func init() {
    // Build connection string
    connString := fmt.Sprintf("server=%s;user id=%s;password=%s;port=%d;database=%s;",
        server, user, password, port, database)

    var err error

    // Create connection pool
    sqldb, err = sql.Open("sqlserver", connString)
    if err != nil {
        log.Fatal("Error creating connection pool: ", err.Error())
    }
    ctx := context.Background()

    //above commands actually do not connect to SQL but the ping below does
    err = sqldb.PingContext(ctx)
    if err != nil {
        log.Fatal(err.Error())
    }
    log.Printf("Connected!\n")
}

During a failover, there will be an amount of time that the database is not available. When that happens, the connection will fail. The error is shown below:

[db customers]: invalid response code 500, body: {"name":"fault","id":"7JPhcikZ","message":"Login error: mssql: Database 'aksdb' on server 'akssrv-ne' is not currently available.  Please retry the connection later.  If the problem persists, contact customer support, and provide them the session tracing ID of '6D7D70C3-D550-4A74-A69C-D689E6F5CDA6'.","temporary":false,"timeout":false,"fault":true}

Note: the Failover Group uses a CNAME record aks-fo.database.windows.net which resolves to the backend servers in either West or North Europe. Make sure you allow connections to these servers in the firewall or you will get the following error:

db customers]: invalid response code 500, body: {"name":"fault","id":"-p9TwZkm","message":"Login error: mssql: Cannot open server 'akssrv-ne' requested by the login. Client with IP address 'IP ADDRESS' is not allowed to access the server.  To enable access, use the Windows Azure Management Portal or run sp_set_firewall_rule on the master database to create a firewall rule for this IP address or address range.  It may take up to five minutes for this change to take effect.","temporary":false,"timeout":false,"fault":true} 

Conclusion

For the highest level of availability, use the regional premium availability model with Availability Zone support. In addition, use a Failover Group to enable replication of the database to another region. A Failover Group automatically connects your application to the primary (read/write) or secondary replica (read) via a CNAME and can failover automatically after some grace period.

Quick Tip: deploying multiple Traefik ingresses

For a customer that is developing a microservices application, the proposed architecture contains two Kubernetes ingresses:

  • internal ingress: exposed via an Azure internal load balancer, deployed in a separate subnet in the customer’s VNET; no need for SSL
  • external ingress: exposed via an external load balancer; SSL via Let’s Encrypt

The internal ingress exposes API endpoints via Azure API Management and its ability to connect to internal subnets. The external ingress exposes web applications via Azure Front Door.

The Ingress Controller of choice is Traefik. We use the Helm chart to deploy Traefik in the cluster. The example below uses Azure Kubernetes Service so I will refer to Azure objects such as VNETs, subnets, etc… Let’s get started!

Internal Ingress

In values.yaml, use ingressClass to set a custom class. For example:

 kubernetes:
  ingressClass: traefik-int 

When you do not set this value, the default ingressClass is traefik. When you define the ingress object, you refer to this class in your manifest via the annotation below:

 annotations:
    kubernetes.io/ingress.class: traefik-int

When we deploy the internal ingress, we need to tell Traefik to create an internal load balancer. Optionally, you can specify a subnet to deploy to. You can add these options under the service section in values.yaml:

service:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
    service.beta.kubernetes.io/azure-load-balancer-internal-subnet: "traefik" 

The above setting makes sure that the annotations are set on the service that the Helm chart creates to expose Traefik to the “outside” world. The settings are not Traefik specific.

Above, we want Kubernetes to deploy the Azure internal load balancer to a subnet called traefik. That subnet needs to exist in the VNET that contains the Kubernetes subnet. Make sure that the AKS service principal has the necessary access rights to deploy the load balancer in the subnet. If it takes a long time to deploy the load balancer, use kubectl get events in the namespace where you deploy Traefik (typically kube-system).

If you want to provide an static IP address to the internal load balancer, you can do so via the loadBalancerIp setting near the top of values.yaml. You can use any free address in the subnet where you deploy the load balancer.

loadBalancerIP: 172.20.3.10 

All done! You can now deploy the internal ingress with:

helm install . --name traefik-int --namespace kube-system

Note that we install the Helm chart from our local file system and that we are in the folder that contains the chart and values.yaml. Hence the dot (.) in the command.

TIP: if you want to use a private DNS zone to resolve the internal services, see the private DNS section in Azure API Management and Azure Kubernetes Service. Private DNS zones are still in preview.

External ingress

The external ingress is simple now. Just set the ingressClass to traefik-ext (or leave it at the default of traefik although that’s not very clear) and remove the other settings. If you want a static public IP address, you can create such an address first and specify it in values.yaml. In an Azure context, you would create a public IP object in the resource group that contains your Kubernetes nodes.

Conclusion

If you need multiple ingresses of the same type or brand, use distinct values for ingressClass and reference the class in your ingress manifest file. Naturally, when you use two different solutions, say Kong for APIs and Traefik for web sites, you do not need to do that since they use different ingressClass values by default (kong and traefik). Hope this quick tip was useful!

Inspecting Web Application Firewall logs

In some of my previous posts, I talked about Azure Front Door and Web Application Firewall policies to protect a workload like one or more APIs running on Kubernetes or App Service. Although I enabled the Web Application Firewall policies, I did not show what happens when the rules are triggered. Let’s take a look at that! 🕶

Before we get started though, take the following diagram into account:

Azure web application firewall
From: https://docs.microsoft.com/en-us/azure/frontdoor/waf-overview

WAF for Front Door is a global solution. You create a WAF policy in the portal or via other means and attach it to a Front Door frontend. Rules are evaluated and acted upon at the edge versus on your application server.

Azure WAF supports custom rules and Azure-managed rule sets (based on OWASP). The custom rules are interesting because they allow you to restrict IP addresses, configure geographic based access control and more.

There’s an additional rule type called bot protection rule as well. At the time of this writing (beginning June 2019) this feature is in public preview. It uses the Microsoft Intelligent Security Graph to do its magic, similarly to Azure Firewall when you enable Threat Intelligence.

WAF Logs

Let’s first use a tool that can scan an endpoint for vulnerabilities to trigger the WAF rules. One such tool is OWASP ZAP, which you need to install on your workstation.

OWASP ZAP tool

Before we check the logs, note we have set the policy to Detection:

WAF policy set to Detection; start with detection to learn what the rules might block in your app

Now let’s take a look at the logs. Use the following query in Log Analytics and modify it for your own host (host_s field):

AzureDiagnostics
| where ResourceType == "FRONTDOORS" and Category == "FrontdoorWebApplicationFirewallLog"
| where action_s == "Block" 
| where host_s == "api.baeke.info"  

The result:

Blocked requests (if the policy were set at Prevention at the global level)

Let’s look at a SQL Injection block:

Typical SQL Injection

The decoded requestUri_s is https://api.baeke.info:443/users?apikey=theapikey’ AND ‘1’=’1′ –. Typical! It was blocked at the edge. This request went via the BRU location.

Like with any Log Analytics query, you can place alerts on log occurrences. You will need to be in the Log Analytics workspace, and not in the Logs section of Azure Front Door:

Conclusion

Azure Web Application Firewall policies for Azure Front Door integrate with Azure Monitor and Log Analytics, like most other Azure services. With some KQL, the query language for Log Analytics, it is straightforward to request the logs and set alerts on them.

Azure Front Door and multi-region deployments

In the previous post, we looked at publishing and securing an API with Azure Front Door and Azure Web Application Firewall. The API ran on Kubernetes, exposed by Kong and Kong Ingress Controller. Kong was configured to require an API key to call the /users API, allowing us to identify the consumer of the API. The traffic flow was as follows:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer) -- HTTP --> API Kubernetes service --> API pods 

Although Kubernetes makes the API(s) highly available, you might want to take extra precautions such as deploying the API in multiple regions. In this post, we will take a look at doing so. That means we will deploy the API in both West and North Europe, in two distinct Kubernetes clusters:

  • we-clu: Kubernetes cluster in West Europe
  • ne-clu: Kubernetes cluster in North Europe

The flow is very similar of course:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer; region to connect to depends on Front Door configuration and health probes) -- HTTP --> API Kubernetes service --> API pods  

Let’s take a look at the configuration! By the way, the supporting files to deploy the Kubernetes objects are here: https://github.com/gbaeke/api-kong/tree/master. To deploy Kong, check out this post.

Kubernetes

We deploy a Kubernetes cluster in each region, install Helm, deploy Kong, deploy our API and configure ingresses and related Kong custom resource definitions (CRDs). The result is an external IP address in each region that leads to the Kong proxy. Search for “kong” on this blog to find posts with more details about this deployment.

Note that the API deployment specifies an environment variable that will contain the string WE or NE. This environment variable will be displayed in the output when we call the API. Here is the API deployment for West Europe:

apiVersion: v1
kind: Service
metadata:
  name: func
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: func
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: func
spec:
  replicas: 2
  selector:
    matchLabels:
      app: func
  template:
    metadata:
      labels:
        app: func
    spec:
      containers:
      - name: func
        image: gbaeke/ingfunc
        env:
        - name: REGION
          value: "WE"
        ports:
        - containerPort: 80 

When we call the API and Azure Front Door uses the backend in West Europe, the result will be:

WE included in the response from the West Europe cluster

Origin APIs

The origin APIs need to be exposed on the public Internet using a DNS name. Azure Front Door requires a public backend to connect to. Naturally, the backend can be configured to only accept incoming requests from Front Door. In our case, the APIs are available on the public IP of the Kong proxy. The following names were used:

  • api-o-we.baeke.info: Kong proxy in West Europe
  • api-o-ne.baeke.info; Kong proxy in North Europe

Both endpoints are configured to accept TLS connections only, and use a Let’s Encrypt wildcard certificate for *.baeke.info.

Front Door Configuration

The Front Door designer looks the same as in the previous post:

Front Door Designer

However, the backend pool api-o now has two backends:

Two backend hosts, both enabled with same priority and weight

To determine the health of the backend, Front Door needs to be configured with a health probe that returns status code 200. If we were to specify the probe below, the health probe would fail:

Errrrrrr, this won’t work

The health probe would hit Kong’s proxy and return a 404 (Not Found). We did not create a route for /, only for /users. With Azure Front Door, when all health probes fail, all backends are considered healthy. Yes, you read that right.

Although we could create a route called /health that returns a 200, we will use the following probe just to make it work:

Fixing the health probe (quick and dirty fix); just can the /users API

If you are exposing multiple APIs on each cluster, the health probe above would not make sense. Also note that the purpose of the health probe is to determine if the cluster is up or not. It will not fix one API behaving badly or being removed accidentally!

You can check the health probes from the portal:

Yep! Health probes in West and North Europe are at 100%

Connection test

When I connect from my home laptop in Belgium, I get the following response:

Connection to West Europe cluster

When I connect from my second home in Dublin 🤷‍♀️ I get:

Connection from a VM in North Europe (I was kidding about the second home)

If you enable logging to Log Analytics, you can check this in the FrontDoorAccessLog:

Connection from home via Brussels (West Europe would show DB in Tenant_x)

When I remove the /users API in West Europe (kubectl delete deploy func), my home laptop will connect to North Europe as expected:

I didn’t fake this! It’s 100% real, my laptop now connects to the North Europe cluster via Front Door as expected

Note that the calls will not fail from the moment you delete the /users API (the health probe here). That depends on the following setting (backends in Front Door designer):

When should the backend be determined healthy or unhealthy; decrease sample size and or samples to make it go faster

The backend health percentage graph indicates the probe failure as well:

We’ve lost West Europe folks!

Conclusion

When you are going for a multi-region deployment of services, Azure Front Door is one of the options. Of course, there is much more to a multi-region deployment than the “front-end stuff” described in this post. What do you do with databases for instance? Can you use active-active write regions (e.g. Cosmos DB) or does your database only support active/passive with read replicas?

As in other load balancing and fail over solutions, proper health probes are crucial in the design. Think about what a good health probe can be and what it means when it is not available. One option is to just write a health probe exposed via an endpoint such as /health that merely returns a 200 status code. But your health probe could also be designed to connect to backend systems such as databases or queues to determine the health of the system.

Hopefully, this post gives you some ideas to start! Follow me on Twitter for updates.

Publishing and securing your API with Kong and Azure Front Door

In the post, Securing your API with Kong and CloudFlare, I exposed a dummy API on Kubernetes with Kong and published it securely with CloudFlare. The breadth of features and its ease of use made CloudFlare a joy to work with. It didn’t take long before I got the question: “can’t you do that with Azure only?”. The answer is obvious: “Of course you can!”

In this post, the traffic flow is as follows:

Consumer -- HTTPS --> Azure Front Door with WAF policy -- HTTPS --> Kong (exposed with Azure Load Balancer) -- HTTP --> API Kubernetes service --> API pods

Similarly to CloudFlare, Azure Front Door provides a fully trusted certificate for consumers of the API. In contrast to CloudFlare, Azure Front Door does not provide origin certificates which are trusted by Front Door. That’s easy to solve though by using a fully trusted Let’s Encrypt certificate which is stored as a Kubernetes secret and used in the Kubernetes Ingress definition. For this post, I requested a wildcard certificate for *.baeke.info via https://www.sslforfree.com/

Let’s take it step-by-step, starting at the API and Kong level.

APIs and Kong

Just like in the previous posts, we have a Kubernetes service called func and back-end pods that host the API implemented via Azure Functions in a container. Below you see the API pods in the default namespace. For convenience, Kong is also deployed in that namespace (not recommended in production):

A view on the API pods and Kong via k9s

The ingress definition is shown below:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: func
  namespace: default
  annotations:
    kubernetes.io/ingress.class: kong
    plugins.konghq.com: http-auth
spec:
  tls:
  - hosts:
    - api-o.baeke.info
    secretName: wildcard-baeke.info.tls
  rules:
    - host: api-o.baeke.info
      http:
        paths:
        - path: /users
          backend:
            serviceName: func
            servicePort: 80 

Kong will pick up the above definition and configure itself accordingly.

The API is exposed publicly via https://api-o.baeke.info where the o stands for origin. The secret wildcard-baeke.info.tls refers to a secret which contains the wildcard certificate for *.baeke.info:

apiVersion: v1
kind: Secret
metadata:
  name: wildcard-baeke.info.tls
  namespace: default
type: kubernetes.io/tls
data:
  tls.crt: certificate
  tls.key: key

Naturally, certificate and key should be replaced with the base64-encoded strings of the certificate and key you have obtained (in this case from https://www.sslforfree.com).

At the DNS level, api-o.baeke.info should refer to the external IP address of the exposed Kong Ingress Controller (proxy):

The service kong-kong-proxy is exposed via a public IP address (service of type LoadBalancer)

For the rest, the Kong configuration is not very different from the configuration in Securing your API with Kong and CloudFlare. I did remove the whitelisting configuration, which needs to be updated for Azure Front Door.

Great, we now have our API listening on https://api-o.baeke.info but it is not exposed via Azure Front Door and it does not have a WAF policy. Let’s change that.

Web Application Firewall (WAF) Policy

You can create a WAF policy from the portal:

WAF Policy

The above policy is set to detection only. No custom rules have been defined, but a managed rule set is activated:

Managed rule set for OWASP

The WAF policy was saved as baekeapiwaf. It will be attached to an Azure Front Door frontend. When a policy is attached to a frontend, it will be shown in the policy:

Associated frontends (Front Door front-ends)

Azure Front Door

We will now add Azure Front Door to obtain the following flow:

Consumer ---> https://api.baeke.info (Front Door + WAF) --> https://api-o.baeke.info

The final configuration in Front Door Designer looks like this:

Front Door Designer

When a request comes in for api.baeke.info, the response from api-o.baeke.info is served. Caching was not enabled. The frontend and backend are tied together via the routing rule.

The first thing you need to do is to add the azurefd.net frontend which is baeke-api.azurefd.net in the above config. There’s not much to say about that. Just click the blue plus next to Frontend hosts and follow the prompts. I did not attach a WAF policy to that frontend because it will not forward requests to the backend. We will use a custom domain for that.

Next, click the blue plus again to add the custom domain (here api.baeke.info). In your DNS zone, create a CNAME record that maps api.yourdomain.com to the azurefd.net name:

Mapping of custom domain to azurefd.net domain in CloudFlare DNS

I attached the WAF policy baekeapiwaf to the front-end domain:

WAF policy with OWASP rules to protect the API

Next, I added a certificate. When you select Front Door managed, you will get a Digicert managed image. If the CNAME mapping is not complete, you will get an e-mail from Digicert to approve certificate issuance. Make sure you check your e-mails if it takes long to issue the certificate. It will take a long time either way so be patient! 💤💤💤

Now that we have the frontend, specify the backend that Front Door needs to connect to:

Backend pool

The backend pool uses the API exposed at api-o.baeke.info as defined earlier. With only one backend, priority and weight are of no importance. It should be clear that you can add multiple backends, potentially in different regions, and load balance between them.

You will also need a health probe to check for healthy and unhealthy backends:

Health probes of the backend

Note that the above health check does NOT return a 200 OK status code. That is the only status code that would result in a healthy endpoint. With the above config, Kong will respond with a “no Route matched” 404 Not Found error instead. That does not mean that Front Door will not route to this endpoint though! When all endpoints are in a failed state, Front Door considers them healthy anyway 😲😲😲 and routes traffic using round-robin. See the documentation for more info.

Now that we have the frontend and the backend, let’s tie the two together with a rule:

First part of routing rule

In the first part of the rule, we specify that we listen for requests to api.baeke.info (and not the azurefd.net domain) and that we only accept https. The pattern /* basically forwards everything to the backend.

In the route details, we specify the backend to route to:

Backend to route to

Clearly, we want to route to the api-o backend we defined earlier. We only connect to the backend via HTTPS. It only accepts HTTPS anyway, as defined at the Kong level via a KongIngress resource.

Note that it is possible to create a HTTP to HTTPS redirect rule. See the post Azure Front Door Revisited for more information. Without the rule, you will get the following warning:

Please disregard this warning 😎

Test, test, test

Let’s call the API via the http tool:

Clearly, Azure Front Door has served this request as indicated by the X-Azure-Ref header. Let’s try http:

Azure Front Door throws the above error because the routing rule only accepts https on api.baeke.info!

White listing Azure Front Door

To restrict calls to the backend to Azure Front Door, I used the following KongPlugin definition:

apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: whitelist-fd
  namespace: default
config:
  whitelist: 
  - 147.243.0.0/16
plugin: ip-restriction 

The IP range is documented here. Note that the IP range can and probably will change in the future.

In the ingress definition, I added the plugin via the annotations:

annotations:
  kubernetes.io/ingress.class: kong
  plugins.konghq.com: http-auth, whitelist-fd 

Calling the backend API directly will now fail:

That’s a no no! Please use the Front Door!

Conclusion

Publishing APIs (or any web app), whether they are running on Kubernetes or other systems, is easy to do with the combination of Azure Front Door and Web Application Firewall policies. Do take pricing into account though. It’s a mixture of relatively low fixed prices with variable pricing per GB and requests processed. In general, CloudFlare has the upper hand here, from both a pricing and features perspective. On the other hand, Front Door has advantages when it comes to automating its deployment together with other Azure resources. As always: plan, plan, plan and choose wisely! 🦉