Taking Azure Container Apps for a spin

At Ignite November 2021, Microsoft released Azure Container Apps as a public preview. It allows you to run containerized applications on a serverless platform, in the sense that you do not have to worry about the underlying infrastructure.

The underlying infrastructure is Kubernetes (AKS) as the control plane with additional software such as:

  • Dapr: distributed application runtime to easily work with state, pub/sub and other Dapr building blocks
  • KEDA: Kubernetes event-driven autoscaler so you can use any KEDA supported scaler, in addition to scaling based on HTTP traffic, CPU and memory
  • Envoy: used to provide ingress functionality and traffic splitting for blue-green deployment, A/B testing, etc…

Your apps actually run on Azure Container Instances (ACI). ACI was always meant to be used as raw compute to build platforms with and this is a great use case.

Note: there is some discussion in the community whether ACI (via AKS virtual nodes) is used or not; I will leave it in for now but in the end, it does not matter too much as the service is meant to hide this complexity anyway

Azure Container Apps does not care about the runtime or programming model you use. Just use whatever feels most comfortable and package it as a container image.

In this post, we will deploy an application that uses Dapr to save state to Cosmos DB. Along the way, we will explain most of the concepts you need to understand to use Azure Container Apps in your own scenarios. The code I am using is on GitHub and written in Go.

Configure the Azure CLI

In this post, we will use the Azure CLI exclusively to perform all the steps. Instead of the Azure CLI, you can also use ARM templates or Bicep. If you want to play with a sample that deploys multiple container apps and uses Bicep, be sure to check out this great Azure sample.

You will need to have the Azure CLI installed and also add the Container Apps extension:

az extension add \
  --source https://workerappscliextension.blob.core.windows.net/azure-cli-extension/containerapp-0.2.0-py2.py3-none-any.whl

The extension allows you to use commands like az containerapp create and az containerapp update.

Create an environment

An environment runs one or more container apps. A container app can run multiple containers and can have revisions. If you know how Kubernetes works, each revision of a container app is actually a scaled collection of Kubernetes pods, using the scalers discussed above. Each revision can be thought of as a separate Kubernetes Deployment/ReplicaSet that runs a specific version of your app. Whenever you modify your app, depending on the type of modification, you get a new revision. You can have multiple active revisions and set traffic weights to distribute traffic as you wish.

Container apps, revisions, pods, and containers

Note that above, although you see multiple containers in a pod in a revision, that is not the most common use case. Most of the time, a pod will have only one application container. That is entirely up to you and the rationale behind using one or more containers is similar to multi-container pods in Kubernetes.

To create an environment, be sure to register or re-register the Microsoft.Web provider. That provider has the kubeEnvironments resource type, which represents a Container App environment.

az provider register --namespace Microsoft.Web

Next, create a resource group:

az group create --name rg-dapr --location northeurope

I have chosen North Europe here, but the location of the resource group does not really matter. What does matter is that you create the environment in either North Europe or Canada Central at this point in time (November 2021).

Every environment needs to be associated with a Log Analytics workspace. You can use that workspace later to view the logs of your container apps. Let’s create such a workspace in the resource group we just created:

az monitor log-analytics workspace create \
  --resource-group rg-dapr \
  --workspace-name dapr-logs

Next, we want to retrieve the workspace client id and secret. We will need that when we create the Container Apps environment. Commands below expect the use of bash:

LOG_ANALYTICS_WORKSPACE_CLIENT_ID=`az monitor log-analytics workspace show --query customerId -g rg-dapr -n dapr-logs --out tsv`
LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET=`az monitor log-analytics workspace get-shared-keys --query primarySharedKey -g rg-dapr -n dapr-logs --out tsv`

Now we can create the environment in North Europe:

az containerapp env create \
  --name dapr-ca \
  --resource-group rg-dapr \
  --logs-workspace-id $LOG_ANALYTICS_WORKSPACE_CLIENT_ID \
  --logs-workspace-key $LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET \
  --location northeurope

The Container App environment shows up in the portal like so:

Container App Environment in the portal

There is not a lot you can do in the portal, besides listing the apps in the environment. Provisioning an environment is extremely quick, in my case a matter of seconds.

Deploying Cosmos DB

We will deploy a container app that uses Dapr to write key/value pairs to Cosmos DB. Let’s deploy Cosmos DB:

uniqueId=$RANDOM
az cosmosdb create \
  --name dapr-cosmosdb-$uniqueId \
  --resource-group rg-dapr \
  --locations regionName='northeurope'

az cosmosdb sql database create \
    -a dapr-cosmosdb-$uniqueId \
    -g rg-dapr \
    -n dapr-db

az cosmosdb sql container create \
    -a dapr-cosmosdb-$uniqueId \
    -g rg-dapr \
    -d dapr-db \
    -n statestore \
    -p '/partitionKey' \
    --throughput 400

The above commands create the following resources:

  • A Cosmos DB account in North Europe: note that this uses session-level consistency (remember that for later in this post 😉)
  • A Cosmos DB database that uses the SQL API
  • A Cosmos DB container in that database, called statestore (can be anything you want)

In Cosmos DB Data Explorer, you should see:

statestore collection will be used as a State Store in Dapr

Deploying the Container App

We can use the following command to deploy the container app and enable Dapr on it:

az containerapp create \
  --name daprstate \
  --resource-group rg-dapr \
  --environment dapr-ca \
  --image gbaeke/dapr-state:1.0.0 \
  --min-replicas 1 \
  --max-replicas 1 \
  --enable-dapr \
  --dapr-app-id daprstate \
  --dapr-components ./components-cosmosdb.yaml \
  --target-port 8080 \
  --ingress external

Let’s unpack what happens when you run the above command:

  • A container app daprstate is created in environment dapr-ca
  • The container app will have an initial revision (revision 1) that runs one container in its pod; the container uses image gbaeke/dapr-state:1.0.0
  • We turn off scaling by setting min and max replicas to 1
  • We enable ingress with the type set to external. That configures a public IP address and DNS name to reach our container app on the Internet; Envoy proxy is used under the hood to achieve this; TLS is automatically configured but we do need to tell the proxy the port our app listens on (–target-port 8080)
  • Dapr is enabled and requires that our app gets a Dapr id (–enable-dapr and –dapr-app-id daprstate)

Because this app uses the Dapr SDK to write key/value pairs to a state store, we need to configure this. That is were the –dapr-components parameter comes in. The component is actually defined in a file components-cosmosdb.yaml:

- name: statestore
  type: state.azure.cosmosdb
  version: v1
  metadata:
    - name: url
      value: YOURURL
    - name: masterkey
      value: YOURMASTERKEY
    - name: database
      value: YOURDB
    - name: collection
      value: YOURCOLLECTION

In the file, the name of our state store is statestore but you can choose any name. The type has to be state.azure.cosmosdb which requires the use of several metadata fields to specify the URL to your Cosmos DB account, the key to authenticate, the database, and collection.

In the Go code, the name of the state store is configurable via environment variables or arguments and, by total coincidence, defaults to statestore 😉.

func main() {
	fmt.Printf("Welcome to super api\n\n")

	// flags
	... code omitted for brevity
	// State store name
	f.String("statestore", "statestore", "State store name")

The flag is used in the code that writes to Cosmos DB with the Dapr SDK (s.config.Statestore in the call to daprClient.SaveState below):

// write data to Dapr statestore
	ctx := r.Context()
	if err := s.daprClient.SaveState(ctx, s.config.Statestore, state.Key, []byte(state.Data)); err != nil {
		w.WriteHeader(http.StatusInternalServerError)
		fmt.Fprintf(w, "Error writing to statestore: %v\n", err)
		return
	} else {
		w.WriteHeader(http.StatusOK)
		fmt.Fprintf(w, "Successfully wrote to statestore\n")
	}

After running the az containerapp create command, you should see the following output (redacted):

{
  "configuration": {
    "activeRevisionsMode": "Multiple",
    "ingress": {
      "allowInsecure": false,
      "external": true,
      "fqdn": "daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io",
      "targetPort": 8080,
      "traffic": [
        {
          "latestRevision": true,
          "revisionName": null,
          "weight": 100
        }
      ],
      "transport": "Auto"
    },
    "registries": null,
    "secrets": null
  },
  "id": "/subscriptions/SUBID/resourceGroups/rg-dapr/providers/Microsoft.Web/containerApps/daprstate",
  "kind": null,
  "kubeEnvironmentId": "/subscriptions/SUBID/resourceGroups/rg-dapr/providers/Microsoft.Web/kubeEnvironments/dapr-ca",
  "latestRevisionFqdn": "daprstate--6sbsmip.politegrass-37c1a51f.northeurope.azurecontainerapps.io",
  "latestRevisionName": "daprstate--6sbsmip",
  "location": "North Europe",
  "name": "daprstate",
  "provisioningState": "Succeeded",
  "resourceGroup": "rg-dapr",
  "tags": null,
  "template": {
    "containers": [
      {
        "args": null,
        "command": null,
        "env": null,
        "image": "gbaeke/dapr-state:1.0.0",
        "name": "daprstate",
        "resources": {
          "cpu": 0.5,
          "memory": "1Gi"
        }
      }
    ],
    "dapr": {
      "appId": "daprstate",
      "appPort": null,
      "components": [
        {
          "metadata": [
            {
              "name": "url",
              "secretRef": "",
              "value": "https://ACCOUNTNAME.documents.azure.com:443/"
            },
            {
              "name": "masterkey",
              "secretRef": "",
              "value": "MASTERKEY"
            },
            {
              "name": "database",
              "secretRef": "",
              "value": "dapr-db"
            },
            {
              "name": "collection",
              "secretRef": "",
              "value": "statestore"
            }
          ],
          "name": "statestore",
          "type": "state.azure.cosmosdb",
          "version": "v1"
        }
      ],
      "enabled": true
    },
    "revisionSuffix": "",
    "scale": {
      "maxReplicas": 1,
      "minReplicas": 1,
      "rules": null
    }
  },
  "type": "Microsoft.Web/containerApps"
}

The output above gives you a hint on how to define the Container App in an ARM template. Note the template section. It defines the containers that are part of this app. We have only one container with default resource allocations. It is possible to set environment variables for your containers but there are none in this case. We will set one later.

Also note the dapr section. It defines the app’s Dapr id and the components it can use.

Note: it is not a good practice to enter secrets in configuration files as we did above. To fix that:

  • add a secret to the Container App in the az containerapp create command via the --secrets flag. E.g. --secrets cosmosdb='YOURCOSMOSDBKEY'
  • in components-cosmosdb.yaml, replace value: YOURMASTERKEY with secretRef: cosmosdb

The URL for the app is https://daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io. When I browse to it, I just get a welcome message: Hello from Super API on Container Apps.

Every revision also gets a URL. The revision URL is https://daprstate–6sbsmip.politegrass-37c1a51f.northeurope.azurecontainerapps.io. Of course, this revision URL gives the same result. Our app has only one revision.

Save state

The application has a /state endpoint you can post a JSON payload to in the form of:

{
  "key": "keyname",
  "data": "datatostoreinkey"
}

We can use curl to try this:

curl -v -H "Content-type: application/json" -d '{ "key": "cool","data": "somedata"}' 'https://daprstate.politegrass-37c1a51f.northeurope.azurecontainerapps.io/state'

Trying the curl command will result in an error because Dapr wants to use strong consistency with Cosmos DB and we configured it for session-level consistency. That is not very relevant for now as that is related to Dapr and not Container Apps. Switching the Cosmos DB account to strong consistency will fix the error.

Update the container app

Let’s see what happens when we update the container app. We will add an environment variable WELCOME to change the welcome message that the app displays. Run the following command:

az containerapp update \
  --name daprstate \
  --resource-group rg-dapr \
  --environment-variables WELCOME='Hello from new revision'

The template section in the JSON output is now:

"template": {
    "containers": [
      {
        "args": null,
        "command": null,
        "env": [
          {
            "name": "WELCOME",
            "secretRef": null,
            "value": "Hello from new revision"
          }
        ],
        "image": "gbaeke/dapr-state:1.0.0",
        "name": "daprstate",
        "resources": {
          "cpu": 0.5,
          "memory": "1Gi"
        }
      }
    ]

It is important to realize that, when the template changes, a new revision will be created. We now have two revisions, reflected in the portal as below:

Container App with two revisions

The new revision is active and receives 100% of the traffic. When we hit the / endpoint, we get Hello from new revision.

The idea here is that you deploy a new revision and test it before you make it active. Another option is to send a small part of the traffic to the new revision and see how that goes. It’s not entirely clear to me how you can automate this, including automated tests, similar to how progressive delivery controllers like Argo Rollouts and Flagger work. Tip to the team to include this! 😉

The az container app create and update commands can take a lot of parameters. Use az container app update –help to check what is supported. You will also see several examples.

Check the logs

Let’s check the container app logs that are sent to the Log Analytics workspace attached to the Container App environment. Make sure you still have the log analytics id in $LOG_ANALYTICS_WORKSPACE_CLIENT_ID:

az monitor log-analytics query   --workspace $LOG_ANALYTICS_WORKSPACE_CLIENT_ID   --analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'daprstate' | project ContainerAppName_s, Log_s, TimeGenerated | take 50"   --out table

This will display both logs from the application container and the Dapr logs. One of the log entries shows that the statestore was successfully initialized:

... msg="component loaded. name: statestore, type: state.azure.cosmosdb/v1"

Conclusion

We have only scratched the surface here but I hope this post gave you some insights into concepts such as environments, container apps, revisions, ingress, the use of Dapr and logging. There is much more to look at such as virtual network integration, setting up scale rules (e.g. KEDA), automated deployments, and much more… Stay tuned!

DNS Options for Private Azure Kubernetes Service

When you deploy Azure Kubernetes Service (AKS), by default the API server is publicly made available. That means it has a public IP address and an Azure-assigned name that’s resolvable by public DNS servers. To secure access, you can use authorized IP ranges.

As an alternative, you can deploy a private AKS cluster. That means the AKS API server gets an IP address in a private Azure virtual network. Most customers I work with use this option to comply with security policies. When you deploy a private AKS cluster, you still need a fully qualified domain name (FQDN) that resolves to the private IP address. There are several options you can use:

  • System (the default option): AKS creates a Private DNS Zone in the Node Resource Group; any virtual network that is linked to that Private DNS Zone can resolve the name; the virtual network used by AKS is automatically linked to the Private DNS Zone
  • None: default to public DNS; AKS creates a name for your cluster in a public DNS zone that resolves to the private IP address
  • Custom Private DNS Zone: AKS uses a Private DNS Zone that you or another team has created beforehand; this is mostly used in enterprise scenarios when the Private DNS Zones are integrated with custom DNS servers (e.g., on AD domain controllers, Infoblox, …)

The first two options, System and None, are discussed in the video below:

Overview of the 3 DNS options with a discussion of the first two: System and None

The third option, custom Private DNS Zone, is discussed in a separate video:

Private AKS with a custom Private DNS Zone

With the custom DNS option, you cannot use any name you like. The Private DNS Zone has to be like: privatelink.<region>.azmk8s.io. For instance, if you deploy your AKS cluster in West Europe, the Private DNS Zone’s name should be privatelink.westeurope.azmk8s.io. There is an option to use a subdomain as well.

When you use the custom DNS option, you also need to use a user-assigned Managed Identity for the AKS control plane. To make the registration of the A record in the Private DNS Zone work, in addition to linking the Private DNS Zone to the virtual network, the managed identity needs the following roles (at least):

  • Private DNS Zone Contributor role on the Private DNS Zone
  • Network Contributor role on the virtual network used by AKS

To deploy a private AKS cluster with a custom Private DNS Zone, you can use the following Azure CLI command which also sets the network plugin to azure (as an example). Private cluster also works with kubenet if you prefer that model. For other examples, see Create a private Azure Kubernetes Service cluster – Azure Kubernetes Service | Microsoft Docs.

az aks create \
    --resource-group RGNAME \
    --name aks-private \
    --network-plugin azure \
    --vnet-subnet-id "resourceId of AKS subnet" \
    --docker-bridge-address 172.17.0.1/16 \
    --dns-service-ip 10.3.0.10 \
    --service-cidr 10.3.0.0/24 \
    --enable-managed-identity \
    --assign-identity "resourceId of user-assigned managed identity" \
    --enable-private-cluster \
    --load-balancer-sku standard \
    --private-dns-zone "resourceId of Private DNS Zone"

The option that is easiest to use is the None option. You do not have to worry about Private DNS Zones and it just works. That option, at the time of this writing (June 2021) is still in preview and needs to be enabled on your subscription. In most cases though, I see enterprises go for the third option where the Private DNS Zones are created beforehand and integrated with custom DNS.

Approving a private endpoint connection with Azure CLI

In my previous post, I wrote about App Services with Private Link and used Azure Front Door to publish the web app. Azure Front Door Premium (in preview), can create a Private Endpoint and link it to your web app via Azure Private Link. When that happens, you need to approve the pending connection in Private Link Center.

The pending connection would be shown here, ready for approval

Although this is easy to do, you might want to automate this approval. Automation is possible via a REST API but it is easier via Azure CLI.

To do so, first list the private endpoint connections of your resource, in my case that is a web app:

az network private-endpoint-connection list --id /subscriptions/SUBID/resourceGroups/RGNAME/providers/Microsoft.Web/sites/APPSERVICENAME

The above command will return all private endpoint connections of the resource. For each connection, you get the following information:

 {
    "id": "PE CONNECTION ID",
    "location": "East US",
    "name": "NAME",
    "properties": {
      "ipAddresses": [],
      "privateEndpoint": {
        "id": "PE ID",
        "resourceGroup": "RESOURCE GROUP NAME OF PE"
      },
      "privateLinkServiceConnectionState": {
        "actionsRequired": "None",
        "description": "Please approve this connection.",
        "status": "Pending"
      },
      "provisioningState": "Pending"
    },
    "resourceGroup": "RESOURCE GROUP NAME OF YOUR RESOURCE",
    "type": "YOUR RESOURCE TYPE"
  }

To approve the above connection, use the following command:

az network private-endpoint-connection approve --id  PE CONNECTION ID --description "Approved"

The –id in the approve command refers to the private endpoint connection ID, which looks like below for a web app:

/subscriptions/YOUR SUB ID/resourceGroups/YOUR RESOURCE GROUP/providers/Microsoft.Web/sites/YOUR APP SERVICE NAME/privateEndpointConnections/YOUR PRIVATE ENDPOINT CONNECTION NAME

After running the above command, the connection should show as approved:

Approved private endpoint connection

When you automate this in a pipeline, you can first list the private endpoint connections of your resource and filter on provisioningState=”Pending” to find the ones you need to approve.

Hope it helps!

Azure Policy for Kubernetes: Contraints and ConstraintTemplates

In one on my videos on my YouTube channel, I talked about Kubernetes authentication and used the image below:

Securing access to the Kubernetes API Server

To secure access to the Kubernetes API server, you need to be authenticated and properly authorized to do what you need to do. The third mechanism to secure access is admission control. Simply put, admission control allows you to inspect requests to the API server and accept or deny the request based on rules you set. You will need an admission controller, which is just code that intercepts the request after authentication and authorization.

There is a list of admission controllers that are compiled-in with two special ones (check the docs):

  • MutatingAdmissionWebhook
  • ValidatingAdmissionWebhook

With the two admission controllers above, you can develop admission plugins as extensions and configure them at runtime. In this post, we will look at a ValidatingAdmissionWebhook that is used together with Azure Policy to inspect requests to the AKS API Server and either deny or audit these requests.

Note that I already have a post about Azure Policy and pod security policies here. There is some overlap between that post and this one. In this post, we will look more closely at what happens on the cluster.

Want a video instead?

Azure Policy

Azure has its own policy engine to control the Azure Resource Manager (ARM) requests you can make. A common rule in many organizations for instance is the prohibition of creation of expensive resources or even creating resources in unapproved regions. For example, a European company might want to only create resources in West Europe or North Europe. Azure Policy is the engine that can enforce such a rule. For more information, see Overview of Azure Policy. In short, you select from an ever growing list of policies or you create your own. Policies can be grouped in policy initiatives. A single policy or an initiative gets assigned to a scope, which can be a management group, a subscription or a resource group. In the portal, you then check for compliance:

Compliancy? What do I care? It’s just my personal subscription 😁

Besides checking for compliance, you can deny the requests in real time. There are also policies that can create resources when they are missing.

Azure Policy for Kubernetes

Although Azure Policy works great with Azure Resource Manager (ARM), which is basically the API that allows you to interact with Azure resources, it does not work with Kubernetes out of the box. We will need an admission controller (see above) that understands how to interpret Kubernetes API requests in addition to another component that can sync policies in Azure Policy to Kubernetes for the admission controller to pick up. There is a built-in list of supported Kubernetes policies.

For the admission controller, Microsoft uses Gatekeeper v3. There is a lot, and I do mean a LOT, to say about Gatekeeper and its history. We will not go down that path here. Check out this post for more information if you are truly curious. For us it’s enough to know that Gatekeeper v3 needs to be installed on AKS. In order to do that, we can use an AKS add-on. In fact, you should use the add-on if you want to work with Azure Policy. Installing Gatekeeper v3 on its own will not work.

Note: there are ways to configure Azure Policy to work with Azure Arc for Kubernetes and AKS Engine. In this post, we only focus on the managed Azure Kubernetes Service (AKS)

So how do we install the add-on? It is very easy to do with the portal or the Azure CLI. For all details, check out the docs. With the Azure CLI, it is as simple as:

az aks enable-addons --addons azure-policy --name CLUSTERNAME --resource-group RESOURCEGROUP

If you want to do it from an ARM template, just add the add-on to the template as shown here.

What happens after installing the add-on?

I installed the add-on without active policies. In kube-system, you will find the two pods below:

azure-policy and azure-policy-webhook

The above pods are part of the add-on. I am not entirely sure what the azure-policy-webhook does, but the azure-policy pod is responsible for checking Azure Policy for new assignments and translating that to resources that Gatekeeper v3 understands (hint: constraints). It also checks policies on the cluster and reports results back to Azure Policy. In the logs, you will see things like:

  • No audit results found
  • Schedule running
  • Creating constraint

The last line creates a constraint but what exactly is that? Constraints tell GateKeeper v3 what to check for when a request comes to the API server. An example of a constraint is that a container should not run privileged. Constraints are backed by constraint templates that contain the schema and logic of the constraint. Good to know, but where are the Gatekeeper v3 pods?

Gatekeeper pods in the gatekeeper-system namespace

Gatekeeper was automatically installed by the Azure Policy add-on and will work with the constraints created by the add-on, synced from Azure Policy. When you remove these pods, the add-on will install them again.

Creating a policy

Although you normally create policy initiatives, we will create a single policy and see what happens on the cluster. In Azure Policy, choose Assign Policy and scope the policy to the resource group of your cluster. In Policy definition, select Kubernetes cluster should not allow privileged containers. As discussed, that is one of the built-in policies:

Creating a policy that does not allow privileged containers

In the next step, set the effect to deny. This will deny requests in real time. Note that the three namespaces in Namespace exclusions are automatically added. You can add extra namespaces there. You can also specifically target a policy to one or more namespaces or even use a label selector.

Policy parameters

You can now select Review and create and then select Create to create the policy assignment. This is the result:

Policy assigned

Now we have to wait a while for the change to be picked up by the add-on on the cluster. This can take several minutes. After a while, you will see the following log entry in the azure-policy pod:

Creating constraint: azurepolicy-container-no-privilege-blablabla

You can see the constraint when you run k get constraints. The constraint is based on a constraint template. You can list the templates with k get constrainttemplates. This is the result:

constraint templates

With k get constrainttemplates k8sazurecontainernoprivilege -o yaml, you will find that the template contains some logic:

the template’s logic

The block of rego contains the logic of this template. Without knowing rego, which is the policy language used by Open Policy Agent (OPA) which is used by Gatekeeper v3 on our cluster, you can actually guess that the privileged field inside securityContext is checked. If that field is true, that’s a violation of policy. Although it is useful to understand more details about OPA and rego, Azure Policy hides the complexity for you.

Does it work?

Let’s try to deploy the following deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.14.2
          ports:
            - containerPort: 80
          securityContext:
            privileged: true

After running kubectl apply -f deployment.yaml, everything seems fine. But when we run kubectl get deploy:

Pods are not coming up

Let’s run kubectl get events:

Oops…

Notice that validation.gatekeeper.sh denied the request because privileged was set to true.

Adding more policies

Azure Security Center comes with a large initiative, Azure Security Benchmark, that also includes many Kubernetes policies. All of these policies are set to audit for compliance. On my system, the initiative is assigned at the subscription level:

Azure Security Benchmark assigned at subscription level with name Security Center

The Azure Policy add-on on our cluster will pick up the Kubernetes policies and create the templates and constraints:

Several new templates created

Now we have two constraints for k8sazurecontainernoprivilege:

Two constraints: one deny and the other audit

The new constraint comes from the larger initiative. In the spec, the enforcementAction is set to dryrun (audit). Although I do not have pods that violate k8sazurecontainernoprivilege, I do have pods that violate another policy that checks for host path mapping. That is reported back by the add-on in the compliance report:

Yes, akv2k8s maps to /etc/kubernetes on the host

Conclusion

In this post, you have seen what happens when you install the AKS policy add-on and enable a Kubernetes policy in Azure Policy. The add-on creates constraints and constraint templates that Gatekeeper v3 understands. The rego in a constraint template contains logic used to define the policy. When the policy is set to deny, Gatekeeper v3, which is an admission controller denies the request in real-time. When the policy is set to audit (or dry run at the constraint level), audit results are reported by the add-on to Azure Policy.

How to delete an stubborn Azure Virtual Hub

A while ago, I created an Azure Virtual WAN (Standard) and added a virtual hub. For some reason, the virtual hub ended up in the state below:

Image
Hub and routing status: Failed (Ouch!)

I tried to reset the router and virtual hub but to no avail. Next, I tried to delete the hub. In the portal, this resulted in a validating state that did not end. In the Azure CLI, an error was thrown.

Because this is a Microsoft Partner Network (MPN) subscription, I also did not have technical support or an easy way to enable it. I ended up buying Developer Support for a month just to open a service request.

The (helpful) support engineer asked me to do the following:

  • Open Azure Resource Explorer
  • Navigate to subscriptions and the resource group that contains the hub
  • Under providers, navigate to Microsoft.Network
  • Locate the virtual hub and do GET, EDIT, PUT (set Read/Write mode first)
After clicking GET and EDIT, PUT can be clicked

At first it did not seem to work but in my case, the PUT operation just took a very long time. After the PUT operation finished, I could delete the virtual hub from the portal.

Long story short: if you ever have a resource you cannot delete, give Azure Resource Explorer and the above procedure a try. Your mileage may vary though!

Integrate an Azure Storage Account with Active Directory

A customer with a Windows Virtual Desktop deployment needed access to several file shares for one of their applications. The integration of Azure Storage Accounts with Active Directory allows us to provide this functionality without having to deploy and maintain file services on a virtual machine.

A sketch of the environment looks something like this:

Azure File Share integration with Active Directory

Based on the sketch above, you should think about the requirements to make this work:

  • Clients that access the file share need to be joined to a domain. This can be an Azure Active Directory Domain Services (AADDS) managed domain or just plain old Active Directory Domain Services (ADDS). The steps to integrate the storage account with AADDS or AADS are different as we will see later. I will only look at ADDS integration via a PowerShell script. In this case, the WVD virtual machines are joined to ADDS and domain controllers are available on Azure.
  • Users that access the file share need to have an account in ADDS that is synced to Azure AD (AAD). This is required because users or groups are given share-level permissions at the storage account level to their AAD identity. The NTFS-level permissions are given to the ADDS identity. Since this is a Windows Virtual Desktop deployment, that is already the case.
  • You should think about how the clients (WVD here) connect to the file share. If you only need access from Azure subnets, then VNET Service Endpoints are a good choice. This will configure direct routing to the storage account in the subnet’s route table and also provides the necessary security as public access to the storage account is blocked. You could also use Private Link or just access the storage account via public access. I do not recommend the latter so either use service endpoints or private link.

Configuring the integration

In the configuration of the storage account, you will see the following options:

Storage account AD integration options

Integration with AADDS is just a click on Enabled. For ADDS integration however, you need to follow another procedure from a virtual machine that is joined to the ADDS domain.

On that virtual machine, log on with an account that can create a computer object in ADDS in an OU that you set in the script. For best results, the account you use should be synced to AAD and should have at least the Contributor role on the storage account.

Next, download the Microsoft provided scripts from here and unzip them in a folder like C:\scripts. You should have the following scripts and modules in there:

Scripts and PowerShell module for Azure Storage Account integration

Next, add a script to the folder with the following contents and replace the <PLACEHOLDERS>:

Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope CurrentUser

.\CopyToPSPath.ps1 

Import-Module -Name AzFilesHybrid

Connect-AzAccount

$SubscriptionId = "<YOUR SUB ID>"
$ResourceGroupName = "<YOUR RESOURCE GROUP>"
$StorageAccountName = "<YOUR STORAGE ACCOUNT NAME>"

Select-AzSubscription -SubscriptionId $SubscriptionId 

Join-AzStorageAccountForAuth `
        -ResourceGroupName $ResourceGroupName `
        -Name $StorageAccountName `
        -DomainAccountType "ComputerAccount" ` 
        -OrganizationalUnitName "<OU DISTINGUISHED NAME>

Debug-AzStorageAccountAuth -StorageAccountName $StorageAccountName -ResourceGroupName $ResourceGroupName -Verbose

Run the script from the C:\scripts folder so it can execute CopyToPSPath.ps1 and import the AzFilesHybrid module. The Join-AzStorageAccountForAuth cmdlet does the actual work. When you are asked to rerun the script, do so!

The result of the above script should be a computer account in the OU that you chose. The computer account has the name of the storage account.

In the storage account configuration, you should see the following:

The blurred section will show the domain name

Now we can proceed to granting “share-level” access rights, similar to share-level rights on a Windows file server.

Granting share-level access

Navigate to the file share and click IAM. You will see the following:

IAM on the file share level

Use + Add to add AAD users or groups. You can use the following roles:

  • Storage File Data SMB Share Reader: read access
  • Storage File Data SMB Share Contributor: read, write and delete
  • Storage File Data SMB Share Elevated Contributor: read, write and delete + modify ACLs at the NTFS level

For example, if I needed to grant read rights to the group APP_READERS in ADDS, I would grant the Storage File Data SMB Share Reader role to the APP_READERS group in Azure AD (synced from ADDS).

Like on a Windows File Server, share permissions are not sufficient. Let’s add some good old NTFS rights…

Granting NTFS Access Rights

For a smooth experience, log on to a domain-joined machine with an ADDS account that is synced to an AAD account that has at least the Contributor role on the storage account.

To grant the NTFS right, map a drive with the storage account key. Use the following command:

net use <desired-drive-letter>: \\<storage-account-name>.file.core.windows.net\<share-name> /user:Azure\<storage-account-name> <storage-account-key>

Get the storage account key from here:

Storage account access keys

Now you can open the mapped drive in Explorer and set NTFS rights. Alternatively, you can use icacls.exe or other tools.

Mapping the drive for users

Now that the storage account is integrated with ADDS, a user can log on to a domain-joined machine and mount the share without having to provide credentials. As long as the user has the necessary share and NTFS rights, she can access the data.

Mapping the drive can be done in many ways but a simple net use Z: \\storageaccountname.file.core.windows.net\share will suffice.

Securing the connection

You should configure the storage account in such a way that it only allows access from selected clients. In this case, because the clients are Windows Virtual Desktops in a specific Azure subnet, we can use Virtual Network Service Endpoints. They can be easily configured from Firewalls and Virtual Networks:

Access from selected networks only: 3 subnets in this case

Granting access to specific subnets results in the configuration of virtual network service endpoints and a modification of the subnet route table with a direct route to the storage account on the Microsoft network. Note that you are still connecting to the public IP of the storage account.

If you decide to use Private Link instead, you would get a private IP in your subnet that is mapped to the storage account. In that case, even on-premises clients could connect to the storage account over the VPN or ExpressRoute private peerings. Of course, using private link would require extra DNS configuration as well.

Some extra notes

  • when you cannot configure the integration with the PowerShell script, follow the steps to enable the integration manually; do not forget the set the Kerberos password properly
  • it is recommended to put the AD computer accounts that represent the storage accounts in a separate OU; enable a Group Policy on that OU that prevents password resets on the computer accounts

Conclusion

Although there is some work to be done upfront and there are requirements such as Azure AD and Azure AD Connect, using an Azure Storage Account to host Active Directory integrated file shares is recommended. Remember that it works with both AADDS and ADDS. In this post, we looked at ADDS only and integration via the Microsoft-provided PowerShell scripts.

Front Door with WordPress on Azure App Service

Here’s a quick overview of the steps you need to take to put Front Door in front of an Azure Web App. In this case, the web app runs a WordPress site.

Step 1: DNS

Suppose you deployed the Web App and its name is gebawptest.azurewebsites.net and you want to reach the site via wp.baeke.info. Traffic will flow like this:

user types wp.baeke.info ---CNAME to xyz.azurefd.net--> Front Door --- connects to gebawptest.azurewebsites.net using wp.baeke.info host header

It’s clear that later, in Front Door, you will have to specify the host header (wp.baeke.info in this case). More on that later…

If you have worked with Azure Web App before, you probably know you need to configure the host header sent by the browser as a custom domain on the web app. Something like this:

Custom domain in Azure Web App (no https configured – hence the red warning)

In this case, we do not want to resolve wp.baeke.info to the web app but to Front Door. To make the custom domain assignment work (because the web app will verify the custom name), add the following TXT record to DNS:

TXT awverify.wp gebawptest.azurewebsites.net 

For example in CloudFlare:

awverify txt record in CloudFlare DNS

With the above TXT record, I could easily add wp.baeke.info as a custom domain to the gebawptest.azurewebsites.net web app.

Note: wp.baeke.info is a CNAME to your Front Door domain (see below)

Step 2: Front Door

My Front Door designer looks like this:

Front Door designer

When you create a Front Door, you need to give it a name. In my case that is gebafd.azurefd.net. With wp.baeke.info as a CNAME for gebafd.azurefd.net, you can easily add wp.baeke.info as an additional Frontend host.

The backend pool is the Azure Web App. It’s configured as follows:

Front Door backend host (only one in the pool); could also have used the Azure App Service backend type

You should connect to the web app using its original name but send wp.baeke.info as the host header. This allows Front Door to connect to the web app correctly.

The last part of the Front Door config is a simple rule that connects the frontend wp.baeke.info to the backend pool using HTTP only.

Step 3: WordPress config

With the default Azure WordPress templates, you do not need to modify anything because wp-config.php contains the following settings:

define('WP_SITEURL', 'http://' . $_SERVER['HTTP_HOST'] . '/');                                                        define('WP_HOME', 'http://' . $_SERVER['HTTP_HOST'] . '/');

If you want, you can change this to:

define('WP_SITEURL', 'http://wp.baeke.info/' );                                                                         define('WP_HOME', 'http://wp.baeke.info/');   

Step 4: Blocking access from other locations

In general, you want users to only connect to the site via Front Door. To achieve this, add the following access restrictions to the Web App:

Access restrictions to only allow traffic from Front Door and Azure basic infrastructure services

AKS Azure Monitor metrics and alerts

In today’s post, we will take a quick look at Azure Kubernetes Service (AKS) metrics and alerts for Azure Monitor. Out of the box, Microsoft offers two ways to obtain metrics:

  • Metrics that can easily be used with Azure Monitor to generate alerts; these metrics are written to the Azure Monitor metrics store
  • Metrics forwarded to Log Analytics; with Log Analytics queries (KQL), you can generate alerts as well

In this post, we will briefly look at the metrics in the Azure Monitor metrics store. In the past, the AKS metrics in the metrics store were pretty basic:

Basic Azure Monitor metrics for AKS

Some time ago however, support for additional metrics was introduced:

insights.container/nodes metrics
insights.containers/pods metrics

Although you can find the above data in Log Analytics as well, it is just a bit easier to work with these metrics when they are in the metrics store. Depending on the age of your cluster, these metrics might not be enabled. Check this page to learn how to enable them: https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-update-metrics

When the metrics are enabled, it is easy to visualize them from the Metrics pane. Note that metrics can be split. The screenshot below shows the nodes count, split in Ready and NotReady:

Pretty uneventful… 2 nodes in ready state

To generate an alert based on the above metrics, a new alert rule can be generated. Although the New alert rule link is greyed out, you can create the alert from Azure Monitor:

Creating a alert on node count from Azure Monitor

And of course, when this fires you will see this in Azure Monitor:

Heeeeelp… node down
Details about the alert

Azure SQL Database High Availability

Creating a SQL Database in Azure is a simple matter: create the database and server, get your connection string and off you go! Before starting though, spend some time thinking about the level of high availability (HA) that you want:

  • What is the required level of HA within the deployment region (e.g. West Europe)?
  • Do you require failover to another region (e.g. from West Europe to North Europe)?

HA in a single region

To achieve the highest level of availability in a region, do the following:

  • Use the Premium (DTU) or Business Critical tier (vCore): Azure will use the premium availability model for your database
  • Enable Availability Zone support if the region supports it: copies of your database will be spread over the zones

The diagram below illustrates the premium availability model (from the Microsoft docs):

Premium Availability Model

The region will contain one primary read/write replica and several secondary replicas. Each replica uses local SSD storage. The database is replicated synchronously and failover is swift and without data loss. If required, you can enable a read replica and specify you want to connect to the read replica by adding ApplicationIntent=ReadOnly to the connection string. Great for reporting and BI!

Spreading the databases over multiple zones is as simple as checking a box. Availability zone support comes at no extra cost but can increase the write commit latency because the nodes are a few kilometers apart. The option to enable zone support is in the Configure section as shown below:

Enabling zone redundancy

To read more about high availability, including the standard availability model for other tiers, check out the docs.

For critical applications, we typically select the Premium/Business Critical model as it provides good performance coupled to the highest possible availability in a region.

Geo-replication

The geo-replication feature replicates a database asynchronously to another region. Suppose you have a database and server in West Europe that you want to replicate to France Central. In the portal, navigate to the database (not the server) and select Geo-Replication. Then check the region, in this case France Central. The following questions show up:

Geo-Replication

A database needs a logical server object that contains the database. To replicate the database to France Central, you need such a server object in that region. The UI above allows you to create that server.

Note that the databases need to use the same tier although the secondary can be configured with less DTUs or vCores. Doing so is generally not recommended.

After configuration, the UI will show the active replication. In this case, I am showing replication from North Europe to West Europe (and not France Central):

Geo-replication is easy to configure but in practice, we recommend to use Failover Groups. A Failover Group uses geo-replication under the hood but gives you some additional features such as:

  • Automated failover (vs manual with just geo-replication)
  • Two servers to use in your connection string that use CNAMEs that are updated in case of failover; one CNAME always points to the read/write replica, the other to the read replica

Failover groups are created at the server level instead of the database level:

Failover group

Below, there is a failover group aks-fo with the primary server in North Europe and the secondary in West Europe:

Failover group details

You can manually fail over the database if needed:

Failover and forced failover

Failover allows you to failover the database without data loss if both databases are still active. Forced failover performs a failover even if the primary is down, which might lead to data loss.

Note: when you configure a Failover Group, geo-replication will automatically be configured for you.

Connecting from an application

To connect to a database configured in a failover group, first get the failover group server names:

Read/write and read-only listener endpoints

Next, in your application, use the appropriate connection string. For instance, in Go:

var sqldb *sql.DB
var server = "aks-fo.database.windows.net"
var port = 1433
var user = "USERNAME"
var password = "PASSWORD"
var database = "DBNAME"

func init() {
    // Build connection string
    connString := fmt.Sprintf("server=%s;user id=%s;password=%s;port=%d;database=%s;",
        server, user, password, port, database)

    var err error

    // Create connection pool
    sqldb, err = sql.Open("sqlserver", connString)
    if err != nil {
        log.Fatal("Error creating connection pool: ", err.Error())
    }
    ctx := context.Background()

    //above commands actually do not connect to SQL but the ping below does
    err = sqldb.PingContext(ctx)
    if err != nil {
        log.Fatal(err.Error())
    }
    log.Printf("Connected!\n")
}

During a failover, there will be an amount of time that the database is not available. When that happens, the connection will fail. The error is shown below:

[db customers]: invalid response code 500, body: {"name":"fault","id":"7JPhcikZ","message":"Login error: mssql: Database 'aksdb' on server 'akssrv-ne' is not currently available.  Please retry the connection later.  If the problem persists, contact customer support, and provide them the session tracing ID of '6D7D70C3-D550-4A74-A69C-D689E6F5CDA6'.","temporary":false,"timeout":false,"fault":true}

Note: the Failover Group uses a CNAME record aks-fo.database.windows.net which resolves to the backend servers in either West or North Europe. Make sure you allow connections to these servers in the firewall or you will get the following error:

db customers]: invalid response code 500, body: {"name":"fault","id":"-p9TwZkm","message":"Login error: mssql: Cannot open server 'akssrv-ne' requested by the login. Client with IP address 'IP ADDRESS' is not allowed to access the server.  To enable access, use the Windows Azure Management Portal or run sp_set_firewall_rule on the master database to create a firewall rule for this IP address or address range.  It may take up to five minutes for this change to take effect.","temporary":false,"timeout":false,"fault":true} 

Conclusion

For the highest level of availability, use the regional premium availability model with Availability Zone support. In addition, use a Failover Group to enable replication of the database to another region. A Failover Group automatically connects your application to the primary (read/write) or secondary replica (read) via a CNAME and can failover automatically after some grace period.

Deploy AKS and Traefik with an Azure DevOps YAML pipeline

This post is a companion to the following GitHub repository: https://github.com/gbaeke/aks-traefik-azure-deploy. The repository contains ARM templates to deploy an AD integrated Kubernetes cluster and an IP address plus a Helm chart to deploy Traefik. Traefik is configured to use the deployed IP address. In addition to those files, the repository also contains the YAML pipeline, ready to be imported in Azure DevOps.

Let’s take a look at the different building blocks!

AKS ARM Template

The aks folder contains the template and a parameters file. You will need to modify the parameters file because it requires settings to integrate the AKS cluster with Azure AD. You will need to specify:

  • clientAppID: the ID of the client app registration
  • serverAppID: the ID of the server app registration
  • tenantID: the ID of your AD tenant

Also specify clientId, which is the ID of the service principal for your cluster. Both the serverAppID and the clientID require a password. The passwords have been set via a pipeline secret variable.

The template configures a fairly standard AKS cluster that uses Azure networking (versus kubenet). It also configures Log Analytics for the cluster (container insights).

Deploying the template from the YAML file is done with the task below. You will need to replace YOUR SUBSCRIPTION with an authorized service connection:

 # DEPLOY AKS IN TEST   
 - task: AzureResourceGroupDeployment@2
   inputs:
     azureSubscription: 'YOUR SUBSCRIPTION'
     action: 'Create Or Update Resource Group'
     resourceGroupName: '$(aksTestRG)'
     location: 'West Europe'
     templateLocation: 'Linked artifact'
     csmFile: 'aks/deploy.json'
     csmParametersFile: 'aks/deployparams.t.json'
     overrideParameters: '-serverAppSecret $(serverAppSecret) -clientIdsecret $(clientIdsecret) -clusterName $(aksTest)'
       deploymentMode: 'Incremental'
       deploymentName: 'CluTest' 

The task uses several variables like $(aksTestRG) etc… If you check azure-pipelines.yaml, you will notice that most are configured at the top of the file in the variables section:

variables:
  aksTest: 'clu-test'
  aksTestRG: 'rg-clu-test'
  aksTestIP: 'clu-test-ip' 

The two secrets are the secret 🔐 vaiables. Naturally, they are configured in the Azure DevOps UI. Note that there are other means to store and obtain secrets, such as Key Vault. In Azure DevOps, the secret variables can be found here:

Azure DevOps secret variables

IP Address Template

The ip folder contains the ARM template to deploy the IP address. We need to deploy the IP address resource to the resource group that holds the AKS agents. With the names we have chosen, that name is MC_rg-clu-test_clu-test_westeurope. It is possible to specify a custom name for the resource group.

Because we want to obtain the IP address after deployment, the ARM template contains an output:

 "outputs": {
        "ipaddress": {
            "type": "string",
            "value": "[reference(concat('Microsoft.Network/publicIPAddresses/', parameters('ipName')), '2017-10-01').ipAddress]"
        }
     } 

The output ipaddress is of type string. Via the reference template function we can extract the IP address.

The ARM template is deployed like the AKS template but we need to capture the ARM outputs. The last line of the AzureResourceGroupDeployment@2 that deploys the IP address contains:

deploymentOutputs: 'armoutputs'

Now we need to extract the IP address and set it as a variable in the pipeline. One way of doing this is via a bash script:

 - task: Bash@3
      inputs:
        targetType: 'inline'
        script: |
          echo "##vso[task.setvariable variable=test-ip;]$(echo '$(armoutputs)' | jq .ipaddress.value -r)" 

You can set a variable in Azure DevOps with echo ##vso[task.setvariable variable=variable_name;]value. In our case, the “value” should be the raw string of the IP address output. The $(armoutputs) variable contains the output of the IP address ARM template as follows:

{"ipaddress":{"type":"String","value":"IP ADDRESS"}}

To extract IP ADDRESS, we pipe the output of “echo $(armoutputs)” to js .ipaddress.value -r which extracts the IP ADDRESS from the JSON. The -r parameter removes double quotes from the IP ADDRESS to give us the raw string. For more info about jq, check https://stedolan.github.io/jq/ .

We now have the IP address in the test-ip variable, to be used in other tasks via $(test-ip).

Taking care of the prerequisites

In a later phase, we install Traefik via Helm. So we need kubectl and helm on the build agent. In addition, we need to install tiller on the cluster. Because the cluster is RBAC-enabled, we need a cluster account and a role binding as well. The following tasks take care of all that:

- task: KubectlInstaller@0
   inputs:
     kubectlVersion: '1.13.5'


- task: HelmInstaller@1
   inputs:
     helmVersionToInstall: '2.14.1'

- task: AzureCLI@1
  inputs:
    azureSubscription: 'YOUR SUB'
    scriptLocation: 'inlineScript'
    inlineScript: 'az aks get-credentials -g $(aksTestRG) -n $(aksTest) --admin'

 - task: Bash@3
   inputs:
     filePath: 'tiller/tillerconfig.sh'
     workingDirectory: 'tiller/' 

Note that we use the AzureCLI built-in task to easily obtain the cluster credentials for kubectl on the build agent. We use the –admin flag to gain full access. Note that this downloads sensitive information to the build agent temporarily.

The last task just runs a shell script to configure the service account and role binding and install tiller. Check the repository to see the contents of this simple script. Note that this is the quick and easy way to install tiller, not the most secure way! 🙇‍♂️

Install Traefik and use the IP address

The repository contains the downloaded chart (helm fetch stable/traefik –untar). The values.yaml file was modified to set the ingressClass to traefik-ext. We could have used the chart from the Helm repository but I prefer having the chart in source control. Here’s the pipeline task:

 - task: HelmDeploy@0
   inputs:
     connectionType: 'None'
     namespace: 'kube-system'
     command: 'upgrade'
     chartType: 'FilePath'
     chartPath: 'traefik-ext/.'
     releaseName: 'traefik-ext'
     overrideValues: 'loadBalancerIP=$(test-ip)'
     valueFile: 'traefik-ext/values.yaml' 

kubectl is configured to use the cluster so connectionType can be set to ‘None’. We simply specify the IP address we created earlier by setting loadBalancerIP to $(test-ip) with the overrides for values.yaml. This sets the loadBalancerIP setting in Traefik’s service definition (in the templates folder). Service.yaml in the templates folder contains the following section:

 spec:
  type: {{ .Values.serviceType }}
  {{- if .Values.loadBalancerIP }}
  loadBalancerIP: {{ .Values.loadBalancerIP }}
  {{- end }} 

Conclusion

Deploying AKS together with one or more public IP addresses is a common scenario. Hopefully, this post together with the GitHub repo gave you some ideas about automating these deployments with Azure DevOps. All you need to do is create a pipeline from the repo. Azure DevOps will read the azure-pipelines.yml file automatically.

%d bloggers like this: