Azure Policy: Kubernetes pod security baseline explained

When you deploy Azure Kubernetes Service (AKS) in an enterprise context, you will probably be asked about policies that can be applied to AKS for compliance and security. In this post, we will discuss Azure Policy for Kubernetes briefly and then proceed to explaining a group of policies that implement baseline security settings.

Azure Policy for Kubernetes

To apply policies to Kubernetes, Microsoft decided to integrate their existing Azure Policy solution with Gatekeeper v3. Gatekeeper is an admission controller webhook for Open Policy Agent (OPA). An admission controller webhook is a piece of software, running in Kubernetes, that can inspect incoming requests to the Kubernetes API server and decide to either allow or deny it. Open Policy Agent is a general solution for policy based control that goes way beyond just Kubernetes. It uses a language, called rego, that allows you to write policies that allow or deny requests. You can check the gatekeeper library for examples.

Although you can install Gatekeeper v3 on Kubernetes yourself, Microsoft provides an add-on to AKS that installs Gatekeeper for you. Be aware that you either install it yourself or let the add-on do it, but not both. The AKS add-on can be installed via the Azure CLI or an ARM template. It can also be enabled via the Azure Portal:

The policy add-on can easily be enabled and disabled via the Azure Portal; above it is enabled

When you enable the add-on, there will be an extra namespace on your cluster called gatekeeper-system. It contains the following workloads:

Gatekeeper v3 workloads

If, for some reason, you were to remove the above deployments, the add-on would add them back.

Enabling policies

Once the add-on is installed, you can enable Kubernetes policies via Azure Policy. Before we get started, keep in mind the following:

  • Policies can be applied at scale to multiple clusters: Azure Policy can be attached to resource groups, a subscription or management groups. When there are multiple AKS clusters at those levels, policy can be applied to all of those clusters
  • Linux nodes only
  • You can only use built-in policies provided by Azure

That last point is an important one. Microsoft provides several policies out of the box that are written with rego as discussed earlier. However, writing your own policies with rego is not supported.

Let’s add a policy initiative, which is just a fancy name for a group of policies. We will apply the policy to the resource group that contains my AKS cluster. From Azure Policy, click assignments:

Click Assign Initiative. The following screen is shown:

Above, the imitative will be linked to the rg-gitops-demo resource group. You can change the scope to the subscription or a management group as well. Click the three dots (…) next to Basics – Initiative definition. In the Search box, type kubernetes. You should see two initiatives:

We will apply the baseline standards. The restricted standards include extra policies. Click the baseline standards and click Select. A bit lower in the screen, make sure Policy Enforcement is enabled:

Now click Next. Because we want to deny the policy in real-time, select the deny effect:

Note that several namespaces are excluded by default. You can add namespaces here that you trust but run pods that will throw policy violations. On my cluster, there is a piece of software that will definitely cause a violation. You can now follow the wizard till the end and create the assignment. The assignment should be listed on the main Azure Policy screen:

You should now give Azure Policy some time to evaluate the policies. After a while, in the Overview screen, you can check the compliance state:

Above, you can see that the Kubernetes policies report non-compliance. In the next section, we will describe some of the policies in more detail.

Note that although these policies are set to deny, they will not kill existing workloads that violate the policy. If you were to kill a running pod that violates the policies, it will not come back up!

Important: in this article, we apply the default initiative. As a best practice however, you should duplicate the initiative. You can then change the policy parameters specific to your organization. For instance, you might want to allow host paths, allow capabilities and more. Host paths and capabilities are explained a bit more below.

Policy details

Let’s look at the non-compliant policy first, by clicking on the policy. This is what I see:

The first policy, Kubernetes cluster pod hostPath volumes should only use allowed host paths, results in non-compliance. This policy requires you to set the paths on the host that can be mapped to the pod. Because we did not specify any host paths, any pod that mounts a host path in any of the namespaces that policy applies too will generate a violation. In my case, I deployed Azure Key Vault to Kubernetes, which mounts the /etc/kubernetes/azure.json file. That file contains the AKS cluster service principal credentials! Indeed, the policy prohibits this.

To learn more about a policy, you can click it and then select View Definition. The definition in JSON will be shown. Close to the end of the JSON, you will find a link to a contraintTemplate:

When you click the link, you will find the rego behind this policy. Here is a snippet:

targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sazurehostfilesystem

        violation[{"msg": msg, "details": {}}] {
            volume := input_hostpath_volumes[_]
            allowedPaths := get_allowed_paths(input)
            input_hostpath_violation(allowedPaths, volume)
            msg := sprintf("HostPath volume %v is not allowed, pod: %v. Allowed path: %v", [volume, input.review.object.metadata.name, allowedPaths])
        }

Even if you have never worked with rego, it’s pretty clear that it checks an array of allowed paths and then checks for host paths that are not in the list. There are other helper functions in the template.

Let’s look at another policy, Do not allow privileged containers in Kubernetes cluster. This one is pretty clear. It prevents you from creating a pod that has privileged: true in its securityContext. Suppose you have the following YAML:

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  volumes:
  - name: sec-ctx-vol
    emptyDir: {}
  containers:
  - name: sec-ctx-demo
    image: busybox
    command: [ "sh", "-c", "sleep 1h" ]
    volumeMounts:
    - name: sec-ctx-vol
      mountPath: /data/demo
    securityContext:
      privileged: true

If you try to apply the above YAML, the following error will be thrown:

Oops, privileged: true is not allowed (don’t look at the capabilities yet 😀)

As you can see, because we set the initiative to deny, the requests are denied in real-time by the Gatekeeper admission controller!

Let’s look at one more policy: Kubernetes cluster containers should only use allowed capabilities. With this policy, you can limit the Linux capabilities that can be added to your pod. An example of a capability is NET_BIND_SERVICE, which allows you to bind to a port below 1024, something a non-root user cannot do. By default, there is an array of allowedCapabilities which is empty. In addition, there is an array of requiredDropCapabilities which is empty as well. Note that this policy does not impact the default capabilities you pods will get. It does impact the additional ones you want to add. For example, if you use the securityContext below, you are adding additional capabilities NET_ADMIN and SYS_TIME:

securityContext:
      capabilities:
        add: ["NET_ADMIN", "SYS_TIME"]

This is not allowed by the policy. You will get:

By checking the contraint policy of the other templates, it will be quite straightforward to see what the policy checks for.

Note: when I export the policy initiative to GitHub (preview feature) I do see the default capabilities; see the snippet below (capabilities match the list that Gatekeeper reports above)

"allowedCapabilities": {
      "value": [
       "CHOWN",
       "DAC_OVERRIDE",
       "FSETID",
       "FOWNER",
       "MKNOD",
       "NET_RAW",
       "SETGID",
       "SETUID",
       "SETFCAP",
       "SETPCAP",
       "NET_BIND_SERVICE",
       "SYS_CHROOT",
       "KILL",
       "AUDIT_WRITE"
      ]

Conclusion

In most cases, you will want to enable Azure Policy for Kubernetes to control what workloads can do. We have only scratched the surface here. Next to the two initiatives, there are several other policies to control things such as GitOps configurations, the creation of external load balancers, require pod requests and limits and much much more!

One thought on “Azure Policy: Kubernetes pod security baseline explained”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: