DNS Options for Private Azure Kubernetes Service

When you deploy Azure Kubernetes Service (AKS), by default the API server is publicly made available. That means it has a public IP address and an Azure-assigned name that’s resolvable by public DNS servers. To secure access, you can use authorized IP ranges.

As an alternative, you can deploy a private AKS cluster. That means the AKS API server gets an IP address in a private Azure virtual network. Most customers I work with use this option to comply with security policies. When you deploy a private AKS cluster, you still need a fully qualified domain name (FQDN) that resolves to the private IP address. There are several options you can use:

  • System (the default option): AKS creates a Private DNS Zone in the Node Resource Group; any virtual network that is linked to that Private DNS Zone can resolve the name; the virtual network used by AKS is automatically linked to the Private DNS Zone
  • None: default to public DNS; AKS creates a name for your cluster in a public DNS zone that resolves to the private IP address
  • Custom Private DNS Zone: AKS uses a Private DNS Zone that you or another team has created beforehand; this is mostly used in enterprise scenarios when the Private DNS Zones are integrated with custom DNS servers (e.g., on AD domain controllers, Infoblox, …)

The first two options, System and None, are discussed in the video below:

Overview of the 3 DNS options with a discussion of the first two: System and None

The third option, custom Private DNS Zone, is discussed in a separate video:

Private AKS with a custom Private DNS Zone

With the custom DNS option, you cannot use any name you like. The Private DNS Zone has to be like: privatelink.<region>.azmk8s.io. For instance, if you deploy your AKS cluster in West Europe, the Private DNS Zone’s name should be privatelink.westeurope.azmk8s.io. There is an option to use a subdomain as well.

When you use the custom DNS option, you also need to use a user-assigned Managed Identity for the AKS control plane. To make the registration of the A record in the Private DNS Zone work, in addition to linking the Private DNS Zone to the virtual network, the managed identity needs the following roles (at least):

  • Private DNS Zone Contributor role on the Private DNS Zone
  • Network Contributor role on the virtual network used by AKS

To deploy a private AKS cluster with a custom Private DNS Zone, you can use the following Azure CLI command which also sets the network plugin to azure (as an example). Private cluster also works with kubenet if you prefer that model. For other examples, see Create a private Azure Kubernetes Service cluster – Azure Kubernetes Service | Microsoft Docs.

az aks create \
    --resource-group RGNAME \
    --name aks-private \
    --network-plugin azure \
    --vnet-subnet-id "resourceId of AKS subnet" \
    --docker-bridge-address 172.17.0.1/16 \
    --dns-service-ip 10.3.0.10 \
    --service-cidr 10.3.0.0/24 \
    --enable-managed-identity \
    --assign-identity "resourceId of user-assigned managed identity" \
    --enable-private-cluster \
    --load-balancer-sku standard \
    --private-dns-zone "resourceId of Private DNS Zone"

The option that is easiest to use is the None option. You do not have to worry about Private DNS Zones and it just works. That option, at the time of this writing (June 2021) is still in preview and needs to be enabled on your subscription. In most cases though, I see enterprises go for the third option where the Private DNS Zones are created beforehand and integrated with custom DNS.

Approving a private endpoint connection with Azure CLI

In my previous post, I wrote about App Services with Private Link and used Azure Front Door to publish the web app. Azure Front Door Premium (in preview), can create a Private Endpoint and link it to your web app via Azure Private Link. When that happens, you need to approve the pending connection in Private Link Center.

The pending connection would be shown here, ready for approval

Although this is easy to do, you might want to automate this approval. Automation is possible via a REST API but it is easier via Azure CLI.

To do so, first list the private endpoint connections of your resource, in my case that is a web app:

az network private-endpoint-connection list --id /subscriptions/SUBID/resourceGroups/RGNAME/providers/Microsoft.Web/sites/APPSERVICENAME

The above command will return all private endpoint connections of the resource. For each connection, you get the following information:

 {
    "id": "PE CONNECTION ID",
    "location": "East US",
    "name": "NAME",
    "properties": {
      "ipAddresses": [],
      "privateEndpoint": {
        "id": "PE ID",
        "resourceGroup": "RESOURCE GROUP NAME OF PE"
      },
      "privateLinkServiceConnectionState": {
        "actionsRequired": "None",
        "description": "Please approve this connection.",
        "status": "Pending"
      },
      "provisioningState": "Pending"
    },
    "resourceGroup": "RESOURCE GROUP NAME OF YOUR RESOURCE",
    "type": "YOUR RESOURCE TYPE"
  }

To approve the above connection, use the following command:

az network private-endpoint-connection approve --id  PE CONNECTION ID --description "Approved"

The –id in the approve command refers to the private endpoint connection ID, which looks like below for a web app:

/subscriptions/YOUR SUB ID/resourceGroups/YOUR RESOURCE GROUP/providers/Microsoft.Web/sites/YOUR APP SERVICE NAME/privateEndpointConnections/YOUR PRIVATE ENDPOINT CONNECTION NAME

After running the above command, the connection should show as approved:

Approved private endpoint connection

When you automate this in a pipeline, you can first list the private endpoint connections of your resource and filter on provisioningState=”Pending” to find the ones you need to approve.

Hope it helps!

Azure App Services with Private Link

In one of my videos on my YouTube channel, I discuss Azure App Services with Private Link. The video describes how it works and provides an example of deploying the infrastructure with Bicep. The Bicep templates are on GitHub.

If you want to jump straight to the video, here it is:

In the rest of this blog post, I provide some more background information on the different pieces of the solution.

Azure App Service

Azure App Service is a great way to host web application and APIs on Azure. It’s PaaS (platform as a service), so you do not have to deal with the underlying Windows or Linux servers as they are managed by the platform. I often see AKS (Azure Kubernetes Service) implementations to host just a couple of web APIs and web apps. In most cases, that is overkill and you still have to deal with Kubernetes upgrades, node patching or image replacements, draining and rebooting the nodes, etc… And then I did not even discuss controlling ingress and egress traffic. Even if you standardize on packaging your app in a container, Azure App Service will gladly accept the container and serve it for you.

By default, Azure App Service gives you a public IP address and FQDN (Fully Qualified Domain Name) to reach your app securely over the Internet. The default name ends with azurewebsites.net but you can easily add custom domains and certificates.

Things get a bit more complicated when you want a private IP address for your app, reachable from Azure virtual networks and on-premises networks. One solution is to use an App Service Environment. It provides a fully isolated and dedicated environment to run App Service apps such as web apps and APIs, Docker containers and Functions. You can create an internal ASE which results in an Internal Load Balancer in front of your apps that is configured in a subnet of your choice. There is no need to configure Private Endpoints to make use of Private Link. This is often called native virtual network integration.

At the network level, an App Service Environment v2, works as follows:

External ASE
ASE networking (from Microsoft website)

Looking at the above diagram, an ILB ASE (but also an External ASE) also makes it easy to connect to back-end systems such as on-premises databases. The outbound connection to internal resources originates from an IP in the chosen integration subnet.

The downside to ASE is that its isolated instances (I1, I2, I3) are rather expensive. It also takes a long time to provision an ASE but that is less of an issue. In reality though , I would like to see App Service Environments go away and replaced by “regular” App Services with toggles that give you the options you require. You would just deploy App Services and set the options you require. In any case, native virtual network integration should not depend on dedicated or shared compute. One can only dream right? 😉

Note: App Service Environment v3, in preview at the time of this writing, provides a simplified deployment experience and also costs less. See App Service Environment v3 public preview – Azure App Service

As an alternative to an ASE for a private app, consider a non-ASE App Service that, in production, uses Premium V2 or V3 instances. The question then becomes: “How do you get a private IP address?” That’s where Private Link comes in…

Azure Private Link with App Service

Azure Private Link provides connectivity to Azure services (such as App Service) via a Private Endpoint. The Private Endpoint creates a virtual network interface card (NIC) on a subnet of your choice. Connections to the NICs IP address end up at the Private Link service the Private Endpoint is connected to. Below is an example with Azure SQL Database where one Private Endpoint is mapped, via Azure Private Link, to one database. The other databases are not reachable via the endpoint.

Private Endpoint connected to Azure SQL Database (PaaS) via Private Link (source: Microsoft website)

To create a regular App Service that is accessible via a private IP, we can do the same thing:

  • create a private endpoint in the subnet of your choice
  • connect the private endpoint to your App Service using Private Link

Both actions can be performed at the same time from the portal. In the Networking section of your App Service, click Configure your private endpoint connections. You will see the following screen:

Private Endpoint connection of App Service

Now click Add to create the Private Endpoint:

Creating the private endpoint

The above creates the private endpoint in the default subnet of the selected VNET. When the creation is finished, the private endpoint will be connected to App Service and automatically approved. There are scenarios, such as connecting private endpoints from other tenants, that require you to approve the connection first:

Automatically approved connection

When you click on the private endpoint, you will see the subnet and NIC that was created:

Private Endpoint

From the above, you can click the link to the network interface (NIC):

Network interface created by the private endpoint

Note that when your delete the Private Endpoint, the interface gets deleted as well.

Great! Now we have an IP address that we can use to reach the App Service. If you use the default name of the web app, in my case https://web-geba.azurewebsites.net, you will get:

Oops, no access on the public name (resolves to public IP)

Indeed, when you enable Private Link on App Service, you cannot access the website using its public IP. To solve this, you will need to do something at the DNS level. For the default domain, azurewebsites.net, it is recommended to use Azure Private DNS. During the creation of my Private Endpoint, I turned on that feature which resulted in:

Private DNS Zone for privatelink.azurewebsites.net

You might wonder why this is a private DNS zone for privatelink.azurewebsites.net? From the moment you enable private link on your web app, Microsoft modifies the response to the DNS query for the public name of your app. For example, if the app is web-geba.azurewebsites.net and you query DNS for that name, it will respond with a CNAME of web-geba.privatelink.azurewebsites.net. If that cannot be resolved, you will still get the public IP but that will result in a 403.

In my case, as long as the DNS servers I use can resolve web-geba.privatelink.azurewebsites.net and I can connect to 10.240.0.4, I am good to go. Note however that the DNS story, including Private DNS and your own DNS servers, is a bit more complex that just checking a box! However, that is not the focus of this blogpost so moving on… 😉

Note: you still need to connect to the website using https://web-geba.azurewebsites.net in your browser

Outbound connections to internal resources

One of the features of App Service Environments, is the ability to connect to back-end systems in Azure VNETs or on-premises. That is the result of native VNET integration.

When you enable Private Link on a regular App Service, you do not get that. Private Link only enables private inbound connectivity but does nothing for outbound. You will need to configure something else to make outbound connections from the Web App to resources such as internal SQL Servers work.

In the network configuration of you App Service, there is another option for outbound connectivity to internal resources – VNet integration.

VNET Integration

In the Networking section of App Service, find the VNet integration section and click Click here to configure. From there, you can add a VNet to integrate with. You will need to select a subnet in that VNet for this integration to work:

Outbound connectivity for App Service to Azure VNets

There are quite some things to know when it comes to VNet integration for App Service so be sure to check the docs.

Private Link with Azure Front Door

Often, a web app is made private because you want to put a Web Application Firewall (WAF) in front of the app. Typically, that goal is achieved by putting Azure Application Gateway (AG) with WAF in front of an internal App Services Environment. As as alternative to AG, you can also use virtual appliances such as Barracuda WAF for Azure. This works because the App Services Environment is a first-class citizen of your Azure virtual network.

There are multiple ways to put a WAF in front of a (non-ASE) App Service. You can use Front Door with the App Service as the origin, as long as you restrict direct access to the origin. To that end, App Services support access restrictions.

With Azure Front Door Premium, in preview at the time of this writing (June 2021), you can use Private Link as well. In that case, Azure Front Door creates a private endpoint. You cannot control or see that private endpoint because it is managed by Front Door. Because the private endpoint is not in your tenant, you will need to approve the connection from the private endpoint to your App Service. You can do that in multiple ways. One way is Private Link Center Pending Connections:

Pending Connections

If you check the video at the top of this page, this is shown here.

Conclusion

The combination of Azure networking with App Services Environments (ASE) and “regular” App Services (non-ASE) can be pretty confusing. You have native network integration for ASE, private access with private link and private endpoints for non-ASE, private DNS for private link domains, virtual network service endpoints, VNet outbound configuration for non-ASE etc… Most of the time, when I am asked for the easiest and most cost-effective option for a private web app in PaaS, I go for a regular non-ASE App Service and use Private Link to make the app accessible from the internal network.

Azure Policy for Kubernetes: Contraints and ConstraintTemplates

In one on my videos on my YouTube channel, I talked about Kubernetes authentication and used the image below:

Securing access to the Kubernetes API Server

To secure access to the Kubernetes API server, you need to be authenticated and properly authorized to do what you need to do. The third mechanism to secure access is admission control. Simply put, admission control allows you to inspect requests to the API server and accept or deny the request based on rules you set. You will need an admission controller, which is just code that intercepts the request after authentication and authorization.

There is a list of admission controllers that are compiled-in with two special ones (check the docs):

  • MutatingAdmissionWebhook
  • ValidatingAdmissionWebhook

With the two admission controllers above, you can develop admission plugins as extensions and configure them at runtime. In this post, we will look at a ValidatingAdmissionWebhook that is used together with Azure Policy to inspect requests to the AKS API Server and either deny or audit these requests.

Note that I already have a post about Azure Policy and pod security policies here. There is some overlap between that post and this one. In this post, we will look more closely at what happens on the cluster.

Want a video instead?

Azure Policy

Azure has its own policy engine to control the Azure Resource Manager (ARM) requests you can make. A common rule in many organizations for instance is the prohibition of creation of expensive resources or even creating resources in unapproved regions. For example, a European company might want to only create resources in West Europe or North Europe. Azure Policy is the engine that can enforce such a rule. For more information, see Overview of Azure Policy. In short, you select from an ever growing list of policies or you create your own. Policies can be grouped in policy initiatives. A single policy or an initiative gets assigned to a scope, which can be a management group, a subscription or a resource group. In the portal, you then check for compliance:

Compliancy? What do I care? It’s just my personal subscription 😁

Besides checking for compliance, you can deny the requests in real time. There are also policies that can create resources when they are missing.

Azure Policy for Kubernetes

Although Azure Policy works great with Azure Resource Manager (ARM), which is basically the API that allows you to interact with Azure resources, it does not work with Kubernetes out of the box. We will need an admission controller (see above) that understands how to interpret Kubernetes API requests in addition to another component that can sync policies in Azure Policy to Kubernetes for the admission controller to pick up. There is a built-in list of supported Kubernetes policies.

For the admission controller, Microsoft uses Gatekeeper v3. There is a lot, and I do mean a LOT, to say about Gatekeeper and its history. We will not go down that path here. Check out this post for more information if you are truly curious. For us it’s enough to know that Gatekeeper v3 needs to be installed on AKS. In order to do that, we can use an AKS add-on. In fact, you should use the add-on if you want to work with Azure Policy. Installing Gatekeeper v3 on its own will not work.

Note: there are ways to configure Azure Policy to work with Azure Arc for Kubernetes and AKS Engine. In this post, we only focus on the managed Azure Kubernetes Service (AKS)

So how do we install the add-on? It is very easy to do with the portal or the Azure CLI. For all details, check out the docs. With the Azure CLI, it is as simple as:

az aks enable-addons --addons azure-policy --name CLUSTERNAME --resource-group RESOURCEGROUP

If you want to do it from an ARM template, just add the add-on to the template as shown here.

What happens after installing the add-on?

I installed the add-on without active policies. In kube-system, you will find the two pods below:

azure-policy and azure-policy-webhook

The above pods are part of the add-on. I am not entirely sure what the azure-policy-webhook does, but the azure-policy pod is responsible for checking Azure Policy for new assignments and translating that to resources that Gatekeeper v3 understands (hint: constraints). It also checks policies on the cluster and reports results back to Azure Policy. In the logs, you will see things like:

  • No audit results found
  • Schedule running
  • Creating constraint

The last line creates a constraint but what exactly is that? Constraints tell GateKeeper v3 what to check for when a request comes to the API server. An example of a constraint is that a container should not run privileged. Constraints are backed by constraint templates that contain the schema and logic of the constraint. Good to know, but where are the Gatekeeper v3 pods?

Gatekeeper pods in the gatekeeper-system namespace

Gatekeeper was automatically installed by the Azure Policy add-on and will work with the constraints created by the add-on, synced from Azure Policy. When you remove these pods, the add-on will install them again.

Creating a policy

Although you normally create policy initiatives, we will create a single policy and see what happens on the cluster. In Azure Policy, choose Assign Policy and scope the policy to the resource group of your cluster. In Policy definition, select Kubernetes cluster should not allow privileged containers. As discussed, that is one of the built-in policies:

Creating a policy that does not allow privileged containers

In the next step, set the effect to deny. This will deny requests in real time. Note that the three namespaces in Namespace exclusions are automatically added. You can add extra namespaces there. You can also specifically target a policy to one or more namespaces or even use a label selector.

Policy parameters

You can now select Review and create and then select Create to create the policy assignment. This is the result:

Policy assigned

Now we have to wait a while for the change to be picked up by the add-on on the cluster. This can take several minutes. After a while, you will see the following log entry in the azure-policy pod:

Creating constraint: azurepolicy-container-no-privilege-blablabla

You can see the constraint when you run k get constraints. The constraint is based on a constraint template. You can list the templates with k get constrainttemplates. This is the result:

constraint templates

With k get constrainttemplates k8sazurecontainernoprivilege -o yaml, you will find that the template contains some logic:

the template’s logic

The block of rego contains the logic of this template. Without knowing rego, which is the policy language used by Open Policy Agent (OPA) which is used by Gatekeeper v3 on our cluster, you can actually guess that the privileged field inside securityContext is checked. If that field is true, that’s a violation of policy. Although it is useful to understand more details about OPA and rego, Azure Policy hides the complexity for you.

Does it work?

Let’s try to deploy the following deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.14.2
          ports:
            - containerPort: 80
          securityContext:
            privileged: true

After running kubectl apply -f deployment.yaml, everything seems fine. But when we run kubectl get deploy:

Pods are not coming up

Let’s run kubectl get events:

Oops…

Notice that validation.gatekeeper.sh denied the request because privileged was set to true.

Adding more policies

Azure Security Center comes with a large initiative, Azure Security Benchmark, that also includes many Kubernetes policies. All of these policies are set to audit for compliance. On my system, the initiative is assigned at the subscription level:

Azure Security Benchmark assigned at subscription level with name Security Center

The Azure Policy add-on on our cluster will pick up the Kubernetes policies and create the templates and constraints:

Several new templates created

Now we have two constraints for k8sazurecontainernoprivilege:

Two constraints: one deny and the other audit

The new constraint comes from the larger initiative. In the spec, the enforcementAction is set to dryrun (audit). Although I do not have pods that violate k8sazurecontainernoprivilege, I do have pods that violate another policy that checks for host path mapping. That is reported back by the add-on in the compliance report:

Yes, akv2k8s maps to /etc/kubernetes on the host

Conclusion

In this post, you have seen what happens when you install the AKS policy add-on and enable a Kubernetes policy in Azure Policy. The add-on creates constraints and constraint templates that Gatekeeper v3 understands. The rego in a constraint template contains logic used to define the policy. When the policy is set to deny, Gatekeeper v3, which is an admission controller denies the request in real-time. When the policy is set to audit (or dry run at the constraint level), audit results are reported by the add-on to Azure Policy.

AKS Pod Identity with the Azure SDK for Go

File:Go Logo Blue.svg - Wikimedia Commons

In an earlier post, I wrote about the use of AKS Pod Identity (Preview) in combination with the Azure SDK for Python. Although that works fine, there are some issues with that solution:

Vulnerabilities as detected by SNYK

In order to reduce the size of the image and reduce/remove the vulnerabilities, I decided to rewrite the solution in Go. Just like the Python app (with FastAPI), we will expose an HTTP endpoint that displays all resource groups in a subscription. We will use a specific pod identity that has the Contributor role at the subscription level.

If you are more into videos, here’s the video version:

The code

The code is on GitHub @ https://github.com/gbaeke/go-msi in main.go. The code is kept as simple as possible. It uses the following packages:

github.com/Azure/azure-sdk-for-go/profiles/latest/resources/mgmt/resources
github.com/Azure/go-autorest/autorest/azure/auth

The resources package is used to create a GroupsClient to work with resource groups (check the samples):

groupsClient := resources.NewGroupsClient(subID)

subID contains the subscription ID, which is retrieved via the SUBSCRIPTION_ID environment variable. The container requires that environment variable to be set.

To authenticate to Azure and obtain proper authorization, the auth package is used with the NewAuthorizerFromEnvironment() method. That method supports several authentication mechanisms, one of which is managed identities. When we run this code on AKS, the pods can use a pod identity as explained in my previous post, if the pod identity addon is installed and configured. To obtain the authorization:

authorizer, err := auth.NewAuthorizerFromEnvironment()

authorizer is then passed to groupsClient via:

groupsClient.Authorizer = authorizer

Now we can use groupsClient to iterate through the resource groups:

ctx := context.Background()
log.Println("Getting groups list...")
groups, err := groupsClient.ListComplete(ctx, "", nil)
if err != nil {
	log.Println("Error getting groups", err)
}

log.Println("Enumerating groups...")
for groups.NotDone() {
	groupList = append(groupList, *groups.Value().Name)
	log.Println(*groups.Value().Name)
	err := groups.NextWithContext(ctx)
	if err != nil {
		log.Println("error getting next group")
	}
}

Note that the groups are printed and added to the groups slice. We can now serve the groupz endpoint that lists the groups (yes, the groups are only read at startup 😀):

log.Println("Serving on 8080...")
http.HandleFunc("/groupz", groupz)
http.ListenAndServe(":8080", nil)

The result of the call to /groupz is shown below:

My resource groups mess in my test subscription 😀

Running the code in a container

We can now build a single statically linked executable with go build and package it in a scratch container. If you want to know if your executable is statically linked, run file on it (e.g. file myapp). The result should be like:

myapp: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

Here is the multi-stage Dockerfile:

# argument for Go version
ARG GO_VERSION=1.14.5

# STAGE 1: building the executable
FROM golang:${GO_VERSION}-alpine AS build

# git required for go mod
RUN apk add --no-cache git

# certs
RUN apk --no-cache add ca-certificates

# Working directory will be created if it does not exist
WORKDIR /src

# We use go modules; copy go.mod and go.sum
COPY ./go.mod ./go.sum ./
RUN go mod download

# Import code
COPY ./ ./


# Build the statically linked executable
RUN CGO_ENABLED=0 go build \
	-installsuffix 'static' \
	-o /app .

# STAGE 2: build the container to run
FROM scratch AS final

# copy compiled app
COPY --from=build /app /app

# copy ca certs
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# run binary
ENTRYPOINT ["/app"]

In the above Dockerfile, it is important to add the ca certificates to the build container and later copy them to the scratch container. The code will need to connect to https://management.azure.com and requires valid root CA certificates to do so.

When you build the container with the Dockerfile, it will result in a docker image of about 8.7MB. SNYK will not report any known vulnerabilities. Great success!

Note: container will run as root though; bad! 😀 Nico Meisenzahl has a great post on containerizing .NET Core apps which also shows how to configure the image to not run as root.

Let’s add some YAML

The GitHub repo contains a workflow that builds and pushes a container to GitHub container registry. The most recent version at the time of this writing is 0.1.1. The YAML file to deploy this container as part of a deployment is below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mymsi-deployment
  namespace: mymsi
  labels:
    app: mymsi
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mymsi
  template:
    metadata:
      labels:
        app: mymsi
        aadpodidbinding: mymsi
    spec:
      containers:
        - name: mymsi
          image: ghcr.io/gbaeke/go-msi:0.1.1
          env:
            - name: SUBSCRIPTION_ID
              value: SUBSCRIPTION ID
            - name: AZURE_CLIENT_ID
              value: APP ID OF YOUR MANAGED IDENTITY
            - name: AZURE_AD_RESOURCE
              value: "https://management.azure.com"
          ports:
            - containerPort: 8080

It’s possible to retrieve the subscription ID at runtime (as in the Python code) but I chose to just supply it via an environment variable.

For the above manifest to work, you need to have done the following (see earlier post):

  • install AKS with the pod identity add-on
  • create a managed identity that has the necessary Azure roles (in this case, enumerate resource groups)
  • create a pod identity that references the managed identity

In this case, the created pod identity is mymsi. The aadpodidbinding label does the trick to match the identity with the pods in this deployment.

Note that, although you can specify the AZURE_CLIENT_ID as shown above, this is not really required. The managed identity linked to the mymsi pod identity will be automatically matched. In any case, the logs of the nmi pod will reflect this.

In the YAML, AZURE_AD_RESOURCE is also specified. In this case, this is not required either because the default is https://management.azure.com. We need that resource to enumerate resource groups.

Conclusion

In this post, we looked at using the Azure SDK for Go together with managed identity on AKS, via the AAD pod identity addon. Similar to the Azure SDK for Python, the Azure SDK for Go supports managed identities natively. The difference with the Python solution is the size of the image and better security. Of course, that is an advantage stemming from the use of a language like Go in combination with the scratch image.

Managed Identity on Azure Arc Servers

Azure Arc – Azure Management | Microsoft Azure

When you install the Azure Arc agent on any physical or virtual server, either Windows or Linux, the machine suddenly starts living in a cloud world:

  • it appears in the Azure Portal
  • you can apply resource tags
  • you can check for security and regulatory compliance with Azure Policy
  • you can enable Update management
  • and much, much more…

Check Microsoft’s documentation for more information about Azure Arc for Servers to find out more. Below is a screenshot of such an Azure Arc-enabled Windows Server 2019 machine running on-premises with Insights enabled (on my laptop 😀):

Azure Arc-enabled Windows Server 2019

A somewhat lesser-known feature of Azure Arc is that these servers also have Managed Server Identity (MSI). After you have installed the Azure Arc agent, which normally installs to Program Files\AzureConnectedMachineAgent, two environment variables are set:

  • IMDS_ENDPOINT=http://localhost:40342
  • IDENTITY_ENDPOINT=http://localhost:40342/metadata/identity/oauth2/token

IMDS stands for Instance Metadata Service. On a regular Azure virtual machine, this service listens on the non-routable IP address of 169.254.169.254. On the virtual machine, you can make HTTP requests to that IP address without any issue. The traffic never leaves the virtual machine.

On an Azure Arc-enabled server, which can run anywhere, using the non-routable IP address is not feasible. Instead, the IMDS listens on a port on localhost as indicated by the environment variables.

The service can be used for all sorts of things. For example, I can make the following request (PowerShell):

Invoke-RestMethod -Headers @{"Metadata"="true"} -Method GET -Uri http://localhost:40342/metadata/instance?api-version=2020-06-01 | ConvertTo-Json

The result will be a JSON structure with most of the fields empty. That is not surprising since this is not an Azure VM and most fields are Azure-related (vmSize, fault domain, update domain, …). But it does show that the IMDS works, just like on a regular Azure VM.

Although there are many other things you can do, one of its most useful features is providing you with an access token to access Azure Resource Manager, Key Vault, or other services.

There are many ways to obtain an access token. The documentation contains an example in PowerShell that uses the environment variables and Invoke-WebRequest to get a token for https://management.azure.com.

A common requirement is code that needs to retrieve secrets from Azure Key Vault. Now we know that we can acquire a token via the IMDS, let’s see how we can do this with the Azure SDK for Python, which has full support for the IMDS on Azure Arc-enabled machines. The code below does the trick:

from azure.identity import ManagedIdentityCredential
from azure.keyvault.secrets import SecretClient

credentials = ManagedIdentityCredential()

secret_client = SecretClient(vault_url="https://gebakv.vault.azure.net", credential=credentials)
secret = secret_client.get_secret("notsecret")
print(secret.value)

Of course, you need Python installed with the following packages (use pip install):

  • azure-identity
  • azure-keyvault

Yes, the above code is all you need to use the managed identity of the Azure Arc-enabled server to authenticate to Key Vault and obtain the secret called notsecret. The functionality that makes the Python SDK work with Azure Arc can be seen here.

Of course, you need to make sure that the managed identity has the necessary access rights to Key Vault:

Managed Identity has Get permissions on Secrets

I have not looked at MSI Azure Arc support in the other SDKs but the Python SDK sure makes it easy!

Azure AD pod-managed identities in AKS revisited

A long time ago, I wrote a blog post about assigning managed identities to pods in Azure Kubernetes Services (AKS) to authenticate to Azure Storage. The implementation was based on the aad-pod-identity project on GitHub. You can look at the walkthrough to see how it worked.

Microsoft recently released a preview that enables you to turn on pod identity during cluster creation. It uses the same building blocks as before but makes it fully supported and part of AKS (although preview now). To create a basic cluster with pod identity enabled, you can use the following commands:

az group create -n RESOURCEGROUP -l LOCATION
az aks create -g RESOURCEGROUP -n CLUSTERNAME --enable-managed-identity --enable-pod-identity --network-plugin azure

Note: you need to use Azure CNI networking here; kubenet will not work

Before you deploy the cluster, make sure you follow the prerequisites in the documentation (Before you begin). At the time of writing (December 2020), the section in the documentation that tells you how to create the AKS cluster does not use the Azure CNI plugin. Make sure you add that!

What does –enable-pod-identity do?

When you use –enable-pod-identity, you should see nmi pods on your cluster in the kube-system namespace:

NMI pods

These pods are created from a DaemonSet so you will have one pod per cluster node (Linux nodes only ). When your application wants to use a managed identity, it does a request to the Instance Metadata Service (IMDS) endpoint which is 169.254.169.254. Requests to that IP address are intercepted by the NMI pods via iptables rules. The NMI pod that intercepts the request then makes an Azure AD Authentication Library (ADAL) request to Azure AD to obtain a token for the managed identity and returns it to your application.

Next to the NMI pods, other things are added as well, such as custom resource definitions. Some of those are discussed below.

How to request the token?

It’s great to know that the NMI pods intercept requests to the IMDS endpoint but how do you make such a request? I put together a small example in Python in the following git repository: https://github.com/gbaeke/python-msi. The code is in the rg-api folder in server.py:

from azure.identity import DefaultAzureCredential
from azure.mgmt.resource import ResourceManagementClient, SubscriptionClient
from fastapi import FastAPI

app = FastAPI()

try:
    credentials = DefaultAzureCredential()
    subscription_client = SubscriptionClient(credentials)
    subscription = next(subscription_client.subscriptions.list())
    subscription_id = subscription.subscription_id
    resource_client = ResourceManagementClient(credentials, subscription_id)
except:
    print("error obtaining credentials")

@app.get("/")
def read_root():
    groups=[]
    try:
        for resource_group in resource_client.resource_groups.list():
            groups.append(resource_group.name)
    except:
        print("error obtaining groups")
    
    return groups

The code does the following:

  • use the azure-identity Python library to obtain credentials via DefaultAzureCredential() function. Note that that function tries multiple authentication options. If you run the code on your local computer and you are logged on to Azure with the Azure CLI, it will also work
  • use the azure-mgmt-resource Python library to enumerate resource groups in the current subscription
  • create a very simple API with FastAPI to ask for the list of resource groups; we can use a kubectl port forward later to obtain the JSON response; if authentication fails, the call will return an empty list instead of HTTP errors as you normally would

On my system, this is the result of the call when pod identity is working:

A bunch of resource groups in my test subscription… messy as usual

The repo also contains a Dockerfile to build a container with the app. I built and pushed that container to Docker Hub as gbaeke/rgapi.

Creating and using the identity

If we want the pod that runs the above code to use a specific identity, we have to create the identity and then tell the pod to use it. To create the managed identity, use the following command:

 az identity create --resource-group  rg-clu-msi --name rgapi 

The output of this command contains an id field that we need in another command later. The result of the above command is a User Assigned Managed Identity called rgapi. I already granted the Contributor role at the subscription level.

User Assigned Managed Identity rgapi

Note that this has nothing to do with AKS. To create a pod identity to use in AKS, you will need to run another command:

az aks pod-identity add --resource-group rg-clu-msi --cluster-name clu-msi --namespace  rgapi  --name rgapi --identity-resource-id "id field from previous command" 

The above command creates a pod identity called rgapi in the namespace rgapi. This namespace will be created if it does not exist. You can see the pod identity by running the below command:

 kubectl get azureidentities.aadpodidentity.k8s.io

If you look inside such an object, you would find the reference to the managed identity by its resource id (the id field from earlier). There are other custom resource definitions used by pod identity that we will not bother with now.

Now we need to create a pod and associate it with the pod identity. You can do so with the following YAML:

apiVersion: v1
kind: Pod
metadata:
  name: rgapi
  namespace: rgapi
  labels:
    aadpodidbinding: rgapi
spec:
  containers:
  - name: rgapi
    image: gbaeke/rgapi
  nodeSelector:
    kubernetes.io/os: linux

The important bit above is the aadpodidbinding label which refers to the pod identity we created earlier. When the above pod gets scheduled, it will call out to the IMDS endpoint. You should see that in the logs of the NMI pod on the same node as your application pod. For example:

no clientID or resourceID in request. rgapi/rgapi has been matched with azure identity rgapi/rgapi
status (200) took 12677813 ns for req.method=GET reg.path=/metadata/identity/oauth2/token req.remote=10.240.0.36

The first line indicates that I did not specifically set a clientID in my request but that the request is matched to the rgapi identity. The second line shows the NMI pod requesting a token for the identity from the Azure AD token endpoint.

Great! We now have a pod running that can retrieve resource groups with our custom managed identity. We did not have to add credentials manually or grab them from Key Vault. Our pod automatically picks up the pod identity. 🎉

Conclusion

Although it is still not super simple (is identity ever simple really?), the new method to enable pod identities is a definite improvement. It is currently in preview so it should not be used in production. Once it goes GA however, you will have a fully supported method of using user assigned managed identity with your pods and use specific identities per pod following least privilege methods.

Azure Key Vault Provider for Secrets Store CSI Driver

In the previous post, I talked about akv2k8s. akv2k8s is a Kubernetes controller that synchronizes secrets and certificates from Key Vault. Besides synchronizing to a regular secret, it can also inject secrets into pods.

Instead of akv2k8s, you can also use the secrets store CSI driver with the Azure Key Vault provider. As a CSI driver, its main purpose is to mount secrets and certificates as storage volumes. Next to that, it can also create regular Kubernetes secrets that can be used with an ingress controller or mounted as environment variables. That might be required if the application was not designed to read the secret from the file system.

In the previous post, I used akv2k8s to grab a certificate from Key Vault, create a Kubernetes secret and use that secret with nginx ingress controller:

certificate in Key Vault ------akv2aks periodic sync -----> Kubernetes secret ------> nginx ingress controller

Let’s briefly look at how to do this with the secrets store CSI driver.

Installation

Follow the guide to install the Helm chart with Helm v3:

helm repo add csi-secrets-store-provider-azure https://raw.githubusercontent.com/Azure/secrets-store-csi-driver-provider-azure/master/charts
helm install csi-secrets-store-provider-azure/csi-secrets-store-provider-azure --generate-name

This will install the components in the current Kubernetes namespace.

Easy no?

Syncing the certificate

Following the same example as with akv2aks, we need to point at the certificate in Key Vault, set the right permissions, and bring the certificate down to Kubernetes.

You will first need to decide how to access Key Vault. You can use the managed identity of your AKS cluster or be more granular and use pod identity. If you have setup AKS with a managed identity, that is the simplest solution. You just need to grab the clientId of the managed identity like so:

az aks show -g <resource group> -n <aks cluster name> --query identityProfile.kubeletidentity.clientId -o tsv

Next, create a file with the content below and apply it to your cluster in a namespace of your choosing.

apiVersion: secrets-store.csi.x-k8s.io/v1alpha1
kind: SecretProviderClass
metadata:
  name: azure-gebakv
  namespace: YOUR NAMESPACE
spec:
  provider: azure
  secretObjects:
  - secretName: nginx-cert
    type: kubernetes.io/tls
    data:
    - objectName: nginx
      key: tls.key
    - objectName: nginx
      key: tls.crt
  parameters:
    useVMManagedIdentity: "true"
    userAssignedIdentityID: "CLIENTID YOU OBTAINED ABOVE" 
    keyvaultName: "gebakv"         
    objects:  |
      array:
        - |
          objectName: nginx
          objectType: secret        
    tenantId: "ID OF YOUR AZURE AD TENANT"

Compared to the akv2k8s controller, the above configuration is a bit more complex. In the parameters section, in the objects array, you specify the name of the certificate in Key Vault and its object type. Yes, you saw that correctly, the objectType actually has to be secret for this to work.

The other settings are self-explanatory: we use the managed identity, set its clientId and in keyvaultName we set the short name of our Key Vault.

The settings in the parameters section are actually sufficient to mount the secret/certificate in a pod. With the secretObjects section though, we can also ask for the creation of regular Kubernetes secrets. Here, we ask for a secret of type kubernetes.io/tls with name nginx-cert to be created. You need to explicitly set both the tls.key and the tls.crt value and correctly reference the objectName in the array.

The akv2k8s controller is simpler to use as you only need to point it to your certificate in Key Vault (and specify it’s a certificate, not a secret) and set a secret name. There is no need to set the different values in the secret.

Using the secret

The advantage of the secrets store CSI driver is that the secret is only mounted/created when an application requires it. That also means we have to instruct our application to mount the secret explicitly. You do that via a volume as the example below illustrates (part of a deployment):

spec:
      containers:
      - name: realtimeapp
        image: gbaeke/fluxapp:1.0.2
        volumeMounts:
          - mountPath: "/mnt/secrets-store"
            name: secrets-store-inline
            readOnly: true
        env:
        - name: REDISHOST
          value: "redis:6379"
        resources:
          requests:
            cpu: 25m
            memory: 50Mi
          limits:
            cpu: 150m
            memory: 150Mi
        ports:
        - containerPort: 8080
      volumes:
      - name: secrets-store-inline
        csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: "azure-gebakv"

In the above YAML, the following happens:

  • in volumes: we create a volume called secrets-store-inline and use the csi driver to mount the secrets we specified in the SecretProviderClass we created earlier (azure-gebakv)
  • in volumeMounts: we mount the volume on /mnt/secrets-store

Because we used secretObjects in our SecretProviderClass, this mount is accompanied by the creation of a regular Kubernetes secret as well.

When you remove the deployment, the Kubernetes secret will be removed instead of lingering behind for all to see.

Of course, the pods in my deployment do not need the mounted volume. It was not immediately clear to me how to avoid the mount but still create the Kubernetes secret (not exactly the point of a CSI driver 😀). On the other hand, there is a way to have the secret created as part of ingress controller creation. That approach is more useful in this case because we want our ingress controller to use the certificate. More information can be found here. In short, it roughly works as follows:

  • instead of creating and mounting a volume in your application pod, a volume should be created and mounted on the ingress controller
  • to do so, you modify the deployment of your ingress controller (e.g. ingress-nginx) with extraVolumes: and extraVolumeMounts: sections; depending on the ingress controller you use, other settings might be required

Be aware that you need to enable auto rotation of secrets manually and that it is an alpha feature at this point (December 2020). The akv2k8s controller does that for you out of the box.

Conclusion

Both the akv2k8s controller and the Secrets Store CSI driver (for Azure) can be used to achieve the same objective: syncing secrets, keys and certificates from Key Vault to AKS. In my experience, the akv2k8s controller is easier to use. The big advantage of the Secrets Store CSI driver is that it is a broader solution (not just for AKS) and supports multiple secret stores. Next to Azure Key Vault, it also supports Hashicorp’s Vault for example. My recommendation: for Azure Key Vault and AKS, keep it simple and try akv2k8s first!

Deploy and bootstrap your Kubernetes cluster with Azure DevOps and GitOps

A while ago, I published a post about deploying AKS with Azure DevOps with extras like Nginx Ingress, cert-manager and several others. An Azure Resource Manager (ARM) template is used to deploy Azure Kubernetes Service (AKS). The extras are installed with Helm charts and Helm installer tasks. I mainly use it for demo purposes but I often refer to it in my daily work as well.

Although this works, there is another approach that combines an Azure DevOps pipeline with GitOps. From a high level point of view, that works as follows:

  • Deploy AKS with an Azure DevOps pipeline: declarative and idempotent thanks to the ARM template; the deployment is driven from an Azure DevOps pipeline but other solutions such as GitHub Actions will do as well (push)
  • Use a GitOps tool to deploy the GitOps agents on AKS and bootstrap the cluster by pointing the GitOps tool to a git repository (pull)

In this post, I will use Flux v2 as the GitOps tool of choice. Other tools, such as Argo CD, are capable of achieving the same goal. Note that there are ways to deploy Kubernetes using GitOps in combination with the Cluster API (CAPI). CAPI is quite a beast so let’s keep this post a bit more approachable. 😉

Let’s start with the pipeline (YAML):

# AKS deployment pipeline
trigger: none

variables:
  CLUSTERNAME: 'CLUSTERNAME'
  RG: 'CLUSTER_RESOURCE_GROUP'
  GITHUB_REPO: 'k8s-bootstrap'
  GITHUB_USER: 'GITHUB_USER'
  KEY_VAULT: 'KEYVAULT_SHORTNAME'

stages:
- stage: DeployGitOpsCluster
  jobs:
  - job: 'Deployment'
    pool:
      vmImage: 'ubuntu-latest'
    steps: 
    # DEPLOY AKS
    - task: AzureResourceGroupDeployment@2
      inputs:
        azureSubscription: 'SUBSCRIPTION_REF'
        action: 'Create Or Update Resource Group'
        resourceGroupName: '$(RG)'
        location: 'YOUR LOCATION'
        templateLocation: 'Linked artifact'
        csmFile: 'aks/deploy.json'
        csmParametersFile: 'aks/deployparams.gitops.json'
        overrideParameters: '-clusterName $(CLUSTERNAME)'
        deploymentMode: 'Incremental'
        deploymentName: 'aks-gitops-deploy'
       
    # INSTALL KUBECTL
    - task: KubectlInstaller@0
      name: InstallKubectl
      inputs:
        kubectlVersion: '1.18.8'

    # GET CREDS TO K8S CLUSTER WITH ADMIN AND INSTALL FLUX V2
    - task: AzureCLI@1
      name: RunAzCLIScripts
      inputs:
        azureSubscription: 'AzureMPN'
        scriptLocation: 'inlineScript'
        inlineScript: |
          export GITHUB_TOKEN=$(GITHUB_TOKEN)
          az aks get-credentials -g $(RG) -n $(CLUSTERNAME) --admin
          msi="$(az aks show -n CLUSTERNAME -g CLUSTER_RESOURCE_GROUP | jq .identityProfile.kubeletidentity.objectId -r)"
          az keyvault set-policy --name $(KEY_VAULT) --object-id $msi --secret-permissions get
          curl -s https://toolkit.fluxcd.io/install.sh | sudo bash
          flux bootstrap github --owner=$(GITHUB_USER) --repository=$(GITHUB_REPO) --branch=main --path=demo-cluster --personal

A couple of things to note here:

  • The above pipeline contains several strings in UPPERCASE; replace them with your own values
  • GITHUB_TOKEN is a secret defined in the Azure DevOps pipeline and set as an environment variable in the last task; it is required for the flux bootstrap command to configure the GitHub repo (e.g. deploy key)
  • the AzureResourceGroupDeployment task deploys the AKS cluster based on parameters defined in deployparams.gitops.json; that file is in a private Azure DevOps git repo; I have also added them to the gbaeke/k8s-bootstrap repository for reference
  • The AKS deployment uses a managed identity versus a service principal with manually set client id and secret (recommended)
  • The flux bootstrap command deploys an Azure Key Vault to Kubernetes Secrets controller that requires access to Key Vault; the script in the last task retrieves the managed identity object id and uses az keyvault set-policy to grant get key permissions; if you delete and recreate the cluster many times, you will have several UNKNOWN access policies at the Key Vault level

The pipeline is of course short due to the fact that nginx-ingress, cert-manager, dapr, KEDA, etc… are all deployed via the gbaeke/k8s-bootstrap repo. The demo-cluster folder in that repo contains a source and four kustomizations:

  • source: reference to another git repo that contains the actual deployments
  • k8s-akv2k8s-kustomize.yaml: deploys the Azure Key Vault to Kubernetes Secrets controller (akv2k8s)
  • k8s-secrets-kustomize.yaml: deploys secrets via custom resources picked up by the akv2k8s controller; depends on akv2k8s
  • k8s-common-kustomize.yaml: deploys all components in the ./deploy folder of gbaeke/k8s-common (nginx-ingress, external-dns, cert-manager, KEDA, dapr, …)

Overall, the big picture looks like this:

Note that the kustomizations that point to ./akv2k8s and ./deploy actually deploy HelmReleases to the cluster. For instance in ./akv2k8s, you will find the following manifest:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: akv2k8s
  namespace: flux-system
spec:
  chart:
    spec:
      chart: akv2k8s
      sourceRef:
        kind: HelmRepository
        name: akv2k8s-repo
  interval: 5m0s
  releaseName: akv2k8s
  targetNamespace: akv2k8s

This manifest tells Flux to deploy a Helm chart, akv2k8s, from the HelmRepository source akv2k8s-repo that is defined as follows:

---
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
  name: akv2k8s-repo
  namespace: flux-system
spec:
  interval: 1m0s
  url: http://charts.spvapi.no/

It is perfectly valid to use a kustomization that deploys manifests that contain resources of kind HelmRelease and HelmRepository. In fact, you can even patch those via a kustomization.yaml file if you wish.

You might wonder why I deploy the akv2k8s controller first, and then deploy a secret with the following manifest (upercase strings to be replaced):

apiVersion: spv.no/v1
kind: AzureKeyVaultSecret
metadata:
  name: secret-sync 
  namespace: flux-system
spec:
  vault:
    name: KEYVAULTNAME # name of key vault
    object:
      name: SECRET # name of the akv object
      type: secret # akv object type
  output: 
    secret: 
      name: SECRET # kubernetes secret name
      dataKey: values.yaml # key to store object value in kubernetes secret

The external-dns chart I deploy in later steps requires configuration to be able to change DNS settings in Cloudflare. Obviously, I do not want to store the Cloudflare secret in the k8s-common git repo. One way to solve that is to store the secrets in Azure Key Vault and then grab those secrets and convert them to Kubernetes secrets. The external-dns HelmRelease can then reference the secret to override values.yaml of the chart. Indeed, that requires storing a file in Key Vault which is easy to do like so (replace uppercase strings):

az keyvault secret set --name SECRETNAME --vault-name VAULTNAME --file ./YOURFILE.YAML

You can call the secret what you want but the Kubernetes secret dataKey should be values.yaml for the HelmRelease to work properly.

There are other ways to work with secrets in GitOps. The Flux v2 documentation mentions SealedSecrets and SOPS and you are of course welcome to use that.

Take a look at the different repos I outlined above to see the actual details. I think it makes the deployment of a cluster and bootstrapping the cluster much easier compared to suing a bunch of Helm install tasks and manifest deployments in the pipeline. What do you think?

Docker without Docker: a look at Podman

I have been working with Docker for quite some time. More and more however, I see people switching to tools like Podman and Buildah and decided to give that a go.

I installed a virtual machine in Azure with the following Azure CLI command:

az vm create \
  	--resource-group RESOURCEGROUP \
  	--name VMNAME \
  	--image UbuntuLTS \
	--authentication-type password \
  	--admin-username azureuser \
  	--admin-password PASSWORD \
	--size Standard_B2ms

Just replace RESOURCEGROUP, VMNAME and PASSWORD with the values you want to use and you are good to go. Note that the above command results in Ubuntu 18.04 at the time of writing.

SSH into that VM for the following steps.

Installing Podman

Installation of Podman is easy enough. The commands below do the trick:

. /etc/os-release
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/Release.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y upgrade 
sudo apt-get -y install podman

You can find more information at https://podman.io/getting-started/installation.

Where Docker uses a client/server model, with a privileged Docker daemon and a docker client that communicates with it, Podman uses a fork/exec model. The container process is a child of the Podman process. This also means you do not require root to run a container which is great from a security and auditing perspective.

You can now just use the podman command. It supports the same arguments as the docker command. If you want, you can even create a docker alias for the podman command.

To check if everything is working, run the following command:

podman run hello-world

It will pull down the hello-world image from Docker Hub and display a message.

I wanted to start my gbaeke/nasnet container with podman, using the following command:

podman run  -p 80:9090 -d gbaeke/nasnet

Of course, the above command will fail. I am not running as root, which means I cannot bind a process to a port below 1024. There are ways to fix that but I changed the command to:

podman run  -p 9090:9090 -d gbaeke/nasnet

The gbaeke/nasnet container is large, close to 3 GB. Pulling the container from Docker Hub went fast but Podman took a very long time during the Storing signatures phase. While the command was running, I checked disk space on the VM with df and noticed that the machine’s disk was quickly filling up.

On WSL2 (Windows Subsystem for Linux), I did not have trouble with pulling large images. With the docker info command, I found that it was using overlay2 as the storage driver:

Docker on WSL2 uses overlay2

You can find more information about Docker and overlay2, see https://docs.docker.com/storage/storagedriver/overlayfs-driver/.

With podman, run podman info to check the storage driver podman uses. Look for graphDriverName in the output. In my case, podman used vfs. Although vfs is well supported and runs anywhere, it does full copies of layers (represented by directories on your filesystem) in the image which results in using a lot of diskspace. If the disk is not super fast, this will result in long wait times when pulling an image and waste of disk space.

Without getting bogged down in the specifics of the storage drivers and their pros and cons, I decided to switch Podman from vfs to fuse-overlayfs. Fuse stands for Filesystem in Userspace, so fuse-overlayfs is the implementation of overlayfs in userspace (using FUSE). It supports deduplication of layers which will result in less consumption of disk space. This should be very noticeable when pulling a large image.

IMPORTANT: remove the containers folder in ~/.local/share to clear out container storage before installing overlayfs. Use the command below;

rm -rf ~/.local/share/containers

Installing fuse-overlayfs

The installation instructions are at https://github.com/containers/fuse-overlayfs. I needed to use the static build because I am running Ubuntu 18.04. On newer versions of Ubuntu, you can use apt install libfuse3-dev.

It’s of no use here to repeat the static build steps. Just head over to the GitHub repo and follow the steps. When asked to clone the git repo, use the following command:

git clone https://github.com/containers/fuse-overlayfs.git

The final step in the instructions is to copy fuse-overlayfs (which was just built with buildah) to /usr/bin.

If you now run podman info, the graphDrivername should be overlay. There’s nothing you need to do to make that happen:

overlay storage driver with /usr/bin/fuse-overlayfs as the executable

When you now run the gbaeke/nasnet container, or any sufficiently large container, the process should be much smoother. I can still take a couple of minutes though. Note that at the end, your output will be somewhat like below:

Output from podman run -p 9090:9090 -d gbaeke/nasnet

Now you can run podman ps and you should see the running container:

gbaeke/nasnet container is running

Go to http://localhost:9090 and you should see the UI. Go ahead and classify an image! 😉

Conclusion

Installing and using Podman is easy, especially if you are familiar with Docker somewhat. I did have trouble with performance and disk storage with large images but that can be fixed by swapping out vfs with something like overlayfs. Be aware that there are many other options and that it is quite complex under the hood. But with the above steps, you should be good to go.

Will I use podman from now on? Probably not as Docker provides me all I need for now and a lot of tools I use are dependent on it.