Azure Kubernetes Service and Azure Firewall

Deploying Azure Kubernetes Service (AKS) is, like most other Kubernetes-as-a-service offerings such as those from DigitalOcean and Google, very straightforward. It’s either a few clicks in the portal or one or two command lines and you are finished.

Using these services properly and in a secure fashion is another matter though. I am often asked how to secure access to the cluster and its applications. In addition, customers also want visibility and control of incoming and outgoing traffic. Combining Azure Firewall with AKS is one way of achieving those objectives.

This post will take a look at the combination of Azure Firewall and AKS. It is inspired by this post by Dennis Zielke. In that post, Dennis provides all the necessary Azure CLI commands to get to the following setup:

AKS and Azure Firewall (from
https://medium.com/@denniszielke/setting-up-azure-firewall-for-analysing-outgoing-traffic-in-aks-55759d188039 by Dennis Zielke)

In what follows, I will keep referring to the subnet names and IP addresses as in the above diagram.

Azure Firewall

Azure Firewall is a stateful firewall, provided as a service with built-in high availability. You deploy it in a subnet of a virtual network. The subnet should have the name AzureFirewallSubnet. The firewall will get two IP addresses:

  • Internal IP: the first IP address in the subnet (here 10.0.3.4)
  • Public IP: a public IP address; in the above setup we will use it to provide access to a Kubernetes Ingress controller via a DNAT rule

As in the physical world, you will need to instruct systems to route traffic through the firewall. In Azure, this is done via a route table. The following route table was created:

Route table

In (1) a route to 0.0.0.0/0 is defined that routes to the private IP of the firewall. The route will be used when no other route applies! The route table is associated with just the aks-5-subnet (2), which is the subnet where AKS (with advanced networking) is deployed. It’s important to note that now, all external traffic originating from the Kubernetes cluster passes through the firewall.

When you compare Azure Firewall to the Network Virtual Appliances (NVAs) from vendors such as CheckPoint, you will notice that the capabilities are somewhat limited. On the flip side though, Azure Firewall is super simple to deploy when compared with a highly available NVA setup.

Before we look at the firewall rules, let’s take a look at the Kubernetes Ingress Controller.

Kubernetes Ingress Controller

In this example, I will deploy nginx-ingress as an Ingress Controller. It will provide access to HTTP-based workloads running in the cluster and it can route to various workloads based on the URL. I will deploy the nginx-ingress with Helm.

Think of an nginx-ingress as a reverse proxy. It receives http requests, looks at the hostname and path (e.g. mydomain.com/api/user) and routes the request to the appropriate Kubernetes service (e.g. the user service).

Diagram showing Ingress traffic flow in an AKS cluster
Ingress in Kubernetes (from Microsoft:
https://docs.microsoft.com/en-us/azure/aks/operator-best-practices-network#distribute-ingress-traffic )

Normally, the nginx-ingress service is accessed via an Azure external load balancer. Behind the scenes, this is the result of the service object having spec.type set to the value LoadBalancer. If we want external traffic to nginx-ingress to pass through the firewall, we will need to tell Kubernetes to create an internal load balancer via an annotation. Let’s do that with Helm. First, you will need to install tiller, the server-side component of Helm. Use the following procedure from the Microsoft documentation:

  • Create a service account for tiller: link
  • Configure tiller: link

With tiller installed, issue the following two commands:

kubectl create ns ingress 

helm install stable/nginx-ingress --namespace ingress --set controller.replicaCount=2 --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-internal"=true --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-internal-subnet"=ing-4-subnet

The second command installs nginx-ingress in the ingress namespace. The two –set parameters add the following annotations to the service object (yes I know, the Helm annotation parameters are ugly ūü§Ę):

service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/azure-load-balancer-internal-subnet: ing-4-subnet

This ensures an internal load balancer gets created. It gets created in the mc-* resource group that backs your AKS deployment:

Internal load balancer created by the Kubernetes cloud integration components

Note that Kubernetes creates the load balancer, including the rules and probes for port 80 and 443 as defined in the service object that comes with the Helm chart. The load balancer is created in the ing-4-subnet as instructed by the service annotation. Its private IP address is 10.0.4.4 as in the diagram at the top of this post

DNAT Rule to Load Balancer

To provide access to internal resources, Azure Firewall uses DNAT rules which stands for destination network address translation. The concept is simple: traffic to the firewall’s public IP on some port can be forwarded to an internal IP on the same or another port. In our case, traffic to the firewall’s public IP on port 80 and 443 is forwarded to the internal load balancer’s private IP on port 80 and 443. The load balancer will forward the request to nginx-ingress:

DNAT rule forwarding port 80 and 443 traffic to the internal load balancer

If the installation of nginx-ingress was successful, you should end up at the default back-end when you go to http://firewallPublicIP.

nginx-ingress default backend when browsing to public IP of firewall

If you configured Log Analytics and installed the Azure Firewall solution, you can look at the firewall logs. DNAT actions are logged and can be inspected:

Firewall logs via Log Analytics

Application and Network Rules

Azure Firewall application rules are rules that allow or deny outgoing HTTP/HTTPS traffic based on the URL. The following rules were defined:

Application rules

The above rules allow http and https traffic to destinations such as docker.io, cloudflare and more.

Note that another Azure Firewall rule type, network rules, are evaluated first. If a match is found, rule evaluation is stopped. Suppose you have these network rules:

Network rules

The above network rule allows port 22 and 443 for all sources and destinations. This means that Kubernetes can actually connect to any https-enabled site on the default port, regardless of the defined application rules. See rule processing for more information.

Threat Intelligence

This feature alerts on and/or denies network traffic coming from known bad IP addresses or domains. You can track this via Log Analytics:

Threat Intelligence Alerts and Denies on Azure Firewall

Above, you see denied port scans, traffic from botnets or brute force credentials attacks all being blocked by Azure Firewall. This feature is currently in preview.

Best Practices

The AKS documentation has a best practices section that discusses networking. It contains useful information about the networking model (Kubenet vs Azure CNI), ingresses and WAF. It does not, at this point in time (May 2019), desicribe how to use Azure Firewall with AKS. It would be great if that were added in the near future.

Here are a couple of key points to think about:

  • WAF (Web Application Firewall): Azure Firewall threat intelligence is not WAF; to enable WAF, there are several options:
    • you can enable mod_security in nginx_ingress
    • you can use Azure WAF or a 3rd party WAF
    • you can use cloud-native WAFs such as TwistLock (WAF is one of the features of this product; it also provides firewall and vulnerability assessment)
  • remote access to Kubernetes API: today, the API server is exposed via a public IP address; having the API server on a local IP will be available soon
  • remote access to Kubernetes hosts using SSH: only allow SSH on the private IP addresses; use a bastion host to enable connectivity

Conclusion

Azure Kubernetes Service (AKS) can be combined with Azure Firewall to control network traffic to and from your Kubernetes cluster. Log Analytics provides the dashboard and logs to report and alert on traffic patterns. Features such as threat intelligence provide an extra layer of defense. For HTTP/HTTPS workloads (so most workloads), you should complement the deployment with a WAF such as Azure Application Gateway or 3rd party.

Securing access to and from Azure Functions

I am often asked how to secure access to and from Azure Functions that are not running in an App Service Environment (ASE). An App Service Environment allows you to safeguard your apps in a subnet of your Azure Virtual Network. In a sense, it gives you a private deployment of Azure App Service that you can secure with Azure Firewall, Network Security Groups (NSGs) or Network Virtual Appliances (NVAs).

When you use Azure Functions in a regular App Service Plan or Premium plan, you will need to rely on Virtual Network Service Endpoints and App Service network integration to achieve similar results.

In this post, we will look at an example of an Azure Function, running in a Premium plan, that queries CosmosDB. We will restrict incoming traffic to the Azure Function from a subnet and only allow CosmosDB to be queried by the same Azure Function. Here’s a diagram:

Incoming Traffic

To restrict incoming traffic to the Azure Function, navigate to the Function App in the portal and select Networking in Platform Features. You will see the following screen:

Azure Functions network features

We will configure the inbound restrictions via Configure Access Restrictions. You can configure restrictions for both the Function App itself and the scm site:

From the moment you add rules, a Deny All rule will appear. In the above rules, I allowed my private IP and the default subnet in the virtual network. The second rule configures the service endpoints. When you open the properties of the subnet, you will see:

Service Endpoint of type Microsoft.Web

Great! When you try to access the function from any other location, you will get a 403 error from the Azure Functions front-end. So don’t expect a connection timeout like with regular network security rules.

Outgoing traffic

The example Azure Function uses an HTTP trigger and a Cosmos DB input (cosmos). Documents contain a name property. The query outputs the name found on the first document:

module.exports = async function (context, req, cosmos) {
context.log(cosmos);
context.res = {
body: "hello " + cosmos[0].name
}};

In order to secure access to Cosmos DB, two features were used:

  1. Azure Functions VNet Integration (VNet integration is currently in preview)
  2. Cosmos DB network service endpoints to restrict access to the subnet that provides the Azure Function hosts with an IP address

Configuring the VNet integration is straightforward, especially when compared to the old style of integration which required a VPN tunnel:

App Service (including Azure Functions) VNet integration

As you can see in the above screenshot, you delegate a subnet to the App Service hosts. In my case, that is subnet func-sec:

Subnet delegated to a service (Microsoft.Web/serverFarms)

The bottom of the screenshot shows the subnet is delegated to the Microsoft.Web/serverFarms service. That is the result of the VNet integration.

You can also see the subnet has service endpoints configured for Cosmos DB. That is the result of the Cosmos DB configuration below:

Service endpoint config in Cosmos DB

In Cosmos DB, an existing virtual network was added. I did not enable the Accept connections from within public Azure datacenters option.

When you remove the service endpoint and you run the Azure Function, the following error is thrown:

Unable to proceed with the request. Please check the authorization claims to ensure the required permissions to process the request. ActivityId: 03b2c11f-2b21-44c9-ab44-61b4864539fe, Microsoft.Azure.Documents.Common/2.2.0.0, Windows/10.0.14393 documentdb-netcore-sdk/2.2.0

Does it work from a VM in the default subnet?

If all went well, I should be able to call the Azure Function from the virtual machine in the default subnet. Let’s try with curl:

Yes, itsme!

The name field in the first document is set to itsme so it worked! Great, the function can be called from the default subnet. In case you are wondering about the use of -p in the ssh command: this virtual machine sat behind an Azure Firewall and the VM ssh port was exposed via a DNAT rule over a random port.

From another location, the following error is shown (wrapped around some HTML but this is the main error):

Error 403 - This web app is stopped

Conclusion

With virtual network service endpoints now available for most Azure PaaS (platform as a service) components, you can ensure those services are only accessed from intended locations. In this example, you saw how to secure access to Azure Functions and Cosmos DB. Service endpoints combined with the App Service VNet integration make it straightforward to secure a Function App end-to-end.

Revisiting Rancher

Several years ago, when we started our first adventures in the wonderful world of IoT, we created an application for visualizing real-time streams of sensor data. The sensor data came from custom-built devices that used 2G for connectivity. IoT networks and protocols such as SigFox, NB-IoT or Lora were not mainstream at that time. We leveraged what were then new and often preview-level Azure services such as IoT Hub, Stream Analytics, etc… The architecture was loosely based on lambda architecture with a hot and cold path and stateful window-based stream processing. Fun stuff!

Kubernetes already existed but had not taken off yet. Managed Kubernetes services such as Azure Kubernetes Service (AKS) weren’t a thing.

The application (end-user UI and management) was loosely based on a micro-services pattern and we decided to run the services as Docker containers. At that time, Karim Vaes, now a Program Manager for Azure Storage, worked at our company and was very enthusiastic about Rancher. , Rancher was still v1 and we decided to use it in combination with their own container orchestration framework called Cattle.

Our experience with Rancher was very positive. It was easy to deploy and run in production. The combination of GitHub, Shippable and the Rancher CLI made it extremely easy to deploy our code. Rancher, including Cattle, was very stable for our needs.

In recent years though, the growth of Kubernetes as a container orchestrator platform has far outpaced the others. Using an alternative orchestrator such as Cattle made less sense. Rancher 2.0 is now built around Kubernetes but maintains the same experience as earlier versions such as simple deployment and flexible configuration and management.

In this post, I will look at deploying Rancher 2.0 and importing an existing AKS cluster. This is a basic scenario but it allows you to get a feel for how it works. Indeed, besides deploying your cluster with Rancher from scratch (even on-premises on VMware), you can import existing Kubernetes clusters including managed clusters from Google, Amazon and Azure.

Installing Rancher

For evaluation purposes, it is best to just run Rancher on a single machine. I deployed an Azure virtual machine with the following properties:

  • Operating system: Ubuntu 16.04 LTS
  • Size: DS2v3 (2 vCPUs, 8GB of RAM)
  • Public IP with open ports 22, 80 and 443
  • DNS name: somename.westeurope.cloudapp.azure.com

In my personal DNS zone on CloudFlare, I created a CNAME record for the above DNS name. Later, when you install Rancher you can use the custom DNS name in combination with Let’s Encrypt support.

On the virtual machine, install Docker. Use the guide here. You can use the convenience script as a quick way to install Docker.

With Docker installed, install Rancher with the following command:

docker run -d --restart=unless-stopped -p 80:80 -p 443:443 \
rancher/rancher:latest --acme-domain your-custom-domain

More details about the single node installation can be found here. Note that Rancher uses etcd as a datastore. With the command above, the data will be in /var/lib/rancher inside the container. This is ok if you are just doing a test drive. In other cases, use external storage and mount it on /var/lib/rancher.

A single-node install is great for test and development. For production, use the HA install. This will actually run Rancher on Kubernetes. Rancher recommends a dedicated cluster in this scenario.

After installation, just connect https://your-custom-domain and provide a password for the default admin user.

Adding a cluster

To get started, I added an existing three-node AKS cluster to Rancher. After you add the cluster and turn on monitoring, you will see the following screen when you navigate to Clusters and select the imported cluster:

Dashboard for a cluster

To demonstrate the functionality, I deployed a 3-node cluster (1.11.9) with RBAC enabled and standard networking. After deployment, open up Azure Cloud shell and get your credentials:

az aks list -o table
az aks get-credentials -n cluster-name -g cluster-resource-group
kubectl cluster-info

The first command lists the clusters in your subscription, including their name and resource group. The second command configures kubectl, the Kubernetes command line admin tool, which is pre-installed in Azure Cloud Shell. To verify you are connected, the last command simply displays cluster information.

Now that the cluster is deployed, let’s try to import it. In Rancher, navigate to GlobalClusters and click Add Cluster:

Add cluster via Import

Click Import, type a name and click Create. You will get a screen with a command to run:

kubectl apply -f https://your-custom-dns/v3/import/somerandomtext.yaml

Back in the Azure Cloud Shell, run the command:

Running the command to prepare the cluster for import

Continue on in Rancher, the cluster will be added (by the components you deployed above):

Cluster appears in the list

Click on the cluster:

Top of the cluster dashboard

To see live metrics, you can click Enable Monitoring. This will install and configure Prometheus and Grafana. You can control several parameters of the deployment such as data retention:

Enabling monitoring

Notice that by default, persistent storage for Grafana and Prometheus is not configured.

Note: with monitoring enabled or not, you will notice the following error in the dashboard:

Controller manager and scheduler unhealthy?

The error is described here. In short, the components are probably healthy. The error is not related to a Rancher issue but an upstream Kubernetes issue.

When the monitoring API is ready, you will see live metrics and Grafana icons. Clicking on the Graphana icon next to Nodes gives you this:

Node monitoring with Prometheus and Grafana

Of course, Azure provides Container Insights for monitoring. The Grafana dashboards are richer though. On the other hand, querying and alerting on logs and metrics from Container Insights is powerful as well. You can of course enable them all and use the best of both worlds.

Conclusion

We briefly looked at Rancher 2.0 and how it can interact with a existing AKS cluster. An existing cluster is easy to add. Once it is added, adding monitoring is “easy peasy lemon squeezy” as my daughter would call it! ūüėČ As with Rancher 1.x, I am again pleasantly surprised at how Rancher is able to make complex matters simpler and more fun to work with. There is much more to explore and do of course. That’s for some follow-up posts!

Improving an Azure Function that writes IoT Hub data to TimescaleDB

In an earlier post, I used an Azure Function to write data from IoT Hub to a TimescaleDB hypertable on PostgreSQL. Although that function works for demo purposes, there are several issues. Two of those issues will be addressed in this post:

  1. the INSERT INTO statement used the NOW() function instead of the enqueuedTimeUtc field; that field is provided by IoT Hub and represents the time the message was enqueued
  2. the INSERT INTO query does not use upsert functionality; if for some reason you need to process the IoT Hub data again, you will end up with duplicate data; you code should be idempotent

Using enqueuedTimeUtc

Using the time the event was enqueued means we need to retrieve that field from the message that our Azure Function receives. The Azure Function receives outside information via two parameters: context and eventHubMessage. The enqueuedTimeUtc field is retrieved via the context variable: context.bindingData.enqueuedTimeUtc.

In the INSERT INTO statement, we need to use TIMESTAMP ‘UCT time’. In JavaScript, that results in the following:

'insert into conditions(time, device, temperature, humidity) values(TIMESTAMP \'' + context.bindingData.enqueuedTimeUtc + '\',\'' + eventHubMessage.device + '\' ...

Using upsert functionality

Before adding upsert functionality, add a unique constraint to the hypertable like so (via pgAdmin):

CREATE UNIQUE INDEX on conditions (time, device); 

It needs to be on time and device because the time field on its own is not guaranteed to be unique. Now modify the INSERT INTO statement like so:

'insert into conditions(time, device, temperature, humidity) values(TIMESTAMP \'' + context.bindingData.enqueuedTimeUtc + '\',\'' + eventHubMessage.device + '\',' + eventHubMessage.temperature + ',' + eventHubMessage.humidity + ') ON CONFLICT DO NOTHING'; 

Notice the ON CONFLICT clause? When any constraint is violated, we do nothing. We do not add or modify data, we leave it all as it was.

The full Azure Function code is below:

Azure Function code with IoT Hub enqueuedTimeUtc and upsert

Conclusion

The above code is a little bit better already. We are not quite there yet but the two changes make sure that the date of the event is correct and independent from when the actual processing is done. By adding the constraint and upsert functionality, we make sure we do not end up with duplicate data when we reprocess data from IoT Hub.

Multi-Tier Bitnami Grafana Stack on Azure

After seeing some tweets about Bitnami’s multi-tier Grafana Stack, I decided to give it a go. On the page describing the Grafana stack, there are several deployment offerings:

Grafana deployment offerings (Image: from Bitnami website)

I decided to use the multi-tier deployment, which deploys multiple Grafana nodes and a shared Azure Database for MariaDB.

On Azure, the Grafana stack is deployed via an Azure Resource Manager (ARM) template. You can easily find it via the Azure Marketplace:

Grafana multi-tier in Azure Marketplace

From the above page, click Create to start deploying the template. You will get a series of straightforward questions such as the resource group, the Grafana admin password, MariaDB admin password, virtual machine size, etc…

It will take about half an hour to deploy the template. When finished, you will find the following resources in the resource group you chose or created during deployment:

Deployed Grafana resources

Let’s take a look at the deployed resources. The database back-end is Azure Database for MariaDB server. The deployment uses a General Purpose, 2 vCore, 50GB database. The monthly cost is around ‚ā¨130.

The Grafana VMs are Standard D1 v2 virtual machines (can be changed). These two machine cost around ‚ā¨100 per month. By default, these virtual machines have a public IP that allows SSH access on port 22. To logon, use the password or public key you configured during deployment.

To access the Grafana portal, Bitnami used an Azure Application Gateway. They used the Standard tier (not WAF) with the Medium SKU size and three nodes. The monthly cost for this setup is around ‚ā¨140.

The public IP address of the front-end can be found in the list of resources (e.g. in my case, mygrafanaagw-ip). The IP address will have an associated DNS name in the form of
mygrafanaRANDOMTEXT-agw-dns.westeurope.cloudapp.azure.com. Simply connect to that URL to access your Grafana instance:

Grafana instance (after logging on and showing a simple dashboard

Naturally, you will want to access Grafana over SSL. That is something you will need to do yourself. For more information see this link.

It goes without saying that the template only takes care of deployment. Once deployed, you are responsible for the infrastructure! Security, backup, patching etc… is your responsibility!

Note that the template does not allow you to easily select the virtual network to deploy to. By default, the template creates a virtual network with address space 10.0.0.0/16. If you got some ARM templating skills, you can download the template right after validation but before deployment and modify it:

Downloading the template for modification

Conclusion

Setting up a multi-tier Grafana stack with Bitnami is very easy. Note that the cost of this deployment is around ‚ā¨370 per month though. Instead of deploying and managing Grafana yourself, you can also take a look at hosted offerings such as Grafana Cloud or Aiven Grafana.

AKS Managed Pod Identity and access to Azure Storage

When you need to access Azure Storage (or other Azure resources) from a container in AKS (Kubernetes on Azure), you have many options. You can put credentials in your code (nooooo!), pass credentials via environment variables, use Kubernetes secrets, obtain secrets from Key Vault and so on. Usually, the credentials are keys but you can also connect to a Storage Account with an Azure AD account. Instead of a regular account, you can use a managed identity that you set up specifically for the purpose of accessing the storage account or a specific container.

A managed identity is created as an Azure resource and will appear in the resource group where it was created:

User assigned managed identity

This managed identity can be created from the Azure Portal but also with the Azure CLI:

az identity create -g storage-aad-rg -n demo-pod-id -o json 

The managed identity can subsequently be granted access rights, for instance, on a storage account. Storage accounts now also support Azure AD accounts (in preview). You can assign roles such as Blob Data Reader, Blob Data Contributor and Blob Data Owner. The screenshot below shows the managed identity getting the Blob Data Reader role on the entire storage account:

Granting the managed identity access to a storage account

When you want to use this specific identity from a Kubernetes pod, you can use the aad-pod-identity project. Note that this is an open source project and that it is not quite finished. The project’s README contains all the instructions you need but here are the highlights:

  • Deploy the infrastructure required to support managed identities in pods; these are the MIC and NMI containers plus some custom resource definitions (CRDs)
  • Assign the AKS service principle the role of Managed¬†Identity¬†Operator over the scope of the managed identity created above (you would use the resource id of the managed identity in the scope such as ¬†/subscriptions/YOURSUBID/resourcegroups/YOURRESOURCEGROUP/providers/Microsoft.ManagedIdentity/userAssignedIdentities/YOURMANAGEDIDENTITY
  • Define the pod identity via the AzureIdentity custom resource definition (CRD); in the YAML file you will refer to the managed identity by its resource id (/subscr…) and client id
  • Define the identity binding via the AzureIdentityBinding custom resource definition (CRD); in the YAML file you will setup a selector that you will use later in a pod definition to associate the managed identity with the pod; I defined a selector called myapp

Here is the identity definition (uses one of the CRDs defined earlier):

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
name: aks-pod-id
spec:
type: 0
ResourceID: /subscriptions/SUBID/resourcegroups/RESOURCEGROUP/providers/Microsoft.ManagedIdentity/userAssignedIdentities/demo-pod-id
ClientID: c35040d0-f73c-4c4e-a376-9bb1c5532fda

And here is the binding that defines the selector (other CRD defined earlier):

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
name: aad-identity-binding
spec:
AzureIdentity: aks-pod-id
Selector: myapp

Note that the installation of the infrastructure containers depends on RBAC being enabled or not. To check if RBAC is enabled on your AKS cluster, you can use https://resources.azure.com and search for your cluster. Check for the enableRBAC. In my cluster, RBAC was enabled:

Yep, RBAC enabled so make sure you use the RBAC YAML files

With everything configured, we can spin up a container with a label that matches the selector defined earlier:

apiVersion: v1
kind: Pod
metadata:
name: ubuntu
labels:
aadpodidbinding: myapp
spec:
containers:
name: ubuntu
image: ubuntu:latest
command: [ "/bin/bash", "-c", "--"]
args: [ "while true; do sleep 30; done;"]

Save the above to a file called ubuntu.yaml and use kubectl apply -f ubuntu.yaml to launch the pod. The pod will keep running because of the forever while loop. The pod can use the managed identity because of the aadpodidbinding label of myapp. Next, get a shell to the container:

kubectl exec -it ubuntu /bin/bash

To check if it works, we have to know how to obtain an access token (which is a JWT or JSON Web Token). We can obtain it via curl. First use apt-get update and then use apt-get install curl to install it. Then issue the following command to obtain a token for https://azure.storage.com:

curl 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com%2F' -H Metadata:true -s

TIP: if you are not very familiar with curl, use https://curlbuilder.com. As a precaution, do not paste your access token in the command builder.

The request to 169.254.169.254 goes to the Azure Instance Metadata Service which provides, among other things, an API to obtain a token. The result will be in the following form:

{"access_token":"THE ACTUAL ACCESS TOKEN","refresh_token":"", "expires_in":"28800","expires_on":"1549083688","not_before":"1549054588","resource":"https://storage.azure.com/","token_type":"Bearer"

Note that many of the SDKs that Microsoft provides, have support for managed identities baked in. That means that the SDK takes care of calling the Instance Metadata Service for you and presents you a token to use in subsequent calls to Azure APIs.

Now that you have the access token, you can use it in a request to the storage account, for instance to list containers:

curl -XGET -H 'Authorization: Bearer THE ACTUAL ACCESS TOKEN' -H 'x-ms-version: 2017-11-09' -H "Content-type: application/json" 'https://storageaadgeba.blob.core.windows.net/?comp=list 

The result of the call is some XML with the container names. I only had a container called test:

OMG… XML

Wrap up

You have seen how to bind an Azure managed identity to a Kubernetes pod running on AKS. The aad-pod-identity project provides the necessary infrastructure and resources to bind the identity to a pod using a label in its YAML file. From there, you can work with the managed identity as you would on a virtual machine, calling the Instance Metadata Service to obtain the token (a JWT). Once you have the token, you can include it in REST calls to the Azure APIs by adding an authorization header. In this post we have used the storage APIs as an example.

Note that Microsoft has AKS Pod Identity marked as in development on the updates site. I am not aware if this is based on the aad-pod-identity project but it does mean that the feature will become an official part of AKS pretty soon!

Virtual Node support in Azure Kubernetes Service (AKS)

Although I am using Kubernetes a lot, I didn’t quite get to trying the virtual nodes support. Virtual nodes is basically the implementation on AKS of the virtual kubelet project. The virtual kubelet project allows Kubernetes nodes to be backed by other services that support containers such as AWS Fargate, IoT Edge, Hyper.sh or Microsoft’s ACI (Azure Container Instances). The idea is that you spin up containers using the familiar Kubernetes API but on services like Fargate and ACI that can instantly scale and only charge you for the seconds the containers are running.

As expected, the virtual nodes support that is built into AKS uses ACI as the backing service. To use it, you need to deploy Kubernetes with virtual nodes support. Use either the CLI or the Azure Portal:

  • CLI: uses the Azure CLI on your machine or from cloud shell
  • Portal: uses the Azure Portal

Note that virtual nodes for AKS are currently in preview. Virtual nodes require AKS to be configured with the advanced network option. You will need to provide a subnet for the virtual nodes that will be dedicated to ACI. The advanced networking option gives you additional control about IP ranges but also allows you to deploy a cluster in an existing virtual network. Note that advanced networking results in the use of the Azure Virtual Network container network interface. Each pod on a regular host gets its own IP address on the virtual network. You will see them in the network as connected devices:

Connected devices on the Kubernetes VNET (includes pods)

In contrast, the containers you will create in the steps below will not show up as connected devices since they are managed by ACI which works differently.

Ok, go ahead and deploy a Kubernetes cluster or just follow along. After deployment, kubectl get nodes will show a result similar to the screenshot below:

kubectl get nodes output with virtual node

With the virtual node online, we can deploy containers to it. Let’s deploy the ONNX ResNet50v2 container from an earlier post and scale it up. Create a YAML file like below and use kubectl apply -f path_to_yaml to create a deployment:

 apiVersion: apps/v1
kind: Deployment
metadata:
name: resnet
spec:
replicas: 1
selector:
matchLabels:
app: resnet
template:
metadata:
labels:
app: resnet
spec:
containers:
- name: onnxresnet50v2
image: gbaeke/onnxresnet50v2
ports:
- containerPort: 5001
resources:
requests:
cpu: 1
limits:
cpu: 1
nodeSelector:
kubernetes.io/role: agent
beta.kubernetes.io/os: linux
type: virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Exists
- key: azure.com/aci
effect: NoSchedule

With the nodeSelector, you constrain a pod to run on particular nodes in your cluster. In this case, we want to deploy on a host of type virtual-kubelet. With the toleration, you specify that the container can be scheduled on the hosts that match the specified taints. There are two taints here, virtual-kubelet.io/provider and azure.com/aci which are applied to the virtual kubelet node.

After executing the above YAML, I get the following result after kubectl get pods -o wide:

The pod is pending on node virtual-node-aci-linux

After a while, the pod will be running, but it’s actually just a container on ACI.

Let’s expose the deployment with a public IP via an Azure load balancer:

kubectl expose deployment resnet --port=80 --target-port=5001 --type=LoadBalancer

The above command creates a service of type LoadBalancer that maps port 80 of the Azure load balancer to, eventually, port 5001 of the container. Just use kubectl get svc to see the external IP address. Configuring the load balancer usually takes around one minute.

Now let’s try to scale the deployment to 100 containers:

kubectl scale --replicas=100 deployments/resnet

Instantly, the containers will be provisioned on ACI via the virtual kubelet:

NAME                      READY     STATUS     RESTARTS   AGE
resnet-6d7954d5d7-26n6l 0/1 Waiting 0 30s
resnet-6d7954d5d7-2bjgp 0/1 Creating 0 30s
resnet-6d7954d5d7-2jsrs 0/1 Creating 0 30s
resnet-6d7954d5d7-2lvqm 0/1 Pending 0 27s
resnet-6d7954d5d7-2qxc9 0/1 Creating 0 30s
resnet-6d7954d5d7-2wnn6 0/1 Creating 0 28s
resnet-6d7954d5d7-44rw7 0/1 Creating 0 30s
.... repeat ....

When you run¬†kubectl¬†get¬†endpoints you will see all the endpoints “behind” the resnet service:

NAME         ENDPOINTS                                                       
resnet 40.67.216.68:5001,40.67.219.10:5001,40.67.219.22:5001
+ 97 more…

In container monitoring:

Hey, one pod has an issue! Who cares right?

As you can see, it is really easy to get started with virtual nodes and to scale up a deployment. In a later post, I will take a look at auto scaling containers on a virtual node.