Azure Private Link and DNS

When you are just starting out with Azure Private Link, it can be hard figuring out how name resolution works and how DNS has to be configured. In this post, we will take a look at some of the internals and try to clear up some of the confusion. If you end up even more confused then I’m sorry in advance. Drop me your questions in the comments if that happens. πŸ˜‰ I will illustrate the inner workings with a Cosmos DB account. It is similar for other services.

Wait! What is Private Link?

Azure Private Link provides private IP addresses for services such as Cosmos DB, Azure SQL Database and many more. You choose where the private IP address comes from by specifying a VNET and subnet. Without private link, these services are normally accessed via a public IP address or via Network Service Endpoints (also the public IP but over the Azure network and restricted to selected subnets). There are several issues or shortcomings with those options:

  • for most customers, accessing databases and other services over the public Internet is just not acceptable
  • although network service endpoints provide a solution, this only works for systems that run inside an Azure Virtual Network (VNET)

When you want to access a service like Cosmos DB from on-premises networks and keep the traffic limited to your on-premises networks and Azure virtual networks, Azure Private Link is the way to go. In addition, you can filter the traffic with Azure Firewall or a virtual appliance, typically installed in a hub site. Now let’s take a look at how this works with Cosmos DB.

Azure Private Link for Cosmos DB

I deployed a Cosmos DB account in East US and called it geba-cosmos. To access this account and work with collections, I can use the following name: https://geba-cosmos.documents.azure.com:443/. As explained before, geba-cosmos.document.azure.com resolves to a public IP address. Note that you can still control who can connect to this public IP address. Below, only my home IP address is allowed to connect:

Cosmos DB configured to allow access from selected networks

In order to connect to Cosmos DB using a private IP address in your Azure Virtual Network, just click Private Endpoint Connections below Firewall and virtual networks:

Private Endpoint Connections for a Cosmos DB account with one private endpoint configured

To create a new private endpoint, click + Private Endpoint and follow the steps. The private endpoint is a resource on its own which needs a name and region. It should be in the same region as the virtual network you want to grab an IP address from. In the second screen, you can select the resource you want the private IP to point to (can be in a different region):

Private endpoint that will connect to a Cosmos DB account in my directory (target sub-resource indicates the Cosmos DB API, here the Core SQL API is used)

In the next step, you select the virtual network and subnet you want to grab an IP address from:

VNET and subnet to grab the IP address for the private endpoint

In this third step (Configuration), you will be asked if you want Private DNS integration. The default is Yes but I will select No for now.

Note: it is not required to use a Private DNS zone with Private Link

When you finish the wizard and look at the created private endpoint, it will look similar to the screenshot below:

Private endpoint configured

In the background, a network interface was created and attached to the selected virtual network. Above, the network interface is pe-geba-cosmos.nic.a755f7ad-9d54-4074-996c-8a14e9434898. The network interface screen will look like the screenshot below:

Network interface attached to subnet servers in VNET vnet-us1; it grabbed the next available IP of 10.1.0.5 as primary (but also 10.1.0.6 as secondary; click IP configurations to see that)

The interesting part is the Custom DNS Settings. How can you resolve the name geba-cosmos.documents.azure.com to 10.1.0.5 when a client (either in Azure or on-premises) requests it? Let’s look at DNS resolution next…

DNS Resolution

Let’s use dig to check what a request for a Cosmos DB account return without private link. I have another account, geba-test, that I can use for that:

dig with a Cosmos DB account without private link

The above DNS request was made on my local machine, using public DNS servers. The response from Microsoft DNS servers for geba-test.documents.azure.com is a CNAME to a cloudapp.net name which results in IP address 40.78.226.8.

The response from the DNS server will be different when private link is configured. When I resolve geba-cosmos.documents.azure.com, I get the following:

Resolving the Cosmos DB hostname with private link configured

As you can see, the Microsoft DNS servers respond with a CNAME of accountname.privatelink.documents.azure.com. but by default that CNAME goes to a cloudapp.net name that resolves to a public IP.

This means that, if you don’t take specific action to resolve accountname.privatelink.documents.azure.com to the private IP, you will just end up with the public IP address. In most cases, you will not be able to connect because you will restrict public access to Cosmos DB. It’s important to note that you do not have to restrict public access and that you can enable both private and public access. Most customers I work with though, restrict public access.

Resolving to the private IP address

Before continuing, it’s important to state that developers should connect to https://accountname.documents.azure.com (if they use the gateway mode). In fact, Cosmos DB expects you to use that name. Don’t try to connect with the IP address or some other name because it will not work. This is similar for services other than Cosmos DB. In the background though, we will make sure that accountname.documents.azure.com goes to the internal IP. So how do we make that happen? In what follows, I will list a couple of solutions. I will not discuss using a hosts file on your local pc, although it is possible to make that work.

Create privatelink DNS zones on your DNS servers
This means that in this case, we create a zone for privatelink.documents.azure.com on our own DNS servers and add the following records:

  • geba-cosmos.privatelink.documents.azure.com. IN A 10.1.0.5
  • geba-cosmos-eastus.privatelink.documents.azure.com. IN A 10.1.0.6

Note: use a low TTL like 10s (similar to Azure Private DNS; see below)

When the DNS server has to resolve geba-cosmos.documents.azure.com, it will get the CNAME response of geba-cosmos.privatelink.documents.azure.com and will be able to answer authoritatively that that is 10.1.0.5.

If you use this solution, you need to make sure that you register the custom DNS settings listed by the private endpoint resource manually. If you want to try this yourself, you can easily do this with a Windows virtual machine with the DNS role or a Linux VM with bind.

Use Azure Private DNS zones
If you do not want to register the custom DNS settings of the private endpoint manually in your own DNS servers, you can use Azure Private DNS. You can create the private DNS zone during the creation of the private endpoint. An internal zone for privatelink.documents.azure.com will be created and Azure will automatically add the required DNS configuration the private endpoint requires:

Azure Private DNS with automatic registration of the required Cosmos DB A records

This is great for systems running in Azure virtual networks that are associated with the private DNS zone and that use the DNS servers provided by Azure but you still need to integrate your on-premises DNS servers with these private DNS zones. The way to do that is explained in the documentation. In particular, the below diagram is important:

On-premises forwarding to Azure DNS
Source: Microsoft docs

The example above is for Azure SQL Database but it is similar to our Cosmos DB example. In essence, you need the following:

  • DNS forwarder in the VNET (above, that is 10.5.0.254): this is an extra (!!!) Windows or Linux VM configured as a DNS forwarder; it should forward to 168.63.129.16 which points to the Azure-provided DNS servers; if the virtual network of the VM is integrated with the private DNS zone that hosts privatelink.documents.azure.com, the A records in that zone can be resolved properly
  • To allow the on-premises server to return the privatelink A records, setup conditional forwarding for documents.azure.com to the DNS forwarder in the virtual network

What should you do?

That’s always difficult to answer but most customers I work with tend to go for option 1. They create a zone for privatelink.x.y.z and register the records manually. Although that could be automated, it’s often a manual step.

I actually prefer the private DNS method because of the automatic registration of the records. Although I don’t like the extra DNS server, it will not be needed most of the time because customers tend to work with the hub/spoke model and the hub already contains DNS servers. Those DNS servers can then be configured to enable the resolution of the privatelink zones.

Azure Security Center and Azure Kubernetes Service

Quick post and note to self today… Azure Security Center checks many of your resources for vulnerabilities or attacks. For a while now, it also does so for Azure Kubernetes Service (AKS). In my portal, I saw the following:

Attacked resources?!? Now what?

There are many possible alerts. These are the ones I got:

Some of the alerts for AKS in Security Center

The first one, for instance, reports that a container has mounted /etc/kubernetes/azure.json on the AKS worker node where it runs. That is indeed a sensitive path because azure.json contains the credentials of the AKS security principal. In this case, it’s Azure Key Vault Controller that has been configured to use this principal to connect to Azure Key Vault.

Another useful one is the alert for new high privilege roles. In my case, these alerts are the result from installing Helm charts that include such a role. For example, the helm-operator chart includes a role which uses a ClusterRoleBinding for [{“resources”:[“*”],”apiGroups”:[“*”],”verbs”:[“*”]}]. Yep, that’s high privilege indeed.

Remember, you will need Azure Security Center Standard for these capabilities. Azure Kubernetes Services is charged per AKS core at $2/VM core/month in the preview (according to what I see in the portal).

Security Center Standard pricing in preview for AKS

Be sure to include Azure Security Center Standard when you are deploying Azure resources (not just AKS). The alerts you get are useful. In most cases, you will also learn a thing or two about the software you are deploying! πŸ˜†

Creating Kubernetes secrets from Key Vault

If you do any sort of development, you often have to deal with secrets. There are many ways to deal with secrets, one of them is retrieving the secrets from a secure system from your own code. When your application runs on Kubernetes and your code (or 3rd party code) cannot be configured to retrieve the secrets directly, you have several options. This post looks at one such solution: Azure Key Vault to Kubernetes from Sparebanken Vest, Norway.

In short, the solution connects to Azure Key Vault and does one of two things:

In my scenario, I just wanted regular secrets to use in a KEDA project that processes IoT Hub messages. The following secrets were required:

  • Connection string to a storage account: AzureWebJobsStorage
  • Connection string to IoT Hub’s event hub: EventEndpoint

In the YAML that deploys the pods that are scaled by KEDA, the secrets are referenced as follows:

env:
 - name: AzureFunctionsJobHost__functions__0
   value: ProcessEvents
 - name: FUNCTIONS_WORKER_RUNTIME
   value: node
 - name: EventEndpoint
   valueFrom:
     secretKeyRef:
       name: kedasample-event
       key: EventEndpoint
 - name: AzureWebJobsStorage
   valueFrom:
     secretKeyRef:
       name: kedasample-storage
       key: AzureWebJobsStorage

Because the YAML above is deployed with Flux from a git repo, we need to get the secrets from an external system. That external system in this case, is Azure Key Vault.

To make this work, we first need to install the controller that makes this happen. This is very easy to do with the Helm chart. By default, this Helm chart will work well on Azure Kubernetes Service as long as you give the AKS security principal read access to Key Vault:

Access policies in Key Vault (azure-cli-2019-… is the AKS service principal here)

Next, define the secrets in Key Vault:

Secrets in Key Vault

With the access policies in place and the secrets defined in Key Vault, the controller installed by the Helm chart can do its work with the following YAML:

apiVersion: spv.no/v1alpha1
kind: AzureKeyVaultSecret
metadata:
  name: eventendpoint
  namespace: default
spec:
  vault:
    name: gebakv
    object:
      name: EventEndpoint
      type: secret
  output:
    secret: 
      name: kedasample-event
      dataKey: EventEndpoint
      type: opaque
---
apiVersion: spv.no/v1alpha1
kind: AzureKeyVaultSecret
metadata:
  name: azurewebjobsstorage
  namespace: default
spec:
  vault:
    name: gebakv
    object:
      name: AzureWebJobsStorage
      type: secret
  output:
    secret: 
      name: kedasample-storage
      dataKey: AzureWebJobsStorage
      type: opaque     

The above YAML defines two objects of kind AzureKeyVaultSecret. In each object we specify the Key Vault secret to read (vault) and the Kubernetes secret to create (output). The above YAML results in two Kubernetes secrets:

Two regular secrets

When you look inside such a secret, you will see:

Inside the secret

To double check the secret, just do echo RW5K… | base64 -d to see the decoded secret and that it matches the secret stored in Key Vault. You can now reference the secret with ValueFrom as shown earlier in this post.

Conclusion

If you want to turn Azure Key Vault secrets into regular Kubernetes secrets for use in your manifests, give the solution from Sparebanken Vest a go. It is very easy to use. If you do not want regular Kubernetes secrets, opt for the Env Injector instead, which injects the environment variables directly in your pod.

Creating a Kubernetes operator on Windows and WSL

I have always wanted to create a Kubernetes operator with the operator framework and tried to give that a go on my Windows 10 system. Note that the emphasis is on creating an operator, not necessarily writing a useful one 😁. All I am doing is using the boilerplate that is generated by the framework. If you have never even seen how this is done, then this post if for you. πŸ‘

An operator is an application-specific controller. A controller is a piece of software that implements a control loop, watching the state of the Kubernetes cluster via the API. It makes changes to the state to drive it towards the desired state.

An operator uses Kubernetes to create and manage complex applications. Many operators can be found here: https://operatorhub.io/. The Cassandra operator for instance, has domain-specific knowledge embedded in it, that knows how to deploy and configure this database. That’s great because that means some of the burden is shifted from you to the operator.

Installation

I installed the Operator SDK CLI from the GitHub releases in WSL, Windows Subsystem for Linux. I am using WSL 1, not WSL 2 as I am not running a Windows Insiders release. The commands to run:

RELEASE_VERSION=v0.13.0 

curl -LO https://github.com/operator-framework/operator-sdk/releases/download/${RELEASE_VERSION}/operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu 

chmod +x operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu && sudo mkdir -p /usr/local/bin/ && sudo cp operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu /usr/local/bin/operator-sdk && rm operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu 

You should now be able to run operator-sdk in WSL 1.

Creating an operator

In WSL, you should have installed Go. I am using version 1.13.5. Although not required, I used my Go path on Windows to generate the operator and not the %GOPATH set in WSL. My working directory was:

/mnt/c/Users/geert/go/src/github.com/baeke.info

To create the operator, I ran the following commands (one line):

export GO111MODULE=on

operator-sdk new fun-operator --repo github.com/baeke.info/fun-operator

This creates a folder, fun-operator, under baeke.info and sets up the project:

Project structure in VS Code

Before continuing, cd into fun-operator and run go mod tidy. Now we can run the following command:

operator-sdk add api --api-version=fun.baeke.info/v1alpha1 --kind FunOp

This creates a new CRD (Custom Resource Definition) API called FunOp. The API version is fun.baeke.info/v1alpha1 which you choose yourself. With the above you can create CRDs like below that the operator acts upon:

apiVersion: fun.baeke.info/v1alpha1
kind: FunOp
metadata:
  name: example-funop 

Now we can add a controller that watches for the above CRD resource:

operator-sdk add controller --api-version=fun.baeke.info/v1alpha1 --kind=FunOp

The above will generate a file, funop_controller.go, that contains some boilerplate code that creates a busybox pod. The Reconcile function is responsible for doing this work:

Reconcile function in the controller (incomplete)

As stated above, I will just use the boilerplate code and build the project:

operator-sdk build gbaeke/fun-operator

In WSL 1, you cannot run Docker so the above command will build the operator from the Go code but fail while building the container image. Can’t wait for WSL 2! The build creates the following artifact:

fun-operator in _output/bin

The supplied Dockerfile can be used to build the container images in Windows. In Windows, copy the Dockerfile from the build folder to the root of the operator project (in my case C:\Users\geert\go\src\github.com\baeke.info\fun-operator) and run docker build and push:

docker build -t gbaeke/fun-operator .

docker push gbaeke/fun-operator

Deploying the operator

The project folder structure contains a bunch of yaml in the deploy folder:

Great! Some YAML to deploy

The service account, role and role binding make sure your code can create (or delete/update) resources in the cluster. The operator.yaml actually deploys the operator on your cluster. You just need to update the container spec with the name of your image (here gbaeke/fun-operator).

Before you deploy the operator, make sure you deploy the CRD manifest (here fun.baeke.info_funops_crd.yaml).

As always, just use kubectl apply -f with the above YAML files.

Testing the operator

With the operator deployed, create a resource based on the CRD. For instance:

apiVersion: fun.baeke.info/v1alpha1
kind: FunOp
metadata:
  name: example-funop  

From the moment you create this resource with kubectl apply, a pod will be created by the operator.

pod created upon submitting the custom resource

When you delete example-funop, the pod will be removed by the operator.

That’s it! We created a Kubernetes operator with the boilerplate code supplied by the operator-sdk cli. Another time, maybe we’ll create an operator that actually does something useful! πŸ˜‰

Use a Power Automate Button to start an Azure DevOps build on the go

In a previous post, we built a pipeline to deploy AKS using Azure DevOps. Because it can take while to deploy, it can be handy to start the deployment at any time without having to logon to Azure DevOps. There are many ways to achieve this, but one of the easiest ways is Power Automate.

Microsoft have made it easy to create such a flow because they support Azure DevOps out of the box. The flow looks like this:

Flow to trigger an Azure DevOps build

The flow uses a manual trigger which allows you to start the flow from the iOS app using a button:

Button in the iOS app that triggers the flow

As simple as that… πŸ‘

Trying Civo’s Kubernetes Service

In a previous post I talked about k3sup, a tool to easily install k3s on any system available over SSH. If you don’t know what k3s is, it’s a lightweight version of Kubernetes. It also runs on ARMv7 and ARM64 processors. That means it’s also compatible with a Raspberry Pi.

If I am not mistaken, Civo is the first cloud provider that offers a managed k3s service. Just like the other Civo services it is very easy to use. At this point in time, the service is in beta and you need to be accepted to participate.

Deploying the cluster

The cluster can be deployed via the portal, CLI or the REST API. Portal deployment is very simple:

  • set a name
  • set the size of the nodes
  • set the number of nodes
Creating a new cluster

After deployment, you will see the cluster as follows:

Yes, a deployed cluster

Marketplace

Kubernetes on Civo comes with a marketplace of Kubernetes apps to install during or after cluster deployment. By default, Traefik is selected but you can add other apps. I added Helm for instance:

My installed apps plus a view on the marketplace

Getting your Kubeconfig

You can use the portal to grab the Kubeconfig file:

Downloading Kubeconfig

Then, in your shell, set the KUBECONFIG environment variable to the path where you downloaded the file. Alternatively, you can use the Civo CLI to obtain the Kubeconfig file.

Deploying an application

Let’s install my image classifier app to the cluster and expose it via Traefik. Let’s look at the Traefik service in the cluster:

Traefik service in the cluster

If you look closely, you will see that the Traefik service is exposed on each node. Currently, there is no integration with Civo’s load balancers. You do get a DNS name that uses round robin over the IP addresses of the nodes. The DNS name is something like 232b548e-897f-41d3-86f6-1a2a38516a58.k8s.civo.com.

Let’s install and expose my image classifier with the following basic YAML:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nasnet-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
  - host: IPADDRESS.nip.io
    http:
      paths:
      - path: /
        backend:
          serviceName: nasnet-svc
          servicePort: 80
---
kind: Service
apiVersion: v1
metadata:
  name: nasnet-svc
spec:
  selector:
    app: nasnet
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9090
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nasnet-app
  labels:
    app: nasnet
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nasnet
  template:
    metadata:
      labels:
        app: nasnet
    spec:
      containers:
      - name: nasnet
        image: gbaeke/nasnet
        resources:
          limits:
            cpu: "0.5"
          requests:
            cpu: "0.2"
        ports:
        - containerPort: 9090

In the above YAML, replace IPADDRESS with one of the IP addresses of your nodes. With a little help of nip.io, that name will resolve to the IP address that you specify.

The result:

Creepy but it works!

Conclusion

This was just a quick look at Civo’s Kubernetes service. It is easy to install and comes with an easy to use marketplace to quickly get started. In a relatively short time, they were able to get this up and running quickly. I am sure it will rapidly evolve into a great contender to the other managed Kubernetes services out there.

Token checking at the API Management layer

In the previous blog post, I talked about the OAuth client credentials flow and how to implement it with Azure Active Directory. At the end of the post, I briefly talked about the need to validate the token in either your application or an intermediary layer. In this post, we will take a look at Azure API Management as that intermediary layer.

Remember that we obtained a token for a specific resource. In this case, the resource is an Azure AD application (App Registration) that represents our API. I will call it the API app from now on. The API app has the following app id: 06b2a484-141c-42d3-9d73-32bec5910b06. In our token, the app id is in the aud (audience) claim.

To verify that our client has access rights to the API, we created an application role on the API app called invokeRole. That role should be in the roles claim of the token. If it is not, the client has no access.

We also want to pass the client application id as a header to our backend API. We can use the azp claim for this purpose. That claim will be extracted by API management and passed as a header.

To validate the API connection, we will check for both the aud and the roles claims. If the aud or the invokeRole claim is not present, we reject the call. Let’s take a look how that works.

Configuring the API in API Management

I deployed an Azure API Management instance in the Development tier (any tier will do). I created a simple API with just one GET operation that can add numbers:

Calc API with one operation – amaaaaaazing

At the All Operations level, the API has the following inbound policy defined:

<validate-jwt header-name="Authorization" failed-validation-httpcode="401" require-expiration-time="false" require-signed-tokens="false">
            <openid-config url="https://login.microsoftonline.com/625422dd-8ffb-45a9-9232-4132babb1324/v2.0/.well-known/openid-configuration" />
            <audiences>
                <audience>06b2a484-141c-42d3-9d73-32bec5910b06</audience>
            </audiences>
            <required-claims>
                <claim name="roles" match="any">
                    <value>invokeRole</value>
                </claim>
            </required-claims>
        </validate-jwt>

The validate-jwt does what it says. It validates a JWT (JSON Web Token) passed via the HTTP Authorization header. If the validation fails, a 401 code is returned. The openid-config element sets the URL to the openid configuration of our tenant. You can browse to that URL to see its content. It is open to anyone. Information in that document is used to validate the JWT.

Note: in the openid config URL you can use the domain name of your tenant instead of the tenant ID

In the audiences section we specify we want that specific value in the aud claim. It is the app id of our API app. In the required-claims section we check that the roles claim contains the invokeRole.

Testing the API

With the validate-jwt policy present, we need a valid token to test the API. We can simply use curl to get the token:

curl -d 'grant_type=client_credentials&client_id=f1f695cb-2d00-4c0f-84a5-437282f3f3fd&client_secret=SECRET&audience=api%3A%2F%2F06b2a484-141c-42d3-9d73-32bec5910b06&scope=api%3A%2F%2F06b2a484-141c-42d3-9d73-32bec5910b06%2F.default' -X POST 'https://login.microsoftonline.com/019486dd-8ffb-45a9-9232-4132babb1324/oauth2/v2.0/token' 

The result of this call is the access token. In API Management, we can use the access token to test the API:

Testing the API with the token added to the Authorization header (after the word Bearer)

If the token is invalid, the following response is received:

Oops! Something wrong with the JWT!

Retrieving a claim and set the value as a header

To retrieve the azp claim and set it as a header, just add the set-header policy AFTER the validate-jwt policy (in API design; all operations):

<set-header name="client" exists-action="override">

   <value>@(context.Request.Headers["Authorization"].First().Split(' ')[1].AsJwt()?.Claims["azp"].FirstOrDefault())</value>

</set-header>

Oh, this is so readable! Well not really but it does extract the azp claim from the token and sets the client header to that value. When you test the API and trace the backend call, the header will be shown in the trace. It is up to the backend API to process it.

If you want, you can remove the Authorization header and not send it to the backend.

Conclusion

When you protect APIs with OAuth, you can perform the validation at the API Management layer. Azure API Management can do this very easily with the validate-jwt policy. You can extract claims from the policy and set them as headers so that the backend can handle them without having to know anything about OAuth. Happy coding!

Azure DevOps multi-stage YAML pipelines

A while ago, the Azure DevOps blog posted an update about multi-stage YAML pipelines. The concept is straightforward: define both your build (CI) and release (CD) pipelines in a YAML file and stick that file in your source code repository.

In this post, we will look at a simple build and release pipeline that builds a container, pushes it to ACR, deploys it to Kubernetes linked to an environment. Something like this:

Two stages in the pipeline – build and deploy (as simple as it can get, almost)

Note: I used a simple go app, a Dockerfile and a Kubernetes manifest as source files, check them out here.

Note: there is also a video version πŸ˜‰

Note: if you start from a repository without manifests and azure-pipelines.yaml, the pipeline build wizard will propose Deploy to Azure Kubernetes Service. The wizard that follows will ask you some questions but in the end you will end up with a configured environment, the necessary service connections to AKS and ACR and even a service.yaml and deployment.yaml with the bare minimum to deploy your container!

“Show me the YAML!!!”

The file, azure-pipelines.yaml contains the two stages. Check out the first stage (plus trigger and variables) below:

trigger:
- master

variables:
  imageName: 'gosample'
  registry: 'REGNAME.azurecr.io'

stages:
- stage: build
  jobs:
  - job: 'BuildAndPush'
    pool:
      vmImage: 'ubuntu-latest'
    steps:
    - task: Docker@2
      inputs:
        containerRegistry: 'ACR'
        repository: '$(imageName)'
        command: 'buildAndPush'
        Dockerfile: '**/Dockerfile'
    - task: PublishPipelineArtifact@0
      inputs:
        artifactName: 'manifests'
        targetPath: 'manifests' 

The pipeline runs on a commit to the master branch. The variables imageName and registry are referenced later using $(imageName) and $(registry). Replace REGNAME with the name of your Azure Container Registry.

It’s a multi-stage pipeline, so we start with stages: and then define the first stage build. That stage has one job which consists of two steps:

  • Docker task (v2): build a Docker image based on the Dockerfile in the source code repository and push it to the container registry called ACR; ACR is a reference to a service connection defined in the project settings
  • PublishPipelineArtifact: the source code repository contains Kubernetes deployment manifests in YAML format in the manifests folder; the contents of that folder is published as a pipeline artifact, to be picked up in a later stage

Now let’s look at the deployment stage:

- stage: deploy
  jobs:
  - deployment: 'DeployToK8S'
    pool:
      vmImage: 'ubuntu-latest'
    environment: dev
    strategy:
      runOnce:
        deploy:
          steps:
            - task: DownloadPipelineArtifact@1
              inputs:
                buildType: 'current'
                artifactName: 'manifests'
                targetPath: '$(System.ArtifactsDirectory)/manifests'
            - task: KubernetesManifest@0
              inputs:
                action: 'deploy'
                kubernetesServiceConnection: 'dev-kub-gosample-1558821689026'
                namespace: 'gosample'
                manifests: '$(System.ArtifactsDirectory)/manifests/deploy.yaml'
                containers: '$(registry)/$(imageName):$(Build.BuildId)' 

The second stage uses a deployment job (quite new; see this). In a deployment job, you can specify an environment to link to. In the above job, the environment is called dev. In Azure DevOps, the environment is shown as below:

dev environment

The environment functionality has Kubernetes integration which is pretty neat. You can drill down to the deployed objects such as deployments and services:

Kubernetes deployment in an Azure DevOps environment

The deployment has two tasks:

  • DownloadPipelineArtifact: download the artifact published in the first stage to $(System.ArtifactsDirectory)/manifests
  • KubernetesManifest: this task can deploy Kubernetes manifests; it uses an AKS service connection that was created during creation of the environment; a service account was created in a specific namespace and with access rights to that namespace only; the manifests property will look for an image name in the Kubernetes YAML files and append the tag which is the build id here

Note that the release stage will actually download the pipeline artifact automatically. The explicit DownloadPipelineArtifact task gives additional control over the download location.

The KubernetesManifest task is relatively new at the time of this writing (end of May 2019). Its image substitution functionality could be enough in many cases, without having to revert to Helm or manual text substitution tasks. There is more to this task than what I have described here. Check out the docs for more info.

Conclusion

If you are just starting out building CI/CD pipelines in YAML, you will probably have a hard time getting uses to the schema. I know I had! 😑 In the end though, doing it this way with the pipeline stored in source control will pay off in the long run. After some time, you will have built up a useful library of these pipelines to quickly get up and running in new projects. Recommended!!! πŸ˜‰πŸš€πŸš€πŸš€

Update on restricting egress traffic on Azure Kubernetes Service

In an earlier post, I discussed the combination of Azure Firewall and Azure Kubernetes Service (AKS) to secure ingress and egress AKS traffic.

A few days ago, Microsoft added documentation that describes the ports and URLs to allow when you route traffic through Azure Firewall or a virtual appliance. Some of the allowed ports and addresses are required for the operation of the cluster, while some others are optional. It’s highly recommended to allow the optional ports and addresses though.

The top of the document mentions registering an additional feature called AKSLockingDownEgressPreview:

az feature register --name AKSLockingDownEgressPreview --namespace Microsoft.ContainerService

The document is not very clear on what the feature does but the comments contain the following:

The feature registration tells the cluster to only pull core system images from container image repositories housed in the Microsoft Container Registry (MCR). Otherwise, clusters could try to pull container images for the core components from external repositories. There is some additional routing that also occurs for the cluster to do this. The list of ports and addresses are then what's required for you to permit when the egress traffic is restricted. You can't simply limit the egress traffic to only those address without the feature being enabled for the cluster. 

In summary, to limit egress traffic, use Azure Firewall or a Network Virtual Appliance, allow the listed ports and URLs AND register the AKSLockingDownEgressPreview feature.

Trying Google Cloud Run

With the release of Google’s Cloud Run, I decided to check it out with my nasnet container.

With Cloud Run, you simply deploy your container and let Google scale it based on the requests it receives. When your container is not used, it gets scaled to zero. As such, it combines the properties of a serverless offering such as Azure Functions with standard containers. Today, you cannot put limits on scaling (e.g. max X instances).

You might be tempted to compare Cloud Run to something like Azure Container Instances but it is not exactly the same. True, Azure Container Instances (ACI) allows you to simply deploy a container without the need for an orchestrator such as Kubernetes. With ACI however, memory and CPU capacity are reserved and it does not scale your container based on the requests it receives. ACI can be used in conjunction with virtual nodes in AKS (Kubernetes on Azure) to achieve somewhat similar results at higher cost and complexity. However, ACI can be used in broader scenarios such as stateful applications beyond the simple HTTP use case.

Prerequisites

Cloud Run containers should be able to fit in 2GB of memory. They should be stateless and all computation should be scoped to a HTTP request.

Your container needs to be invocable via HTTP requests on port 8080. It is against best practices though to hardcode this port. Instead, you should check the PORT environment variable that is automatically injected into the container by Cloud Run. Google might change the port in the future! In the nasnet container, the code checks this as follows:

port := getEnv("PORT", "9090") 

getEnv is a custom function that checks the environment variable. If it is not set, port is set to the value of the second parameter:

func getEnv(key, fallback string) string {
value, exists := os.LookupEnv(key)
if !exists {
value = fallback
}
return value
}

Later, in the call to ListenAndServe, the port variable is used as follows:

log.Fatal(http.ListenAndServe(":"+port, nil)) 

Deploying to Cloud Run

Make sure you have access to Google Cloud and create a project. I created a project called CRTest.

Next, clone the nasnet-go repository:

git clone https://github.com/gbaeke/nasnet-go.git

If you have Docker installed, issue the following command to build and tag the container (from the nasnet-go folder which is a folder created by the git clone command above):

docker build -t gcr.io/<PROJECT>/nasnet:latest .

In the above command, replace <PROJECT> with your Google Cloud project name.

To push the container image to the Google Cloud Repository, install gcloud. When you run gcloud init, you will have to authenticate to Google Cloud. We install gcloud here to make the authentication process to Google Container Registry easier. To do that, run the following command:

gcloud auth configure-docker

Next, authenticate to the registry:

docker login gcr.io/<PROJECT>

Now that you are logged in, push the image:

docker push gcr.io/<PROJECT>/nasnet:latest

If you don’t want to bother yourself with local build and push, you can use Google Cloud Build instead:

gcloud builds submit --tag gcr.io/crtest/nasnet .

The above command will package you source files, submit them to Cloud Build and build the container in Google Cloud. When finished, the container image will be pushed to gcr.io/crtest. Either way, when the push is done, check the image in the console:

The nasnet container image in gcr

Now we have the image in the repository, we can use it with Cloud Run. In the console, navigate to Cloud Run and click Create Service:

Creating a Cloud Run service for nasnet

By default, allocated memory is set to 256MB which is too low for this image. Set allocated memory to 1GB. If you set it too low, your container will be restarted. Click Optional Settings and change the allocated memory:

Change the allocated memory for the nasnet container

Note: this particular container writes files you upload to the local file system; this is not recommended since data written to the file system is counted as memory

Note: the container will handle multiple requests up to a maximum of 80; currently 80 concurrent requests per container is the maximum

Now finish the configuration. The image will be pulled and the Cloud Run service will be started:

Cloud Run gives you a https URL to connect to your container. You can configure custom domains as well. When you browse to the URL, you should see the following:

Try it by uploading an image to classify it!

Conclusion

Google Cloud Run makes it easy to deploy HTTP invocable containers in a serverless fashion. In this example, I modified the nasnet code to check the PORT environment variable. In the runtime configuation, I set the amount of memory to 1GB. That was all that was needed to get this container to run. Note that Cloud Run can also be used in conjunction with GKE (Google Kubernetes Engine). That’s a post for some other time!