Trying Civo’s Kubernetes Service

In a previous post I talked about k3sup, a tool to easily install k3s on any system available over SSH. If you don’t know what k3s is, it’s a lightweight version of Kubernetes. It also runs on ARMv7 and ARM64 processors. That means it’s also compatible with a Raspberry Pi.

If I am not mistaken, Civo is the first cloud provider that offers a managed k3s service. Just like the other Civo services it is very easy to use. At this point in time, the service is in beta and you need to be accepted to participate.

Deploying the cluster

The cluster can be deployed via the portal, CLI or the REST API. Portal deployment is very simple:

  • set a name
  • set the size of the nodes
  • set the number of nodes
Creating a new cluster

After deployment, you will see the cluster as follows:

Yes, a deployed cluster

Marketplace

Kubernetes on Civo comes with a marketplace of Kubernetes apps to install during or after cluster deployment. By default, Traefik is selected but you can add other apps. I added Helm for instance:

My installed apps plus a view on the marketplace

Getting your Kubeconfig

You can use the portal to grab the Kubeconfig file:

Downloading Kubeconfig

Then, in your shell, set the KUBECONFIG environment variable to the path where you downloaded the file. Alternatively, you can use the Civo CLI to obtain the Kubeconfig file.

Deploying an application

Let’s install my image classifier app to the cluster and expose it via Traefik. Let’s look at the Traefik service in the cluster:

Traefik service in the cluster

If you look closely, you will see that the Traefik service is exposed on each node. Currently, there is no integration with Civo’s load balancers. You do get a DNS name that uses round robin over the IP addresses of the nodes. The DNS name is something like 232b548e-897f-41d3-86f6-1a2a38516a58.k8s.civo.com.

Let’s install and expose my image classifier with the following basic YAML:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nasnet-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
  - host: IPADDRESS.nip.io
    http:
      paths:
      - path: /
        backend:
          serviceName: nasnet-svc
          servicePort: 80
---
kind: Service
apiVersion: v1
metadata:
  name: nasnet-svc
spec:
  selector:
    app: nasnet
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9090
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nasnet-app
  labels:
    app: nasnet
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nasnet
  template:
    metadata:
      labels:
        app: nasnet
    spec:
      containers:
      - name: nasnet
        image: gbaeke/nasnet
        resources:
          limits:
            cpu: "0.5"
          requests:
            cpu: "0.2"
        ports:
        - containerPort: 9090

In the above YAML, replace IPADDRESS with one of the IP addresses of your nodes. With a little help of nip.io, that name will resolve to the IP address that you specify.

The result:

Creepy but it works!

Conclusion

This was just a quick look at Civo’s Kubernetes service. It is easy to install and comes with an easy to use marketplace to quickly get started. In a relatively short time, they were able to get this up and running quickly. I am sure it will rapidly evolve into a great contender to the other managed Kubernetes services out there.

Token checking at the API Management layer

In the previous blog post, I talked about the OAuth client credentials flow and how to implement it with Azure Active Directory. At the end of the post, I briefly talked about the need to validate the token in either your application or an intermediary layer. In this post, we will take a look at Azure API Management as that intermediary layer.

Remember that we obtained a token for a specific resource. In this case, the resource is an Azure AD application (App Registration) that represents our API. I will call it the API app from now on. The API app has the following app id: 06b2a484-141c-42d3-9d73-32bec5910b06. In our token, the app id is in the aud (audience) claim.

To verify that our client has access rights to the API, we created an application role on the API app called invokeRole. That role should be in the roles claim of the token. If it is not, the client has no access.

We also want to pass the client application id as a header to our backend API. We can use the azp claim for this purpose. That claim will be extracted by API management and passed as a header.

To validate the API connection, we will check for both the aud and the roles claims. If the aud or the invokeRole claim is not present, we reject the call. Let’s take a look how that works.

Configuring the API in API Management

I deployed an Azure API Management instance in the Development tier (any tier will do). I created a simple API with just one GET operation that can add numbers:

Calc API with one operation – amaaaaaazing

At the All Operations level, the API has the following inbound policy defined:

<validate-jwt header-name="Authorization" failed-validation-httpcode="401" require-expiration-time="false" require-signed-tokens="false">
            <openid-config url="https://login.microsoftonline.com/625422dd-8ffb-45a9-9232-4132babb1324/v2.0/.well-known/openid-configuration" />
            <audiences>
                <audience>06b2a484-141c-42d3-9d73-32bec5910b06</audience>
            </audiences>
            <required-claims>
                <claim name="roles" match="any">
                    <value>invokeRole</value>
                </claim>
            </required-claims>
        </validate-jwt>

The validate-jwt does what it says. It validates a JWT (JSON Web Token) passed via the HTTP Authorization header. If the validation fails, a 401 code is returned. The openid-config element sets the URL to the openid configuration of our tenant. You can browse to that URL to see its content. It is open to anyone. Information in that document is used to validate the JWT.

Note: in the openid config URL you can use the domain name of your tenant instead of the tenant ID

In the audiences section we specify we want that specific value in the aud claim. It is the app id of our API app. In the required-claims section we check that the roles claim contains the invokeRole.

Testing the API

With the validate-jwt policy present, we need a valid token to test the API. We can simply use curl to get the token:

curl -d 'grant_type=client_credentials&client_id=f1f695cb-2d00-4c0f-84a5-437282f3f3fd&client_secret=SECRET&audience=api%3A%2F%2F06b2a484-141c-42d3-9d73-32bec5910b06&scope=api%3A%2F%2F06b2a484-141c-42d3-9d73-32bec5910b06%2F.default' -X POST 'https://login.microsoftonline.com/019486dd-8ffb-45a9-9232-4132babb1324/oauth2/v2.0/token' 

The result of this call is the access token. In API Management, we can use the access token to test the API:

Testing the API with the token added to the Authorization header (after the word Bearer)

If the token is invalid, the following response is received:

Oops! Something wrong with the JWT!

Retrieving a claim and set the value as a header

To retrieve the azp claim and set it as a header, just add the set-header policy AFTER the validate-jwt policy (in API design; all operations):

<set-header name="client" exists-action="override">

   <value>@(context.Request.Headers["Authorization"].First().Split(' ')[1].AsJwt()?.Claims["azp"].FirstOrDefault())</value>

</set-header>

Oh, this is so readable! Well not really but it does extract the azp claim from the token and sets the client header to that value. When you test the API and trace the backend call, the header will be shown in the trace. It is up to the backend API to process it.

If you want, you can remove the Authorization header and not send it to the backend.

Conclusion

When you protect APIs with OAuth, you can perform the validation at the API Management layer. Azure API Management can do this very easily with the validate-jwt policy. You can extract claims from the policy and set them as headers so that the backend can handle them without having to know anything about OAuth. Happy coding!

Azure DevOps multi-stage YAML pipelines

A while ago, the Azure DevOps blog posted an update about multi-stage YAML pipelines. The concept is straightforward: define both your build (CI) and release (CD) pipelines in a YAML file and stick that file in your source code repository.

In this post, we will look at a simple build and release pipeline that builds a container, pushes it to ACR, deploys it to Kubernetes linked to an environment. Something like this:

Two stages in the pipeline – build and deploy (as simple as it can get, almost)

Note: I used a simple go app, a Dockerfile and a Kubernetes manifest as source files, check them out here.

Note: there is also a video version 😉

Note: if you start from a repository without manifests and azure-pipelines.yaml, the pipeline build wizard will propose Deploy to Azure Kubernetes Service. The wizard that follows will ask you some questions but in the end you will end up with a configured environment, the necessary service connections to AKS and ACR and even a service.yaml and deployment.yaml with the bare minimum to deploy your container!

“Show me the YAML!!!”

The file, azure-pipelines.yaml contains the two stages. Check out the first stage (plus trigger and variables) below:

trigger:
- master

variables:
  imageName: 'gosample'
  registry: 'REGNAME.azurecr.io'

stages:
- stage: build
  jobs:
  - job: 'BuildAndPush'
    pool:
      vmImage: 'ubuntu-latest'
    steps:
    - task: Docker@2
      inputs:
        containerRegistry: 'ACR'
        repository: '$(imageName)'
        command: 'buildAndPush'
        Dockerfile: '**/Dockerfile'
    - task: PublishPipelineArtifact@0
      inputs:
        artifactName: 'manifests'
        targetPath: 'manifests' 

The pipeline runs on a commit to the master branch. The variables imageName and registry are referenced later using $(imageName) and $(registry). Replace REGNAME with the name of your Azure Container Registry.

It’s a multi-stage pipeline, so we start with stages: and then define the first stage build. That stage has one job which consists of two steps:

  • Docker task (v2): build a Docker image based on the Dockerfile in the source code repository and push it to the container registry called ACR; ACR is a reference to a service connection defined in the project settings
  • PublishPipelineArtifact: the source code repository contains Kubernetes deployment manifests in YAML format in the manifests folder; the contents of that folder is published as a pipeline artifact, to be picked up in a later stage

Now let’s look at the deployment stage:

- stage: deploy
  jobs:
  - deployment: 'DeployToK8S'
    pool:
      vmImage: 'ubuntu-latest'
    environment: dev
    strategy:
      runOnce:
        deploy:
          steps:
            - task: DownloadPipelineArtifact@1
              inputs:
                buildType: 'current'
                artifactName: 'manifests'
                targetPath: '$(System.ArtifactsDirectory)/manifests'
            - task: KubernetesManifest@0
              inputs:
                action: 'deploy'
                kubernetesServiceConnection: 'dev-kub-gosample-1558821689026'
                namespace: 'gosample'
                manifests: '$(System.ArtifactsDirectory)/manifests/deploy.yaml'
                containers: '$(registry)/$(imageName):$(Build.BuildId)' 

The second stage uses a deployment job (quite new; see this). In a deployment job, you can specify an environment to link to. In the above job, the environment is called dev. In Azure DevOps, the environment is shown as below:

dev environment

The environment functionality has Kubernetes integration which is pretty neat. You can drill down to the deployed objects such as deployments and services:

Kubernetes deployment in an Azure DevOps environment

The deployment has two tasks:

  • DownloadPipelineArtifact: download the artifact published in the first stage to $(System.ArtifactsDirectory)/manifests
  • KubernetesManifest: this task can deploy Kubernetes manifests; it uses an AKS service connection that was created during creation of the environment; a service account was created in a specific namespace and with access rights to that namespace only; the manifests property will look for an image name in the Kubernetes YAML files and append the tag which is the build id here

Note that the release stage will actually download the pipeline artifact automatically. The explicit DownloadPipelineArtifact task gives additional control over the download location.

The KubernetesManifest task is relatively new at the time of this writing (end of May 2019). Its image substitution functionality could be enough in many cases, without having to revert to Helm or manual text substitution tasks. There is more to this task than what I have described here. Check out the docs for more info.

Conclusion

If you are just starting out building CI/CD pipelines in YAML, you will probably have a hard time getting uses to the schema. I know I had! 😡 In the end though, doing it this way with the pipeline stored in source control will pay off in the long run. After some time, you will have built up a useful library of these pipelines to quickly get up and running in new projects. Recommended!!! 😉🚀🚀🚀

Update on restricting egress traffic on Azure Kubernetes Service

In an earlier post, I discussed the combination of Azure Firewall and Azure Kubernetes Service (AKS) to secure ingress and egress AKS traffic.

A few days ago, Microsoft added documentation that describes the ports and URLs to allow when you route traffic through Azure Firewall or a virtual appliance. Some of the allowed ports and addresses are required for the operation of the cluster, while some others are optional. It’s highly recommended to allow the optional ports and addresses though.

The top of the document mentions registering an additional feature called AKSLockingDownEgressPreview:

az feature register --name AKSLockingDownEgressPreview --namespace Microsoft.ContainerService

The document is not very clear on what the feature does but the comments contain the following:

The feature registration tells the cluster to only pull core system images from container image repositories housed in the Microsoft Container Registry (MCR). Otherwise, clusters could try to pull container images for the core components from external repositories. There is some additional routing that also occurs for the cluster to do this. The list of ports and addresses are then what's required for you to permit when the egress traffic is restricted. You can't simply limit the egress traffic to only those address without the feature being enabled for the cluster. 

In summary, to limit egress traffic, use Azure Firewall or a Network Virtual Appliance, allow the listed ports and URLs AND register the AKSLockingDownEgressPreview feature.

Trying Google Cloud Run

With the release of Google’s Cloud Run, I decided to check it out with my nasnet container.

With Cloud Run, you simply deploy your container and let Google scale it based on the requests it receives. When your container is not used, it gets scaled to zero. As such, it combines the properties of a serverless offering such as Azure Functions with standard containers. Today, you cannot put limits on scaling (e.g. max X instances).

You might be tempted to compare Cloud Run to something like Azure Container Instances but it is not exactly the same. True, Azure Container Instances (ACI) allows you to simply deploy a container without the need for an orchestrator such as Kubernetes. With ACI however, memory and CPU capacity are reserved and it does not scale your container based on the requests it receives. ACI can be used in conjunction with virtual nodes in AKS (Kubernetes on Azure) to achieve somewhat similar results at higher cost and complexity. However, ACI can be used in broader scenarios such as stateful applications beyond the simple HTTP use case.

Prerequisites

Cloud Run containers should be able to fit in 2GB of memory. They should be stateless and all computation should be scoped to a HTTP request.

Your container needs to be invocable via HTTP requests on port 8080. It is against best practices though to hardcode this port. Instead, you should check the PORT environment variable that is automatically injected into the container by Cloud Run. Google might change the port in the future! In the nasnet container, the code checks this as follows:

port := getEnv("PORT", "9090") 

getEnv is a custom function that checks the environment variable. If it is not set, port is set to the value of the second parameter:

func getEnv(key, fallback string) string {
value, exists := os.LookupEnv(key)
if !exists {
value = fallback
}
return value
}

Later, in the call to ListenAndServe, the port variable is used as follows:

log.Fatal(http.ListenAndServe(":"+port, nil)) 

Deploying to Cloud Run

Make sure you have access to Google Cloud and create a project. I created a project called CRTest.

Next, clone the nasnet-go repository:

git clone https://github.com/gbaeke/nasnet-go.git

If you have Docker installed, issue the following command to build and tag the container (from the nasnet-go folder which is a folder created by the git clone command above):

docker build -t gcr.io/<PROJECT>/nasnet:latest .

In the above command, replace <PROJECT> with your Google Cloud project name.

To push the container image to the Google Cloud Repository, install gcloud. When you run gcloud init, you will have to authenticate to Google Cloud. We install gcloud here to make the authentication process to Google Container Registry easier. To do that, run the following command:

gcloud auth configure-docker

Next, authenticate to the registry:

docker login gcr.io/<PROJECT>

Now that you are logged in, push the image:

docker push gcr.io/<PROJECT>/nasnet:latest

If you don’t want to bother yourself with local build and push, you can use Google Cloud Build instead:

gcloud builds submit --tag gcr.io/crtest/nasnet .

The above command will package you source files, submit them to Cloud Build and build the container in Google Cloud. When finished, the container image will be pushed to gcr.io/crtest. Either way, when the push is done, check the image in the console:

The nasnet container image in gcr

Now we have the image in the repository, we can use it with Cloud Run. In the console, navigate to Cloud Run and click Create Service:

Creating a Cloud Run service for nasnet

By default, allocated memory is set to 256MB which is too low for this image. Set allocated memory to 1GB. If you set it too low, your container will be restarted. Click Optional Settings and change the allocated memory:

Change the allocated memory for the nasnet container

Note: this particular container writes files you upload to the local file system; this is not recommended since data written to the file system is counted as memory

Note: the container will handle multiple requests up to a maximum of 80; currently 80 concurrent requests per container is the maximum

Now finish the configuration. The image will be pulled and the Cloud Run service will be started:

Cloud Run gives you a https URL to connect to your container. You can configure custom domains as well. When you browse to the URL, you should see the following:

Try it by uploading an image to classify it!

Conclusion

Google Cloud Run makes it easy to deploy HTTP invocable containers in a serverless fashion. In this example, I modified the nasnet code to check the PORT environment variable. In the runtime configuation, I set the amount of memory to 1GB. That was all that was needed to get this container to run. Note that Cloud Run can also be used in conjunction with GKE (Google Kubernetes Engine). That’s a post for some other time!

Azure Front Door Revisited

A while ago, I wrote a post about Azure Front Door. In that post, I wrote that http to https redirection was not possible. With Azure Front Door being GA, let’s take a look if that is still the case.

In the previous post, I had the following configuration in Front Door Designer:

Azure Front Door Designer

The above configuration exposes a static website hosted in an Azure Storage Acccount (the backend in the backend pool). The custom domain deploy.baeke.info maps to geba.azurefd.net using a CNAME in my CloudFlare hosted domain. The routing rule routeall maps all requests to the backend.

The above configuration does not, however, redirect http://deploy.baeke.info to https://deploy.baeke.info which is clearly not what we want. In order to achieve that goal, the routing rules can be changed. A redirect routing rule looks as follows:

Redirect routing rule (Replace destination host was not required)

The routall rule looks like this:

Routing rule

The routing rule simply routes https://deploy.baeke.info to the azdeploy backend pool which only contains the single static website hosted in a storage account.

The full config looks like this:

Full config in Front Door designer

Although not very useful for this static website, I also added WAF (Web Application Firewall) rules to Azure Front Door. In the Azure Portal, just search for WAF and add a policy. I added a default policy and associated it with this Azure Front Door website:

WAF rules associated with the Azure Front Door frontend

If required, you can enable/disable WAF rules:

Using TensorFlow models in Go

Image via www.vpnsrus.com

In earlier posts, I discussed hosting a deep learning model such as Resnet50 on Kubernetes or Azure Container Instances. The model can then be used as any API which receives input as JSON and returns a result as JSON.

Naturally, you can also run the model in offline scenarios and directly from your code. In this post, I will take a look at calling a TensorFlow model from Go. If you want to follow along, you will need Linux or MacOS because the Go module does not support Windows.

Getting Ready

I installed an Ubuntu Data Science Virtual Machine on Azure and connected to it with X2Go:

Data Science Virtual Machine (Ubuntu) with X2Go

The virtual machine has all the required machine learning tools installed such as TensorFlow and Python. It also has Visual Studio Code. There are some extra requirements though:

  • Go: follow the instructions here to download and install Go
  • TensorFlow C API: follow the instructions here to download and install the C API; the TensorFlow package for Go requires this; it is recommended to also build and run the Hello from TensorFlow C program to verify that the library works (near the bottom of the instructions page)

After installing Go and the TensorFlow C API, install the TensorFlow Go package with the following command:

go get github.com/tensorflow/tensorflow/tensorflow/go

Test the package with go test:

go test github.com/tensorflow/tensorflow/tensorflow/go

The above command should return:

ok      github.com/tensorflow/tensorflow/tensorflow/go  0.104s

The go get command installed the package in $HOME/go/src/github.com if you did not specify a custom $GOPATH (see this wiki page for more info).

Getting a model

A model describes how the input (e.g. an image for image classification) gets translated to an output (e.g. a list of classes with probabilities). The model contains thousands or even millions of parameters which means a model can be quite large. In this example, we will use NASNetMobile which can be used to classify images.

Now we need some code to save the model in TensorFlow format so that it can be used from a Go program. The code below is based on the sample code on the NASNetMobile page from modeldepot.io. It also does a quick test inference on a cat image.

import keras
from keras.applications.nasnet import NASNetMobile
from keras.preprocessing import image
from keras.applications.xception import preprocess_input, decode_predictions
import numpy as np
import tensorflow as tf
from keras import backend as K

sess = tf.Session()
K.set_session(sess)

model = NASNetMobile(weights="imagenet")
img = image.load_img('cat.jpg', target_size=(224,224))
img_arr = np.expand_dims(image.img_to_array(img), axis=0)
x = preprocess_input(img_arr)
preds = model.predict(x)
print('Prediction:', decode_predictions(preds, top=5)[0])

#save the model for use with TensorFlow
builder = tf.saved_model.builder.SavedModelBuilder("nasnet")

#Tag the model, required for Go
builder.add_meta_graph_and_variables(sess, ["atag"])
builder.save()
sess.close()

On the Ubuntu Data Science Virtual Machine, the above code should execute without any issues because all Python packages are already installed. I used the py35 conda environment. Use activate py35 to make sure you are in that environment.

The above code results in a nasnet folder, which contains the saved_model.pb file for the graph structure. The actual weights are in the variables subfolder. In total, the nasnet folder is around 38MB.

Great! Now we need a way to use the model from our Go program.

Using the saved model from Go

The model can be loaded with the LoadSavedModel function of the TensorFlow package. That package is imported like so:

import (
tf "github.com/tensorflow/tensorflow/tensorflow/go"
)

LoadSavedModel is used like so:

model, err := tf.LoadSavedModel("nasnet",
[]string{"atag"}, nil)
if err != nil {
log.Fatal(err)
}

The above code simply tries to load the model from the nasnet folder. We also need to specify the tag.

Next, we need to load an image and convert the image to a tensor with the following dimensions [1][224][224][3]. This is similar to my earlier ResNet50 post.

Now we need to pass the tensor to the model as input, and retrieve the class predictions as output. The following code achieves this:

output, err := model.Session.Run(
map[tf.Output]*tf.Tensor{
model.Graph.Operation("input_1").Output(0): input,
},
[]tf.Output{
model.Graph.Operation("predictions/Softmax").Output(0),
},
nil,
)
if err != nil {
log.Fatal(err)
}

What the heck is this? The run method is defined as follows:

func (s *Session) Run(feeds map[Output]*Tensor, fetches []Output, targets []*Operation) ([]*Tensor, error)

When you build a model, you can give names to tensors and operations. In this case the input tensor (of dimensions [1][224][224][3]) is called input_1 and needs to be specified as a map. The inference operation is called predictions/Softmax and the output needs to be specified as an array.

The actual predictions can be retrieved from the output variable:

predictions, ok := output[0].Value().([][]float32)
if !ok {
log.Fatal(fmt.Sprintf("output has unexpected type %T", output[0].Value()))
}

If you are not very familiar with Go, the code above uses type assertion to verify that predictions is a 2-dimensional array of float32. If the type assertion succeeds, the predictions variable will contain the actual predictions: [[<probability class 1 (tench)>, <probability class 2 (goldfish)>, …]]

You can now simply find the top prediction(s) in the array and match them with the list of classes for NASNet (actually the ImageNet classes). I get the following output with a cat image:

Yep, it’s a tabby!

If you are wondering what image I used:

Tabby?

Conclusion

With Go’s TensorFlow bindings, you can load TensorFlow models from disk and use them for inference locally, without having to call a remote API. We used Python to prepare the model with some help from Keras.