Giving linkerd a spin

A while ago, I gave linkerd a spin. Due to vacations and a busy schedule, I was not able to write about my experience. I will briefly discuss how to setup linkerd and then deploy a sample service to illustrate what it can do out of the box. Let’s go!

Wait! What is linkerd?

linkerd basically is a network proxy for your Kubernetes pods that’s designed to be deployed as a service mesh. When the pods you care about have been infused with linkerd, you will automatically get metrics like latency and requests per second, a web portal to check these metrics, live inspection of traffic and much more. Below is an example of a Kubernetes namespace that has been meshed:

A meshed namespace; all deployments in this particular namespace are meshed which means all pods get the linkerd network proxy that provides the metrics and features such as encryption

Installation

I can be very brief about this: installation is about as simple as it gets. Simply navigate to https://linkerd.io/2/getting-started to get started. Here are the simplified steps:

  • Download the linkerd executable as described in the Getting Started guide; I used WSL for this
  • Create a Kubernetes cluster with AKS (or another provider); for AKS, use the Azure CLI to get your credentials (az aks get-credentials); make sure the Azure CLI is installed in WSL and that you connected to your Azure subscription with az login
  • Make sure you can connect to your cluster with kubectl
  • Run linkerd check –pre to check if prerequisites are fulfilled
  • Install linkerd with linkerd install | kubectl apply -f –
  • Check the installation with linkerd check

The last step will nicely show its progress and end when the installation is complete:

linkerd check output

Exploring linkerd with the dashboard

linkerd automatically installs a dashboard. The dashboard is exposed as a Kubernetes service called linkerd-web. The service is of type ClusterIP. Although you could expose the service using an ingress, you can easily tunnel to the service with the following linkerd command (first line is the command; other lines are the output):

linkerd dashboard

Linkerd dashboard available at:
http://127.0.0.1:50750
Grafana dashboard available at:
http://127.0.0.1:50750/grafana
Opening Linkerd dashboard in the default browser
Failed to open Linkerd dashboard automatically
Visit http://127.0.0.1:50750 in your browser to view the dashboard

From WSL, the dashboard can not open automatically but you can manually browse to it. Note that linkerd also installs Prometheus and Grafana.

Out of the box, the linkerd deployment is meshed:

Adding linkerd to your own service

In this section, we will deploy a simple service that can add numbers and add linkerd to it. Although there are many ways to do this, I chose to create a separate namespace and enable auto-injection via an annotation. Here’s the yaml to create the namespace (add-ns.yaml):

apiVersion: v1
kind: Namespace
metadata:
  name: add
  annotations:
    linkerd.io/inject: enabled

Just run kubectl create -f add-ns.yaml to create the namespace. The annotation ensures that all pods added to the namespace get the linkerd proxy in the pod. All traffic to and from the pod will then pass through the proxy.

Now, let’s install the add service and deployment:

apiVersion: v1
kind: Service
metadata:
  name: add-svc
spec:
  ports:
  - port: 80
    name: http
    protocol: TCP
    targetPort: 8000
  - port: 8080
    name: grpc
    protocol: TCP
    targetPort: 8080
  selector:
    app: add
    version: v1
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: add
spec:
  replicas: 2
  selector:
    matchLabels:
      app: add
  template:
    metadata:
      labels:
        app: add
        version: v1
    spec:
      containers:
      - name: add
        image: gbaeke/adder

The deployment deploys to two pods with the gbaeke/adder image. To deploy the above, save it to a file (add.yaml) and use the following command to deploy:

kubectl create -f add-yaml -n add

Because the deployment uses the add namespace, the linkerd proxy will be added to each pod automatically. When you list the pods in the deployment, you see:

Each add pod has two containers: the actual add container based on gbaeke/adder and the proxy

To see more details about one of these pods, I can use the following command:

k get po add-5b48fcc894-2dc97 -o yaml -n add

You will clearly see the two containers in the output:

Two containers in the pod: actual service (gbaeke/adder) and the linkerd proxy

Generating some traffic

Let’s deploy a client that continuously uses the calculator service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: add-cli
spec:
  replicas: 1
  selector:
    matchLabels:
      app: add-cli
  template:
    metadata:
      labels:
        app: add-cli
    spec:
      containers:
      - name: add-cli
        image: gbaeke/adder-cli
        env:
        - name: SERVER
          value: "add-svc"

Save the above to add-cli.yaml and deploy with the below command:

kubectl create -f add-cli.yaml -n add

The deployment uses another image called gbaeke/adder-cli that continuously makes requests to the server specified in the SERVER environment variable.

Checking the deployment in the linkerd portal

When you now open the add namespace in the linked portal, you should see something similar to the below screenshot (note: I deployed 5 servers and 5 clients):

A view on the add namespace; linkerd has learned how the deployments talk to eachother

The linkerd proxy in all pods sees all traffic. From the traffic, it can infer that the add-cli deployment talks to the add deployment. The add deployment receives about 150 requests per second. The 99th percentile latency is relatively high because the cluster nodes are very small, I deployed more instances and the client is relatively inefficient.

When I click the deployment called add, the following screen is shown:

A view on the deployment

The deployment clearly shows where traffic is coming from plus relevant metrics such as RPS and P99 latency. You also get a view on the live calls now. Note that the client is using GRPC which uses a HTTP POST. When you scroll down on this page, you get more information about the caller and a view on the individual pods:

A view on the inbound calls to the deployment plus a view on the pods

To see live calls in more detail, you can click the Tap icon:

A live view on the calls with Tap

For each call, details can be requested:

Request details

Conclusion

This was just a brief look at linkerd. It is trivially easy to install and with auto-injection, very simple to add it to your own services. Highly recommended to give it a spin to see where it can add value to your projects!

Securing your API with Kong and CloudFlare

In the previous post, we looked at API Management with Kong and the Kong Ingress Controller. We did not care about security and exposed a sample toy API over a public HTTP endpoint that also required an API key. All in the clear, no firewall, no WAF, nothing… 👎👎👎

In this post, we will expose the API over TLS and configure Kong to use a CloudFlare origin certificate. An origin certificate is issued and trusted by CloudFlare to connect to the origin, which in our case is an API hosted on Kubernetes.

The API consumer will not connect directly to the Kubernetes-hosted API exposed via Kong. Instead, the consumer connects to CloudFlare over TLS and uses a certificate issued by CloudFlare that is fully trusted by browsers and other clients.

The traffic flow is as follows:

Consumer --> CloudFlare (TLS with fully trusted cert, WAF, ...) --> Kong Ingress (TLS with origin cert) --> API (HTTP)

Configuring Kong

Refer to the previous post for installation instructions. The YAML files to configure the Ingress, KongIngress, Consumer, etc… are almost the same. The Ingress resource has the following changes:

  • We use a new hostname api.baeke.info
  • We configure TLS for api.baeke.info by referring to a secret called baeke.info.tls which contains the CloudFlare origin certificate.
  • We use an additional Kong plugin which provides whitelisting of CloudFlare addresses; only CloudFlare is allowed to connect to the Ingress

Here is the full definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: func
  namespace: default
  annotations:
    kubernetes.io/ingress.class: kong
    plugins.konghq.com: http-auth, whitelist
spec:
  tls:
  - hosts:
    - api.baeke.info
    secretName: baeke.info.tls # cloudflare origin cert
  rules:
    - host: api.baeke.info
      http:
        paths:
        - path: /users
          backend:
            serviceName: func
            servicePort: 80

Here is the plugin definition for whitelisting with the current (June 15th, 2019) list of IP ranges used by CloudFlare. Note that you have to supply the addresses and ranges as an array. The documentation shows a comma-separated list! 🤷‍♂️

apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: whitelist
  namespace: default
config:
  whitelist: 
  - 173.245.48.0/20
  - 103.21.244.0/22
  - 103.22.200.0/22
  - 103.31.4.0/22
  - 141.101.64.0/18
  - 108.162.192.0/18
  - 190.93.240.0/20
  - 188.114.96.0/20
  - 197.234.240.0/22
  - 198.41.128.0/17
  - 162.158.0.0/15
  - 104.16.0.0/12
  - 172.64.0.0/13
  - 131.0.72.0/22
plugin: ip-restriction 

I also made a change to the KongIngress resource, to only allow https to the back-end service. Only the route section is shown below:

route:
 methods:
 - GET
 regex_priority: 0
 strip_path: true
 preserve_host: true
 protocols:
 - https 

In the previous post, the protocols array contained the http value.

Note: for whitelisting to work, the Kong proxy service needs externalTrafficPolicy set to Local. Use kubectl edit svc kong-kong-proxy to modify that setting. You can set this value at deployment time as well. This might or might not work for you. I used AKS where this produces the desired outcome.

CloudFlare

Get the external IP of the kong-kong-proxy service and create a DNS entry for it. I created a A record for api.baeke.info:

Make sure the orange cloud is active. In this case, this means that requests for api.baeke.info are proxied by CloudFlare. That allows us to cache, enable WAF (web application firewall), rate limiting and more!

In the Firewall section, WAF is turned on. Note that this is a paying feature!

WAF to protect your API

In Crypto, Universal SSL is turned on and set to Full (strict).

Full (strict) means that CloudFlare connects to your origin over HTTPS and that it expects a valid certificate, which is checked. An origin certificate, issued by CloudFlare but not trusted by your operating system is also valid. As stated above, I use such an origin certificate at the Ingress level.

The origin certificate can be issued and/or downloaded from the Crypto section:

Origin certs

I created an origin certificate for *.baeke.info and baeke.info and downloaded the certificate and private key in PEM format. I then encoded the contents of the certificate and key in base64 format and used them in a secret:

apiVersion: v1
kind: Secret
metadata:
  name: baeke.info.tls
  namespace: default
type: kubernetes.io/tls
data:
  tls.crt: base64-encoded-cert
  tls.key: base64-endoced-key

As you have seen in the Ingress definition, it referred to this secret via its name, baeke.info.tls.

When a consumer connects to the API, the fully trusted certificate issued by CloudFlare is used:

Universal SSL cert from CloudFlare

We also make sure consumers of the API need to use TLS:

Force HTTPS at the CloudFlare level

With the above configuration, consumers need to securely connect to https://api.baeke.info at CloudFlare. CloudFlare connects securely to the origin, which is the external IP of the ingress. Only CloudFlare is allowed to connect to that external IP because of the whitelisting configuration.

Testing the API

Let’s try the API with the http tool:

Connecting to the API

All sorts of headers are added by CloudFlare which makes it clear that CloudFlare is proxying the requests. When we don’t add a key or specify a wrong one:

Kong is still doing its work

The key is now securely sent from consumer to CloudFlare to origin. Phew! 😎

Conclusion

In this post, we hosted an API on Kubernetes, exposed it with Kong and secured it with CloudFlare. This example can easily be extended with multiple Kong proxies for high availability and multiple APIs (/users, /orders, /products, …) that are all protected by CloudFlare with end-to-end encryption and WAF. CloudFlare lends an extra helping hand by automatically generating both the “front-end” and origin certificates.

In a follow-up post, we will look at an alternative approach via Azure Front Door Service. Stay tuned!

Quick overview of Traefik Ingress Controller Installation

This post is mainly a note to self 📝📝📝 that describes a quick way to deploy a Kubernetes Ingress Controller with Traefik.

There is also a video version:

We will install Traefik with Helm and I assume the cluster has rbac enabled. If you deploy clusters with AKS, that is the default although you can turn it off. With rbac enabled, you need to install the server-side component of Helm, tiller, using the following commands:

kubectl apply -f tiller-rbac.yaml
helm init --service-account tiller

The file tiller-rbac.yaml should contain the following:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system 

Note that you create an account that has cluster-wide admin privileges. That’s guaranteed to work but might not be what you want.

Next, install the Traefik Ingress Controller with the following Helm one-liner:

helm install stable/traefik --name traefik --set serviceType=LoadBalancer,rbac.enabled=true,ssl.enabled=true,ssl.enforced=true,acme.enabled=true,acme.email=email@domain.com,onHostRule=true,acme.challengeType=tls-alpn-01,acme.staging=false,dashboard.enabled=true --namespace kube-system 

The above command uses Helm to install the stable/traefik chart. Note that the chart is maintained by the community and not by the folks at Traefik. Traefik itself is exposed via a service of type LoadBalancer, which results in a public IP address. Use kubectl get svc traefik -n kube-system to check. There are ways to make sure the service uses a static IP but that is not discussed in this post. Check out this doc for AKS. The other settings do the following:

  • ssl.enabled: yes, SSL 😉
  • ssl.enforced: redirect to https when user uses http
  • acme.enabled: enable Let’s Encrypt
  • acme.email: set the e-mail address to use with Let’s Encrypt; you will get certificate expiry mails on that address
  • onHostRule: issue certificates based on the host setting in the ingress definition
  • acme.challengeType: method used by Let’s Encrypt to issue the certificate; use this one for regular certs; use DNS verification for wildcard certs
  • acme.staging: set to false to issue fully trusted certs; beware of rate limiting
  • dashboard.enabled: enable the Traefik dashboard; you can expose the service via an ingress object as well

Note: to specify a specific version of Traefik, use the imageTag parameter as part of –set; for instance imageTag=1.7.12

When the installation is finished, run the following commands:

# check installation
helm ls

# check traefik service
kubectl get svc traefik --namespace kube-system -w

The first command should show that Traefik is installed. The second command returns the traefik service, which we configured with serviceType LoadBalancer. The external IP of the service will be pending for a while. When you have an address and you browse it, you should get a 404. Result from curl -v below:

 Rebuilt URL to: http://IP/
 Trying 137.117.140.116…
 Connected to 137.117.140.116 (IP) port 80 (#0) 
 GET / HTTP/1.1
 Host: IP
 User-Agent: curl/7.47.0
 Accept: /
 < HTTP/1.1 404 Not Found
 < Content-Type: text/plain; charset=utf-8
 < Vary: Accept-Encoding
 < X-Content-Type-Options: nosniff
 < Date: Fri, 24 May 2019 17:00:29 GMT
 < Content-Length: 19
 <
 404 page not found 

Next, install nginx just to have a simple website to securely publish. Yes I know, kubectl run… 🤷

kubectl run nginx --image nginx --expose --port 80

The above command installs nginx but also creates an nginx service of type ClusterIP. We can expose that service via an ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nginx
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
    - host: your.domain.com
      http:
        paths:
        - path: /
          backend:
            serviceName: nginx
            servicePort: 80

Replace your.domain.com with a host that resolves to the external IP address of the Traefik service. The annotation is not technically required if Traefik is the only Ingress Controller in your cluster. I prefer being explicit though. Save the above contents to a file and then run:

kubectl apply -f yourfile.yaml

Now browse to whatever you used as domain. The result should be:

Yes… nginx exposed via Traefik and a Let’s Encrypt certificate

To expose the Traefik dashboard, use the yaml below. Note that we explicitly installed the dashboard by setting dashboard.enabled to true.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: traefikdb
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
    - host: yourother.domain.com
      http:
        paths:
        - path: /
          backend:
            serviceName: traefik-dashboard
            servicePort: 80

Put the above contents in a file and create the ingress object in the same namespace as the traefik-dashboard service. Use kubectl apply -f yourfile.yaml -n kube-system. You should then be able to access the dashboard with the host name you provided:

Traefik dashboard

Note: if you do not want to mess with DNS records that map to the IP address of the Ingress Controller, just use a xip.io address. In the ingress object’s host setting, use something like web.w.x.y.z.xip.io where web is just something you choose and w.x.y.z is the IP address of the Ingress Controller. Traefik will also request a certificate for such a name. For more information, check xip.io. Simple for testing purposes!

Hope it helps!

Cloud Run on Google Kubernetes Engine

In this short post, we will take a look at Cloud Run on Google Kubernetes Engine (GKE). To get this to work, you will need to deploy a Kubernetes cluster. Make sure you use nodes with at least 2 vCPUs and 7.5 GB of memory. Take a look here for more details. You will notice that you need to include Istio which will make the option to enable Cloud Run on GKE available.

To create a Cloud Run service on GKE, navigate to Cloud Run in the console and click Create Service. For location, you can select your Kubernetes cluster. In the screenshot below, the default namespace of my cluster gebacr in zone us-central1-a was chosen:

Cloud Run service on GKE

In Connectivity, select external:

External connectivity to the service

In the optional settings, you can specify the allocated memory and maximum requests per container.

When finished, you will see a deployment on your cluster:

Cloud Run Kubernetes deployment (note that the Cloud Run service is nasnet-gke)

Notice that, like with Cloud Run without GKE, the deployment is scaled to zero when it is not in use!

To connect to the service, check the URL given to you by Cloud Run. It will be in the form of: http://SERVICE.NAMESPACE.example.com. For example: http://nasnet-gke.default.example.com. Clearly, we will not be able to connect to that from the browser.

To fix that, you can patch the domain name to something that can be resolved, for instance a xip.io address. First get the external IP of the istio-ingressgateway:

kubectl get service istio-ingressgateway --namespace istio-system

Next, patch the config-domain configmap to replace example.com with <EXTERNALIP>.xip.io

kubectl patch configmap config-domain --namespace knative-serving --patch \
'{"data": {"example.com": null, "[EXTERNAL-IP].xip.io": ""}}'

In my example Cloud Run service, I now get the following URL (not the actual IP):

http://nasnet-gke.default.107.198.183.182.xip.io/

Note: instead of patching the domain, you could also use curl to connect to the external IP of the ingress and pass the host header nasnet-gke.default.example.com.

With that URL, I can connect to the service. In case of a cold start (when the ReplicaSet has been scaled to 0), it takes a bit longer that “native” Cloud Run which takes a second or so.

It is clear that connecting to the Cloud Run service on GKE takes a bit more work than with “native” Cloud Run. Enabling HTTPS is also more of a pain on GKE where in “native” Cloud Run, you merely need to validate your domain and Google will configure a Let’s Encrypt certificate for the domain name you have configured. Cloud Run cold starts also seem faster.

That’s it for this quick look. In general, try to use Cloud Run versus Cloud Run on GKE as much as possible. Less fuss, more productivity! 😉

Creating and containerizing a TensorFlow Go application

In an earlier post, I discussed using a TensorFlow model from a Go application. With the TensorFlow bindings for Go, you can load a model that was exported with TensorFlow’s SavedModelBuilder module. That module saves a “snapshot” of a trained model which can be used for inference.

In this post, we will actually use the model in a web application. The application presents the user with a page to upload an image:

The upload page

The class and its probability is displayed, including the processed image:

Clearly a hen!

The source code of the application can be found at https://github.com/gbaeke/nasnet-go. If you just want to try the application, use Docker and issue the following command (replace port 80 with another port if there is a conflict):

docker run -p 80:9090 -d gbaeke/nasnet

The image is around 2.55GB in size so be patient when you first run the application. When the container has started, open your browser at http://localhost to see the upload page.

To quickly try it, you can run the container on Azure Container Instances. If you use the Portal, specify port 9090 as the container port.

Nasnet container in ACI

A closer look at the appN

**UPDATE**: since first publication, the http handler code was moved into from main.go to handlers/handlers.go

In the init() function, the nasnet model is loaded with tf.LoadSavedModel. The ImageNet categories are also loaded with a call to getCategories() and stored in categories which is a map of int to a string array.

In main(), we simply print the TensorFlow version (1.12). Next, http.HandleFunc is used to setup a handler (upload func) when users connect to the root of the web app.

Naturally, most of the logic is in the upload function. In summary, it does the following:

  • when users just navigate to the page (HTTP GET verb), render the upload.gtpl template; that template contains the upload form and uses a bit of bootstrap to make it just a bit better looking (and that’s already an overstatement); to learn more about Go web templates, see this link.
  • when users submit a file (POST), the following happens:
    • read the image
    • convert the image to a tensor with the getTensor function; getTensor returns a *tf.Tensor; the tensor is created from a [1][224][224][3] array; note that each pixel value gets normalized by subtracting by 127.5 and then dividing by 127.5 which is the same preprocessing applied as in Keras (divide by 127.5 and subtract 1)
    • run a session by inputting the tensor and getting the categories and probabilities as output
    • look for the highest probability and save it, together with the category name in a variable of type ResultPageData (a struct)
    • the struct data is used as input for the response.gtpl template

Note that the image is also shown in the output. The processed image (resized to 224×224) gets converted to a base64-encoded string. That string can be used in HTML image rendering as follows (where {{.Picture}} in the template will be replaced by the encoded string):

 <img src="data:image/jpg;base64,{{.Picture}}"> 

Note that the application lacks sufficient error checking to gracefully handle the upload of non-image files. Maybe I’ll add that later! 😉

Containerization

To containerize the application, I used the Dockerfile from https://github.com/tinrab/go-tensorflow-image-recognition but removed the step that downloads the InceptionV3 model. My application contains a ready to use NasnetMobile model.

The container image is based on tensorflow/tensorflow:1.12.0. It is further modified as required with the TensorFlow C API and the installation of Go. As discussed earlier, I uploaded a working image on Docker Hub.

Conclusion

Once you know how to use TensorFlow models from Go applications, it is easy to embed them in any application, from command-line tools to APIs to web applications. Although this application does server-side processing, you can also use a model directly in the browser with TensorFlow.js or ONNX.js. For ONNX, try https://microsoft.github.io/onnxjs-demo/#/resnet50 to perform image classification with ResNet50 in the browser. You will notice that it will take a while to get started due to the model being downloaded. Once the model is downloaded, you can start classifying images. Personally, I prefer the server-side approach but it all depends on the scenario.

Infrastructure as Code: exploring Pulumi

Image: from the Pulumi website

In my Twitter feed, I often come across Pulumi so I decided to try it out. Pulumi is an Infrastructure as Code solution that allows you to use familiar development languages such as JavaScript, Python and Go. The idea is that you define your infrastructure in the language that you prefer, versus some domain specific language. When ready, you merely use pulumi up to deploy your resources (and pulumi update, pulumi destroy, etc…). The screenshot below shows the deployment of an Azure resource group, storage account, file share and a container group on Azure Container Instances. The file share is mapped as a volume to one of the containers in the container group:

Deploying infrastructure with pulumi up

Installation is extremely straightforward. I chose to write the code in JavaScript as I had all the tools already installed on my Windows box. It is also more polished than the Go option (for now). I installed Pulumi per their instructions over at https://pulumi.io/quickstart/install.html.

Next, I used their cloud console to create a new project. Eventually, you will need to run a pulumi new command on your local machine. The cloud console will provide you with the command to use which is handy when you are just getting started. The cloud console provides a great overview of all your activities:

Nice and green (because I did not include the failed ones 😉)

In Resources, you can obtain a graph of the deployed resources:

Don’t you just love pretty graphs like this?

Let’s take a look at the code. The complete code is in the following gist: https://gist.github.com/gbaeke/30ae42dd10836881e7d5410743e4897c.

Resource group, storage account and share

The above code creates the resource group, storage account and file share. It is so straightforward that there is no need to explain it, especially if you know how it works with ARM. The simplicity of just referring to properties of resources you just created is awesome!

Next, we create a container group with two containers:

Creating the container group

If you have ever created a container group with a YAML file or ARM template, the above code will be very familiar. It defines a DNS label for the group and sets the type to Linux (ACI also supports Windows). Then two containers are added. The realtime-go container uses CertMagic to obtain Let’s Encrypt certificates. The certificates should be stored in persistent storage and that is what the Azure File Share is used for. It is mounted on /.local/share/certmagic because that is where the files will be placed in a scratch container.

I did run into a small issue with the container group. The realtime-go container should expose both port 80 and 443 but the port setting is a single numeric value. In YAML or ARM, multiple ports can be specified which makes total sense. Pulumi has another cross-cloud option to deploy containers which might do the trick.

All in all, I am pleasantly surprised with Pulumi. It’s definitely worth a more in-depth investigation!

ResNet50v2 classification in Go with a local container

To quickly go to the code, go here. Otherwise, keep reading…

In a previous blog post, I wrote about classifying images with the ResNet50v2 model from the ONNX Model Zoo. In that post, the container ran on a Kubernetes cluster with GPU nodes. The nodes had an NVIDIA v100 GPU. The actual classification was done with a simple Python script with help from Keras and Numpy. Each inference took around 25 milliseconds.

In this post, we will do two things:

  • run the scoring container (CPU) on a local machine that runs Docker
  • perform the scoring (classification) in Go

Installing the scoring container locally

I pushed the scoring container with the ONNX ResNet50v2 image to the following location: https://cloud.docker.com/u/gbaeke/repository/docker/gbaeke/onnxresnet50v2. Run the container with the following command:

docker run -d -p 5001:5001 gbaeke/onnxresnet50

The container will be pulled and started. The scoring URI is on http://localhost:5001/score.

Note that in the previous post, Azure Machine Learning deployed two containers: the scoring container (the one described above) and a front-end container. In that scenario, the front-end container handles the HTTP POST requests (optionally with SSL) and route the request to the actual scoring container.

The scoring container accepts the same payload as the front-end container. That means it can be used on its own, as we are doing now.

Note that you can also use IoT Edge, as explained in an earlier post. That actually shows how easy it is to push AI models to the edge and use them locally, befitting your business case.

Scoring with Go

To actually classify images, I wrote a small Go program to do just that. Although there are some scientific libraries for Go, they are not really needed in this case. That means we do have to create the 4D tensor payload and interpret the softmax result manually. If you check the code, you will see that is not awfully difficult.

The code can be found in the following GitHub repository: https://github.com/gbaeke/resnet-score.

Remember that this model expects the input as a 4D tensor with the following dimensions:

  • dimension 0: batch (we only send one image here)
  • dimension 1: channels (one for each; RGB)
  • dimension 2: height
  • dimension 3: width

The 4D tensor needs to be serialized to JSON in a field called data. We send that data with HTTP POST to the scoring URI at http://localhost:5001/score.

The response from the container will be JSON with two fields: a result field with the 1000 softmax values and a time field with the inference time. We can use the following two structs for marshaling and unmarshaling

Input and output of the model

Note that this model expects pictures to be scaled to 224 by 224 as reflected by the height and width dimensions of the uint8 array. The rest of the code is summarized below:

  • read the image; the path of the image is passed to the code via the -image command line parameter
  • the image is resized with the github.com/disintegration/imaging package (linear method)
  • the 4D tensor is populated by iterating over all pixels of the image, extracting r,g and b and placing them in the BCHW array; note that the r,g and b values are uint16 and scaled to fit in a uint8
  • construct the input which is a struct of type InputData
  • marshal the InputData struct to JSON
  • POST the JSON to the local scoring URI
  • read the HTTP response and unmarshal the response in a struct of type OutputData
  • find the highest probability in the result and note the index where it was found
  • read the 1000 ImageNet categories from imagenet_class_index.json and marshal the JSON into a map of string arrays
  • print the category using the index with the highest probability and the map

What happens when we score the image below?

What is this thing?

Running the code gives the following result:

$ ./class -image images/cassette.jpg

Highest prob is 0.9981583952903748 at 481 (inference time: 0.3309464454650879 )
Probably [n02978881 cassette

The inference time is 1/3 of a second on my older Linux laptop with a dual-core i7.

Try it yourself by running the container and the class program. Download it from here (Linux).