Microsoft Face API with a local container

A few days ago, I obtained access to the Face container. It provides access to the Face API via a container you can run where you want: on your pc, at the network edge or in your datacenter. You should allocate 6 GB or RAM and 2 cores for the container to run well. Note that you still need to create a Face API resource in the Azure Portal. The container needs to be associated with the Azure Face API via the endpoint and access key:

Face API with a West Europe (Amsterdam) endpoint

I used the Standard tier, which charges 0.84 euros per 1000 calls. As noted, the container will not function without associating it with an Azure Face API resource.

When you gain access to the container registry, you can pull the container:

docker pull

After that, you can run the container as follows (for API billing endpoint in West Europe):

docker run --rm -it -p 5000:5000 --memory 6g --cpus 2 Eula=accept Billing= ApiKey=YOUR_API_KEY

The container will start. You will see the output (–it):

Running Face API container

And here’s the spec:

API spec Face API v1

Before showing how to use the detection feature, note that the container needs Internet access for billing purposes. You will not be able to run the container in fully offline scenarios.

Over at, you can find a simple example in Go that uses the container. The Face API can take a byte stream of an image or a URL to an image. The example takes the first approach and loads an image from disk as specified by the -image parameter. The resulting io.Reader is passed to the getFace function which does the actual call to the API (uri = http://localhost:5000/face/v1.0/detect):

request, err := http.NewRequest("POST", uri+"?returnFaceAttributes="+params, m)
request.Header.Add("Content-Type", "application/octet-stream")

// Send the request to the local web service
resp, err := client.Do(request)
if err != nil {
    return "", err

The response contains a Body attribute and that attribute is unmarshalled to a variable of type interface. That one is marshalled with indentation to a byte slice (b) which is returned by the function as a string:

var response interface{}
err = json.Unmarshal(respBody, &response)
if err != nil {
    return "", err
b, err := json.MarshalIndent(response, "", "\t")

Now you can use a picture like the one below:

Is he smiling?

Here are some parts of the input, following the command
detectface -image smiling.jpg

Emotion is clearly happiness with additional features such as age, gender, hair color, etc…

"faceAttributes": {
"accessories": [],
"age": 33,
"blur": {
"blurLevel": "high",
"value": 1
"emotion": {
"anger": 0,
"contempt": 0,
"disgust": 0,
"fear": 0,
"happiness": 1,
"neutral": 0,
"sadness": 0,
"surprise": 0
"exposure": {
"exposureLevel": "goodExposure",
"value": 0.71
"facialHair": {
"beard": 0.6,
"moustache": 0.6,
"sideburns": 0.6
"gender": "male",
"glasses": "NoGlasses",
"hair": {
"bald": 0.26,
"hairColor": [
"color": "black",
"confidence": 1
"faceId": "b6d924c1-13ef-4d19-8bc9-34b0bb21f0ce",
"faceRectangle": {
"height": 1183,
"left": 944,
"top": 167,
"width": 1183

That’s it! Give the Face API container a go with the tool. You can get it here: (Windows)

Building a real-time messaging server in Go

Often, I need a simple real-time server and web interface that shows real-time events. Although there are many options available like for Node.js or services like Azure SignalR and PubNub, I decided to create a real-time server in Go with a simple web front-end:

The impressive UI of the real-time web front-end

For a real-time server in Go, there are several options. You could use Gorilla WebSocket of which there is an excellent tutorial, and use native WebSockets in the browser. There’s also Glue. However, if you want to use the client, you can use It is an implementation, although not a complete one, of For production scenarios, I recommend using with Node.js because it is heavily used, has more features, better documentation, etc…

With that out of the way, let’s take a look at the code. Some things to note in advance:

  • the code uses the concept of rooms (as in a chat room); clients can join a room and only see messages for that room; you can use that concept to create a “room” for a device and only subscribe to messages for that device
  • the code use the excellent to enable https via a Let’s Encrypt certificate (DNS-01 verification)
  • the code uses Redis as the back-end; applications send messages to Redis via a PubSub channel; the real-time Go server checks for messages via a subscription to one or more Redis channels

The code is over at


Let’s start with the imports. Naturally we need Redis support, the actual packages and certmagic. The cloudflare package is needed because my domain is managed by CloudFlare. The package gives certmagic the ability to create the verification record that Let’s Encrypt will check before issuing the certificate:

import (

socketio ""

Next, the code checks if the RTHOST environment variable is set. RTHOST should contain the hostname you request the certificate for (e.g.

Let’s check the block of code that sets up the Redis connection.

// redis connection
client := redis.NewClient(&redis.Options{
Addr: getEnv("REDISHOST", "localhost:6379"),

// subscribe to all channels
pubsub := client.PSubscribe("*")
_, err := pubsub.Receive()
if err != nil {

// messages received on a Go channel
ch := pubsub.Channel()

First, we create a new Redis client. We either use the address in the REDISHOST environment variable or default to localhost:6379. I will later run this server on Azure Container Instances (ACI) in a multi-container setup that also includes Redis.

With the call to PSubscribe, a pattern subscribe is used to subscribe to all PubSub channels (*). If the subscribe succeeds, a Go channel is setup to actually receive messages on.

Now that the Redis connection is configured, let’s turn to

server, err := socketio.NewServer(nil)
if err != nil {

server.On("connection", func(so socketio.Socket) {
log.Printf("New connection from %s ", so.Id())

so.On("channel", func(channel string) {
log.Printf("%s joins channel %s\n", so.Id(), channel)

so.On("disconnection", func() {
log.Printf("disconnect from %s\n", so.Id())

The above code is pretty simple. We create a new server and subsequently setup event handlers for the following events:

  • connection: code that runs when a web client connects; gives us the socket the client connects on which is further used by the channel and disconnection handler
  • channel: this handler runs when a client sends a message of the chosen type channel; the channel contains the name of the room to join; this is used by the client to indicate what messages to show (e.g. just for device01); in the browser, the client sends a channel message that contains the text “device01”
  • disconnection: code to run when the client disconnects from the socket

Naturally, something crucial is missing. We need to check Redis for messages in Redis channels and broadcast them to matching “channels”. This is done in a Go routine that runs concurrently with the main code:

 go func(srv *socketio.Server) {
   for msg := range ch {
      log.Println(msg.Channel, msg.Payload)
      srv.BroadcastTo(msg.Channel, "message", msg.Payload)

The anonymous function accepts a parameter of type socketio.Server. We use the BroadcastTo method of socketio.Server to broadcast messages arriving on the Redis PubSub channels to matching channels. Note that we send a message of type “message” so the client will have to check for “message” coming in as well. Below is a snippet of client-side code that does that. It adds messages to the messages array defined on the Vue.js app:

socket.on('message', function(msg){

The rest of the server code basically configures certmagic to request the Let’s Encrypt certificate and sets up the http handlers for the static web client and the server:

// certificate magic
certmagic.Agreed = true
certmagic.CA = certmagic.LetsEncryptStagingCA

cloudflare, err := cloudflare.NewDNSProvider()
if err != nil {

certmagic.DNSProvider = cloudflare

mux := http.NewServeMux()
mux.Handle("/", server)
mux.Handle("/", http.FileServer(http.Dir("./assets")))

certmagic.HTTPS([]string{rthost}, mux)

Let’s try it out! The GitHub repository contains a file called multi.yaml, which deploys both the server and Redis to Azure Container Instances. The following images are used:

  • gbaeke/realtime-go-le: built with this Dockerfile; the image has a size of merely 14MB
  • redis: the official Redis image

To make it work, you will need to update the environment variables in multi.yaml with the domain name and your CloudFlare credentials. If you do not use CloudFlare, you can use one of the other providers. If you want to use the Let’s Encrypt production CA, you will have to change the code, rebuild the container, store it in your registry and modify multi.yaml accordingly.

In Azure Container Instances, the following is shown: and Redis container in ACI

To test the setup, I can send a message with redis-cli, from a console to the realtime-redis container:

Testing with redis-cli in the Redis container

You should be aware that using CertMagic with ephemeral storage is NOT a good idea due to potential Let’s Encrypt rate limiting. You should store the requested certificates in persistent storage like an Azure File Share and mount it at /.local/share/certmagic!


The client is a Vue.js app. It was not created with the Vue cli so it just grabs the Vue.js library from the content delivery network (CDN) and has all logic in a single page. The library (v1.3.7) is also pulled from the CDN. The client code is kept at a minimum for demonstration purposes:

 var socket = io();
socket.on('message', function(msg){

When the page loads, the client emits a channel message to the server with a payload of device01. As you have seen in the server section, the server reacts to this message by joining this client to a room, in this case with name device01.

Whenever the client receives a message from the server, it adds the message to the messages array which is bound to a list item (li) with a v-for directive.

Surprisingly easy no? With a few lines of code you have a fully functional real-time messaging solution!

Azure API Management Consumption Tier

In the previous post, I talked about a personal application I use to deploy Azure resources to my lab subscription. The architecture is pretty straightforward:

After obtaining an id token from Azure Active directory (v1 endpoint), API calls go to API Management with the token in the authorization HTTP header.

API Management is available in several tiers:

API Management tiers

The consumption tier, with its 1.000.000 free calls per month per Azure subscription naturally is the best fit for this application. I do not need virtual network support or multi-region support or even Active Directory support. And I don’t want the invoice either! 😉 Note that the lack of Active Directory support has nothing to do with the ability to verify the validity of a JWT (JSON Web Token).

I created an instance in West Europe but it gave me errors while adding operations (like POSTs or GETs). It complained about reaching the 1000 operations limit. Later, I created an instance in North Europe which had no issues.

Define a product

A product contains one or more APIs and has some configuration such as quotas. You can read up on API products here. You can also add policies at the product level. One example of a policy is a JWT check, which is exactly what I needed. Another example is adding basic authentication to the outgoing call:

Policies at the product level

The first policy, authentication, configures basic authentication and gets the password from the BasicAuthPassword named value:

Named values in API Management

The second policy is the JWT check. Here it is in full:

JWT Policy

The policy checks the validity of the JWT and returns a 401 error if invalid. The openid-config url points to a document that contains useful information to validate the JWT, including a pointer to the public keys that can be used to verify the JWT’s signature ( Note that I also check for the name claim to match mine.

Note that Active Directory is also configured to only issue a token to me. This is done via Enterprise Applications in

Creating the API

With this out of the way, let’s take a look at the API itself:

Azure Deploy API and its defined operations

The operations are not very RESTful but they do the trick since they are an exact match with the webhookd server’s endpoints.

To not end up with CORS errors, All Operations has a CORS policy defined:

CORS policy at the All operations level

Great! The front-end can now authenticate to Azure AD and call the API exposed by API management. Each call has the Azure AD token (a JWT) in the authorization header so API Management van verify the token’s validity and pass along the request to webhookd.

With the addition of the consumption tier, it makes sense to use API Management in many more cases. And not just for smaller apps like this one!

Simple Azure AD Authentication in a single page application (SPA)

Adding Azure AD integration to a website is often confusing if you are just getting started. Let’s face it, not everybody has the opportunity to dig deep into such topics. For, I wanted to enable Azure AD authentication so that only a select group of users in our AD tenant can call the back-end webhooks exposed by webhookd. The architecture of the application looks like this:

Client to webhook

The process is as follows:

  • Load the client from
  • Client obtains a token from Azure Active Directory; the user will have to authenticate; in our case that means that a second factor needs to be provided as well
  • When the user performs an action that invokes a webhook, the call is sent to API Management
  • API Management verifies the token and passes the request to webhookd over https with basic authentication
  • The response is received by API Management which passes it unmodified to the client

I know you are an observing reader that is probably thinking: “why not present the token to webhookd?”. That’s possible but then I did not have a reason to use API Management! 😉

Before we begin you might want to get some background information about what we are going to do. Take a look at this excellent Youtube video that explains topics such a OAuth 2.0 and OpenID Connect in an easy to understand format:

Create an application in Azure AD

The first step is to create a new application registration. You can do this from In Azure Active Directory, select App registrations or use the new App registrations (Preview) experience.

For single page applications (SPAs), the application type should be Web app / API. As the App ID URI and Home page URL, I used

In my app, a user will authenticate to Azure AD with a Login button. Clicking that button brings the user to a Microsoft hosted page that asks for credentials:

Providing user credentials

Naturally, this implies that the authentication process, when finished, needs to find its way back to the application. In that process, it will also bring along the obtained authentication token. To configure this, specify the Reply URLs. If you also develop on your local machine, include the local URL of the app as well:

Reply URLs of the registered app

For a SPA, you need to set an additional option in the application manifest (via the Manifest button):

"oauth2AllowImplicitFlow": true

This implicit flow is well explained in the above video and also here.

This is basically all you have to do for this specific application. In other cases, you might want to grant access from this application to other applications such as an API. Take a look at this post for more information about calling the Graph API or your own API.

We will just present the token obtained by the client to API Management. In turn, API Management will verify the token. If it does not pass the verification steps, a 401 error will be returned. We will look at API Management in a later post.

A bit of client code

Just browse to and view the source. Authentication is performed with ADAL for Javascript. ADAL stands for the Active Directory Authentication Library. The library is loaded with from the CDN.

This is a simple Vue application so we have a Vue instance for data and methods. In that Vue instance data, authContext is setup via a call to new AuthenticationContext. The clientId is the Application ID of the registered app we created above:

authContext: new AuthenticationContext({ 
clientId: '1fc9093e-8a95-44f8-b524-45c5d460b0d8',
postLogoutRedirectUri: window.location

To authenticate, the Login button’s click handler calls authContext.login(). The login method uses a redirect. It is also possible to use a pop-up window by setting popUp: true in the object passed to new AuthenticationContext() above. Personally, I do not like that approach though.

In the created lifecycle hook of the Vue instance, there is some code that handles the callback. When not in the callback, getCachedUser() is used to check if the user is logged in. If she is, the token is obtained via acquireToken() and stored in the token variable of the Vue instance. The acquireToken() method allows the application to obtain tokens silently without prompting the user again. The first parameter of acquireToken is the same application ID of the registered app.

Note that the token (an ID token) is not encrypted. You can paste the token in and look inside. Here’s an example (click to navigate):

Calling the back-end API

In this application, the calls go to API Management. Here is an example of a call with axios:''                             + this.createrg.rg , null, this.getAxiosConfig(this.token)) 
.then(function(result) {
console.log("Got response...")
self.response =;
.catch(function(error) {
console.log("Error calling webhook: " + error)

The third parameter is a call to getAxiosConfig that passes the token. getAxiosConfig uses the token to create the Authorization header:

getAxiosConfig: function(token) { 
const config = {
headers: {
"authorization": "bearer " + token
return config

As discussed earlier, the call goes to API Management which will verify the token before allowing a call to webhookd.


With the source of and this post, it should be fairly straightforward to enable Azure AD Authentication in a simple single page web application. Note that the code is kept as simple as possible and does not cover any edge cases. In a next post, we will take a look at API Management.

Detecting emotions with FER+

In an earlier post, I discussed classifying images with the ResNet50v2 model. Azure Machine Learning Service was used to create a container image that used the ONNX ResNet50v2 model and the ONNX Runtime for scoring.

Continuing on that theme, I created a container image that uses the ONNX FER+ model that can detect emotions in an image. The container image also uses the ONNX Runtime for scoring.

You might wonder why you would want to detect emotions this way when there are many services available that can do this for you with a simple API call! You could use Microsoft’s Face API or Amazon’s Rekognition for example. While those services are easy to use and provide additional features, they do come at a cost. If all you need is basic detection of emotions, using this FER+ container is sufficient and cost effective.

Azure Face API (image from Microsoft website)

A notebook to create the image and deploy a container to Azure Container Instances (ACI) can be found here. The notebook uses the Azure Machine Learning SDK to register the model to an Azure Machine Learning workspace, build a container image from that model and deploy the container to ACI. The scoring script is shown below.

The model expects an 64×64 gray scale image of a face in an array with the following dimensions: [1][1][64][64]. The output is JSON with a results array that contains the probabilities for each emotion and a time field with the inference time.

The emotion probabilities are in this order:

0: "neutral", 1: "happy", 2: "surprise", 3: "sadness", 4: "anger", 5: "disgust", 6: "fear", 7: "contempt

To actually capture the emotions, I wrote a small demo program in Go that uses OpenCV (via GoCV). You can find it on GitHub: You will need to install OpenCV and GoCV. Find the instructions here: There are similar instructions for Mac and Windows but I have not tried those

The program is still a little rough around the edges but it does the trick. The scoring URI is hard coded to http://localhost:5002/score. With Docker installed, use the following command to install the scoring container:

 docker run -d -p 5002:5001 gbaeke/onnxferplus

Have fun with it!

ResNet50v2 classification in Go with a local container

To quickly go to the code, go here. Otherwise, keep reading…

In a previous blog post, I wrote about classifying images with the ResNet50v2 model from the ONNX Model Zoo. In that post, the container ran on a Kubernetes cluster with GPU nodes. The nodes had an NVIDIA v100 GPU. The actual classification was done with a simple Python script with help from Keras and Numpy. Each inference took around 25 milliseconds.

In this post, we will do two things:

  • run the scoring container (CPU) on a local machine that runs Docker
  • perform the scoring (classification) in Go

Installing the scoring container locally

I pushed the scoring container with the ONNX ResNet50v2 image to the following location: Run the container with the following command:

docker run -d -p 5001:5001 gbaeke/onnxresnet50

The container will be pulled and started. The scoring URI is on http://localhost:5001/score.

Note that in the previous post, Azure Machine Learning deployed two containers: the scoring container (the one described above) and a front-end container. In that scenario, the front-end container handles the HTTP POST requests (optionally with SSL) and route the request to the actual scoring container.

The scoring container accepts the same payload as the front-end container. That means it can be used on its own, as we are doing now.

Note that you can also use IoT Edge, as explained in an earlier post. That actually shows how easy it is to push AI models to the edge and use them locally, befitting your business case.

Scoring with Go

To actually classify images, I wrote a small Go program to do just that. Although there are some scientific libraries for Go, they are not really needed in this case. That means we do have to create the 4D tensor payload and interpret the softmax result manually. If you check the code, you will see that is not awfully difficult.

The code can be found in the following GitHub repository:

Remember that this model expects the input as a 4D tensor with the following dimensions:

  • dimension 0: batch (we only send one image here)
  • dimension 1: channels (one for each; RGB)
  • dimension 2: height
  • dimension 3: width

The 4D tensor needs to be serialized to JSON in a field called data. We send that data with HTTP POST to the scoring URI at http://localhost:5001/score.

The response from the container will be JSON with two fields: a result field with the 1000 softmax values and a time field with the inference time. We can use the following two structs for marshaling and unmarshaling

Input and output of the model

Note that this model expects pictures to be scaled to 224 by 224 as reflected by the height and width dimensions of the uint8 array. The rest of the code is summarized below:

  • read the image; the path of the image is passed to the code via the -image command line parameter
  • the image is resized with the package (linear method)
  • the 4D tensor is populated by iterating over all pixels of the image, extracting r,g and b and placing them in the BCHW array; note that the r,g and b values are uint16 and scaled to fit in a uint8
  • construct the input which is a struct of type InputData
  • marshal the InputData struct to JSON
  • POST the JSON to the local scoring URI
  • read the HTTP response and unmarshal the response in a struct of type OutputData
  • find the highest probability in the result and note the index where it was found
  • read the 1000 ImageNet categories from imagenet_class_index.json and marshal the JSON into a map of string arrays
  • print the category using the index with the highest probability and the map

What happens when we score the image below?

What is this thing?

Running the code gives the following result:

$ ./class -image images/cassette.jpg

Highest prob is 0.9981583952903748 at 481 (inference time: 0.3309464454650879 )
Probably [n02978881 cassette

The inference time is 1/3 of a second on my older Linux laptop with a dual-core i7.

Try it yourself by running the container and the class program. Download it from here (Linux).

Recognizing images with Azure Machine Learning and the ONNX ResNet50v2 model

Featured image from:

In a previous post, I discussed the creation of a container image that uses the ResNet50v2 model for image classification. If you want to perform tasks such as localization or segmentation, there are other models that serve that purpose. The image was built with GPU support. Adding GPU support was pretty easy:

  • Use the enable_gpu flag in the Azure Machine Learning SDK or check the GPU box in the Azure Portal; the service will build an image that supports NVIDIA cuda
  • Add GPU support in your file and/or conda dependencies file (scoring script uses the ONNX runtime, so we added the onnxruntime-gpu package)

In this post, we will deploy the image to a Kubernetes cluster with GPU nodes. We will use Azure Kubernetes Service (AKS) for this purpose. Check my previous post if you want to use NVIDIA V100 GPUs. In this post, I use hosts with one V100 GPU.

To get started, make sure you have the Kubernetes cluster deployed and that you followed the steps in my previous post to create the GPU container image. Make sure you attached the cluster to the workspace’s compute.

Deploy image to Kubernetes

Click the container image you created from the previous post and deploy it to the Kubernetes cluster you attached to the workspace by clicking + Create Deployment:

Starting the deployment from the image in the workspace

The Create Deployment screen is shown. Select AKS as deployment target and select the Kubernetes cluster you attached. Then press Create.

Azure Machine Learning now deploys the containers to Kubernetes. Note that I said containers in plural. In addition to the scoring container, another frontend container is added as well. You send your requests to the front-end container using HTTP POST. The front-end container talks to the scoring container over TCP port 5001 and passes the result back. The front-end container can be configured with certificates to support SSL.

Check the deployment and wait until it is healthy. We did not specify advanced settings during deployment so the default settings were chosen. Click the deployment to see the settings:

Deployment settings including authentication keys and scoring URI

As you can see, the deployment has authentication enabled. When you send your HTTP POST request to the scoring URI, make sure you pass an authentication header like so: bearer primary-or-secondary-key. The primary and secondary key are in the settings above. You can regenerate those keys at any time.

Checking the deployment

From the Azure Cloud Shell, issue the following commands in order to list the pods deployed to your Kubernetes cluster:

  • az aks list -o table
  • az aks get-credentials -g RESOURCEGROUP -n CLUSTERNAME
  • kubectl get pods
Listing the deployed pods

Azure Machine Learning has deployed three front-ends (default; can be changed via Advanced Settings during deployment) and one scoring container. Let’s check the container with: kubectl get pod onnxgpu-5d6c65789b-rnc56 -o yaml. Replace the container name with yours. In the output, you should find the following:

limits: "1"
cpu: 100m
memory: 500m "1"

The above allows the pod to use the GPU on the host. The nvidia drivers on the host are mapped to the pod with a volume:

- mountPath: /usr/local/nvidia
name: nvidia

Great! We did not have to bother with doing this ourselves. Let’s now try to recognize an image by sending requests to the front-end pods.

Recognizing images

To recognize an image, we need to POST a JSON payload to the scoring URI. The scoring URI can be found in the deployment properties in the workspace. In my case, the URI is:

The JSON payload needs to be in the below format:

{"data": [[[[143.06100463867188, 130.22100830078125, 122.31999969482422, ... ]]]]} 

The data field is a multi-dimensional array, serialized to JSON. The shape of the array is (1,3,224,224). The dimensions correspond to the batch size, channels (RGB), height and width.

You only have to read an image and put the pixel values in the array! Easy right? Well, as usual the answer is: “it depends”! The easiest way to do it, according to me, is with Python and a collection of helper packages. The code is in the following GitHub gist: You need to run the code on a machine with Python 3 installed. Make sure you also install Keras and NumPy (pip3 install keras / pip3 install numpy). The code uses two images, cat.jpg and car.jpg but you can use your own. When I run the code, I get the following result:

Using TensorFlow backend.
Loading and preprocessing image… cat.jpg
Array shape (224, 224, 3)
Array shape afer moveaxis: (3, 224, 224)
Array shape after expand_dims (1, 3, 224, 224)
prediction time (as measured by the scoring container) 0.025304794311523438
Probably a: Egyptian_cat 0.9460222125053406
Loading and preprocessing image… car.jpg
Array shape (224, 224, 3)
Array shape afer moveaxis: (3, 224, 224)
Array shape after expand_dims (1, 3, 224, 224)
prediction time (as measured by the scoring container) 0.02526378631591797
Probably a: sports_car 0.948998749256134

It takes about 25 milliseconds to classify an image, or 40 images/second. By increasing the number of GPUs and scoring containers (we only deployed one), we can easily scale out the solution.

With a bit of help from Keras and NumPy, the code does the following:

  • check the image format reported by the keras back-end: it reports channels_last which means that, by default, the RGB channels are the last dimensions of the image array
  • load the image; the resulting array has a (224,224,3) shape
  • our container expects the channels_first format; we use moveaxis to move the last axis to the front; the array now has a (3,224,224) shape
  • our container expects a first dimension with a batch size; we use expand_dims to end up with a (1,3,224,224) shape
  • we convert the 4D array to a list and construct the JSON payload
  • we send the payload to the scoring URI and pass an authorization header
  • we get a JSON response with two fields: result and time; we print the inference time as reported by the container
  • from keras.applications.resnet50, we use the decode_predictions class to process the result field; result contains the 1000 values computed by the softmax function in the container; decode_predictions knows the categories and returns the first five
  • we print the name and probability of the category with the highest probability (item 0)

What happens when you use a scoring container that uses the CPU? In that case, you could run the container in Azure Container Instances (ACI). Using ACI is much less costly! In ACI with the default setting of 0.1 CPU, it will take around 2 seconds to score an image. Ouch! With a full CPU (in ACI), the scoring time goes down to around 180-220ms per image. To achieve better results, simply increase the number of CPUs. On the Standard_NC6s_v3 Kubernetes node with 6 cores, scoring time with CPU hovers around 60ms.


In this post, you have seen how Azure Machine Learning makes it straightforward to deploy GPU scoring images to a Kubernetes cluster with GPU nodes. The service automatically configures the resource requests for the GPU and maps the NVIDIA drivers to the scoring container. The only thing left to do is to start scoring images with the service. We have seen how easy that is with a bit of help from Keras and NumPy. In practice, always start with CPU scoring and scale out that solution to match your requirements. But if you do need GPUs for scoring, Azure Machine Learning makes it pretty easy to do so!