Building a real-time messaging server in Go

Often, I need a simple real-time server and web interface that shows real-time events. Although there are many options available like socket.io for Node.js or services like Azure SignalR and PubNub, I decided to create a real-time server in Go with a simple web front-end:

The impressive UI of the real-time web front-end

For a real-time server in Go, there are several options. You could use Gorilla WebSocket of which there is an excellent tutorial, and use native WebSockets in the browser. There’s also Glue. However, if you want to use the socket.io client, you can use https://github.com/googollee/go-socket.io. It is an implementation, although not a complete one, of socket.io. For production scenarios, I recommend using socket.io with Node.js because it is heavily used, has more features, better documentation, etc…

With that out of the way, let’s take a look at the code. Some things to note in advance:

  • the code uses the concept of rooms (as in a chat room); clients can join a room and only see messages for that room; you can use that concept to create a “room” for a device and only subscribe to messages for that device
  • the code use the excellent https://github.com/mholt/certmagic to enable https via a Let’s Encrypt certificate (DNS-01 verification)
  • the code uses Redis as the back-end; applications send messages to Redis via a PubSub channel; the real-time Go server checks for messages via a subscription to one or more Redis channels

The code is over at https://github.com/gbaeke/realtime-go.

Server

Let’s start with the imports. Naturally we need Redis support, the actual go-socket.io packages and certmagic. The cloudflare package is needed because my domain baeke.info is managed by CloudFlare. The package gives certmagic the ability to create the verification record that Let’s Encrypt will check before issuing the certificate:

import (
"log"
"net/http"
"os"

"github.com/go-redis/redis"
socketio "github.com/googollee/go-socket.io"
"github.com/mholt/certmagic"
"github.com/xenolf/lego/providers/dns/cloudflare"
)

Next, the code checks if the RTHOST environment variable is set. RTHOST should contain the hostname you request the certificate for (e.g. rt.baeke.info).

Let’s check the block of code that sets up the Redis connection.

// redis connection
client := redis.NewClient(&redis.Options{
Addr: getEnv("REDISHOST", "localhost:6379"),
})

// subscribe to all channels
pubsub := client.PSubscribe("*")
_, err := pubsub.Receive()
if err != nil {
panic(err)
}

// messages received on a Go channel
ch := pubsub.Channel()

First, we create a new Redis client. We either use the address in the REDISHOST environment variable or default to localhost:6379. I will later run this server on Azure Container Instances (ACI) in a multi-container setup that also includes Redis.

With the call to PSubscribe, a pattern subscribe is used to subscribe to all PubSub channels (*). If the subscribe succeeds, a Go channel is setup to actually receive messages on.

Now that the Redis connection is configured, let’s turn to socket.io:

server, err := socketio.NewServer(nil)
if err != nil {
log.Fatal(err)
}

server.On("connection", func(so socketio.Socket) {
log.Printf("New connection from %s ", so.Id())

so.On("channel", func(channel string) {
log.Printf("%s joins channel %s\n", so.Id(), channel)
so.Join(channel)
})

so.On("disconnection", func() {
log.Printf("disconnect from %s\n", so.Id())
})
})

The above code is pretty simple. We create a new socket.io server and subsequently setup event handlers for the following events:

  • connection: code that runs when a web client connects; gives us the socket the client connects on which is further used by the channel and disconnection handler
  • channel: this handler runs when a client sends a message of the chosen type channel; the channel contains the name of the socket.io room to join; this is used by the client to indicate what messages to show (e.g. just for device01); in the browser, the client sends a channel message that contains the text “device01”
  • disconnection: code to run when the client disconnects from the socket

Naturally, something crucial is missing. We need to check Redis for messages in Redis channels and broadcast them to matching socket.io “channels”. This is done in a Go routine that runs concurrently with the main code:

 go func(srv *socketio.Server) {
   for msg := range ch {
      log.Println(msg.Channel, msg.Payload)
      srv.BroadcastTo(msg.Channel, "message", msg.Payload)
   }
 }(server)

The anonymous function accepts a parameter of type socketio.Server. We use the BroadcastTo method of socketio.Server to broadcast messages arriving on the Redis PubSub channels to matching socket.io channels. Note that we send a message of type “message” so the client will have to check for “message” coming in as well. Below is a snippet of client-side code that does that. It adds messages to the messages array defined on the Vue.js app:

socket.on('message', function(msg){
app.messages.push(msg)
}

The rest of the server code basically configures certmagic to request the Let’s Encrypt certificate and sets up the http handlers for the static web client and the socket.io server:

// certificate magic
certmagic.Agreed = true
certmagic.CA = certmagic.LetsEncryptStagingCA

cloudflare, err := cloudflare.NewDNSProvider()
if err != nil {
log.Fatal(err)
}

certmagic.DNSProvider = cloudflare

mux := http.NewServeMux()
mux.Handle("/socket.io/", server)
mux.Handle("/", http.FileServer(http.Dir("./assets")))

certmagic.HTTPS([]string{rthost}, mux)

Let’s try it out! The GitHub repository contains a file called multi.yaml, which deploys both the socket.io server and Redis to Azure Container Instances. The following images are used:

  • gbaeke/realtime-go-le: built with this Dockerfile; the image has a size of merely 14MB
  • redis: the official Redis image

To make it work, you will need to update the environment variables in multi.yaml with the domain name and your CloudFlare credentials. If you do not use CloudFlare, you can use one of the other providers. If you want to use the Let’s Encrypt production CA, you will have to change the code, rebuild the container, store it in your registry and modify multi.yaml accordingly.

In Azure Container Instances, the following is shown:

socket.io and Redis container in ACI

To test the setup, I can send a message with redis-cli, from a console to the realtime-redis container:

Testing with redis-cli in the Redis container

You should be aware that using CertMagic with ephemeral storage is NOT a good idea due to potential Let’s Encrypt rate limiting. You should store the requested certificates in persistent storage like an Azure File Share and mount it at /.local/share/certmagic!

Client

The client is a Vue.js app. It was not created with the Vue cli so it just grabs the Vue.js library from the content delivery network (CDN) and has all logic in a single page. The socket.io library (v1.3.7) is also pulled from the CDN. The socket.io client code is kept at a minimum for demonstration purposes:

 var socket = io();
socket.emit('channel','device01');
socket.on('message', function(msg){
app.messages.push(msg)
})

When the page loads, the client emits a channel message to the server with a payload of device01. As you have seen in the server section, the server reacts to this message by joining this client to a socket.io room, in this case with name device01.

Whenever the client receives a message from the server, it adds the message to the messages array which is bound to a list item (li) with a v-for directive.

Surprisingly easy no? With a few lines of code you have a fully functional real-time messaging solution!

Infrastructure as Code: exploring Pulumi

Image: from the Pulumi website

In my Twitter feed, I often come across Pulumi so I decided to try it out. Pulumi is an Infrastructure as Code solution that allows you to use familiar development languages such as JavaScript, Python and Go. The idea is that you define your infrastructure in the language that you prefer, versus some domain specific language. When ready, you merely use pulumi up to deploy your resources (and pulumi update, pulumi destroy, etc…). The screenshot below shows the deployment of an Azure resource group, storage account, file share and a container group on Azure Container Instances. The file share is mapped as a volume to one of the containers in the container group:

Deploying infrastructure with pulumi up

Installation is extremely straightforward. I chose to write the code in JavaScript as I had all the tools already installed on my Windows box. It is also more polished than the Go option (for now). I installed Pulumi per their instructions over at https://pulumi.io/quickstart/install.html.

Next, I used their cloud console to create a new project. Eventually, you will need to run a pulumi new command on your local machine. The cloud console will provide you with the command to use which is handy when you are just getting started. The cloud console provides a great overview of all your activities:

Nice and green (because I did not include the failed ones 😉)

In Resources, you can obtain a graph of the deployed resources:

Don’t you just love pretty graphs like this?

Let’s take a look at the code. The complete code is in the following gist: https://gist.github.com/gbaeke/30ae42dd10836881e7d5410743e4897c.

Resource group, storage account and share

The above code creates the resource group, storage account and file share. It is so straightforward that there is no need to explain it, especially if you know how it works with ARM. The simplicity of just referring to properties of resources you just created is awesome!

Next, we create a container group with two containers:

Creating the container group

If you have ever created a container group with a YAML file or ARM template, the above code will be very familiar. It defines a DNS label for the group and sets the type to Linux (ACI also supports Windows). Then two containers are added. The realtime-go container uses CertMagic to obtain Let’s Encrypt certificates. The certificates should be stored in persistent storage and that is what the Azure File Share is used for. It is mounted on /.local/share/certmagic because that is where the files will be placed in a scratch container.

I did run into a small issue with the container group. The realtime-go container should expose both port 80 and 443 but the port setting is a single numeric value. In YAML or ARM, multiple ports can be specified which makes total sense. Pulumi has another cross-cloud option to deploy containers which might do the trick.

All in all, I am pleasantly surprised with Pulumi. It’s definitely worth a more in-depth investigation!

Azure API Management Consumption Tier

In the previous post, I talked about a personal application I use to deploy Azure resources to my lab subscription. The architecture is pretty straightforward:

After obtaining an id token from Azure Active directory (v1 endpoint), API calls go to API Management with the token in the authorization HTTP header.

API Management is available in several tiers:

API Management tiers

The consumption tier, with its 1.000.000 free calls per month per Azure subscription naturally is the best fit for this application. I do not need virtual network support or multi-region support or even Active Directory support. And I don’t want the invoice either! 😉 Note that the lack of Active Directory support has nothing to do with the ability to verify the validity of a JWT (JSON Web Token).

I created an instance in West Europe but it gave me errors while adding operations (like POSTs or GETs). It complained about reaching the 1000 operations limit. Later, I created an instance in North Europe which had no issues.

Define a product

A product contains one or more APIs and has some configuration such as quotas. You can read up on API products here. You can also add policies at the product level. One example of a policy is a JWT check, which is exactly what I needed. Another example is adding basic authentication to the outgoing call:

Policies at the product level

The first policy, authentication, configures basic authentication and gets the password from the BasicAuthPassword named value:

Named values in API Management

The second policy is the JWT check. Here it is in full:

JWT Policy

The policy checks the validity of the JWT and returns a 401 error if invalid. The openid-config url points to a document that contains useful information to validate the JWT, including a pointer to the public keys that can be used to verify the JWT’s signature (https://login.microsoftonline.com/common/discovery/keys). Note that I also check for the name claim to match mine.

Note that Active Directory is also configured to only issue a token to me. This is done via Enterprise Applications in https://aad.portal.azure.com.

Creating the API

With this out of the way, let’s take a look at the API itself:

Azure Deploy API and its defined operations

The operations are not very RESTful but they do the trick since they are an exact match with the webhookd server’s endpoints.

To not end up with CORS errors, All Operations has a CORS policy defined:

CORS policy at the All operations level

Great! The front-end can now authenticate to Azure AD and call the API exposed by API management. Each call has the Azure AD token (a JWT) in the authorization header so API Management van verify the token’s validity and pass along the request to webhookd.

With the addition of the consumption tier, it makes sense to use API Management in many more cases. And not just for smaller apps like this one!

Recognizing images with Azure Machine Learning and the ONNX ResNet50v2 model

Featured image from: https://medium.com/comet-app/review-of-deep-learning-algorithms-for-object-detection-c1f3d437b852

In a previous post, I discussed the creation of a container image that uses the ResNet50v2 model for image classification. If you want to perform tasks such as localization or segmentation, there are other models that serve that purpose. The image was built with GPU support. Adding GPU support was pretty easy:

  • Use the enable_gpu flag in the Azure Machine Learning SDK or check the GPU box in the Azure Portal; the service will build an image that supports NVIDIA cuda
  • Add GPU support in your score.py file and/or conda dependencies file (scoring script uses the ONNX runtime, so we added the onnxruntime-gpu package)

In this post, we will deploy the image to a Kubernetes cluster with GPU nodes. We will use Azure Kubernetes Service (AKS) for this purpose. Check my previous post if you want to use NVIDIA V100 GPUs. In this post, I use hosts with one V100 GPU.

To get started, make sure you have the Kubernetes cluster deployed and that you followed the steps in my previous post to create the GPU container image. Make sure you attached the cluster to the workspace’s compute.

Deploy image to Kubernetes

Click the container image you created from the previous post and deploy it to the Kubernetes cluster you attached to the workspace by clicking + Create Deployment:

Starting the deployment from the image in the workspace

The Create Deployment screen is shown. Select AKS as deployment target and select the Kubernetes cluster you attached. Then press Create.

Azure Machine Learning now deploys the containers to Kubernetes. Note that I said containers in plural. In addition to the scoring container, another frontend container is added as well. You send your requests to the front-end container using HTTP POST. The front-end container talks to the scoring container over TCP port 5001 and passes the result back. The front-end container can be configured with certificates to support SSL.

Check the deployment and wait until it is healthy. We did not specify advanced settings during deployment so the default settings were chosen. Click the deployment to see the settings:

Deployment settings including authentication keys and scoring URI

As you can see, the deployment has authentication enabled. When you send your HTTP POST request to the scoring URI, make sure you pass an authentication header like so: bearer primary-or-secondary-key. The primary and secondary key are in the settings above. You can regenerate those keys at any time.

Checking the deployment

From the Azure Cloud Shell, issue the following commands in order to list the pods deployed to your Kubernetes cluster:

  • az aks list -o table
  • az aks get-credentials -g RESOURCEGROUP -n CLUSTERNAME
  • kubectl get pods
Listing the deployed pods

Azure Machine Learning has deployed three front-ends (default; can be changed via Advanced Settings during deployment) and one scoring container. Let’s check the container with: kubectl get pod onnxgpu-5d6c65789b-rnc56 -o yaml. Replace the container name with yours. In the output, you should find the following:

resources:
limits:
nvidia.com/gpu: "1"
requests:
cpu: 100m
memory: 500m
nvidia.com/gpu: "1"

The above allows the pod to use the GPU on the host. The nvidia drivers on the host are mapped to the pod with a volume:

volumeMounts:
- mountPath: /usr/local/nvidia
name: nvidia

Great! We did not have to bother with doing this ourselves. Let’s now try to recognize an image by sending requests to the front-end pods.

Recognizing images

To recognize an image, we need to POST a JSON payload to the scoring URI. The scoring URI can be found in the deployment properties in the workspace. In my case, the URI is:

http://23.97.218.34/api/v1/service/onnxgpu/score

The JSON payload needs to be in the below format:

{"data": [[[[143.06100463867188, 130.22100830078125, 122.31999969482422, ... ]]]]} 

The data field is a multi-dimensional array, serialized to JSON. The shape of the array is (1,3,224,224). The dimensions correspond to the batch size, channels (RGB), height and width.

You only have to read an image and put the pixel values in the array! Easy right? Well, as usual the answer is: “it depends”! The easiest way to do it, according to me, is with Python and a collection of helper packages. The code is in the following GitHub gist: https://gist.github.com/gbaeke/b25849f3813e9eb984ee691659d1d05a. You need to run the code on a machine with Python 3 installed. Make sure you also install Keras and NumPy (pip3 install keras / pip3 install numpy). The code uses two images, cat.jpg and car.jpg but you can use your own. When I run the code, I get the following result:

Using TensorFlow backend.
channels_last
Loading and preprocessing image… cat.jpg
Array shape (224, 224, 3)
Array shape afer moveaxis: (3, 224, 224)
Array shape after expand_dims (1, 3, 224, 224)
prediction time (as measured by the scoring container) 0.025304794311523438
Probably a: Egyptian_cat 0.9460222125053406
Loading and preprocessing image… car.jpg
Array shape (224, 224, 3)
Array shape afer moveaxis: (3, 224, 224)
Array shape after expand_dims (1, 3, 224, 224)
prediction time (as measured by the scoring container) 0.02526378631591797
Probably a: sports_car 0.948998749256134

It takes about 25 milliseconds to classify an image, or 40 images/second. By increasing the number of GPUs and scoring containers (we only deployed one), we can easily scale out the solution.

With a bit of help from Keras and NumPy, the code does the following:

  • check the image format reported by the keras back-end: it reports channels_last which means that, by default, the RGB channels are the last dimensions of the image array
  • load the image; the resulting array has a (224,224,3) shape
  • our container expects the channels_first format; we use moveaxis to move the last axis to the front; the array now has a (3,224,224) shape
  • our container expects a first dimension with a batch size; we use expand_dims to end up with a (1,3,224,224) shape
  • we convert the 4D array to a list and construct the JSON payload
  • we send the payload to the scoring URI and pass an authorization header
  • we get a JSON response with two fields: result and time; we print the inference time as reported by the container
  • from keras.applications.resnet50, we use the decode_predictions class to process the result field; result contains the 1000 values computed by the softmax function in the container; decode_predictions knows the categories and returns the first five
  • we print the name and probability of the category with the highest probability (item 0)

What happens when you use a scoring container that uses the CPU? In that case, you could run the container in Azure Container Instances (ACI). Using ACI is much less costly! In ACI with the default setting of 0.1 CPU, it will take around 2 seconds to score an image. Ouch! With a full CPU (in ACI), the scoring time goes down to around 180-220ms per image. To achieve better results, simply increase the number of CPUs. On the Standard_NC6s_v3 Kubernetes node with 6 cores, scoring time with CPU hovers around 60ms.

Conclusion

In this post, you have seen how Azure Machine Learning makes it straightforward to deploy GPU scoring images to a Kubernetes cluster with GPU nodes. The service automatically configures the resource requests for the GPU and maps the NVIDIA drivers to the scoring container. The only thing left to do is to start scoring images with the service. We have seen how easy that is with a bit of help from Keras and NumPy. In practice, always start with CPU scoring and scale out that solution to match your requirements. But if you do need GPUs for scoring, Azure Machine Learning makes it pretty easy to do so!

Creating a GPU container image for scoring with Azure Machine Learning

In a previous post, I discussed how you can add an existing Kubernetes cluster to an Azure Machine Learning workspace. Adding an existing cluster is necessary when the workspace does not support auto creation of a cluster. That is the case when you want to use the Standard_NC6s_v3 virtual machine image. I also used a container for scoring pictures with the ResNet50v2 model from the ONNX Model Zoo. Now we will take a look at actually creating that container image with GPU support. Note that in many cases, inference with CPUs is more than sufficient but the GPU case is more interesting to look at!

To get started, you need an Azure subscription with an Azure Machine Learning workspace. Take a look here for instructions.

Once you have a workspace, there are a few steps to take. If you look at the diagram at the top of this post, we will perform the steps starting from Register and manage your model:

  • Register model: we will add the Resnet50v2 model from the ONNX Model Zoo; we are using this existing model instead of our own; ResNet50v2 can recognize pictures in 1000 categories
  • Create container image: from the model in the workspace, we create a container image with GPU support
  • Deploy container image: from the image in the workspace, we deploy the image to compute that supports GPUs

Machine Learning SDK

The Azure Machine Learning service has a Machine Learning SDK for Python. All the steps discussed above can be performed with code. You can find an example of the Python code to use in the following Jupyter notebook hosted on Azure Notebooks: https://gebaml-geba.notebooks.azure.com/j/notebooks/ONNXResnet.ipynb. Note that the Azure Notebooks service is still in preview and a bit rough around the edges. The Machine Learning SDK is available by default in Azure Notebooks.

At the beginning of the notebook, we import azureml.core which allows you to check the version of the SDK (among other things):

Registering the model

First, we download the model to the notebook project. In the notebook, the urllib module is used to download the compressed version of the ResNet50v2 model. The tarball is untarred in resnet50v2/resnet50v2.onnx. You should see the model as a complex function with, in this case, millions of parameters (weights). The input to the function are the pixels of your picture (their red, green and blue values). The output of the function is a category: cat, guitar, …

Now that we have the model, we need to add it to the workspace, which means we also have to authenticate. Create a file called config.json with the following contents:

{
"subscription_id": "your Azure subscription ID", "resource_group": "your Azure ML resource group",
"workspace_name": "your Azure ML workspace name"
}

With the Workspace class from azureml.core we authenticate to Azure and grab a reference to the workspace with the ws variable. The Workspace.from_config() function searches for the config.json file.

Now we can finally register the model in the workspace using Model.register:

The above is the same as adding a model using the Azure Portal. You might hit file upload limits in the portal so adding the model via code is the better approach. Your model is now registered in the workspace:

Creating a GPU container image from the model

Now that we have the model, we can create the container image. The model will be included in the image which will add about 100MB to its size. The container image in Azure Machine Learning is created from four settings/artifacts:

  • model: registered in the workspace
  • score file: a file score.py with an init() and run() function; helper functions can also be included
  • dependency file: used to indicate the Python modules that need to be installed in the image (see https://conda.io/docs/)
  • GPU support: set to True or False

You will find the score file in the notebook. It was copied from a Microsoft supplied sample. If you do not have some experience with Machine Learning and neural networks (in this case), it will be difficult to create this from scratch. The ResNet50v2 model expects a 4-dimensional tensor with the following dimensions:

  • 0: batch (1 when you send 1 image)
  • 1: channels (3 channels for red, green and blue; RGB)
  • 2: height (224 pixels)
  • 3: width (224 pixels)

For inference, you will actually send the above data in a JSON payload as the data field. The preprocess() function in score.py grabs the data field and converts it to a NumPy array. The data is then normalized by dividing each pixel by 255, subtracting the mean values (of each channel) and dividing by the standard deviation (of each channel) . The normalized data is then sent to the model which outputs an array with 1000 probabilities that sum to 1 (via a softmax function).

Why are there a thousand probabilities? The model was trained on a thousand different categories of images and for each of these categories, a probability is output. After inference we will need a list of these categories so we can find the one that matches with our uploaded image and that has the highest probability!

This particular score.py file uses the ONNX runtime for inference. To enable GPU support, make sure you include the onnxruntime-gpu package in your conda dependencies as shown below:

With score.py and myenv.yml, the container image with GPU support can be created. Note that we are specifying the score.py file, the conda file and the model. GPU support is enabled as well via enable_gpu=True.

The code above should result in the following image in your workspace (after several minutes of building):

In the background, this image is stored in the container registry that got created when you deployed the Azure Machine Learning workspace. You are now ready for the third step, deploying the image to compute that supports GPUs (for instance Kubernetes). That step, together with some code to actually recognize images, will be for another post. In that post, we will also compare CPU to GPU speed.

Conclusion

In this post, we looked at creating a scoring (inference) container image with GPU support. Instead of creating and using our own model, we used the ResNet50v2 model from the ONNX Model Zoo. The model file, together with a score.py file and conda dependency file was used to build a container image. Azure Machine Learning builds the container image for you and stores it in a container registry. Although Azure Machine Learning takes care of most of the infrastructure work, you still need to know how to write the scoring file. In this post, the scoring file uses the ONNX runtime but you can use other runtimes or frameworks such as TensorFlow or MXNET.


Attaching Kubernetes clusters with NVIDIA V100 GPUs to Azure Machine Learning Service

Azure Machine Learning Service allows you to easily deploy compute for training and inference via a machine learning workspace. Although one of the compute types is Kubernetes, the workspace is a bit picky about the node VM sizes. I wanted to use two Standard_NC6s_v3 instances with NVIDIA Tesla V100 GPUs but that was not allowed. Other GPU instances, such as the Standard_NC6 type (K80 GPU) can be deployed from the workspace.

Luckily, you can deploy clusters on your own and then attach the cluster to your Azure Machine Learning workspace. You can create the cluster with the below command. Make sure you ask for a quota increase that allows 12 cores of Standard_NC6s_v3.

az aks create -g RESOURCE_GROUP --generate-ssh-keys --node-vm-size Standard_NC6s_v3 --node-count 2 --disable-rbac --name NAME --admin-username azureuser --kubernetes-version 1.11.5

Before I ran the above command, I created an Azure Machine Learning workspace to a resource group called ml-rg. The above command was run with RESOURCE_GROUP set to ml-rg and NAME set to mlkub. After a few minutes, you should have your cluster up and running. Be mindful of the price of this cluster. GPU instances are not cheap!

Now we can Add Compute to the workspace. In your workspace, navigate to Compute and use the + Add Compute button. Complete the form as below. The compute name does not need to match the cluster name.

After a while, the Kubernetes cluster should be attached:

Manually deployed cluster attached

Note that detaching a cluster does not remove it. Be sure to remove the cluster manually!

You can now deploy container images to the cluster that take advantage of the GPU of each node. When you a deploy an image marked as a GPU image, Azure Machine Learning takes care of all the parameters that allow your container to use the GPU on the Kubernetes node.

The screenshot below shows a deployment of an image that can be used for inference. It uses an ONNX ResNet50v2 model.

Deployment of container for scoring (inference; ResNet50v2)

With the below picture of a cat, the model used by the container guesses it is an Egyptian Cat (it’s not but it is close) with close to 94% certainty.

Egyptian Cat (not)

Using your own compute with the Azure Machine Learning service is very easy to do. The more interesting and somewhat more complicated parts such as the creation of the inference container that supports GPUs is something I will discuss in a later post. In a follow-up post, I will also discuss how you send image data to the scoring container.

Deploying Azure resources using webhookd

In the previous blog post, I discussed adding SSL to webhookd. In this post, I will briefly show how to use this solution to deploy Azure resources.

To run webhookd, I deployed a small Standard_B1s machine (1GB RAM, 1 vCPU) with a system assigned managed identity. After deployment, information about the managed identity is available via the Identity link.

Code running on a machine with a managed identity needs to do something specific to obtain information about the identity like a token. With curl, you would issue the following command:

curl 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F' -H Metadata:true -s

The response would be JSON that contains a field called access_token. You could parse out the access_token and then use the token in a call to the Azure Resource Manager APIs. You would use the token in the autorization header. Full details about acquiring these tokens can be found here. On that page, you will find details about acquiring the token with Go, JavaScript and several other languages.

Because we are using webhookd and shell scripts, the Azure CLI is the ideal way to create Azure resources. The Azure CLI can easily authenticate with the managed identity using a simple command: az login –identity. Here’s a shell script that uses it to create a virtual machine:

#!/bin/bash echo "Authenticating...`az login --identity`" 

echo "Creating the resource group...`az group create -n $rg -l westeurope`"

echo "Creating the vm...`az vm create --no-wait --size Standard_B1s --resource-group $rg --name $vmname --image win2016datacenter --admin-username azureuser --admin-password $pw`"

The script expects three parameters: rg, vmname and pw. We can pass these parameters as HTTP query parameters. If the above script would be in the ./scripts/vm folder as create.sh, I could do the following call to webhookd:

curl --user api -XPOST "https://<public_server_dns>/vm/create?vmname=myvm&rg=myrg&pw=Abcdefg$$$$!!!!" 

The response to the above call would contain the output from the three az commands. The az login command would output the following:

 data:   {
data: "environmentName": "AzureCloud",
data: "id": "<id>",
data: "isDefault": true,
data: "name": "<subscription name>",
data: "state": "Enabled",
data: "tenantId": "<tenant_id>",
data: "user": {
data: "assignedIdentityInfo": "MSI",
data: "name": "systemAssignedIdentity",
data: "type": "servicePrincipal"
data: }

Notice the user object, which clearly indicates we are using a system-assigned managed identity. In my case, the managed identity has the contributor role on an Azure subscription used for testing. With that role, the shell script has the required access rights to deploy the virtual machine.

As you can see, it is very easy to use webhookd to deploy Azure resources if the Azure virtual machine that runs webhookd has a managed identity with the required access rights.