Creating and deploying a model with Azure Machine Learning Service

In this post, we will take a look at creating a simple machine learning model for text classification and deploying it as a container with Azure Machine Learning service. This post is not intended to discuss the finer details of creating a text classification model. In fact, we will use the Keras library and its Reuters newswire dataset to create a simple dense neural network. You can find many online examples based on this dataset. For further information, be sure to check out and buy 👍 Deep Learning with Python by François Chollet, the creator of Keras and now at Google. It contains a section that explains using this dataset in much more detail!

Machine Learning service workspace

To get started, you need an Azure subscription. Once you have the subscription, create a Machine Learning service workspace. Below, you see such a workspace:

My Machine Learning service workspace (gebaml)

Together with the workspace, you also get a storage account, a key vault, application insights and a container registry. In later steps, we will create a container and store it in this registry. That all happens behind the scenes though. You will just write a few simple lines of code to make that happen!

Note the Authoring (Preview) section! These were added just before Build 2019 started. For now, we will not use them.

Azure Notebooks

To create the model and interact with the workspace, we will use a free Jupyter notebook in Azure Notebooks. At this point in time (8 May 2019), Azure Notebooks is still in preview. To get started, find the link below in the Overview section of the Machine Learning service workspace:

Getting Started with Notebooks

To quickly get the notebook, you can clone my public project: ⏩⏩⏩

Creating the model

When you open the notebook, you will see the following first four cells:

Getting the dataset

It’s always simple if a prepared dataset is handed to you like in the above example. Above, you simply use the reuters class of keras.datasets and use the load_data method to get the data and directly assign it to variables to hold the train and test data plus labels.

In this case, the data consists of newswires with a corresponding label that indicates the category of the newswire (e.g. an earnings call newswire). There are 46 categories in this dataset. In the real world, you would have the newswire in text format. In this case, the newswire has already been converted (preprocessed) for you in an array of integers, with each integer corresponding to a word in a dictionary.

A bit further in the notebook, you will find a Vectorization section:


In this section, the train and test data is vectorized using a one-hot encoding method. Because we specified, in the very first cell of the notebook, to only use the 10000 most important words each article can be converted to a vector with 10000 values. Each value is either 1 or 0, indicating the word is in the text or not.

This bag-of-words approach is one of the ways to represent text in a data structure that can be used in a machine learning model. Besides vectorizing the training and test samples, the categories are also one-hot encoded.

Now the dense neural network model can be created:

Dense neural net with Keras

The above code defines a very simple dense neural network. A dense neural network is not necessarily the best type but that’s ok for this post. The specifics are not that important. Just note that the nn variable is our model. We will use this variable later when we convert the model to the ONNX format.

The last cell (16 above) does the actual training in 9 epochs. Training will be fast because the dataset is relatively small and the neural network is simple. Using the Azure Notebooks compute is sufficient. After 9 epochs, this is the result:

Training result

Not exactly earth-shattering: 78% accuracy on the test set!

Saving the model in ONNX format

ONNX is an open format to store deep learning models. When your model is in that format, you can use the ONNX runtime for inference.

Converting the Keras model to ONNX is easy with the onnxmltools:

Converting the Keras model to ONNX

The result of the above code is a file called reuters.onnx in your notebook project.

Predict with the ONNX model

Let’s try to predict the category of the first newswire in the test set. Its real label is 3, which means it’s a newswire about an earnings call (earn class):

Inferencing with the ONNX model

We will use similar code later in, a file that will be used in a container we will create to expose the model as an API. The code is pretty simple: start an inference session based on the reuters.onnx file, grab the input and output and use run to predict. The resulting array is the output of the softmax layer and we use argmax to extract the category with the highest probability.

Saving the model to the workspace

With the model in reuters.onnx, we can add it to the workspace:

Saving the model in the workspace

You will need a file in your Azure Notebook project called config.json with the following contents:

     "subscription_id": "<subscription-id>",
     "resource_group": "<resource-group>",
     "workspace_name": "<workspace-name>" 

With that file in place, when you run cell 27 (see above), you will need to authenticate to Azure to be able to interact with the workspace. The code is pretty self-explanatory: the reuters.onnx model will be added to the workspace:

Models added to the workspace

As you can see, you can save multiple versions of the model. This happens automatically when you save a model with the same name.

Creating the scoring container image

The scoring (or inference) container image is used to expose an API to predict categories of newswires. Obviously, you will need to give some instructions how scoring needs to be done. This is done via

The code is similar to the code we wrote earlier to test the ONNX model. needs an init() and run() function. The other functions are helper functions. In init(), we need to grab a reference to the ONNX model. The ONNX model file will be placed in the container during the build process. Next, we start an InferenceSession via the ONNX runtime. In run(), the code is similar to our earlier example. It predicts via and returns the result as JSON. We do not have to worry about the rest of the code that runs the API. That is handled by Machine Learning service.

Note: using ONNX is not a requirement; we could have persisted and used the native Keras model for instance

In this post, we only need since we do not train our model via Azure Machine learning service. If you train a model with the service, you would create a file to instruct how training should be done based on data in a storage account for instance. You would also provision compute resources for training. In our case, that is not required so we train, save and export the model directly from the notebook.

Training and scoring with Machine Learning service

Now we need to create an environment file to indicate the required Python packages and start the image build process:

Create an environment yml file via the API and build the container

The build process is handled by the service and makes sure the model file is in the container, in addition to and myenv.yml. The result is a fully functional container that exposes an API that takes an input (a newswire) and outputs an array of probabilities. Of course, it is up to you to define what the input and output should be. In this case, you are expected to provide a one-hot encoded article as input.

The container image will be listed in the workspace, potentially multiple versions of it:

Container images for the reuters ONNX model

Deploy to Azure Container Instances

When the image is ready, you can deploy it via the Machine Learning service to Azure Container Instances (ACI) or Azure Kubernetes Service (AKS). To deploy to ACI:

Deploying to ACI

When the deployment is finished, the deployment will be listed:

Deployment (ACI)

When you click on the deployment, the scoring URI will be shown (e.g. http://IPADDRESS:80/score). You can now use Postman or any other method to score an article. To quickly test the service from the notebook:

Testing the service

The helper method run of aci_service will post the JSON in test_sample to the service. It knows the scoring URI from the deployment earlier.


Containerizing a machine learning model and exposing it as an API is made surprisingly simple with Azure Machine learning service. It saves time so you can focus on the hard work of creating a model that performs well in the field. In this post, we used a sample dataset and a simple dense neural network to illustrate how you can build such a model, convert it to ONNX format and use the ONNX runtime for scoring.

Creating and containerizing a TensorFlow Go application

In an earlier post, I discussed using a TensorFlow model from a Go application. With the TensorFlow bindings for Go, you can load a model that was exported with TensorFlow’s SavedModelBuilder module. That module saves a “snapshot” of a trained model which can be used for inference.

In this post, we will actually use the model in a web application. The application presents the user with a page to upload an image:

The upload page

The class and its probability is displayed, including the processed image:

Clearly a hen!

The source code of the application can be found at If you just want to try the application, use Docker and issue the following command (replace port 80 with another port if there is a conflict):

docker run -p 80:9090 -d gbaeke/nasnet

The image is around 2.55GB in size so be patient when you first run the application. When the container has started, open your browser at http://localhost to see the upload page.

To quickly try it, you can run the container on Azure Container Instances. If you use the Portal, specify port 9090 as the container port.

Nasnet container in ACI

A closer look at the appN

**UPDATE**: since first publication, the http handler code was moved into from main.go to handlers/handlers.go

In the init() function, the nasnet model is loaded with tf.LoadSavedModel. The ImageNet categories are also loaded with a call to getCategories() and stored in categories which is a map of int to a string array.

In main(), we simply print the TensorFlow version (1.12). Next, http.HandleFunc is used to setup a handler (upload func) when users connect to the root of the web app.

Naturally, most of the logic is in the upload function. In summary, it does the following:

  • when users just navigate to the page (HTTP GET verb), render the upload.gtpl template; that template contains the upload form and uses a bit of bootstrap to make it just a bit better looking (and that’s already an overstatement); to learn more about Go web templates, see this link.
  • when users submit a file (POST), the following happens:
    • read the image
    • convert the image to a tensor with the getTensor function; getTensor returns a *tf.Tensor; the tensor is created from a [1][224][224][3] array; note that each pixel value gets normalized by subtracting by 127.5 and then dividing by 127.5 which is the same preprocessing applied as in Keras (divide by 127.5 and subtract 1)
    • run a session by inputting the tensor and getting the categories and probabilities as output
    • look for the highest probability and save it, together with the category name in a variable of type ResultPageData (a struct)
    • the struct data is used as input for the response.gtpl template

Note that the image is also shown in the output. The processed image (resized to 224×224) gets converted to a base64-encoded string. That string can be used in HTML image rendering as follows (where {{.Picture}} in the template will be replaced by the encoded string):

 <img src="data:image/jpg;base64,{{.Picture}}"> 

Note that the application lacks sufficient error checking to gracefully handle the upload of non-image files. Maybe I’ll add that later! 😉


To containerize the application, I used the Dockerfile from but removed the step that downloads the InceptionV3 model. My application contains a ready to use NasnetMobile model.

The container image is based on tensorflow/tensorflow:1.12.0. It is further modified as required with the TensorFlow C API and the installation of Go. As discussed earlier, I uploaded a working image on Docker Hub.


Once you know how to use TensorFlow models from Go applications, it is easy to embed them in any application, from command-line tools to APIs to web applications. Although this application does server-side processing, you can also use a model directly in the browser with TensorFlow.js or ONNX.js. For ONNX, try to perform image classification with ResNet50 in the browser. You will notice that it will take a while to get started due to the model being downloaded. Once the model is downloaded, you can start classifying images. Personally, I prefer the server-side approach but it all depends on the scenario.