Deploying Azure Cognitive Services Containers with IoT Edge

Introduction

Azure Cognitive Services is a collection of APIs that make your applications smarter. Some of those APIs are listed below:

  • Vision: image classification, face detection (including emotions), OCR
  • Language: text analytics (e.g. key phrase or sentiment analysis), language detection and translation

To use one of the APIs you need to provision it in an Azure subscription. After provisioning, you will get an endpoint and API key. Every time you want to classify an image or detect sentiment in a piece of text, you will need to post an appropriate payload to the cloud endpoint and pass along the API key as well.

What if you want to use these services but you do not want to pass your payload to a cloud endpoint for compliance or latency reasons? In that case, the Cognitive Services containers can be used. In this post, we will take a look at the Text Analytics containers, specifically the one for Sentiment Analysis. Instead of deploying the container manually, we will deploy the container with IoT Edge.

IoT Edge Configuration

To get started, create an IoT Hub. The free tier will do just fine. When the IoT Hub is created, create an IoT Edge device. Next, configure your actual edge device to connect to IoT Hub with the connection string of the device you created in IoT Hub. Microsoft have a great tutorial to do all of the above, using a virtual machine in Azure as the edge device. The tutorial I linked to is the one for an edge device running Linux. When finished, the device should report its status to IoT Hub:

If you want to install IoT Edge on an existing device like a laptop, follow the procedure for Linux x64.

Once you have your edge device up and running, you can use the following command to obtain the status of your edge device: sudo systemctl status iotedge. The result:

Deploy Sentiment Analysis container

With the IoT Edge daemon up and running, we can deploy the Sentiment Analysis container. In IoT Hub, select your IoT Edge device and select Set modules:

In Set Modules you have the ability to configure the modules for this specific device. Modules are always deployed as containers and they do not have to be specifically designed or developed for use with IoT Edge. In the three step wizard, add the Sentiment Analysis container in the first step. Click Add and then select IoT Edge Module. Provide the following settings:

Although the container can freely be pulled from the Image URI, the container needs to be configured with billing info and an API key. In the Billing environment variable, specify the endpoint URL for the API you configured in the cloud. In ApiKey set your API key. Note that the container always needs to be connected to the cloud to verify that you are allowed to use the service. Remember that although your payload is not sent to the cloud, your container usage is. The full container create options are listed below:

{
"Env": [
"Eula=accept",
"Billing=https://westeurope.api.cognitive.microsoft.com/text/analytics/v2.0",
"ApiKey=<yourKEY>"
],
"HostConfig": {
"PortBindings": {
"5000/tcp": [
{
"HostPort": "5000"
}
]
}
}
}

In HostConfig we ask the container runtime (Docker) to map port 5000 of the container to port 5000 of the host. You can specify other create options as well.

On the next page, you can configure routing between IoT Edge modules. Because we do not use actual IoT Edge modules, leave the configuration as shown below:

Now move to the last page in the Set Modules wizard to review the configuration and click Submit.

Give the deployment some time to finish. After a while, check your edge device with the following command: sudo iotedge list. Your TextAnalytics container should be listed. Alternatively, use sudo docker ps to list the Docker containers on your edge device.

Testing the Sentiment Analysis container

If everything went well, you should be able to go to http://localhost:5000/swagger to see the available endpoints. Open Sentiment Analysis to try out a sample:

You can use curl to test as well:

curl -X POST "http://localhost:5000/text/analytics/v2.0/sentiment" -H  "accept: application/json" -H  "Content-Type: application/json-patch+json" -d "{  \"documents\": [    {      \"language\": \"en\",      \"id\": \"1\",      \"text\": \"I really really despise this product!! DO NOT BUY!!\"    }  ]}"

As you can see, the API expects a JSON payload with a documents array. Each document object has three fields: language, id and text. When you run the above command, the result is:

{"documents":[{"id":"1","score":0.0001703798770904541}],"errors":[]}

In this case, the text I really really despise this product!! DO NOT BUY!! clearly results in a very bad score. As you might have guessed, 0 is the absolute worst and 1 is the absolute best.

Just for fun, I created a small Go program to test the API:

The Go program can be found here: https://github.com/gbaeke/sentiment. You can download the executable for Linux with: wget https://github.com/gbaeke/sentiment/releases/download/v0.1/ta. Make ta executable and use ./ta –help for help with the parameters.

Summary

IoT Edge is a great way to deploy containers to edge devices running Linux or Windows. Besides deploying actual IoT Edge modules, you can deploy any container you want. In this post, we deployed a Cognitive Services container that does Sentiment Analysis at the edge.

Deploying Azure resources using webhookd

In the previous blog post, I discussed adding SSL to webhookd. In this post, I will briefly show how to use this solution to deploy Azure resources.

To run webhookd, I deployed a small Standard_B1s machine (1GB RAM, 1 vCPU) with a system assigned managed identity. After deployment, information about the managed identity is available via the Identity link.

Code running on a machine with a managed identity needs to do something specific to obtain information about the identity like a token. With curl, you would issue the following command:

curl 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F' -H Metadata:true -s

The response would be JSON that contains a field called access_token. You could parse out the access_token and then use the token in a call to the Azure Resource Manager APIs. You would use the token in the autorization header. Full details about acquiring these tokens can be found here. On that page, you will find details about acquiring the token with Go, JavaScript and several other languages.

Because we are using webhookd and shell scripts, the Azure CLI is the ideal way to create Azure resources. The Azure CLI can easily authenticate with the managed identity using a simple command: az login –identity. Here’s a shell script that uses it to create a virtual machine:

#!/bin/bash echo "Authenticating...`az login --identity`" 

echo "Creating the resource group...`az group create -n $rg -l westeurope`"

echo "Creating the vm...`az vm create --no-wait --size Standard_B1s --resource-group $rg --name $vmname --image win2016datacenter --admin-username azureuser --admin-password $pw`"

The script expects three parameters: rg, vmname and pw. We can pass these parameters as HTTP query parameters. If the above script would be in the ./scripts/vm folder as create.sh, I could do the following call to webhookd:

curl --user api -XPOST "https://<public_server_dns>/vm/create?vmname=myvm&rg=myrg&pw=Abcdefg$$$$!!!!" 

The response to the above call would contain the output from the three az commands. The az login command would output the following:

 data:   {
data: "environmentName": "AzureCloud",
data: "id": "<id>",
data: "isDefault": true,
data: "name": "<subscription name>",
data: "state": "Enabled",
data: "tenantId": "<tenant_id>",
data: "user": {
data: "assignedIdentityInfo": "MSI",
data: "name": "systemAssignedIdentity",
data: "type": "servicePrincipal"
data: }

Notice the user object, which clearly indicates we are using a system-assigned managed identity. In my case, the managed identity has the contributor role on an Azure subscription used for testing. With that role, the shell script has the required access rights to deploy the virtual machine.

As you can see, it is very easy to use webhookd to deploy Azure resources if the Azure virtual machine that runs webhookd has a managed identity with the required access rights.

Draft: a simpler way to deploy to Kubernetes during development

If you work with containers and work with Kubernetes, Draft makes it easier to deploy your code while you are in the earlier development stages. You use Draft while you are working on your code but before you commit it to version control. The idea is simple:

  • You have some code written in something like Node.js, Go or another supported language
  • You then use draft create to containerize the application based on Draft packs; several packs come with the tool and provide a Dockerfile and a Helm chart depending on the development language
  • You then use draft up to deploy the application to Kubernetes; the application is made accessible via a public URL

Let’s demonstrate how Draft is used, based on a simple Go application that is just a bit more complex than the Go example that comes with Draft. I will use the go-data service that I blogged about earlier. You can find the source code on GitHub. The go-data service is a very simple REST API. By calling the endpoint /data/{deviceid}, it will check if a “device” exists and then actually return no data. Hey, it’s just a sample! The service uses the Gorilla router but also Go Micro to call a device service running in the Kubernetes cluster. If the device service does not run, the data service will just report that the device does not exist.

Note that this post does not cover how to install Draft and its prerequisites like Helm and a Kubernetes Ingress Controller. You will also need a Kubernetes cluster (I used Azure ACS) and a container registry (I used Docker Hub). I installed all client-side components in the Windows 10 Linux shell which works great!

The only thing you need on your development box that has Helm and Draft installed is main.go and an empty glide.yaml file. The first command to run is draft create

This results in several files and folders being created, based on the Golang Draft pack. Draft detected you used Go because of glide.yaml. No Docker container is created at this point.

  • Dockerfile: a simple Dockerfile that builds an image based on the golang:onbuild image
  • draft.toml: the Draft configuration file that contains the name of the application (set randomly), the namespace to deploy to and if the folder needs to be watched for changes after you do draft up
  • chart folder: contains the Helm chart for your application; you might need to make changes here if you want to modify the Kubernetes deployment as we will do soon

When you deploy, Draft will do several things. It will package up the chart and your code and send it to the Draft server-side component running in Kubernetes. It will then instruct Draft to build your container, push it to a configured registry and then install the application in Kubernetes. All those tasks are performed by the Draft server component, not your client!

In my case, after running draft up, I get the following on my prompt (after the build, push and deploy steps):

image

In my case, the name of the application was set to exacerbated-ragdoll (in draft.toml). Part of what makes Draft so great is that it then makes the service available using that name and the configured domain. That works because of the following:

  • During installation of Draft, you need to configure an Ingress Controller in Kubernetes; you can use a Helm chart to make that easy; the Ingress Controller does the magic of mapping the incoming request to the correct application
  • When you configure Draft for the first time with draft init you can pass the domain (in my case baeke.info); this requires a wildcard A record (e.g. *.baeke.info) that points to the public IP of the Ingress Controller; note that in my case, I used Azure Container Services which makes that IP the public IP of an Azure load balancer that load balances traffic between the Ingress Controller instances (ngnix)

So, with only my source code and a few simple commands, the application was deployed to Kubernetes and made available on the Internet! There is only one small problem here. If you check my source code, you will see that there is no route for /. The Draft pack for Golang includes a livenessProbe on / and a readinessProbe on /. The probes are in deployment.yaml which is the file that defines the Kubernetes deployment. You will need to change the path in livenessProbe and readinessProbe to point to /data/device like so:

- containerPort: {{ .Values.service.internalPort }}
livenessProbe:
  httpGet:
   path: /data/device
   port: {{ .Values.service.internalPort }}
  readinessProbe:
   httpGet:
   path: /data/device
   port: {{ .Values.service.internalPort }}

If you already deployed the application but Draft is still watching the folder, you can simply make the above changes and save the deployment.yaml file (in chart/templates). The container will then be rebuilt and the deployment will be updated. When you now check the service with curl, you should get something like:

curl http://exacerbated-ragdoll.baeke.info/data/device1

Device active:  false
Oh and, no data for you!

To actually make the Go Micro features work, we will have to make another change to deployment.yaml. We will need to add an environment variable that instructs our code to find other services developed with Go Micro using the kubernetes registry:

- name: {{ .Chart.Name }}
  image: "{{ .Values.image.registry }}/{{ .Values.image.org }}/{{ .Values.image.name }}:{{ .Values.image.tag }}"
  imagePullPolicy: {{ .Values.image.pullPolicy }}
  env:
   - name: MICRO_REGISTRY
     value: kubernetes

To actually test this, use the following command to deploy the device service.

kubectl create -f https://raw.githubusercontent.com/gbaeke/go-device/master/go-device-dep.yaml

You can then check if it works by running the curl command again. It should now return the following:

Device active:  true
Oh and, no data for you!

Hopefully, you have seen how you can work with Draft from your development box and that you can modify the files generated by Draft to control how your application gets deployed. In our case, we had to modify the health checks to make sure the service can be reached. In addition, we had to add an environment variable because the code uses the Go Micro microservices framework.

IoT Hub Scaling

When you work with Azure IoT Hub, it is not always easy to tell what will happen when you reach the limits of IoT Hub and what to do when you reach those limits. As a reminder, recall that the scale of IoT Hub is defined by its tier and the number of units in the tier. There are three paying tiers, besides the free tier:

image

Although these tiers make it clear how many messages you can send, other limits such as the amount of messages per second cannot be seen here. To have an idea about the amount of messages you can send and the sustained throughput see https://azure.microsoft.com/en-us/documentation/articles/iot-hub-scaling/#device-to-cloud-and-cloud-to-device-message-throughput

The specific burst performance numbers can be found here: https://azure.microsoft.com/en-us/documentation/articles/iot-hub-devguide-quotas-throttling/. Typically, the limit you are concerned with is the amount of device-to-cloud sends which are as follows:

  • S1: 12/sec/unit (but you get at least 100/sec in total; not per unit obviously); 10 units give you 120/sec and not 100+120/sec
  • S2: 120/sec/unit
  • S3: 6000/sec/unit

Now suppose you think about deploying 300 devices which send data every half a second. What tier should you use and how many units? It is clear that you need to send 600 messages per second so 5 units of S2 will suffice. You could also take 50 units of S1 for the same performance and price. With 5 units of S2 though, you can send more messages.

Now it would be nice to test the above in advance. At ThingTank we use Docker containers for this and we schedule them with Rancher, a great and easy to use Docker orchestration tool. If you want to try it, just use the container you can find on Docker Hub or the new Docker Store (still in beta). Just search for gbaeke and you will find the following container:

image

If you want to check out the code (warning: written hastily!), you can find it on GitHub here: https://github.com/xyloscloudservices/docker-itproceed. It is a simple NodeJs script that uses the Azure IoT Hub libraries to create a new device in the registry with a GUID for the name. Afterwards, the code sends a simple JSON payload to IoT Hub every half a second.

To use the script, start it as follows with three parameters:

app.js IoT_Hub_Short_Name IoT_Hub_Connection_String millis

Note: the millis parameter is the amount of milliseconds to wait between each send

Now you can run the containers in Rancher (for instance). I won’t go into the details how to add Docker Hosts to Rancher and how to create a new Stack (as they call it). Alternatively, you can run the containers on Azure Container Service or similar solutions.

In the PowerBI chart below, you see the eventcount every five seconds which is around 420-440 events which is a bit lower than expected for one S1 unit:

image

Note: the spike you see happens after the launch of 300 containers; throttling quickly kicks in

When switched to 5 S2 units, the graph looks as follows:

image

You see the eventcount jump to 3000 (near the end) which is what you would expect (300 containers send 600 messages per second = 3000 messages per 5 seconds which is possible with 5 S2 units that deliver 120 messages/sec/unit)

You really need to think if you want to send data every half a second or second. For our ThingTank Air Quality solution, we take measurements every second but aggregate them over a minute at the edge. Sending every minute with 5 S2 units would amount to thousands of devices before you reach the limits of IoT Hub!

Bots in an IoT context

At ThingTank (@thingtankBE), we are constantly looking to expose IoT data in different ways. A chat bot can be a great way to ask for device measurements or even instruct devices to perform actions. In this post, I will describe a bot that gets air quality data for a meeting room with Slack.

I chose to write the bot in Node.js for simplicity reasons and publish it to Azure’s App Service. The basics about writing a bot with Node.js can be found in the documentation of Microsoft’s Bot Framework here: https://docs.botframework.com/en-us/node/builder/overview.

Our bot is really simple for now. After getting the basics up and running, the bot can be enhanced with a natural language interface. What we want to do now:

  • Set the room name and save it in the session (UserData)
  • Change the room name and save it in the session
  • Simple help: list the commands you can use
  • Get air quality measurements (a subset)

To achieve the above, you use dialogs, intents and some simple regular expressions. Check out the source code to see how it is done (remember, this is a basic script to get it working at a minimum). The basic logic is as follows:

  • If the intent is unknown, check if the room name is set. If not, switch to the /roomName dialog that asks for the room name and stores it in session.UserData
  • if the intent matches commands, repond with a list of commands
  • if the intent matches change room, switch to the /roomName dialog that asks for the room name
  • if the intent matches air quality, get the measurements for the selected room using the getRoom function in an external module airq.js. Our real-time air quality data comes from a pubsub channel and the getRoom function just retrieves it from there

Writing an intent is very simple. The change room intent for instance:

intents.matches(/^change room/i, [
function (session) {
session.send(“Ok, let’s change the room name…”);
session.beginDialog(‘/roomName’);
},
function(session, results) {
session.send(‘Changed room to %s’, session.userData.roomName);
}
]);

If you look at the source code, you will see we use the Chat Connector. When you are writing your bot in the beginning, I recommend you use the ConsoleConnector instead. You can then simply run your bot with node .js and interact with it from the command line. In our case, we use the ChatConnector so you should use the Bot Framework Channel Emulator from here to interact with and test your bot.

image

To get the emulator working, you need to obtain an App Id and App Password from Microsoft and make sure you use those in both your bot source code and the emulator. In the source code, these two values come from environment variables.Note that for local testing, you can leave these values blank.

Now it’s time to publish the bot on the web so we can register it with Microsoft and then enable it on Slack. To publish the bot, use the instructions here. You will use the Azure CLI and git to make this work so be sure to install both on your machine. After the bot is installed and running on App Service, set the environment variables for App Id and App Password in the website properties. Next, you can test your bot using the Channel Emulator.

Important: when you test your bot in the cloud using the Channel Emulator, be sure to use ngrok as specified here: https://docs.botframework.com/en-us/tools/bot-framework-emulator/#using-the-emulator-with-ngrok-to-interact-with-your-bot-in-the-cloud.

Now we have the bot running, it’s time to register it with Microsoft at https://dev.botframework.com/bots/new. As part of the registration process, you need to supply the URL to your bot in the cloud and obtain a new App Id and App Password. Update the website settings with these new values. After registration, you get:

image

From the above page, you can test your bot and add other channels. One of those channels is Slack. When you add Slack as a channel, you will be guided to create an app in Slack, authenticate, and of course, create a Slack bot. In Slack, you will get something like:

image

To summarize:

  • Creating a simple bot with the Bot Framework is easy; the fun starts when you want to enable things like natural language processing
  • When you deploy you bot to the cloud and want to test it with the Channel Emulator, use ngrok
  • When you want to deploy the bot to Slack, register the bot with Microsoft and simply add Slack as a channel

Azure Automation and PowerApps

One of our applications in our “test playground” is running some code in an Azure WebApp that needs to be restarted once in a while. Rather than trying to fix the underlying problem (no fun in that right?), I decided to create a small mobile app to restart the WebApp when needed. To make it a bit more fun, I used the following “code-less” solutions to make it work:

  • Azure Automation: Graphical Runbook to restart the WebApp; use a Webhook to call the Runbook using a simple HTTP POST
  • Microsoft Flow: calls the Azure Automation Webhook when a control is selected in a PowerApp
  • PowerApp: simple app with a button that calls the above Flow

Azure Automation

I created an Azure Automation account with the option to create a service principal. This results in an account that is added as Contributor for the subscription in which the Azure Automation account was created. This also means that a runbook that uses this account is allowed to restart a WebApp in the same subscription. In my case, the Automation Account and the WebApp are in the same subscription.

Now, before you can use the Restart-AzureRMWebApp cmdlet, you need to add the AzureRM.Websites module to the Automation Account. To do so, navigate to https://www.powershellgallery.com/packages/AzureRM.Websites/1.1.2 and use the Deploy to Azure Automation button. Follow the instructions to add the module to an existing Azure Automation account. When you are finished, click Assets in the Automation Account’s main pane and then click Modules. You should see the following:

 image

Now you can duplicate the AzureAutomationTutorial graphical runbook. In Runbooks, click that Runbook and use the Export option to export the definition to a local file on your computer. Now add a new Runbook and use the Import an existing runbook option together with the export file you just created. Your copied Runbook will look like below:

image

You can remove everything after Login to Azure (that’s the login with the Service Principal that has Contributor rights). Just add the Restart-AzureRMWebApp cmdlet like so:

image

The Restart-AzureRmWebApp only needs two parameters: the name of the WebApp and the resource group of the WebApp. To be able to call the Runbook using HTTP POST, create a Webhook for it. In the properties of the Runbook, click Webhooks and then add a Webhook. Note that there is no authentication for these Webhooks. It’s just a long, unique URL with an expiration date that you set. Make sure you copy the URL before you save the Webhook because it will not be shown later. I created a RunFromPowerApps webhook like so:

image

You can try the Webhook with Postman (https://www.getpostman.com/) or curl and see if a job gets started.

Microsoft Flow

Well, this could not be simpler. Go to https://flow.microsoft.com and login with your credentials (the same credentials for PowerApps, in my case they are Azure AD organization credentials). From My flows create a new flow that looks like this:

image

In the URI, enter the Webhook address from Azure Automation. Save the flow. We will now use this flow in PowerApps.

PowerApps

To create a PowerApp, install the Windows PowerApp application (a Windows Store app) and logon with the same credentials you used with Flow. I created a blank app with a simple button, nothing fancy. With the button selected, click Flows from the Action menu. You should see the flow you created. Just select it to link it to the button selection. You should see something like:

image

Note that it is possible to pass data to the flow as parameters to the Run() command. You could for instance create a list of WebApps to restart and pass the WebApp to be restarted to the Flow and the Webhook.

Test the PowerApp with the play button in the menu bar. When you click Restart, check that the Automation Job fired properly:

image

Now you can run the PowerApp on your iOS or Android device with the PowerApp app for those platforms. Enjoy!

This simple example shows that a lot can be accomplished with tools like Azure Automation, Flow and PowerApps for prototyping or even actual applications with a quick time to value.

Azure Resource Manager REST API from Node

In our video about Fault Domains in Azure IaaSv2, we mentioned Azure Resource Manager and the use of templates to deploy IaaSv2 resources such as virtual machines in a fault domain, a load balancer, a public IP address and more. Azure Resource Manager also has a REST API that can be used from any language. This post discusses the use of the REST API from node.js, including obtaining a token from Azure Active Directory using adal-node.

Before obtaining the token, you need to decide which account to use. In this case I created a service principal in Azure AD to be used as a service account. The process to create a service principal is well documented here and here. I created the service principal using the procedure in the first link, by creating a dummy application in Azure Active Directory. When you create such a dummy application you will obtain two of several things you need to obtain the token:

  • The client ID (a GUID) which basically serves as the user name
  • A generated key with a validity of 1 or 2 years that servers as the password

Other information you will need is the tenant ID (also a GUID) which is used to construct the authorization URL. To actually obtain the token using ADAL (Active Directory Authentication Library) for node.js with adal-node, take a look at adal-node on npm, in the server to server with client credentials sample. There are some issues with that sample code, so I modified it as follows:

var adal=require('adal-node');
var AuthenticationContext= adal.AuthenticationContext;
var tenantID="TenantIDGUID";
var clientID="ClientIDGUID";
var resource="https://management.azure.com/";
var authURL="https://login.windows.net/" + tenantID;
var secret="ClientSecret";
var context=new AuthenticationContext(authURL);
context.acquireTokenWithClientCredentials(resource, clientID, secret, function(err,tokenResponse) { }

Some things to note:

Once you obtain the token, you will get a tokenResponse in the callback function. The tokenResponse contains:

{ tokenType: 'Bearer',
expiresIn: 3600,
expiresOn: Fri Jun 05 2015 10:46:48 GMT+0200 (Romance Daylight Time),
resource: 'https://management.azure.com/',
accessToken: 'long, long token',
isMRRT: true,
_clientId: 'clientID',
_authority: 'https://login.windows.net/tenantID' }

So basically, you are getting an OAuth bearer token you can use in a call to a Web API that expects such a token. The Azure Resource Manager REST APIs will be called with this token.

To actually make the API request, I do the following:

  • Use the restler module: see https://www.npmjs.com/package/restler
  • Get the access token from the token response above: the token is obtained with tokenResponse[‘accessToken’]
  • Build the request URL, in this case to list all resources in my subscription. To interactively find out the kind of requests you can make, use Resource Explorer
  • Make the REST call with restler, passing the accessToken

The code looks like this where you replace {yourSubID} with your Azure subscription ID:

var rest=require('restler');
authHeader = tokenResponse['accessToken'];
requestURL="https://management.azure.com/subscriptions/{yourSubID}/resources?api-version=2015-01-01";
rest.get(requestURL, {accessToken:authHeader}).on('complete',function(result) {
console.log(result); });

If you go to https://github.com/gbaeke/armnode you will find the full samples to get you started. Hope this is helpful. Leave a comment if you have further questions.

Fault Domains in Azure IaaSv2

With the availability of IaaSv2 in Microsoft Azure, several new features are available that dramatically change the way resources are deployed and maintained. One profound change is the introduction of three fault domains for IaaSv2 virtual machines as opposed to two fault domains for IaaSv1 virtual machines. In the case of Azure, a fault domain is basically a rack of servers. A power failure at the rack level will impact all servers in the rack or fault domain. To make sure your application can survive a fault domain failure, you will need to spread your application’s components, for instance front-end web servers, across fault domains. The way to do this in Azure is to assign virtual machines to an availability set. Upon deployment but also during service healing, Azure’s fabric controller will spread the virtual machines that belong to the same availability set across the fault domains automatically. As an administrator, you cannot control this assignment.

If you deploy virtual machines in cloud services (IaaSv1 style), the maximum amount of fault domains is two which can present a problem. For instance, when you deploy a majority node set cluster with three nodes across two fault domains, it is entirely possible that the fault domain that hosts two of the three nodes fails. When that happens, the surviving node does not have majority and will go offline as well. For such deployments, three fault domains are a requirement to survive a failure in one fault domain.

Now that you understand what a fault domain is and the requirement for three fault domains, how do you get three fault domains in Azure? Well, you will need to deploy virtual machines using the IaaSv2 model. This model is based on Azure Resource Manager which also enables rich template based deployment of virtual machines, network interfaces, IP addresses, load balancers, web sites and more. Many Microsoft and community templates can be found at http://azure.microsoft.com/en-us/documentation/templates/

To get a feel for how such a deployment works and to check if your resources are spread across three fault domains, take a look at our Cloud Chat video:

%d bloggers like this: