Further improvements to the IoT Hub to TimescaleDB Azure Function

In the post Improving an Azure Function that writes IoT Hub data to TimescaleDB, we added some improvements to an Azure Function that uses the Event Hub trigger to write messages from IoT Hub to TimescaleDB:

  • use of the Event Hub enqueuedTime timestamp instead of NOW() in the INSERT statement (yes, I know, using NOW() did not make sense πŸ˜‰)
  • make the code idempotent to handle duplicates (basically do nothing when a unique constraint is violated)

In general, I prefer to use application time (time at the event publisher) versus the time the message was enqueued. If you don’t have that timestamp, enqueuedTime is the next best thing.

How can we optimize the function even further? Read on about the cardinality setting!

Event Hub trigger cardinality setting

Our JavaScript Azure Function has its settings in function.json. For reference, here is its content:

{
"bindings": [
{
"type": "eventHubTrigger",
"name": "IoTHubMessages",
"direction": "in",
"eventHubName": "hub-pg",
"connection": "EH",
"cardinality": "one",
"consumerGroup": "pg"
}
]
}

Clearly, the function uses the eventHubTrigger for an Event Hub called hub-pg. In connection, EH refers to an Application Setting which contains the connections string to the Event Hub. Yes, I excel at naming stuff! The Event Hub has defined a consumer group called pg that we are using in this function.

The cardinality setting is currently set to “one”, which means that the function can only process one message at a time. As a best practice, you should use a cardinality of “many” in order to process batches of messages. A setting of “many” is the default.

To make the required change, modify function.json and set cardinality to “many”. You will also have to modify the Azure Function to process a batch of messages versus only one:

Processing batches of messages

With cardinality set to many, the IoTHubMessages parameter of the function is now an array. To retrieve the enqueuedTime from the messages, grab it from the enqueuedTimeUtcArray array using the index of the current message. Notice I also switched to JavaScript template literals to make the query a bit more readable.

The number of messages in a batch is controlled by maxBatchSize in host.json. By default, it is set to 64. Another setting,prefetchCount, determines how many messages are retrieved and cached before being sent to your function. When you change maxBatchSize, it is recommended to set prefetchCount to twice the maxBatchSize setting. For instance:

{
"version": "2.0",
"extensions": {
"eventHubs": {
"batchCheckpointFrequency": 1,
"eventProcessorOptions": {
"maxBatchSize": 128,
"prefetchCount": 256
}
}
}
}

It’s great to have these options but how should you set them? As always, the answer is in this book:

Afbeeldingsresultaat voor it depends joke

A great resource to get a feel for what these settings do is this article. It also comes with a Power BI report that allows you to set the parameters to see the results of load tests.

Conclusion

In this post, we used the function.json cardinality setting of “many” to process a batch of messages per function call. By default, Azure Functions will use batches of 64 messages without prefetching. With the host.json settings of maxBatchSize and prefetchCount, that can be changed to better handle your scenario.

Improving an Azure Function that writes IoT Hub data to TimescaleDB

In an earlier post, I used an Azure Function to write data from IoT Hub to a TimescaleDB hypertable on PostgreSQL. Although that function works for demo purposes, there are several issues. Two of those issues will be addressed in this post:

  1. the INSERT INTO statement used the NOW() function instead of the enqueuedTimeUtc field; that field is provided by IoT Hub and represents the time the message was enqueued
  2. the INSERT INTO query does not use upsert functionality; if for some reason you need to process the IoT Hub data again, you will end up with duplicate data; you code should be idempotent

Using enqueuedTimeUtc

Using the time the event was enqueued means we need to retrieve that field from the message that our Azure Function receives. The Azure Function receives outside information via two parameters: context and eventHubMessage. The enqueuedTimeUtc field is retrieved via the context variable: context.bindingData.enqueuedTimeUtc.

In the INSERT INTO statement, we need to use TIMESTAMP ‘UCT time’. In JavaScript, that results in the following:

'insert into conditions(time, device, temperature, humidity) values(TIMESTAMP \'' + context.bindingData.enqueuedTimeUtc + '\',\'' + eventHubMessage.device + '\' ...

Using upsert functionality

Before adding upsert functionality, add a unique constraint to the hypertable like so (via pgAdmin):

CREATE UNIQUE INDEX on conditions (time, device); 

It needs to be on time and device because the time field on its own is not guaranteed to be unique. Now modify the INSERT INTO statement like so:

'insert into conditions(time, device, temperature, humidity) values(TIMESTAMP \'' + context.bindingData.enqueuedTimeUtc + '\',\'' + eventHubMessage.device + '\',' + eventHubMessage.temperature + ',' + eventHubMessage.humidity + ') ON CONFLICT DO NOTHING'; 

Notice the ON CONFLICT clause? When any constraint is violated, we do nothing. We do not add or modify data, we leave it all as it was.

The full Azure Function code is below:

Azure Function code with IoT Hub enqueuedTimeUtc and upsert

Conclusion

The above code is a little bit better already. We are not quite there yet but the two changes make sure that the date of the event is correct and independent from when the actual processing is done. By adding the constraint and upsert functionality, we make sure we do not end up with duplicate data when we reprocess data from IoT Hub.

Dashboard your TimescaleDB data with Grafana

In an earlier post, I looked at storing time-series data with TimescaleDB on Azure Database for PostgreSQL. To visualize your data, there are many options as listed here. Because TimescaleDB is built on PostgreSQL, you can use any tool that supports PostgreSQL such as Power BI or Tableau.

Grafana is a bit of a special case because TimescaleDB engineers actually built the data source, which is designed to take advantage of the time-series capabilities. For a detailed overview of the capabilities of the data source, see the Grafana documentation.

Let’s take a look at a simple example to get started. I have a hypertable called conditions with four columns: time, device, temperature, humidity. An IoT Simulator is constantly writing data for five devices: pg-1 to pg-5.

On a multi-tier deployment of Grafana, I added the PostgreSQL data source:

PostgreSQL data source in Grafana

One setting in the data source is particularly noteworthy:

TimescaleDB support in the PostgreSQL datasource

Grafana has the concept of macro’s such as $_timeGroup or $_interval, as noted in the preceding image. The macro is translated to what the underlying data source supports. In this case, with TimescaleDB enabled, the macro results in the use of time_bucket, which is specific for TimescaleDB.

Creating a dashboard

Create a dashboard from the main page:

Creating a new dashboard

You will get a new dashboard with an empty panel:

Click Add Query. You will notice Grafana proposes a query. In this case it is very close because we only have one data source and table:

Grafana proposes the following query

Let’s modify this a bit. In the top right corner, I switched the time interval to last 30 minutes. Because the default query uses WHERE Macro: $_timeFilter, only the last 30 minutes will be shown. That’s another example of a macro. I would like to show the average temperature over 10 second intervals. That is easy to do with a GROUP BY and $_interval. In GROUP BY, click the + and type or select time to use the time field. You will notice the following:

GROUP BY with $_interval

Just click $_interval and select 10s. Now add the humidity column to the SELECT statement:

Adding humidity

When you click the Generated SQL link, you will see the query built by the query builder:

Generated SQL

Notice that the query uses time_bucket. The GROUP BY 1 and ORDER BY 1 just means group and order on the first field which is the time_bucket. If the query builder is not sufficient, you can click Edit SQL and specify your query directly. When you switch back to query builder, your custom SQL statement might be overwritten if the builder does not support it.

When you save your dashboard, you should see something like:

Pretty boring temperature and humidity graphWi

Now, let’s add a few gauges. In the top right row of icons, the first one should be Add panel. Choose the Gauge visualization and set your query:

Temperature Gauge

In Visualization, set Stat to Current:

Stat field on current

When the panel is finished, navigate back to the dashboard and duplicate the gauge. Modify the duplicated gauge to show humidity. Also change the titles. The dashboard now looks like:

Conditions dashboard

Grafana can be configured to auto refresh the dashboard. In the image below, refresh was set to every 5 seconds:

Setting auto refresh

Your dashboard will now update every 5 seconds for a more dynamic experience.

Joins

You can join hypertables with regular tables quite easily. This is one of the advantages of using a relational database such as PostgreSQL for your time-series data. The screenshot below shows a graph of the temperature per device location. The device location is stored in a regular table.

Join between hypertable and regular table: they are all just tables in the end

Here is the full dashboard:

Conclusion

Grafana, in combination with PostgreSQL and TimescaleDB, is a flexible solution for dashboarding your IoT time-series data. We have only scratched the surface here but it’s clear you can be up and running fast! Give it a go and tell me what you think in the comments or via @geertbaeke!

Azure Functions with Consumption Plan on Linux

In a previous post, I talked about saving time-series data to TimescaleDB, which is an extension on top of PostgreSQL. The post used an Azure Function with an Event Hub trigger to save the data in TimescaleDB with a regular INSERT INTO statement.

The Function App used the Windows runtime which gave me networking errors (ECONNRESET) when connecting to PostgreSQL. I often encounter those issues with the Windows runtime. In general, for Node.js, I try to stick to the Linux runtime whenever possible. In this post, we will try the same code but with a Function App that uses the Linux runtime in a Consumption Plan.

Make sure Azure CLI is installed and that you are logged in. First, create a Storage Account:

az storage account create --name gebafuncstore --location westeurope --resource-group funclinux --sku Standard_LRS

Next, create the Function App. It references the storage account you created above:

az functionapp create --resource-group funclinux --name funclinux --os-type Linux --runtime node --consumption-plan-location westeurope --storage-account gebafuncstore

You can also use a script to achieve the same results. For an example, see
https://docs.microsoft.com/en-us/azure/azure-functions/scripts/functions-cli-create-serverless.

Now, in the Function App, set the following Application Settings. These settings will be used in the code we will deploy later.

  • host: hostname of the PostgreSQL server (e.g. servername.postgres.database.azure.com)
  • user: user name (e.g. user@servername)
  • password
  • database: name of the PostgreSQL database
  • EH: connection string to the Event Hub interface of your IoT Hub; if your are unsure how to set this, see this post

You can set the above values from the Azure Portal:

Application Settings of the Function App

The function uses the first four Application Settings in the function code via process.env:

Using Application Settings in JavaScript

The application setting EH is used to reference the Event Hub in function.json:

function.json with Event Hub details such as the connection, cardinality and the consumerGroup

Now let’s get the code from my GitHub repo in the Azure Function. First install Azure Function Core Tools 2.x. Next, create a folder called funcdemo. In that folder, run the following commands:

git clone https://github.com/gbaeke/pgfunc.git
cd pgfunc
npm install
az login
az account show

The npm install command installs the pg module as defined in package.json. The last two commands log you in and show the active subscription. Make sure that subscription contains the Function App you deployed above. Now run the following command:

func init

Answer the questions: we use Node and JavaScript. You should now have a local.settings.json file that sets the FUNCTIONS_WORKER_RUNTIME to node. If you do not have that, the next command will throw an error.

Now issue the following command to package and deploy the function to the Function App we created earlier:

func azure functionapp publish funclinux

This should result in the following feedback:

Feedback from function deployment

You should now see the function in the Function App:

Deployed function

To verify that the function works as expected, I started my IoT Simulator with 100 devices that send data every 5 seconds. I also deleted all the existing data from the TimescaleDB hypertable. The Live Metrics stream shows the results. In this case, the function is running smoothly without connection reset errors. The consumption plan spun up 4 servers:

Live Metrics Stream of IoT Hub to PostgreSQL function

IoT with Azure Database for PostgreSQL and TimescaleDB

In IoT projects, the same question always comes up: “Where do we store our telemetry data?”. As usual, the answer to that question is not straightforward. We have seen all kinds of solutions in the wild:

  • save directly to a relational database (SQL Server, MySQL, …)
  • save to a data lake and/or SQL
  • save to Cosmos DB or similar (e.g. MongoDB)
  • save to Azure Table Storage or similar
  • save to Time Series Insights

Saving the data to a relational database is often tempting. It fits in existing operational practices and it is easy to extract, transform and visualize the data. In practice, I often recommend against this approach except in the simplest of use cases. The reason is clear: these databases are not optimized for fast ingestion of time-series data. Instead, you should use a time-series database which is optimized for fast ingest and efficient processing of time-series data.

TimescaleDB

TimescaleDB is a an open-source time-series databases optimized for fast ingest even when the amount of data stored becomes large. It does not stand on its own, as it runs on PostgreSQL as an extension. Note that you can store time-series in a regular table or as a TimescaleDB hypertable. The graphic below (from this post), shows the difference:

A comparison on Azure PostgreSQL with and without TimescaleDB and observed degradation in insert performance over time.
Test on general purpose compute Gen 5 with 8 vCores, 45GB RAM with Premium Storage

The difference is clear. With a regular table, the insert rate is lowered dramatically when the amount of data becomes large.

The TimescaleDB extension can easily be installed on Azure Database for PostgreSQL. Let’s see how that goes shall we?

Installing TimescaleDB

To create an Azure Database for PostgreSQL instance, I will use the Azure CLI with the db-up extension:

az postgres up -g RESOURCEGROUP -s SERVERNAME -d DBNAME -u USER -p PASSWORD 

The server name you provide should result in a unique URL for your database (e.g. servername.postgres.database.azure.com).

Tip: do not use admin as the user name πŸ‘

When the server has been provisioned, modify the server confguration for the TimescaleDB extension:

az postgres server configuration set --resource-group RESOURCEGROUP ––server-name SERVERNAME --name shared_preload_libraries --value timescaledb

Now you need to actually install the extension. Install pgAdmin and issue the following query:

CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

In pgAdmin, you should see extra schemas for TimescaleDB:

Extra schemas for TimescaleDB

Creating a hypertable

A hypertable uses partitioning to optimize writing and reading time-series data. Creating such a table is straightforward. You start with a regular table:

CREATE TABLE conditions (   
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL );

Next, convert to a hypertable:

SELECT create_hypertable('conditions', 'time');

The above command partitions the data by time, using the values in the time column. By default, the time interval for partitioning is set to 7 days, starting from version 0.11.0 of TimescaleDB. You can override this by setting chunk_time_interval when creating the hypertable. You should make sure that the chunk belonging to the most recent interval can fit into memory. According to best practices, such a chunk should not use more than 25% of main memory.

Now that we have the hypertable, we can write time-series data to it. One advantage of being built on top of a relational database such as PostgreSQL is that you can use standard SQL INSERT INTO statements. For example:

const query = 'insert into conditions(time, device, temperature, humidity) values(NOW(),\'' + eventHubMessage.device + '\',' +         + eventHubMessage.temperature + ',' + eventHubMessage.humidity + ');';

The example above is from an Azure Function we will look at in a moment. In extracts values from a message received via IoT Hub and inserts them into the hypertable via an INSERT INTO query.

Let’s take a look at the Azure Function next.

Azure Function: from IoT Hub to the Hypertable

The Azure Function is kept bare bones to focus on the essentials. Note that you will need to open the console and install the pg module with the following command:

npm install pg

The image below shows the Azure Function (based on this although the article does not use a hypertable and stores the telemetry as JSON).

Bare bones Azure Function to write IoT Hub data to the hypertable

Naturally, the Azure Function above requires an Azure Event Hubs trigger. In this case, event hub cardinality was set to One. More information here. Note that you should NOT use the NOW() function to set the time. It’s only used here for demo purposes. Instead, you should take the timestamp sent by the device or the time the data was queued at the Event Hub!

Naturally, you will also need an IoT Hub where you send your data. In this case, I created a standard IoT Hub and used the IoT Hub Visual Studio Code extension to generate code (1 in image below) to send sample messages. I modified the code somewhat to include the device name (2 in image below):

Visual Studio toolkit used to create the device, generate code and modify the code

Now we can run the code (saved as sender.js) with:

node sender.js

Note: do not forget to first run npm install azure-iot-device

Data is being sent:

Data sent to IoT Hub

Data processed by the Azure Function as viewed in Application Insights Live Metrics stream:

Application Insights Live Metrics stream

With only one device sending data, there isn’t that much to do! In pgAdmin, you should see connections from at least one of the Azure Function hosts that are active:

Connections to PostgreSQL

Note: I encountered some issues with ECONNRESET errors under higher load; take a look at this post which runs the same function on a Linux Consumption Plan

Querying the data

TimescaleDB, with help from PostgreSQL, has a rich query language especially when compared to some other offerings. Yes, I am looking at you Cosmos DB! πŸ˜‰ Below are some examples (based on the documentation at https://docs.timescale.com/v1.2/using-timescaledb/reading-data:

SELECT COUNT(*) FROM conditions   WHERE time > NOW() - interval '1 minute';

The above query simply counts the messages in the last minute. Notice the flexibility in expressing the time which is what we want from time-series databases.

SELECT time_bucket('1 minute', time) AS one_min, device, COUNT(*),     MAX(temperature) AS max_temp, MAX(humidity) AS max_hum FROM conditions   WHERE time > NOW() - interval '10 minutes' GROUP BY one_min, device   ORDER BY one_min DESC, max_temp DESC;

The above query displays the following result:

Conclusion

When dealing with time-series data, it is often beneficial to use a time-series database. They are optimized to ingest time-series data at high speed and greater efficiency than general purpose SQL or NoSQL databases. The fact that TimescaleDB is built on PostgreSQL means that it can take advantage of the flexibility and stability of PostgreSQL. Although there are many other time-series databases, TimescaleDB is easy to use when coupled with PaaS (platform-as-a-service) PostgreSQL offerings such as Azure Database for PostgreSQL.

Building a real-time messaging server in Go

Often, I need a simple real-time server and web interface that shows real-time events. Although there are many options available like socket.io for Node.js or services like Azure SignalR and PubNub, I decided to create a real-time server in Go with a simple web front-end:

The impressive UI of the real-time web front-end

For a real-time server in Go, there are several options. You could use Gorilla WebSocket of which there is an excellent tutorial, and use native WebSockets in the browser. There’s also Glue. However, if you want to use the socket.io client, you can use https://github.com/googollee/go-socket.io. It is an implementation, although not a complete one, of socket.io. For production scenarios, I recommend using socket.io with Node.js because it is heavily used, has more features, better documentation, etc…

With that out of the way, let’s take a look at the code. Some things to note in advance:

  • the code uses the concept of rooms (as in a chat room); clients can join a room and only see messages for that room; you can use that concept to create a “room” for a device and only subscribe to messages for that device
  • the code use the excellent https://github.com/mholt/certmagic to enable https via a Let’s Encrypt certificate (DNS-01 verification)
  • the code uses Redis as the back-end; applications send messages to Redis via a PubSub channel; the real-time Go server checks for messages via a subscription to one or more Redis channels

The code is over at https://github.com/gbaeke/realtime-go.

Server

Let’s start with the imports. Naturally we need Redis support, the actual go-socket.io packages and certmagic. The cloudflare package is needed because my domain baeke.info is managed by CloudFlare. The package gives certmagic the ability to create the verification record that Let’s Encrypt will check before issuing the certificate:

import (
"log"
"net/http"
"os"

"github.com/go-redis/redis"
socketio "github.com/googollee/go-socket.io"
"github.com/mholt/certmagic"
"github.com/xenolf/lego/providers/dns/cloudflare"
)

Next, the code checks if the RTHOST environment variable is set. RTHOST should contain the hostname you request the certificate for (e.g. rt.baeke.info).

Let’s check the block of code that sets up the Redis connection.

// redis connection
client := redis.NewClient(&redis.Options{
Addr: getEnv("REDISHOST", "localhost:6379"),
})

// subscribe to all channels
pubsub := client.PSubscribe("*")
_, err := pubsub.Receive()
if err != nil {
panic(err)
}

// messages received on a Go channel
ch := pubsub.Channel()

First, we create a new Redis client. We either use the address in the REDISHOST environment variable or default to localhost:6379. I will later run this server on Azure Container Instances (ACI) in a multi-container setup that also includes Redis.

With the call to PSubscribe, a pattern subscribe is used to subscribe to all PubSub channels (*). If the subscribe succeeds, a Go channel is setup to actually receive messages on.

Now that the Redis connection is configured, let’s turn to socket.io:

server, err := socketio.NewServer(nil)
if err != nil {
log.Fatal(err)
}

server.On("connection", func(so socketio.Socket) {
log.Printf("New connection from %s ", so.Id())

so.On("channel", func(channel string) {
log.Printf("%s joins channel %s\n", so.Id(), channel)
so.Join(channel)
})

so.On("disconnection", func() {
log.Printf("disconnect from %s\n", so.Id())
})
})

The above code is pretty simple. We create a new socket.io server and subsequently setup event handlers for the following events:

  • connection: code that runs when a web client connects; gives us the socket the client connects on which is further used by the channel and disconnection handler
  • channel: this handler runs when a client sends a message of the chosen type channel; the channel contains the name of the socket.io room to join; this is used by the client to indicate what messages to show (e.g. just for device01); in the browser, the client sends a channel message that contains the text “device01”
  • disconnection: code to run when the client disconnects from the socket

Naturally, something crucial is missing. We need to check Redis for messages in Redis channels and broadcast them to matching socket.io “channels”. This is done in a Go routine that runs concurrently with the main code:

 go func(srv *socketio.Server) {
   for msg := range ch {
      log.Println(msg.Channel, msg.Payload)
      srv.BroadcastTo(msg.Channel, "message", msg.Payload)
   }
 }(server)

The anonymous function accepts a parameter of type socketio.Server. We use the BroadcastTo method of socketio.Server to broadcast messages arriving on the Redis PubSub channels to matching socket.io channels. Note that we send a message of type “message” so the client will have to check for “message” coming in as well. Below is a snippet of client-side code that does that. It adds messages to the messages array defined on the Vue.js app:

socket.on('message', function(msg){
app.messages.push(msg)
}

The rest of the server code basically configures certmagic to request the Let’s Encrypt certificate and sets up the http handlers for the static web client and the socket.io server:

// certificate magic
certmagic.Agreed = true
certmagic.CA = certmagic.LetsEncryptStagingCA

cloudflare, err := cloudflare.NewDNSProvider()
if err != nil {
log.Fatal(err)
}

certmagic.DNSProvider = cloudflare

mux := http.NewServeMux()
mux.Handle("/socket.io/", server)
mux.Handle("/", http.FileServer(http.Dir("./assets")))

certmagic.HTTPS([]string{rthost}, mux)

Let’s try it out! The GitHub repository contains a file called multi.yaml, which deploys both the socket.io server and Redis to Azure Container Instances. The following images are used:

  • gbaeke/realtime-go-le: built with this Dockerfile; the image has a size of merely 14MB
  • redis: the official Redis image

To make it work, you will need to update the environment variables in multi.yaml with the domain name and your CloudFlare credentials. If you do not use CloudFlare, you can use one of the other providers. If you want to use the Let’s Encrypt production CA, you will have to change the code, rebuild the container, store it in your registry and modify multi.yaml accordingly.

In Azure Container Instances, the following is shown:

socket.io and Redis container in ACI

To test the setup, I can send a message with redis-cli, from a console to the realtime-redis container:

Testing with redis-cli in the Redis container

You should be aware that using CertMagic with ephemeral storage is NOT a good idea due to potential Let’s Encrypt rate limiting. You should store the requested certificates in persistent storage like an Azure File Share and mount it at /.local/share/certmagic!

Client

The client is a Vue.js app. It was not created with the Vue cli so it just grabs the Vue.js library from the content delivery network (CDN) and has all logic in a single page. The socket.io library (v1.3.7) is also pulled from the CDN. The socket.io client code is kept at a minimum for demonstration purposes:

 var socket = io();
socket.emit('channel','device01');
socket.on('message', function(msg){
app.messages.push(msg)
})

When the page loads, the client emits a channel message to the server with a payload of device01. As you have seen in the server section, the server reacts to this message by joining this client to a socket.io room, in this case with name device01.

Whenever the client receives a message from the server, it adds the message to the messages array which is bound to a list item (li) with a v-for directive.

Surprisingly easy no? With a few lines of code you have a fully functional real-time messaging solution!

Deploying Azure Cognitive Services Containers with IoT Edge

Introduction

Azure Cognitive Services is a collection of APIs that make your applications smarter. Some of those APIs are listed below:

  • Vision: image classification, face detection (including emotions), OCR
  • Language: text analytics (e.g. key phrase or sentiment analysis), language detection and translation

To use one of the APIs you need to provision it in an Azure subscription. After provisioning, you will get an endpoint and API key. Every time you want to classify an image or detect sentiment in a piece of text, you will need to post an appropriate payload to the cloud endpoint and pass along the API key as well.

What if you want to use these services but you do not want to pass your payload to a cloud endpoint for compliance or latency reasons? In that case, the Cognitive Services containers can be used. In this post, we will take a look at the Text Analytics containers, specifically the one for Sentiment Analysis. Instead of deploying the container manually, we will deploy the container with IoT Edge.

IoT Edge Configuration

To get started, create an IoT Hub. The free tier will do just fine. When the IoT Hub is created, create an IoT Edge device. Next, configure your actual edge device to connect to IoT Hub with the connection string of the device you created in IoT Hub. Microsoft have a great tutorial to do all of the above, using a virtual machine in Azure as the edge device. The tutorial I linked to is the one for an edge device running Linux. When finished, the device should report its status to IoT Hub:

If you want to install IoT Edge on an existing device like a laptop, follow the procedure for Linux x64.

Once you have your edge device up and running, you can use the following command to obtain the status of your edge device: sudo systemctl status iotedge. The result:

Deploy Sentiment Analysis container

With the IoT Edge daemon up and running, we can deploy the Sentiment Analysis container. In IoT Hub, select your IoT Edge device and select Set modules:

In Set Modules you have the ability to configure the modules for this specific device. Modules are always deployed as containers and they do not have to be specifically designed or developed for use with IoT Edge. In the three step wizard, add the Sentiment Analysis container in the first step. Click Add and then select IoT Edge Module. Provide the following settings:

Although the container can freely be pulled from the Image URI, the container needs to be configured with billing info and an API key. In the Billing environment variable, specify the endpoint URL for the API you configured in the cloud. In ApiKey set your API key. Note that the container always needs to be connected to the cloud to verify that you are allowed to use the service. Remember that although your payload is not sent to the cloud, your container usage is. The full container create options are listed below:

{
"Env": [
"Eula=accept",
"Billing=https://westeurope.api.cognitive.microsoft.com/text/analytics/v2.0",
"ApiKey=<yourKEY>"
],
"HostConfig": {
"PortBindings": {
"5000/tcp": [
{
"HostPort": "5000"
}
]
}
}
}

In HostConfig we ask the container runtime (Docker) to map port 5000 of the container to port 5000 of the host. You can specify other create options as well.

On the next page, you can configure routing between IoT Edge modules. Because we do not use actual IoT Edge modules, leave the configuration as shown below:

Now move to the last page in the Set Modules wizard to review the configuration and click Submit.

Give the deployment some time to finish. After a while, check your edge device with the following command: sudo iotedge list. Your TextAnalytics container should be listed. Alternatively, use sudo docker ps to list the Docker containers on your edge device.

Testing the Sentiment Analysis container

If everything went well, you should be able to go to http://localhost:5000/swagger to see the available endpoints. Open Sentiment Analysis to try out a sample:

You can use curl to test as well:

curl -X POST "http://localhost:5000/text/analytics/v2.0/sentiment" -H  "accept: application/json" -H  "Content-Type: application/json-patch+json" -d "{  \"documents\": [    {      \"language\": \"en\",      \"id\": \"1\",      \"text\": \"I really really despise this product!! DO NOT BUY!!\"    }  ]}"

As you can see, the API expects a JSON payload with a documents array. Each document object has three fields: language, id and text. When you run the above command, the result is:

{"documents":[{"id":"1","score":0.0001703798770904541}],"errors":[]}

In this case, the text I really really despise this product!! DO NOT BUY!! clearly results in a very bad score. As you might have guessed, 0 is the absolute worst and 1 is the absolute best.

Just for fun, I created a small Go program to test the API:

The Go program can be found here: https://github.com/gbaeke/sentiment. You can download the executable for Linux with: wget https://github.com/gbaeke/sentiment/releases/download/v0.1/ta. Make ta executable and use ./ta –help for help with the parameters.

Summary

IoT Edge is a great way to deploy containers to edge devices running Linux or Windows. Besides deploying actual IoT Edge modules, you can deploy any container you want. In this post, we deployed a Cognitive Services container that does Sentiment Analysis at the edge.