From MQTT to InfluxDB with Dapr

In a previous post, we looked at using the Dapr InfluxDB component to write data to InfluxDB Cloud. In this post, we will take a look at reading data from an MQTT topic and storing it in InfluxDB. We will use Dapr 0.10, which includes both components.

To get up to speed with Dapr, please read the previous post and make sure you have an InfluxDB instance up and running in the cloud.

If you want to see a video instead:

MQTT to Influx with Dapr

Note that the video sends output to both InfluxDB and Azure SignalR. In addition, the video uses Dapr 0.8 with a custom compiled Dapr because I was still developing and testing the InfluxDB component.

MQTT Server

Although there are cloud-based MQTT servers you can use, let’s mix it up a little and run the MQTT server from Docker. If you have Docker installed, type the following:

docker run -it -p 1883:1883 -p 9001:9001 eclipse-mosquitto

The above command runs Mosquitto and exposes port 1883 on your local machine. You can use a tool such as MQTT Explorer to send data. Install MQTT Explorer on your local machine and run it. Create a connection like in the below screenshot:

MQTT Explorer connection

Now, click Connect to connect to Mosquitto. With MQTT, you send data to topics of your choice. Publish a json message to a topic called test as shown below:

Publish json data to the test topic

You can now click the topic in the list of topics and see its most recent value:

Subscribing to the test topic

Using MQTT with Dapr

You are now ready to read data from an MQTT topic with Dapr. If you have Dapr installed, you can run the following code to read from the test topic and store the data in InfluxDB:

const express = require('express');
const bodyParser = require('body-parser');

const app = express();
app.use(bodyParser.json());

const port = 3000;

// mqtt component will post messages from influx topic here
app.post('/mqtt', (req, res) => {
    console.log("MQTT Binding Trigger");
    console.log(req.body)

    // body is expected to contain room and temperature
    room = req.body.room
    temperature = req.body.temperature

    // room should not contain spaces
    room = room.split(" ").join("_")

    // create message for influx component
    message = {
        "measurement": "stat",
        "tags": `room=${room}`,
        "values": `temperature=${temperature}`
    };
    
    // send the message to influx output binding
    res.send({
        "to": ["influx"],
        "data": message
    });
});

app.listen(port, () => console.log(`Node App listening on port ${port}!`));

In this example, we use Node.js instead of Python to illustrate that Dapr works with any language. You will also need this package.json and run npm install:

{
  "name": "mqttapp",
  "version": "1.0.0",
  "description": "",
  "main": "app.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "body-parser": "^1.18.3",
    "express": "^4.16.4"
  }
}

In the previous post about InfluxDB, we used an output binding. You use an output binding by posting data to a Dapr HTTP URI.

To use an input binding like MQTT, you will need to create an HTTP server. Above, we create an HTTP server with Express, and listen on port 3000 for incoming requests. Later, we will instruct Dapr to listen for messages on an MQTT topic and, when a message arrives, post it to our server. We can then retrieve the message from the request body.

To tell Dapr what to do, we’ll create a components folder in the same folder that holds the Node.js code. Put a file in that folder with the following contents:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: mqtt
spec:
  type: bindings.mqtt
  metadata:
  - name: url
    value: mqtt://localhost:1883
  - name: topic
    value: test

Above, we configure the MQTT component to list to topic test on mqtt://localhost:1883. The name we use (in metadata) is important because that needs to correspond to our HTTP handler (/mqtt).

Like in the previous post, there’s another file that configures the InfluxDB component:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: influx
spec:
  type: bindings.influx
  metadata:
  - name: Url
    value: http://localhost:9999
  - name: Token
    value: ""
  - name: Org
    value: ""
  - name: Bucket
    value: ""

Replace the parameters in the file above with your own.

Saving the MQTT request body to InfluxDB

If you look at the Node.js code, you have probably noticed that we send a response body in the /mqtt handler:

res.send({
        "to": ["influx"],
        "data": message
    });

Dapr is written to accept responses that include a to and a data field in the JSON response. The above response simply tells Dapr to send the message in the data field to the configured influx component.

Does it work?

Let’s run the code with Dapr to see if it works:

dapr run --app-id mqqtinflux --app-port 3000 --components-path=./components node app.js

In dapr run, we also need to specify the port our app uses. Remember that Dapr will post JSON data to our /mqtt handler!

Let’s post some JSON with the expected fields of temperature and room to our MQTT server:

Posting data to the test topic

The Dapr logs show the following:

Logs from the APP (appear alongside the Dapr logs)

In InfluxDB Cloud table view:

Data stored in InfluxDB Cloud (posted some other data points before)

Conclusion

Dapr makes it really easy to retrieve data with input bindings and send that data somewhere else with output bindings. There are many other input and output bindings so make sure you check them out on GitHub!

Further improvements to the IoT Hub to TimescaleDB Azure Function

In the post Improving an Azure Function that writes IoT Hub data to TimescaleDB, we added some improvements to an Azure Function that uses the Event Hub trigger to write messages from IoT Hub to TimescaleDB:

  • use of the Event Hub enqueuedTime timestamp instead of NOW() in the INSERT statement (yes, I know, using NOW() did not make sense 😉)
  • make the code idempotent to handle duplicates (basically do nothing when a unique constraint is violated)

In general, I prefer to use application time (time at the event publisher) versus the time the message was enqueued. If you don’t have that timestamp, enqueuedTime is the next best thing.

How can we optimize the function even further? Read on about the cardinality setting!

Event Hub trigger cardinality setting

Our JavaScript Azure Function has its settings in function.json. For reference, here is its content:

{
"bindings": [
{
"type": "eventHubTrigger",
"name": "IoTHubMessages",
"direction": "in",
"eventHubName": "hub-pg",
"connection": "EH",
"cardinality": "one",
"consumerGroup": "pg"
}
]
}

Clearly, the function uses the eventHubTrigger for an Event Hub called hub-pg. In connection, EH refers to an Application Setting which contains the connections string to the Event Hub. Yes, I excel at naming stuff! The Event Hub has defined a consumer group called pg that we are using in this function.

The cardinality setting is currently set to “one”, which means that the function can only process one message at a time. As a best practice, you should use a cardinality of “many” in order to process batches of messages. A setting of “many” is the default.

To make the required change, modify function.json and set cardinality to “many”. You will also have to modify the Azure Function to process a batch of messages versus only one:

Processing batches of messages

With cardinality set to many, the IoTHubMessages parameter of the function is now an array. To retrieve the enqueuedTime from the messages, grab it from the enqueuedTimeUtcArray array using the index of the current message. Notice I also switched to JavaScript template literals to make the query a bit more readable.

The number of messages in a batch is controlled by maxBatchSize in host.json. By default, it is set to 64. Another setting,prefetchCount, determines how many messages are retrieved and cached before being sent to your function. When you change maxBatchSize, it is recommended to set prefetchCount to twice the maxBatchSize setting. For instance:

{
"version": "2.0",
"extensions": {
"eventHubs": {
"batchCheckpointFrequency": 1,
"eventProcessorOptions": {
"maxBatchSize": 128,
"prefetchCount": 256
}
}
}
}

It’s great to have these options but how should you set them? As always, the answer is in this book:

Afbeeldingsresultaat voor it depends joke

A great resource to get a feel for what these settings do is this article. It also comes with a Power BI report that allows you to set the parameters to see the results of load tests.

Conclusion

In this post, we used the function.json cardinality setting of “many” to process a batch of messages per function call. By default, Azure Functions will use batches of 64 messages without prefetching. With the host.json settings of maxBatchSize and prefetchCount, that can be changed to better handle your scenario.

Improving an Azure Function that writes IoT Hub data to TimescaleDB

In an earlier post, I used an Azure Function to write data from IoT Hub to a TimescaleDB hypertable on PostgreSQL. Although that function works for demo purposes, there are several issues. Two of those issues will be addressed in this post:

  1. the INSERT INTO statement used the NOW() function instead of the enqueuedTimeUtc field; that field is provided by IoT Hub and represents the time the message was enqueued
  2. the INSERT INTO query does not use upsert functionality; if for some reason you need to process the IoT Hub data again, you will end up with duplicate data; you code should be idempotent

Using enqueuedTimeUtc

Using the time the event was enqueued means we need to retrieve that field from the message that our Azure Function receives. The Azure Function receives outside information via two parameters: context and eventHubMessage. The enqueuedTimeUtc field is retrieved via the context variable: context.bindingData.enqueuedTimeUtc.

In the INSERT INTO statement, we need to use TIMESTAMP ‘UCT time’. In JavaScript, that results in the following:

'insert into conditions(time, device, temperature, humidity) values(TIMESTAMP \'' + context.bindingData.enqueuedTimeUtc + '\',\'' + eventHubMessage.device + '\' ...

Using upsert functionality

Before adding upsert functionality, add a unique constraint to the hypertable like so (via pgAdmin):

CREATE UNIQUE INDEX on conditions (time, device); 

It needs to be on time and device because the time field on its own is not guaranteed to be unique. Now modify the INSERT INTO statement like so:

'insert into conditions(time, device, temperature, humidity) values(TIMESTAMP \'' + context.bindingData.enqueuedTimeUtc + '\',\'' + eventHubMessage.device + '\',' + eventHubMessage.temperature + ',' + eventHubMessage.humidity + ') ON CONFLICT DO NOTHING'; 

Notice the ON CONFLICT clause? When any constraint is violated, we do nothing. We do not add or modify data, we leave it all as it was.

The full Azure Function code is below:

Azure Function code with IoT Hub enqueuedTimeUtc and upsert

Conclusion

The above code is a little bit better already. We are not quite there yet but the two changes make sure that the date of the event is correct and independent from when the actual processing is done. By adding the constraint and upsert functionality, we make sure we do not end up with duplicate data when we reprocess data from IoT Hub.

Dashboard your TimescaleDB data with Grafana

In an earlier post, I looked at storing time-series data with TimescaleDB on Azure Database for PostgreSQL. To visualize your data, there are many options as listed here. Because TimescaleDB is built on PostgreSQL, you can use any tool that supports PostgreSQL such as Power BI or Tableau.

Grafana is a bit of a special case because TimescaleDB engineers actually built the data source, which is designed to take advantage of the time-series capabilities. For a detailed overview of the capabilities of the data source, see the Grafana documentation.

Let’s take a look at a simple example to get started. I have a hypertable called conditions with four columns: time, device, temperature, humidity. An IoT Simulator is constantly writing data for five devices: pg-1 to pg-5.

On a multi-tier deployment of Grafana, I added the PostgreSQL data source:

PostgreSQL data source in Grafana

One setting in the data source is particularly noteworthy:

TimescaleDB support in the PostgreSQL datasource

Grafana has the concept of macro’s such as $_timeGroup or $_interval, as noted in the preceding image. The macro is translated to what the underlying data source supports. In this case, with TimescaleDB enabled, the macro results in the use of time_bucket, which is specific for TimescaleDB.

Creating a dashboard

Create a dashboard from the main page:

Creating a new dashboard

You will get a new dashboard with an empty panel:

Click Add Query. You will notice Grafana proposes a query. In this case it is very close because we only have one data source and table:

Grafana proposes the following query

Let’s modify this a bit. In the top right corner, I switched the time interval to last 30 minutes. Because the default query uses WHERE Macro: $_timeFilter, only the last 30 minutes will be shown. That’s another example of a macro. I would like to show the average temperature over 10 second intervals. That is easy to do with a GROUP BY and $_interval. In GROUP BY, click the + and type or select time to use the time field. You will notice the following:

GROUP BY with $_interval

Just click $_interval and select 10s. Now add the humidity column to the SELECT statement:

Adding humidity

When you click the Generated SQL link, you will see the query built by the query builder:

Generated SQL

Notice that the query uses time_bucket. The GROUP BY 1 and ORDER BY 1 just means group and order on the first field which is the time_bucket. If the query builder is not sufficient, you can click Edit SQL and specify your query directly. When you switch back to query builder, your custom SQL statement might be overwritten if the builder does not support it.

When you save your dashboard, you should see something like:

Pretty boring temperature and humidity graphWi

Now, let’s add a few gauges. In the top right row of icons, the first one should be Add panel. Choose the Gauge visualization and set your query:

Temperature Gauge

In Visualization, set Stat to Current:

Stat field on current

When the panel is finished, navigate back to the dashboard and duplicate the gauge. Modify the duplicated gauge to show humidity. Also change the titles. The dashboard now looks like:

Conditions dashboard

Grafana can be configured to auto refresh the dashboard. In the image below, refresh was set to every 5 seconds:

Setting auto refresh

Your dashboard will now update every 5 seconds for a more dynamic experience.

Joins

You can join hypertables with regular tables quite easily. This is one of the advantages of using a relational database such as PostgreSQL for your time-series data. The screenshot below shows a graph of the temperature per device location. The device location is stored in a regular table.

Join between hypertable and regular table: they are all just tables in the end

Here is the full dashboard:

Conclusion

Grafana, in combination with PostgreSQL and TimescaleDB, is a flexible solution for dashboarding your IoT time-series data. We have only scratched the surface here but it’s clear you can be up and running fast! Give it a go and tell me what you think in the comments or via @geertbaeke!

Azure Functions with Consumption Plan on Linux

In a previous post, I talked about saving time-series data to TimescaleDB, which is an extension on top of PostgreSQL. The post used an Azure Function with an Event Hub trigger to save the data in TimescaleDB with a regular INSERT INTO statement.

The Function App used the Windows runtime which gave me networking errors (ECONNRESET) when connecting to PostgreSQL. I often encounter those issues with the Windows runtime. In general, for Node.js, I try to stick to the Linux runtime whenever possible. In this post, we will try the same code but with a Function App that uses the Linux runtime in a Consumption Plan.

Make sure Azure CLI is installed and that you are logged in. First, create a Storage Account:

az storage account create --name gebafuncstore --location westeurope --resource-group funclinux --sku Standard_LRS

Next, create the Function App. It references the storage account you created above:

az functionapp create --resource-group funclinux --name funclinux --os-type Linux --runtime node --consumption-plan-location westeurope --storage-account gebafuncstore

You can also use a script to achieve the same results. For an example, see
https://docs.microsoft.com/en-us/azure/azure-functions/scripts/functions-cli-create-serverless.

Now, in the Function App, set the following Application Settings. These settings will be used in the code we will deploy later.

  • host: hostname of the PostgreSQL server (e.g. servername.postgres.database.azure.com)
  • user: user name (e.g. user@servername)
  • password
  • database: name of the PostgreSQL database
  • EH: connection string to the Event Hub interface of your IoT Hub; if your are unsure how to set this, see this post

You can set the above values from the Azure Portal:

Application Settings of the Function App

The function uses the first four Application Settings in the function code via process.env:

Using Application Settings in JavaScript

The application setting EH is used to reference the Event Hub in function.json:

function.json with Event Hub details such as the connection, cardinality and the consumerGroup

Now let’s get the code from my GitHub repo in the Azure Function. First install Azure Function Core Tools 2.x. Next, create a folder called funcdemo. In that folder, run the following commands:

git clone https://github.com/gbaeke/pgfunc.git
cd pgfunc
npm install
az login
az account show

The npm install command installs the pg module as defined in package.json. The last two commands log you in and show the active subscription. Make sure that subscription contains the Function App you deployed above. Now run the following command:

func init

Answer the questions: we use Node and JavaScript. You should now have a local.settings.json file that sets the FUNCTIONS_WORKER_RUNTIME to node. If you do not have that, the next command will throw an error.

Now issue the following command to package and deploy the function to the Function App we created earlier:

func azure functionapp publish funclinux

This should result in the following feedback:

Feedback from function deployment

You should now see the function in the Function App:

Deployed function

To verify that the function works as expected, I started my IoT Simulator with 100 devices that send data every 5 seconds. I also deleted all the existing data from the TimescaleDB hypertable. The Live Metrics stream shows the results. In this case, the function is running smoothly without connection reset errors. The consumption plan spun up 4 servers:

Live Metrics Stream of IoT Hub to PostgreSQL function

IoT with Azure Database for PostgreSQL and TimescaleDB

In IoT projects, the same question always comes up: “Where do we store our telemetry data?”. As usual, the answer to that question is not straightforward. We have seen all kinds of solutions in the wild:

  • save directly to a relational database (SQL Server, MySQL, …)
  • save to a data lake and/or SQL
  • save to Cosmos DB or similar (e.g. MongoDB)
  • save to Azure Table Storage or similar
  • save to Time Series Insights

Saving the data to a relational database is often tempting. It fits in existing operational practices and it is easy to extract, transform and visualize the data. In practice, I often recommend against this approach except in the simplest of use cases. The reason is clear: these databases are not optimized for fast ingestion of time-series data. Instead, you should use a time-series database which is optimized for fast ingest and efficient processing of time-series data.

TimescaleDB

TimescaleDB is a an open-source time-series databases optimized for fast ingest even when the amount of data stored becomes large. It does not stand on its own, as it runs on PostgreSQL as an extension. Note that you can store time-series in a regular table or as a TimescaleDB hypertable. The graphic below (from this post), shows the difference:

A comparison on Azure PostgreSQL with and without TimescaleDB and observed degradation in insert performance over time.
Test on general purpose compute Gen 5 with 8 vCores, 45GB RAM with Premium Storage

The difference is clear. With a regular table, the insert rate is lowered dramatically when the amount of data becomes large.

The TimescaleDB extension can easily be installed on Azure Database for PostgreSQL. Let’s see how that goes shall we?

Installing TimescaleDB

To create an Azure Database for PostgreSQL instance, I will use the Azure CLI with the db-up extension:

az postgres up -g RESOURCEGROUP -s SERVERNAME -d DBNAME -u USER -p PASSWORD 

The server name you provide should result in a unique URL for your database (e.g. servername.postgres.database.azure.com).

Tip: do not use admin as the user name 👍

When the server has been provisioned, modify the server confguration for the TimescaleDB extension:

az postgres server configuration set --resource-group RESOURCEGROUP ––server-name SERVERNAME --name shared_preload_libraries --value timescaledb

Now you need to actually install the extension. Install pgAdmin and issue the following query:

CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

In pgAdmin, you should see extra schemas for TimescaleDB:

Extra schemas for TimescaleDB

Creating a hypertable

A hypertable uses partitioning to optimize writing and reading time-series data. Creating such a table is straightforward. You start with a regular table:

CREATE TABLE conditions (   
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL );

Next, convert to a hypertable:

SELECT create_hypertable('conditions', 'time');

The above command partitions the data by time, using the values in the time column. By default, the time interval for partitioning is set to 7 days, starting from version 0.11.0 of TimescaleDB. You can override this by setting chunk_time_interval when creating the hypertable. You should make sure that the chunk belonging to the most recent interval can fit into memory. According to best practices, such a chunk should not use more than 25% of main memory.

Now that we have the hypertable, we can write time-series data to it. One advantage of being built on top of a relational database such as PostgreSQL is that you can use standard SQL INSERT INTO statements. For example:

const query = 'insert into conditions(time, device, temperature, humidity) values(NOW(),\'' + eventHubMessage.device + '\',' +         + eventHubMessage.temperature + ',' + eventHubMessage.humidity + ');';

The example above is from an Azure Function we will look at in a moment. In extracts values from a message received via IoT Hub and inserts them into the hypertable via an INSERT INTO query.

Let’s take a look at the Azure Function next.

Azure Function: from IoT Hub to the Hypertable

The Azure Function is kept bare bones to focus on the essentials. Note that you will need to open the console and install the pg module with the following command:

npm install pg

The image below shows the Azure Function (based on this although the article does not use a hypertable and stores the telemetry as JSON).

Bare bones Azure Function to write IoT Hub data to the hypertable

Naturally, the Azure Function above requires an Azure Event Hubs trigger. In this case, event hub cardinality was set to One. More information here. Note that you should NOT use the NOW() function to set the time. It’s only used here for demo purposes. Instead, you should take the timestamp sent by the device or the time the data was queued at the Event Hub!

Naturally, you will also need an IoT Hub where you send your data. In this case, I created a standard IoT Hub and used the IoT Hub Visual Studio Code extension to generate code (1 in image below) to send sample messages. I modified the code somewhat to include the device name (2 in image below):

Visual Studio toolkit used to create the device, generate code and modify the code

Now we can run the code (saved as sender.js) with:

node sender.js

Note: do not forget to first run npm install azure-iot-device

Data is being sent:

Data sent to IoT Hub

Data processed by the Azure Function as viewed in Application Insights Live Metrics stream:

Application Insights Live Metrics stream

With only one device sending data, there isn’t that much to do! In pgAdmin, you should see connections from at least one of the Azure Function hosts that are active:

Connections to PostgreSQL

Note: I encountered some issues with ECONNRESET errors under higher load; take a look at this post which runs the same function on a Linux Consumption Plan

Querying the data

TimescaleDB, with help from PostgreSQL, has a rich query language especially when compared to some other offerings. Yes, I am looking at you Cosmos DB! 😉 Below are some examples (based on the documentation at https://docs.timescale.com/v1.2/using-timescaledb/reading-data:

SELECT COUNT(*) FROM conditions   WHERE time > NOW() - interval '1 minute';

The above query simply counts the messages in the last minute. Notice the flexibility in expressing the time which is what we want from time-series databases.

SELECT time_bucket('1 minute', time) AS one_min, device, COUNT(*),     MAX(temperature) AS max_temp, MAX(humidity) AS max_hum FROM conditions   WHERE time > NOW() - interval '10 minutes' GROUP BY one_min, device   ORDER BY one_min DESC, max_temp DESC;

The above query displays the following result:

Conclusion

When dealing with time-series data, it is often beneficial to use a time-series database. They are optimized to ingest time-series data at high speed and greater efficiency than general purpose SQL or NoSQL databases. The fact that TimescaleDB is built on PostgreSQL means that it can take advantage of the flexibility and stability of PostgreSQL. Although there are many other time-series databases, TimescaleDB is easy to use when coupled with PaaS (platform-as-a-service) PostgreSQL offerings such as Azure Database for PostgreSQL.

Adaptable IoT

On May 24, 2017 I gave a short partner session at Techorama, a technology event in Belgium for both developers and IT Pros. You can find the slides on SlideShare:

Since it was a short session and a short slide deck, this post provides a bit more background information.

First, what do I mean with Adaptable IoT? Basically, an IoT solution should be adaptable at two levels:

  1. The IoT platform: use a platform that can be easily adapted to new conditions such as changed business needs or higher scaling requirements; a platform that allows you to plug in new services
  2. The application you write on the platform: use a flexible architecture that can easily be changed according to changing business needs; and no, that does not mean you have to use microservices

The presentation mainly focuses on the first point, which deals with the platform aspects that should be adaptable end-to-end at the following levels:

  • Devices and edge: devices should not be isolated in the field which means you should provide a two-way communication channel, a way to update firmware and write robust device code as a base requirement
  • Ingestion and management: with most platforms, the service used for ingestion of telemetry also provides management
  • Processing: the platform should be easy to extend with extra processing steps with limited impact on the existing processing pipeline
  • Storage: the platform should provide flexible storage options for both structured and unstructured data
  • Analytics: the platform should provide both descriptive and predictive analytics options that can be used to answer relevant business questions

Before continuing, note that this post focuses on Microsoft Azure with its Azure IoT Suite. The concepts laid out in this post can apply to other platforms as well!

Devices and Edge

There is a lot to say about devices and edge. What we see in the field is that most tend to think that the devices are the easy part. In fact, devices tend to be the most difficult part in an end-to-end IoT solution. Prototyping is easy because you can skip many of the hard parts you encounter in production:

  • Use Arduino or platforms such as particle.io: they are easy to use but do not give you full access to the underlying hardware and speed might be an issue
  • To demonstrate that it works, you can use simple and cheap sensors. But do they work in the long run? What about calibration?
  • You can use any library you find on the net but stability and accuracy might be an issue in production and even in the prototyping phase!
  • You can store secrets to connect to your back-end application directly in the sketch. In production however, you will need to store them securely.
  • Using TLS for secure connections is easy, provided the hardware and libraries support it. But what about certificate checks and expiry of root and leaf certificates?
  • You can just use WiFi because it is easy and convenient.

When you move to production and you want to create truly adaptable devices, you will need to think about several things:

  • Drop Arduino and move to C/C++ directly on the metal; heck, maybe you even have to throw in some assembler depending on the use case (though I hope not!); your focus should be on stability, speed and power usage.
  • Provide two-way communications so that devices can send telemetry and status messages to the back-end and the back-end can send messages back.
  • Make sure you can send messages to groups of devices (e.g. based on some query)
  • Provide a firmware update mechanism. Easier said than done!
  • Make sure the device is secure. Store secrets in a crypto chip.
  • Use stable and supported libraries such as the Azure IoT device SDK for C

Take into account that many devices will not be able to connect to your back-end directly, requiring a gateway at the edge. The edge should be adaptable as well, with options to do edge processing beyond merely relaying messages. What are some of those additional edge features?

  • Inference based on a machine learning algorithm trained in the cloud (e.g. anomaly detection)
  • Aggregation of data (e.g. stream processing with windowing)
  • Launch compute tasks based on conditions (e.g. launch an Azure Function when an anomaly is detected)

Ideally, the edge components are developed and tested in the cloud and then exported to the edge. Azure IoT Edge provides that functionality and uses containers to encapsulate the functionality described above.

Ingestion and management

The central service in the Azure IoT Suite for ingestion and management is Azure IoT Hub. It is highly scalable and makes your IoT solution adaptable by providing configuration and reporting mechanisms for devices. The figure below illustrates what is possible:

iothub

Device Twin functionality provides you with several options to make the solution adaptable and highly configurable:

  • From the back-end, you set desired properties that your devices can pick up. For instance, set a reporting interval to instruct the device to send telemetry more often
  • From the device, you send reported properties like battery status or available memory so you can act accordingly (e.g. send the user an alert to charge the device)
  • From the back-end, set tags to group devices (e.g. set the device location such as building, floor, room, etc…)

In a previous post, I already talked about setting desired properties with Device Twins and that today, you need to use the MQTT protocol to make this work. You can use the MQTT protocol directly or as part of one of the Azure Device SDKs where the protocol can simply be set as configuration.

The concept of jobs makes the solution even more adaptable since desired properties can be set on a group of devices using a query. By creating a query like ‘all devices where tag.building=buildingX’, you can set a desired property like the reporting interval on hundreds of devices at once.

Processing

The selected cloud platform should allow you to create an adaptable processing pipeline. With IoT Hub, the telemetry is made available to downstream components with a multi-consumer queue. An example is shown below:

processing

It is relatively easy to plug in new downstream components or modiy components. As an example, Microsoft recently made Time Series Insights available that uses an IoT Hub or an Event Hub as input. In a recent blogpost, I already described that service. Even if you already have an existing pipeline, it is simple to plug in Time Series Insights and to start using it to analyze your data.

IoT Hub Device Twin and MQTT

When you connect to IoT Hub with MQTT directly, you need to connect with a ClientId, username and password. Those three values need to be set according to Azure IoT Hub specificiations:

  • ClientId: use the IoT Hub deviceId
  • Username: use {iothubhostname}/{deviceId}/api-version=2016-11-14
  • Password: use a SAS token

When you connect with MQTT, you will notice it also works if you just use {iothubhostname}/{device_id}. You will be able to send telemetry to the devices/{deviceId}/messages/events/ topic and receive cloud-to-device messages by subscribing to the devices/{deviceId}/messages/devicebound/# topic.

With MQTT, you can also update a reported property in the Device Twin. You should do that as follows:

  • Subscribe to $iothub/twin/res/# to receive a message after you report a property; the message will indicate success or failure like a 204 status when a property is updated
  • Send a message to topic $iothub/twin/PATCH/properties/reported/?$rid={rid} with the properties in the Json payload; {rid} is a value you set to match it up with the message you get back

If I want to set a property called freeRam, I would send the following message to topic $iothub/twin/PATCH/properties/reported/?$rid={rid}:

{ “freeRam”: 27364 }

Although this is easy enough, do not make the same mistake as I did: include the api-version=2016-11-14 in the MQTT username. If you don’t, IoT Hub will disconnect your client because Device Twins are only supported in recent incarnations of IoT Hub. Took me a few hours to troubleshoot… Winking smile

You can test all this from a client such as MQTT.fx. Install that client and in the settings, add a new connection profile. In the profile, specify the IoT Hub hostname in broker address, set the port to 8883 and set the client to a device Id that exists in your IoT Hub. Also set the MQTT version to 3.1.1 specifically. In User Credentials, specify the username and password and do not forget the api version. In SSL/TLS, enable SSL/TLS. Note: use Device Explorer to create a SAS token for your device from the Management tab.

Next, subscribe to $iothub/twin/res/#:

image

 

Then, send a freeRam property to the device like so (on topic $iothub/twin/PATCH/properties/reported/?$rid={rid} where you set {rid} to any value):

image

 Note: to delete a property, send the null value

In Subscribe, you will get the result of the PATCH operation which mentions the {rid} you specified and also reports the version which indicates the amount of times the property was changed. Also notice the status of 204 which means the property was updated.

image

 

By the way, if you want to retrieve the twin properties, just send an empty message to $iothub/twin/GET/?$rid={rid}. The result will be the desired and reported properties of the Device Twin in Json:

image

 

In the Azure Portal:

image

Hope this helps when trying to work with Device Twins from a device with MQTT directly (and not the IoT Hub Device SDKs)!

IoT Hub and Azure Time Series Insights

Azure Time Series Insights is a new service that makes it very easy to store and visualize time series data. In this blog post, we will create a dashboard that looks like the one below (click to enlarge):

image

The dashboard has four sections:

  • Query1: a heat map of events per device; in this case there are 20 devices sending data every 2 seconds
  • Query2: a line graph with random “temperature” data
  • Query3: a line graph with both “temperature” and “humidity” data
  • Query4: a line graph with “humidity” data

The events are sent to an IoT Hub using the following JSON shape: {temperature: x, humidity: y} where x and y are randomized floating point numbers, generated by an IoT device simulator.

Step 1: Create IoT Hub

Install Azure CLI 2.0, and then use az login to login. Use az account list to list your subscriptions and use az account set –subscription name_or_id to set the default subscription. Next, issue the following commands to create a resource group and an IoT Hub (set location to your preference):

az group create --name resource_group_name --location westeurope
az iot hub create --sku F1 --name iot_hub_name --resource-group resource_group_name

As a best practice, create a separate consumer group on the Events endpoint. In the Azure Portal, in the properties of the IoT Hub, click Endpoints. Then click Events and add a consumer group underneath $Default. Click Save.

Record the Connection String – primary key setting of the device or  iothubowner Shared access policy. Click Shared Access Policies, and device to find this connection string. It will be in the form of:

HostName=iot_hub_name.azure-devices.net;SharedAccessKeyName=keyname;SharedAccessKey=b5dARuGPhL6wdgHboUIhEC6LlcFalIjfEdh4aXYa1WI=

You will need this connection string later to configure the IoT Simulator.

Step 2: Create Time Series Insights Environment

In the Azure Portal, click the green + and navigate to Internet of Things. Click Time Series Insights and follow the on-screen instructions. You will end up with:

image

I selected one unit of the S1 tier which is more than enough for this example.

Step 3: Set Data Access Policy

Even though you created the Time Series Insights Environment, you still need to grant yourself access to the data. Click Data Access Policies and add your user or group and a role of Contributor.

image

Step 4: Add Event Source

We will add the IoT Hub we created earlier as an event source. Click Event Sources and then click Add. Give the event source a name and set the source to IoT Hub. Then select an IoT Hub from your available subscriptions and do not forget to set the consumer group to the one you created in step 1. If your event data has a timestamp, you can enter the timestamp property name. If you do not specify the timestamp, the event enqueue time set by the IoT Hub will be used.

Note that Azure Time Series Insights also supports Event Hubs as an event source.

Step 5: Configure the IoT simulator

Head over to https://github.com/gbaeke/iot-simulator/releases/tag/v0.3 and download iot-simulator.exe to a folder of your choice. In the same folder add a file called config.json with the following contents:

{
     "Interval":5,
     "IoTHubs":["iot_hub_name.azure-devices.net”],
     "SasTokens":["SharedAccessSignature sr=..."],
     "DevGroups":[
        {"Prefix":"ts","DeviceNum":20,"Firmware":"1.0","IoTHub": 0}
     ]
}

In the SasTokens array, replace SharedAccessSignature sr=… with a Sas token that has the necessary rights to submit events to the IoT Hub. One way of doing so, is with Device Explorer. Once installed, copy the connection string from step 1 in the connection string box and click Generate SAS. Copy the Sas token in the config.json file.

image

With the config.json correctly configured, from a command prompt, start iot-simulator.exe. It will connect to the IoT Hub, create the devices and start sending data every 5 seconds from every device. In the sample config file, you can set the interval in seconds (Interval) and the amount of devices (DeviceNum). To clean up the devices, run iot-simulator.exe –r.

Step 6: Visualize the data

Now go to https://insights.timeseries.azure.com and login with the credentials you used in step 3. You will get a screen to select data. I selected Last 60 Mins from the quick times dropdown and then clicked the search icon:

image

In the following screen, click Heatmap and then configure the box at the left with a descriptive title. Also select a split by deviceid to have an idea about the number of events per time window per device and to spot devices that stopped sending data.

image

Now, at the right top corner, click the circle with the four squares. You end up with:

image

Now click the + in the top, right section. Select a time range again and then, at the left, change the measure from Events to Temperature. Automatically, the temperature will be averaged over the interval size. Change the term (Term 1) to Temperature and click the circle with the four squares again.

The temperature line graph has been added and you can now click the copy icon and create the same visualization for humidity.

image

Now it’s easy to create the other panel with both temperature and humidity. Give it a go and try out other visualizations. When you are finished, you can click the Save icon and save this perspective. Yep, these visualizations are called perspectives!

It’s still early days for the service and many features will be added in the near future. If you are already working with event data coming into an Event Hub and IoT Hub, it should be easy to add a new consumer group and start analyzing the data with this service.

Getting started with Kubernetes on Azure

As you may or may not know, at Xylos we have developed an IoT platform to support sensor networks of any kind. The back-end components are microservices running as containers on Rancher, a powerful and easy to use container orchestration tool. In the meantime, we are constantly evaluating other ways of orchestrating containers and naturally, Azure Container Services is one of the options. Recently, Microsoft added support for Kubernetes so we decided to check that out.

Instead of the default “look, here’s how you deploy an nginx container”, we will walk through an example of an extremely simple microservices application written in Go with the help of go-micro, a microservices toolkit. Now, I have to warn you that I am quite the newbie when it comes to Go and go-micro. If you have remarks about the code, just let me know. This post will not explain the Go services however, so let’s focus on deploying a Kubernetes cluster first and deploying the finished containers. Subsequent posts will talk about the services in more detail.

With the help of Azure CLI 2.0, deploying Kubernetes could not be simpler. You will find full details about installation on https://docs.microsoft.com/en-us/cli/azure/install-azure-cli. The CLI runs on Windows, Linux and macOS. For this post, I used macOS. If you are a bit unsure about how the Azure CLI works, check out this post: https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli.

After installation, use az login to authenticate and az account to set the default subscription. After that you are all set to deploy Kubernetes. First, create a resource group for the cluster:

az group create --name=rgname --location=westeurope

After the above command (use any name as resource group), use the following command to create a Kubernetes cluster with only one master and two agents and use a small virtual machine size. We do this to keep costs down while testing.

az acs create --orchestrator-type=kubernetes --resource-group=rgname --name=clustername --generate-ssh-keys --agent-count=2 --master-count=1 --agent-vm-size=Standard_A1_v2

Tip: to know the other virtual machine sizes in a region (like westeurope) use az vm list-sizes --location=westeurope

Note that in the az acs command, we auto-generate SSH keys. These are used to interact with the cluster and you can of course create your own. When you use generate-ssh-keys, you will find them in your home folder in the .ssh folder (id_rsa and id_rsa.pub files).

Now you need a way to administer the Kubernetes cluster. You do that with the kubectl command-line tool. Get kubectl with the following command:

az acs kubernetes install-cli

The kubectl tool needs a configuration file that instructs the tool where to connect and the credentials to use. Just use the following command to get this configured:

az acs kubernetes get-credentials --resource-group=rgname --name=clustername

Running the above command creates a config file in the .kube folder of your home folder. In the config file, you will see a https location that kubectl connects to, in addition to user information such as a user name and certificates.

Now, as a test, lets deploy a part of the microservices application that exposes a REST API endpoint to the outside world (I call it the data API). To do so, do the following:

kubectl create -f https://raw.githubusercontent.com/gbaeke/go-data/master/go-data-dep.yaml

The above command creates a deployment from a configuration file that makes sure that there are two containers running that use the image gbaeke/go-data. Each container runs in its own pod. You can check this like so:

kubectl get pods

You will see something like:

image-2

Run kubectl get deployment to see the deployment. Use kubectl describe deployment dataapito obtain more details about the deployment.

You will not be able to access this API from the outside world. To do this, let’s create a service of type LoadBalancer which will also configure an Azure load balancer automatically (could have been done from the YAML file as well):

kubectl expose deployments dataapi --port=8080 --type=LoadBalancer

You can check the service with kubectl get service. After a while and by running the last command again, the external IP will appear. You should now be able to hit the service with curl like so:

curl http://IP_of_service:8080/data/device1

No matter what device id you type at the end, you will always get Device active: false because the device API has not been deployed yet. How the data API talks to the device API and how they use service registration in Kubernetes will be discussed in another post.

Tip: for those that cannot wait, just run kubectl create -f https://raw.githubusercontent.com/gbaeke/go-device/master/go-device-dep.yaml and then use curl again with device1 at the end (should return true). The above command deploys the device API so that the data API can find and use it to check if a device exists.

%d bloggers like this: