Securing access to and from Azure Functions

I am often asked how to secure access to and from Azure Functions that are not running in an App Service Environment (ASE). An App Service Environment allows you to safeguard your apps in a subnet of your Azure Virtual Network. In a sense, it gives you a private deployment of Azure App Service that you can secure with Azure Firewall, Network Security Groups (NSGs) or Network Virtual Appliances (NVAs).

When you use Azure Functions in a regular App Service Plan or Premium plan, you will need to rely on Virtual Network Service Endpoints and App Service network integration to achieve similar results.

In this post, we will look at an example of an Azure Function, running in a Premium plan, that queries CosmosDB. We will restrict incoming traffic to the Azure Function from a subnet and only allow CosmosDB to be queried by the same Azure Function. Here’s a diagram:

Incoming Traffic

To restrict incoming traffic to the Azure Function, navigate to the Function App in the portal and select Networking in Platform Features. You will see the following screen:

Azure Functions network features

We will configure the inbound restrictions via Configure Access Restrictions. You can configure restrictions for both the Function App itself and the scm site:

From the moment you add rules, a Deny All rule will appear. In the above rules, I allowed my private IP and the default subnet in the virtual network. The second rule configures the service endpoints. When you open the properties of the subnet, you will see:

Service Endpoint of type Microsoft.Web

Great! When you try to access the function from any other location, you will get a 403 error from the Azure Functions front-end. So don’t expect a connection timeout like with regular network security rules.

Outgoing traffic

The example Azure Function uses an HTTP trigger and a Cosmos DB input (cosmos). Documents contain a name property. The query outputs the name found on the first document:

module.exports = async function (context, req, cosmos) {
context.log(cosmos);
context.res = {
body: "hello " + cosmos[0].name
}};

In order to secure access to Cosmos DB, two features were used:

  1. Azure Functions VNet Integration (VNet integration is currently in preview)
  2. Cosmos DB network service endpoints to restrict access to the subnet that provides the Azure Function hosts with an IP address

Configuring the VNet integration is straightforward, especially when compared to the old style of integration which required a VPN tunnel:

App Service (including Azure Functions) VNet integration

As you can see in the above screenshot, you delegate a subnet to the App Service hosts. In my case, that is subnet func-sec:

Subnet delegated to a service (Microsoft.Web/serverFarms)

The bottom of the screenshot shows the subnet is delegated to the Microsoft.Web/serverFarms service. That is the result of the VNet integration.

You can also see the subnet has service endpoints configured for Cosmos DB. That is the result of the Cosmos DB configuration below:

Service endpoint config in Cosmos DB

In Cosmos DB, an existing virtual network was added. I did not enable the Accept connections from within public Azure datacenters option.

When you remove the service endpoint and you run the Azure Function, the following error is thrown:

Unable to proceed with the request. Please check the authorization claims to ensure the required permissions to process the request. ActivityId: 03b2c11f-2b21-44c9-ab44-61b4864539fe, Microsoft.Azure.Documents.Common/2.2.0.0, Windows/10.0.14393 documentdb-netcore-sdk/2.2.0

Does it work from a VM in the default subnet?

If all went well, I should be able to call the Azure Function from the virtual machine in the default subnet. Let’s try with curl:

Yes, itsme!

The name field in the first document is set to itsme so it worked! Great, the function can be called from the default subnet. In case you are wondering about the use of -p in the ssh command: this virtual machine sat behind an Azure Firewall and the VM ssh port was exposed via a DNAT rule over a random port.

From another location, the following error is shown (wrapped around some HTML but this is the main error):

Error 403 - This web app is stopped

Conclusion

With virtual network service endpoints now available for most Azure PaaS (platform as a service) components, you can ensure those services are only accessed from intended locations. In this example, you saw how to secure access to Azure Functions and Cosmos DB. Service endpoints combined with the App Service VNet integration make it straightforward to secure a Function App end-to-end.

Querying Postgres with GraphQL

I wanted a quick and easy way to build an API that retrieves the ten latest events from a stream of data sent to a TimescaleDB hypertable. Since such a table can be queried by any means supported by Postgres, I decided to use Postgraphile, which automatically provides a GraphQL server for a database.

If you have Node.js installed, just run the following command:

npm install -g postgraphile

Then run the following command to start the GraphQL server:

postgraphile -c "postgres://USER@SERVER:PASSWORD@SERVER.postgres.database.azure.com/DATABASE?ssl=1" --simple-collections only --enhance-graphiql

Indeed, I am using Azure Database for PostgreSQL. Replace the strings in UPPERCASE with your values. I used simple-collections only to, eh, only use simple collections which makes it, well, simpler. πŸ‘πŸ‘πŸ‘

Note: the maintainer of Postgraphile provided a link to what simple-collections actually does; take a look there for a more thorough explanation πŸ˜‰

The result of the above command looks like the screenshot below:

GraphQL Server started

You can now navigate to http://localhost:5000/graphiql to try some GraphQL queries in an interactive environment:

GraphiQL, enhanced with the –enhance-graphiql flag when we started the server

In the Explorer to the left, you can easily click the query together. In this case, that is easy to do since I only want to query a single table an obtain the last ten events for a single device. The resulting query looks like so:

{
allConditionsList(condition: {device: "pg-1"}, orderBy: TIME_DESC, first: 10) {
time
device
temperature
}
}

allConditionsList gets created by the GraphQL server by looking at the tables of the database. Indeed, my database contains a conditions table with time, device, temperature and humidity columns.

To finish off, let’s try to obtain the data with a regular POST method to http://localhost:5000/graphql. This is the command to use:

curl -X POST -H “Content-Type: application/json” -d ‘{“query”:”{\n allConditionsList(condition: {device: \”pg-1\”}, orderBy: TIME_DESC, first: 10) {\n time\n device\n temperature\n }\n}\n”,”variables”:null}’ http://localhost:5000/graphql

Ugly but it works. To be honest, there is some noise in the above command because of the \n escapes. They are the result of me grabbing the body from the network traffic sent by GraphiQL:

Yes, lazy me grabbing the request payload from GraphiQL and not cleaning it up πŸ˜‰

There is much, much, much more you can do with GraphQL in general and PostGraphile in particular but this was all I needed for now. Hopefully this can help you if you have to throw something together quickly. In a production setting, there is of course much more to think about: hosting the API (preferably in a container), authentication, authorization, performance, etc…

Further improvements to the IoT Hub to TimescaleDB Azure Function

In the post Improving an Azure Function that writes IoT Hub data to TimescaleDB, we added some improvements to an Azure Function that uses the Event Hub trigger to write messages from IoT Hub to TimescaleDB:

  • use of the Event Hub enqueuedTime timestamp instead of NOW() in the INSERT statement (yes, I know, using NOW() did not make sense πŸ˜‰)
  • make the code idempotent to handle duplicates (basically do nothing when a unique constraint is violated)

In general, I prefer to use application time (time at the event publisher) versus the time the message was enqueued. If you don’t have that timestamp, enqueuedTime is the next best thing.

How can we optimize the function even further? Read on about the cardinality setting!

Event Hub trigger cardinality setting

Our JavaScript Azure Function has its settings in function.json. For reference, here is its content:

{
"bindings": [
{
"type": "eventHubTrigger",
"name": "IoTHubMessages",
"direction": "in",
"eventHubName": "hub-pg",
"connection": "EH",
"cardinality": "one",
"consumerGroup": "pg"
}
]
}

Clearly, the function uses the eventHubTrigger for an Event Hub called hub-pg. In connection, EH refers to an Application Setting which contains the connections string to the Event Hub. Yes, I excel at naming stuff! The Event Hub has defined a consumer group called pg that we are using in this function.

The cardinality setting is currently set to “one”, which means that the function can only process one message at a time. As a best practice, you should use a cardinality of “many” in order to process batches of messages. A setting of “many” is the default.

To make the required change, modify function.json and set cardinality to “many”. You will also have to modify the Azure Function to process a batch of messages versus only one:

Processing batches of messages

With cardinality set to many, the IoTHubMessages parameter of the function is now an array. To retrieve the enqueuedTime from the messages, grab it from the enqueuedTimeUtcArray array using the index of the current message. Notice I also switched to JavaScript template literals to make the query a bit more readable.

The number of messages in a batch is controlled by maxBatchSize in host.json. By default, it is set to 64. Another setting,prefetchCount, determines how many messages are retrieved and cached before being sent to your function. When you change maxBatchSize, it is recommended to set prefetchCount to twice the maxBatchSize setting. For instance:

{
"version": "2.0",
"extensions": {
"eventHubs": {
"batchCheckpointFrequency": 1,
"eventProcessorOptions": {
"maxBatchSize": 128,
"prefetchCount": 256
}
}
}
}

It’s great to have these options but how should you set them? As always, the answer is in this book:

Afbeeldingsresultaat voor it depends joke

A great resource to get a feel for what these settings do is this article. It also comes with a Power BI report that allows you to set the parameters to see the results of load tests.

Conclusion

In this post, we used the function.json cardinality setting of “many” to process a batch of messages per function call. By default, Azure Functions will use batches of 64 messages without prefetching. With the host.json settings of maxBatchSize and prefetchCount, that can be changed to better handle your scenario.

Revisiting Rancher

Several years ago, when we started our first adventures in the wonderful world of IoT, we created an application for visualizing real-time streams of sensor data. The sensor data came from custom-built devices that used 2G for connectivity. IoT networks and protocols such as SigFox, NB-IoT or Lora were not mainstream at that time. We leveraged what were then new and often preview-level Azure services such as IoT Hub, Stream Analytics, etc… The architecture was loosely based on lambda architecture with a hot and cold path and stateful window-based stream processing. Fun stuff!

Kubernetes already existed but had not taken off yet. Managed Kubernetes services such as Azure Kubernetes Service (AKS) weren’t a thing.

The application (end-user UI and management) was loosely based on a micro-services pattern and we decided to run the services as Docker containers. At that time, Karim Vaes, now a Program Manager for Azure Storage, worked at our company and was very enthusiastic about Rancher. , Rancher was still v1 and we decided to use it in combination with their own container orchestration framework called Cattle.

Our experience with Rancher was very positive. It was easy to deploy and run in production. The combination of GitHub, Shippable and the Rancher CLI made it extremely easy to deploy our code. Rancher, including Cattle, was very stable for our needs.

In recent years though, the growth of Kubernetes as a container orchestrator platform has far outpaced the others. Using an alternative orchestrator such as Cattle made less sense. Rancher 2.0 is now built around Kubernetes but maintains the same experience as earlier versions such as simple deployment and flexible configuration and management.

In this post, I will look at deploying Rancher 2.0 and importing an existing AKS cluster. This is a basic scenario but it allows you to get a feel for how it works. Indeed, besides deploying your cluster with Rancher from scratch (even on-premises on VMware), you can import existing Kubernetes clusters including managed clusters from Google, Amazon and Azure.

Installing Rancher

For evaluation purposes, it is best to just run Rancher on a single machine. I deployed an Azure virtual machine with the following properties:

  • Operating system: Ubuntu 16.04 LTS
  • Size: DS2v3 (2 vCPUs, 8GB of RAM)
  • Public IP with open ports 22, 80 and 443
  • DNS name: somename.westeurope.cloudapp.azure.com

In my personal DNS zone on CloudFlare, I created a CNAME record for the above DNS name. Later, when you install Rancher you can use the custom DNS name in combination with Let’s Encrypt support.

On the virtual machine, install Docker. Use the guide here. You can use the convenience script as a quick way to install Docker.

With Docker installed, install Rancher with the following command:

docker run -d --restart=unless-stopped -p 80:80 -p 443:443 \
rancher/rancher:latest --acme-domain your-custom-domain

More details about the single node installation can be found here. Note that Rancher uses etcd as a datastore. With the command above, the data will be in /var/lib/rancher inside the container. This is ok if you are just doing a test drive. In other cases, use external storage and mount it on /var/lib/rancher.

A single-node install is great for test and development. For production, use the HA install. This will actually run Rancher on Kubernetes. Rancher recommends a dedicated cluster in this scenario.

After installation, just connect https://your-custom-domain and provide a password for the default admin user.

Adding a cluster

To get started, I added an existing three-node AKS cluster to Rancher. After you add the cluster and turn on monitoring, you will see the following screen when you navigate to Clusters and select the imported cluster:

Dashboard for a cluster

To demonstrate the functionality, I deployed a 3-node cluster (1.11.9) with RBAC enabled and standard networking. After deployment, open up Azure Cloud shell and get your credentials:

az aks list -o table
az aks get-credentials -n cluster-name -g cluster-resource-group
kubectl cluster-info

The first command lists the clusters in your subscription, including their name and resource group. The second command configures kubectl, the Kubernetes command line admin tool, which is pre-installed in Azure Cloud Shell. To verify you are connected, the last command simply displays cluster information.

Now that the cluster is deployed, let’s try to import it. In Rancher, navigate to GlobalClusters and click Add Cluster:

Add cluster via Import

Click Import, type a name and click Create. You will get a screen with a command to run:

kubectl apply -f https://your-custom-dns/v3/import/somerandomtext.yaml

Back in the Azure Cloud Shell, run the command:

Running the command to prepare the cluster for import

Continue on in Rancher, the cluster will be added (by the components you deployed above):

Cluster appears in the list

Click on the cluster:

Top of the cluster dashboard

To see live metrics, you can click Enable Monitoring. This will install and configure Prometheus and Grafana. You can control several parameters of the deployment such as data retention:

Enabling monitoring

Notice that by default, persistent storage for Grafana and Prometheus is not configured.

Note: with monitoring enabled or not, you will notice the following error in the dashboard:

Controller manager and scheduler unhealthy?

The error is described here. In short, the components are probably healthy. The error is not related to a Rancher issue but an upstream Kubernetes issue.

When the monitoring API is ready, you will see live metrics and Grafana icons. Clicking on the Graphana icon next to Nodes gives you this:

Node monitoring with Prometheus and Grafana

Of course, Azure provides Container Insights for monitoring. The Grafana dashboards are richer though. On the other hand, querying and alerting on logs and metrics from Container Insights is powerful as well. You can of course enable them all and use the best of both worlds.

Conclusion

We briefly looked at Rancher 2.0 and how it can interact with a existing AKS cluster. An existing cluster is easy to add. Once it is added, adding monitoring is “easy peasy lemon squeezy” as my daughter would call it! πŸ˜‰ As with Rancher 1.x, I am again pleasantly surprised at how Rancher is able to make complex matters simpler and more fun to work with. There is much more to explore and do of course. That’s for some follow-up posts!

Improving an Azure Function that writes IoT Hub data to TimescaleDB

In an earlier post, I used an Azure Function to write data from IoT Hub to a TimescaleDB hypertable on PostgreSQL. Although that function works for demo purposes, there are several issues. Two of those issues will be addressed in this post:

  1. the INSERT INTO statement used the NOW() function instead of the enqueuedTimeUtc field; that field is provided by IoT Hub and represents the time the message was enqueued
  2. the INSERT INTO query does not use upsert functionality; if for some reason you need to process the IoT Hub data again, you will end up with duplicate data; you code should be idempotent

Using enqueuedTimeUtc

Using the time the event was enqueued means we need to retrieve that field from the message that our Azure Function receives. The Azure Function receives outside information via two parameters: context and eventHubMessage. The enqueuedTimeUtc field is retrieved via the context variable: context.bindingData.enqueuedTimeUtc.

In the INSERT INTO statement, we need to use TIMESTAMP ‘UCT time’. In JavaScript, that results in the following:

'insert into conditions(time, device, temperature, humidity) values(TIMESTAMP \'' + context.bindingData.enqueuedTimeUtc + '\',\'' + eventHubMessage.device + '\' ...

Using upsert functionality

Before adding upsert functionality, add a unique constraint to the hypertable like so (via pgAdmin):

CREATE UNIQUE INDEX on conditions (time, device); 

It needs to be on time and device because the time field on its own is not guaranteed to be unique. Now modify the INSERT INTO statement like so:

'insert into conditions(time, device, temperature, humidity) values(TIMESTAMP \'' + context.bindingData.enqueuedTimeUtc + '\',\'' + eventHubMessage.device + '\',' + eventHubMessage.temperature + ',' + eventHubMessage.humidity + ') ON CONFLICT DO NOTHING'; 

Notice the ON CONFLICT clause? When any constraint is violated, we do nothing. We do not add or modify data, we leave it all as it was.

The full Azure Function code is below:

Azure Function code with IoT Hub enqueuedTimeUtc and upsert

Conclusion

The above code is a little bit better already. We are not quite there yet but the two changes make sure that the date of the event is correct and independent from when the actual processing is done. By adding the constraint and upsert functionality, we make sure we do not end up with duplicate data when we reprocess data from IoT Hub.

Dashboard your TimescaleDB data with Grafana

In an earlier post, I looked at storing time-series data with TimescaleDB on Azure Database for PostgreSQL. To visualize your data, there are many options as listed here. Because TimescaleDB is built on PostgreSQL, you can use any tool that supports PostgreSQL such as Power BI or Tableau.

Grafana is a bit of a special case because TimescaleDB engineers actually built the data source, which is designed to take advantage of the time-series capabilities. For a detailed overview of the capabilities of the data source, see the Grafana documentation.

Let’s take a look at a simple example to get started. I have a hypertable called conditions with four columns: time, device, temperature, humidity. An IoT Simulator is constantly writing data for five devices: pg-1 to pg-5.

On a multi-tier deployment of Grafana, I added the PostgreSQL data source:

PostgreSQL data source in Grafana

One setting in the data source is particularly noteworthy:

TimescaleDB support in the PostgreSQL datasource

Grafana has the concept of macro’s such as $_timeGroup or $_interval, as noted in the preceding image. The macro is translated to what the underlying data source supports. In this case, with TimescaleDB enabled, the macro results in the use of time_bucket, which is specific for TimescaleDB.

Creating a dashboard

Create a dashboard from the main page:

Creating a new dashboard

You will get a new dashboard with an empty panel:

Click Add Query. You will notice Grafana proposes a query. In this case it is very close because we only have one data source and table:

Grafana proposes the following query

Let’s modify this a bit. In the top right corner, I switched the time interval to last 30 minutes. Because the default query uses WHERE Macro: $_timeFilter, only the last 30 minutes will be shown. That’s another example of a macro. I would like to show the average temperature over 10 second intervals. That is easy to do with a GROUP BY and $_interval. In GROUP BY, click the + and type or select time to use the time field. You will notice the following:

GROUP BY with $_interval

Just click $_interval and select 10s. Now add the humidity column to the SELECT statement:

Adding humidity

When you click the Generated SQL link, you will see the query built by the query builder:

Generated SQL

Notice that the query uses time_bucket. The GROUP BY 1 and ORDER BY 1 just means group and order on the first field which is the time_bucket. If the query builder is not sufficient, you can click Edit SQL and specify your query directly. When you switch back to query builder, your custom SQL statement might be overwritten if the builder does not support it.

When you save your dashboard, you should see something like:

Pretty boring temperature and humidity graphWi

Now, let’s add a few gauges. In the top right row of icons, the first one should be Add panel. Choose the Gauge visualization and set your query:

Temperature Gauge

In Visualization, set Stat to Current:

Stat field on current

When the panel is finished, navigate back to the dashboard and duplicate the gauge. Modify the duplicated gauge to show humidity. Also change the titles. The dashboard now looks like:

Conditions dashboard

Grafana can be configured to auto refresh the dashboard. In the image below, refresh was set to every 5 seconds:

Setting auto refresh

Your dashboard will now update every 5 seconds for a more dynamic experience.

Joins

You can join hypertables with regular tables quite easily. This is one of the advantages of using a relational database such as PostgreSQL for your time-series data. The screenshot below shows a graph of the temperature per device location. The device location is stored in a regular table.

Join between hypertable and regular table: they are all just tables in the end

Here is the full dashboard:

Conclusion

Grafana, in combination with PostgreSQL and TimescaleDB, is a flexible solution for dashboarding your IoT time-series data. We have only scratched the surface here but it’s clear you can be up and running fast! Give it a go and tell me what you think in the comments or via @geertbaeke!

Multi-Tier Bitnami Grafana Stack on Azure

After seeing some tweets about Bitnami’s multi-tier Grafana Stack, I decided to give it a go. On the page describing the Grafana stack, there are several deployment offerings:

Grafana deployment offerings (Image: from Bitnami website)

I decided to use the multi-tier deployment, which deploys multiple Grafana nodes and a shared Azure Database for MariaDB.

On Azure, the Grafana stack is deployed via an Azure Resource Manager (ARM) template. You can easily find it via the Azure Marketplace:

Grafana multi-tier in Azure Marketplace

From the above page, click Create to start deploying the template. You will get a series of straightforward questions such as the resource group, the Grafana admin password, MariaDB admin password, virtual machine size, etc…

It will take about half an hour to deploy the template. When finished, you will find the following resources in the resource group you chose or created during deployment:

Deployed Grafana resources

Let’s take a look at the deployed resources. The database back-end is Azure Database for MariaDB server. The deployment uses a General Purpose, 2 vCore, 50GB database. The monthly cost is around €130.

The Grafana VMs are Standard D1 v2 virtual machines (can be changed). These two machine cost around €100 per month. By default, these virtual machines have a public IP that allows SSH access on port 22. To logon, use the password or public key you configured during deployment.

To access the Grafana portal, Bitnami used an Azure Application Gateway. They used the Standard tier (not WAF) with the Medium SKU size and three nodes. The monthly cost for this setup is around €140.

The public IP address of the front-end can be found in the list of resources (e.g. in my case, mygrafanaagw-ip). The IP address will have an associated DNS name in the form of
mygrafanaRANDOMTEXT-agw-dns.westeurope.cloudapp.azure.com. Simply connect to that URL to access your Grafana instance:

Grafana instance (after logging on and showing a simple dashboard

Naturally, you will want to access Grafana over SSL. That is something you will need to do yourself. For more information see this link.

It goes without saying that the template only takes care of deployment. Once deployed, you are responsible for the infrastructure! Security, backup, patching etc… is your responsibility!

Note that the template does not allow you to easily select the virtual network to deploy to. By default, the template creates a virtual network with address space 10.0.0.0/16. If you got some ARM templating skills, you can download the template right after validation but before deployment and modify it:

Downloading the template for modification

Conclusion

Setting up a multi-tier Grafana stack with Bitnami is very easy. Note that the cost of this deployment is around €370 per month though. Instead of deploying and managing Grafana yourself, you can also take a look at hosted offerings such as Grafana Cloud or Aiven Grafana.