Querying Postgres with GraphQL

I wanted a quick and easy way to build an API that retrieves the ten latest events from a stream of data sent to a TimescaleDB hypertable. Since such a table can be queried by any means supported by Postgres, I decided to use Postgraphile, which automatically provides a GraphQL server for a database.

If you have Node.js installed, just run the following command:

npm install -g postgraphile

Then run the following command to start the GraphQL server:

postgraphile -c "postgres://USER@SERVER:PASSWORD@SERVER.postgres.database.azure.com/DATABASE?ssl=1" --simple-collections only --enhance-graphiql

Indeed, I am using Azure Database for PostgreSQL. Replace the strings in UPPERCASE with your values. I used simple-collections only to, eh, only use simple collections which makes it, well, simpler. 👏👏👏

Note: the maintainer of Postgraphile provided a link to what simple-collections actually does; take a look there for a more thorough explanation 😉

The result of the above command looks like the screenshot below:

GraphQL Server started

You can now navigate to http://localhost:5000/graphiql to try some GraphQL queries in an interactive environment:

GraphiQL, enhanced with the –enhance-graphiql flag when we started the server

In the Explorer to the left, you can easily click the query together. In this case, that is easy to do since I only want to query a single table an obtain the last ten events for a single device. The resulting query looks like so:

{
allConditionsList(condition: {device: "pg-1"}, orderBy: TIME_DESC, first: 10) {
time
device
temperature
}
}

allConditionsList gets created by the GraphQL server by looking at the tables of the database. Indeed, my database contains a conditions table with time, device, temperature and humidity columns.

To finish off, let’s try to obtain the data with a regular POST method to http://localhost:5000/graphql. This is the command to use:

curl -X POST -H “Content-Type: application/json” -d ‘{“query”:”{\n allConditionsList(condition: {device: \”pg-1\”}, orderBy: TIME_DESC, first: 10) {\n time\n device\n temperature\n }\n}\n”,”variables”:null}’ http://localhost:5000/graphql

Ugly but it works. To be honest, there is some noise in the above command because of the \n escapes. They are the result of me grabbing the body from the network traffic sent by GraphiQL:

Yes, lazy me grabbing the request payload from GraphiQL and not cleaning it up 😉

There is much, much, much more you can do with GraphQL in general and PostGraphile in particular but this was all I needed for now. Hopefully this can help you if you have to throw something together quickly. In a production setting, there is of course much more to think about: hosting the API (preferably in a container), authentication, authorization, performance, etc…

Further improvements to the IoT Hub to TimescaleDB Azure Function

In the post Improving an Azure Function that writes IoT Hub data to TimescaleDB, we added some improvements to an Azure Function that uses the Event Hub trigger to write messages from IoT Hub to TimescaleDB:

  • use of the Event Hub enqueuedTime timestamp instead of NOW() in the INSERT statement (yes, I know, using NOW() did not make sense 😉)
  • make the code idempotent to handle duplicates (basically do nothing when a unique constraint is violated)

In general, I prefer to use application time (time at the event publisher) versus the time the message was enqueued. If you don’t have that timestamp, enqueuedTime is the next best thing.

How can we optimize the function even further? Read on about the cardinality setting!

Event Hub trigger cardinality setting

Our JavaScript Azure Function has its settings in function.json. For reference, here is its content:

{
"bindings": [
{
"type": "eventHubTrigger",
"name": "IoTHubMessages",
"direction": "in",
"eventHubName": "hub-pg",
"connection": "EH",
"cardinality": "one",
"consumerGroup": "pg"
}
]
}

Clearly, the function uses the eventHubTrigger for an Event Hub called hub-pg. In connection, EH refers to an Application Setting which contains the connections string to the Event Hub. Yes, I excel at naming stuff! The Event Hub has defined a consumer group called pg that we are using in this function.

The cardinality setting is currently set to “one”, which means that the function can only process one message at a time. As a best practice, you should use a cardinality of “many” in order to process batches of messages. A setting of “many” is the default.

To make the required change, modify function.json and set cardinality to “many”. You will also have to modify the Azure Function to process a batch of messages versus only one:

Processing batches of messages

With cardinality set to many, the IoTHubMessages parameter of the function is now an array. To retrieve the enqueuedTime from the messages, grab it from the enqueuedTimeUtcArray array using the index of the current message. Notice I also switched to JavaScript template literals to make the query a bit more readable.

The number of messages in a batch is controlled by maxBatchSize in host.json. By default, it is set to 64. Another setting,prefetchCount, determines how many messages are retrieved and cached before being sent to your function. When you change maxBatchSize, it is recommended to set prefetchCount to twice the maxBatchSize setting. For instance:

{
"version": "2.0",
"extensions": {
"eventHubs": {
"batchCheckpointFrequency": 1,
"eventProcessorOptions": {
"maxBatchSize": 128,
"prefetchCount": 256
}
}
}
}

It’s great to have these options but how should you set them? As always, the answer is in this book:

Afbeeldingsresultaat voor it depends joke

A great resource to get a feel for what these settings do is this article. It also comes with a Power BI report that allows you to set the parameters to see the results of load tests.

Conclusion

In this post, we used the function.json cardinality setting of “many” to process a batch of messages per function call. By default, Azure Functions will use batches of 64 messages without prefetching. With the host.json settings of maxBatchSize and prefetchCount, that can be changed to better handle your scenario.