Azure SQL Database High Availability

Creating a SQL Database in Azure is a simple matter: create the database and server, get your connection string and off you go! Before starting though, spend some time thinking about the level of high availability (HA) that you want:

  • What is the required level of HA within the deployment region (e.g. West Europe)?
  • Do you require failover to another region (e.g. from West Europe to North Europe)?

HA in a single region

To achieve the highest level of availability in a region, do the following:

  • Use the Premium (DTU) or Business Critical tier (vCore): Azure will use the premium availability model for your database
  • Enable Availability Zone support if the region supports it: copies of your database will be spread over the zones

The diagram below illustrates the premium availability model (from the Microsoft docs):

Premium Availability Model

The region will contain one primary read/write replica and several secondary replicas. Each replica uses local SSD storage. The database is replicated synchronously and failover is swift and without data loss. If required, you can enable a read replica and specify you want to connect to the read replica by adding ApplicationIntent=ReadOnly to the connection string. Great for reporting and BI!

Spreading the databases over multiple zones is as simple as checking a box. Availability zone support comes at no extra cost but can increase the write commit latency because the nodes are a few kilometers apart. The option to enable zone support is in the Configure section as shown below:

Enabling zone redundancy

To read more about high availability, including the standard availability model for other tiers, check out the docs.

For critical applications, we typically select the Premium/Business Critical model as it provides good performance coupled to the highest possible availability in a region.

Geo-replication

The geo-replication feature replicates a database asynchronously to another region. Suppose you have a database and server in West Europe that you want to replicate to France Central. In the portal, navigate to the database (not the server) and select Geo-Replication. Then check the region, in this case France Central. The following questions show up:

Geo-Replication

A database needs a logical server object that contains the database. To replicate the database to France Central, you need such a server object in that region. The UI above allows you to create that server.

Note that the databases need to use the same tier although the secondary can be configured with less DTUs or vCores. Doing so is generally not recommended.

After configuration, the UI will show the active replication. In this case, I am showing replication from North Europe to West Europe (and not France Central):

Geo-replication is easy to configure but in practice, we recommend to use Failover Groups. A Failover Group uses geo-replication under the hood but gives you some additional features such as:

  • Automated failover (vs manual with just geo-replication)
  • Two servers to use in your connection string that use CNAMEs that are updated in case of failover; one CNAME always points to the read/write replica, the other to the read replica

Failover groups are created at the server level instead of the database level:

Failover group

Below, there is a failover group aks-fo with the primary server in North Europe and the secondary in West Europe:

Failover group details

You can manually fail over the database if needed:

Failover and forced failover

Failover allows you to failover the database without data loss if both databases are still active. Forced failover performs a failover even if the primary is down, which might lead to data loss.

Note: when you configure a Failover Group, geo-replication will automatically be configured for you.

Connecting from an application

To connect to a database configured in a failover group, first get the failover group server names:

Read/write and read-only listener endpoints

Next, in your application, use the appropriate connection string. For instance, in Go:

var sqldb *sql.DB
var server = "aks-fo.database.windows.net"
var port = 1433
var user = "USERNAME"
var password = "PASSWORD"
var database = "DBNAME"

func init() {
    // Build connection string
    connString := fmt.Sprintf("server=%s;user id=%s;password=%s;port=%d;database=%s;",
        server, user, password, port, database)

    var err error

    // Create connection pool
    sqldb, err = sql.Open("sqlserver", connString)
    if err != nil {
        log.Fatal("Error creating connection pool: ", err.Error())
    }
    ctx := context.Background()

    //above commands actually do not connect to SQL but the ping below does
    err = sqldb.PingContext(ctx)
    if err != nil {
        log.Fatal(err.Error())
    }
    log.Printf("Connected!\n")
}

During a failover, there will be an amount of time that the database is not available. When that happens, the connection will fail. The error is shown below:

[db customers]: invalid response code 500, body: {"name":"fault","id":"7JPhcikZ","message":"Login error: mssql: Database 'aksdb' on server 'akssrv-ne' is not currently available.  Please retry the connection later.  If the problem persists, contact customer support, and provide them the session tracing ID of '6D7D70C3-D550-4A74-A69C-D689E6F5CDA6'.","temporary":false,"timeout":false,"fault":true}

Note: the Failover Group uses a CNAME record aks-fo.database.windows.net which resolves to the backend servers in either West or North Europe. Make sure you allow connections to these servers in the firewall or you will get the following error:

db customers]: invalid response code 500, body: {"name":"fault","id":"-p9TwZkm","message":"Login error: mssql: Cannot open server 'akssrv-ne' requested by the login. Client with IP address 'IP ADDRESS' is not allowed to access the server.  To enable access, use the Windows Azure Management Portal or run sp_set_firewall_rule on the master database to create a firewall rule for this IP address or address range.  It may take up to five minutes for this change to take effect.","temporary":false,"timeout":false,"fault":true} 

Conclusion

For the highest level of availability, use the regional premium availability model with Availability Zone support. In addition, use a Failover Group to enable replication of the database to another region. A Failover Group automatically connects your application to the primary (read/write) or secondary replica (read) via a CNAME and can failover automatically after some grace period.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s