Perform database failover with SQL Managed Instance Availability Groups

The following Jumpstart scenario will guide you on how to explore and test Azure Arc-enabled SQL Managed Instance Availability Groups, simulate failures and DB replication. In this scenario, you will be restoring a sample database, will initiate a failover to force HA event as well as validating database replication across multiple SQL nodes in an availability group.

NOTE: This guide assumes you already deployed a Azure Arc-enabled SQL Managed Instance on Azure Kubernetes Service (AKS). If you haven’t, this following bootstrap Jumpstart scenario offers you a way to do so in an automated fashion. All the steps and operations described in this scenario assume you used the mentioned bootstrap Jumpstart scenario and have the Client VM deployed as part of it.

Deployed Kubernetes Resources

When deploying Azure Arc-enabled SQL Managed Instance in an availability group, multiple Kubernetes resources are created to support it. The below section describes the main ones that are important to understand for this scenario.

SQL MI Pods Replicas

Three SQL pods replicas will be deployed to assemble the availability group. These can be seen using the kubectl get pods -n <deployment namespace> -o wide command, for example, kubectl get pods -n arc -o wide. It is also important to highlight that Kubernetes will spread the pods across the various nodes in the cluster.

SQL pods

Services & Endpoints

An external endpoint is automatically provisioned for connecting to databases within the availability group. This endpoint plays the role of the availability group listener.

In an availability group deployment, two endpoints, primary and secondary get created, both backed by a Kubernetes Service resource with a type of LoadBalancer.

  • Using the az sql mi-arc show -n jumpstart-sql --k8s-namespace arc --use-k8s command, validate the deployment endpoints details and the Availability Group health status.

    az sql Azure CLI extension

    az sql mi-arc show command

    NOTE: Initiating the command will also deploy az sql Azure CLI extension automatically.

  • Using the kubectl get svc -n arc command, you will be able to see the LoadBalancer services used by the endpoints.

    Kubernetes services

Database Restore

In order for you to test the HA functionality, a database restore RestoreDB PowerShell script is provided. The script will restore the AdventureWorks2019 sample database directly onto the primary SQL node pod container. From the C:\Temp folder, run the script using the .\RestoreDB.ps1 command.

RestoreDB script

Database Replication

All databases are automatically added to the availability group, including all users (including the AdventureWorks2019 database you just restored) and system databases like master and msdb. This capability provides a single-system view across the availability group replicas.

  • In addition to restoring the AdventureWorks2019 database, the script will also create a new text file and a desktop shortcut named Endpoints that includes both the primary and the secondary SQL endpoints.

    Endpoints desktop shortcut

    Endpoints text file

  • Open Microsoft SQL Server Management Studio (SSMS) which is installed automatically for you as part of the bootstrap Jumpstart scenario and use the primary endpoint IP address and login to the primary DB instance using the username and password provided in the text file mentioned above.

    Microsoft SQL Server Management Studio

  • Use the username and password you entered when provisioned the environment and select “SQL Server Authentication”. Alternatively, you can retrieve the username and password using the $env:AZDATA_USERNAME and $env:AZDATA_PASSWORD commands.

    SSMS login

    Primary endpoint connected

  • Follow the same process and connect to the secondary endpoint.

    Connect button

    Secondary endpoint connected

  • On both endpoints, expand the “Databases” and the “Always On High Availability” sections and see how the AdventureWorks2019 database is already automatically replicated and is part of the availability group.

    Databases replication

  • To test that the DB replication is working, a simple table modification is needed. For this example, on the primary replica, expand the “Tables” section for the database, select the “HumanResources.Employee” table, click on “Edit Top 200 Rows”, modify one or more records and commit the changes by saving (Ctrl+S). As you can see, in this example a change was made to “ken0” title and the number of vacation hours for “rob0”.

    Expending database for primary

    Edit Top 200 Rows

    Modifying a table

  • On the secondary replica, expand the “Tables” section for the database, click on “Select Top 1000 Rows”, and in the Results pane see how the table change is now replicated, showing the synchronization of the SQL instances in the availability group works as expected.

    Expending database for secondary

    Select Top 1000 Rows

    Replication works

Database Failover

As you already know, the availability group includes three Kubernetes replicas with a primary and two secondaries with all CRUD operations for the availability group are managed internally, including creating the availability group or joining replicas to the availability group created.

  • To test that failover between the replicas, we will simulate a “crash” that will trigger an HA event and will force one of the secondary replicas to get promoted to a primary replica. Open two side-by-side PowerShell sessions. On the left side session, use the kubectl get pods -n arc to review the deployed pods. The right-side session will be used to monitor the pods on the cluster using the kubectl get pods -n arc -w command. As you can see, three SQL replicas with four containers each are running.

    side-by-side PowerShell sessions

  • In SSMS, you can also see that jumpstart-sql-0 is acting as the primary replica and jumpstart-sql-1 and jumpstart-sql-2 are the secondary. At this point, close SSMS.

    Primary and secondary replicas

  • To trigger the HA event, delete the primary replica jumpstart-sql-0 using the kubectl delete pod jumpstart-sql-0 -n arc and watch how the pod gets deleted and then being deployed again due to being part of a Kubernetes ReplicaSet. Wait for the jumpstart-sql-0 pod to become ready again (and an additional few minutes for letting the availability group to recover).

    Pod deletion

  • Re-open SSMS and connect back to the previous secondary endpoint. You can now see that jumpstart-sql-0 is now acting as the secondary replica and jumpstart-sql-2 was promoted to primary. In addition, run the az sql mi-arc show -n jumpstart-sql --k8s-namespace arc --use-k8s command again and check the health status of the availability group.

    NOTE: It might take a few minutes for the availability group to return to an healthy state.

    Successful failover

    Availability group health

Re-Validating Database Replication

  • Now that we perform a successful failover, we can re-validate and make sure replication still works as expected. In SSMS, re-add the second instance.

    Re-adding instance

  • In the primary endpoint connection, repeat the process of performing a change on the AdventureWorks2019 database “HumanResources.Employee” table and check that replication is working. In the example below, you can see how new values in new rows are now replicated.

    Successful replication