Perform database failover with SQL Managed Instance Availability Groups
The following README will guide you on how to explore and test Azure Arc-enabled SQL Managed Instance Availability Groups, simulate failures and DB replication. In this scenario, you will be restoring a sample database, will initiate a failover to force HA event as well as validating database replication across multiple SQL nodes in an availability group.
Note: This guide assumes you already deployed a Azure Arc-enabled SQL Managed Instance on Azure Kubernetes Service (AKS). If you haven’t, this following bootstrap Jumpstart scenario offers you a way to do so in an automated fashion. All the steps and operations described in this readme assume you used the mentioned bootstrap Jumpstart scenario and have the Client VM deployed as part of it.
Note: Azure Arc-enabled SQL Managed Instance with Availability Groups is currently in preview.
Deployed Kubernetes Resources
When deploying Azure Arc-enabled SQL Managed Instance in an availability group, multiple Kubernetes resources are created to support it. The below section describes the main ones that are important to understand for this scenario.
SQL MI Pods Replicas
Three SQL pods replicas will be deployed to assemble the availability group. These can be seen using the
kubectl get pods -n <deployment namespace> -o wide command, for example,
kubectl get pods -n arc -o wide.
It is also important to highlight that Kubernetes will spread the pods across the various nodes in the cluster.
Services & Endpoints
An external endpoint is automatically provisioned for connecting to databases within the availability group. This endpoint plays the role of the availability group listener.
In an availability group deployment, two endpoints, primary and secondary get created, both backed by a Kubernetes Service resource with a type of LoadBalancer.
az sql mi-arc show -n jumpstart-sql --k8s-namespace arc --use-k8scommand, validate the deployment endpoints details and the Availability Group health status.
Note: Initiating the command will also deploy az sql Azure CLI extension automatically.
kubectl get svc -n arccommand, you will be able to see the LoadBalancer services used by the endpoints.
In order for you to test the HA functionality, a database restore RestoreDB PowerShell script is provided. The script will restore the AdventureWorks2019 sample database directly onto the primary SQL node pod container. From the C:\Temp folder, run the script using the
All databases are automatically added to the availability group, including all users (including the AdventureWorks2019 database you just restored) and system databases like master and msdb. This capability provides a single-system view across the availability group replicas.
In addition to restoring the AdventureWorks2019 database, the script will also create a new text file and a desktop shortcut named Endpoints that includes both the primary and the secondary SQL endpoints.
Open Microsoft SQL Server Management Studio (SSMS) which is installed automatically for you as part of the bootstrap Jumpstart scenario and use the primary endpoint IP address and login to the primary DB instance.
Use the username and password you entered when provisioned the environment and select “SQL Server Authentication”. Alternatively, you can retrieve the username and password using the
Follow the same process and connect to the secondary endpoint.
On both endpoints, expand the “Databases” and the “Always On High Availability” sections and see how the AdventureWorks2019 database is already automatically replicated and is part of the availability group.
To test that the DB replication is working, a simple table modification is needed. For this example, on the primary replica, expand the “Tables” section for the database, select the “HumanResources.Employee” table, click on “Edit Top 200 Rows”, modify one or more records and commit the changes by saving (
Ctrl+S). As you can see, in this example a change was made to “ken0” title and the number of vacation hours for “rob0”.
On the secondary replica, expand the “Tables” section for the database, click on “Select Top 1000 Rows”, and in the Results pane see how the table change is now replicated, showing the synchronization of the SQL instances in the availability group works as expected.
As you already know, the availability group includes three Kubernetes replicas with a primary and two secondaries with all CRUD operations for the availability group are managed internally, including creating the availability group or joining replicas to the availability group created.
To test that failover between the replicas, we will simulate a “crash” that will trigger an HA event and will force one of the secondary replicas to get promoted to a primary replica. Open two side-by-side PowerShell sessions. On the left side session, use the
kubectl get pods -n arcto review the deployed pods. The right-side session will be used to monitor the pods on the cluster using the
kubectl get pods -n arc -wcommand. As you can see, three SQL replicas with four containers each are running.
Note: As mentioned under the “Known Issues” in the bootstrap Jumpstart scenario, Webhook pods error state can be safely ignored and do not impact the functionality of Azure Arc-enabled data services and the Jumpstart automation.
In SSMS, you can also see that jumpstart-sql-0 is acting as the primary replica and jumpstart-sql-1 as the secondary. At this point, close SSMS.
To trigger the HA event, delete the primary replica jumpstart-sql-0 using the
kubectl delete pod jumpstart-sql-0 -n arcand watch how the pod gets deleted and then being deployed again due to being part of a Kubernetes ReplicaSet. Wait for the jumpstart-sql-0 pod to become ready again (and an additional few minutes for letting the availability group to recover).
Re-open SSMS and connect back to the primary endpoint. You can now see that jumpstart-sql-0 is now acting as the secondary replica and jumpstart-sql-2 was promoted to primary. In addition, run the
az sql mi-arc show -n jumpstart-sql --k8s-namespace arc --use-k8scommand again and check the health status of the availability group.
Re-Validating Database Replication
Now that we perform a successful failover, we can re-validate and make sure replication still works as expected. In SSMS, re-add the secondary endpoint connection.
In the primary endpoint connection, repeat the process of performing a change on the AdventureWorks2019 database “HumanResources.Employee” table and check that replication is working. In the example below, you can see how new values in new rows are now replicated.