SQL Server Change Data Capture (CDC) for real-time SQL Server Replication

What is SQL Server CDC?

SQL Server Change Data Capture or CDC is a way to capture all changes made to a Microsoft SQL Server database. Any inserts, updates or deletes made to any of the tables made in a specified time window are captured. Changes to the table structures are also captured, making this information available for SQL Server replication purposes.CDC (change data capture) reads the transaction logs to find data changes and records the changes in a separate table or database. This has a small overhead on server resources, it has no impact on the tables themselves. There are no locks, reads, or anything to block or slow down transactions.

SQL CDC is an integral part of SQL Server replication

When data needs to be transferred or replicated to data lakes or data warehouses from a SQL Server database, the SQL CDC functionality is especially useful as it helps in replicating incremental data efficiently, so that the data lake/ warehouse is always up to date and is a centralised place for data access from different SQL Server repositories.

SQL Server is a relational database management system developed by Microsoft. Its primary function is to store and retrieve data efficiently and is generally used by applications for data storage and retrieval. SQL Server versions 2008 onwards support Change Data Capture or CDC.

SQL Server Change Data Capture or CDC – points to note:

  • You need sysadmin privileges to turn this functionality on
  • Web, Express and Standard editions of SQL Server do not support this functionality
  • You need to enable CDC at database level AND  at the table level for all tables that need to be tracked
  • You need to start the SQL Server Agent.

These steps are explained in depth below:

To enable Change Data Capture (CDC)

1. At DB level:

Use <databasename>;
EXEC sys.sp_cdc_enable_db;

For Example:

Use Adventureworks2019;
EXEC sys.sp_cdc_enable_db;

2. At table level:

USE <databasename>
GO
EXEC sys.sp_cdc_enable_table
@source_schema = ‘<schema_name>’,
@source_name = ‘<table_name>’,
@role_name = null,
@supports_net_changes = 0;

For Example:
USE Adventureworks2019
GO
EXEC sys.sp_cdc_enable_table
@source_schema = ‘dbo’,
@source_name = ‘DimCustomer’,
@role_name = null,
@supports_net_changes = 0;

Checks to see if CDC has been enabled:

1. Check if CDC has been enabled at database level

USE master
GO

select name, is_cdc_enabled
from sys.databases
where name = ‘<databasename>’
GO

For example:

USE Adventureworks2019
GO

select name, is_cdc_enabled
from sys.databases where name = ‘Adventureworks2019’
GO

Results:

name is_cdc_enabled 
Adventureworks2019 1

If is_cdc_enabled is 0 then CDC is not enabled for the database
If is_cdc_enabled is 1 then CDC is already enabled for the database name

2. Check if CDC is enabled at the table level

USE databasename
GO
select name,type,type_desc,is_tracked_by_cdc
from sys.tables
where name = ‘<table_name>’
GO

For example:

USE Adventureworks2019
GO
select name,type,type_desc,is_tracked_by_cdc
from sys.tables
where name = ‘DimCustomer’
GO

Results:

name type type_desc is_tracked_by_cdc
DimCustomer U USER_TABLE 1

If is_tracked_by_cdc = 1 then CDC is enabled for the table
If is_tracked_by_cdc = 0 then CDC is not enabled for the table

3. Check if the SQL Server Agent has been started

Use Microsoft configuration Manager to turn on the SQL Server Agent.
The SQL Server Agent should be running.

1. To start this process on your SQL Server, launch SQL Server Configuration Manager.

Description: How to Enable SQL Server Agent Service-SQL Server Configuration Manager

2. Right-click the SQL Server Agentservice and click Properties.

Description: How to Enable SQL Server Agent Service-Properties

3. On the Properties Window, select an appropriate account. Here we are using the Local System account as an example. Then click on the Service tab.

Description: How to Enable SQL Server Agent Service-Log On

4. Change the Start Mode to Automatic and then click OK to close the window.

Description: How to Enable SQL Server Agent Service-Start Mode

5. Right-click the SQL Server Agentservice and click Start.

Description: How to Enable SQL Server Agent Service-Start

6. With that last step completed, the SQL Server Agentservice is ready to be used.

Description: How to Enable SQL Server Agent Service-Running

SQL Server Change Data Capture: Pros and Cons

Challenges

  • A limitation that we see with the SQL Server Change Data Capture functionality is that it uses I/O and has a small overhead on server resources; however, it has no impact on the tables themselves. There are no locks, reads, or anything to block or slow down transactions. There are several other mechanisms that support real-time data replication from SQL Server. When the underlying tables have large volumes of DML activity, SQL Server CDC may not be the optimum solution.
  • It is harder to set up than other mechanisms and may need some on-going intervention by the DBAs for transaction logs for transactional replication.
  • Also CDC functionality is not supported by Web, Express and Standard editions of SQL Server

Benefits

  • The benefit with SQL Server Change Data Capture is that most replication mechanisms from SQL Server require the tables to have a primary key. CDC will support tables without a primary key and this is the use case it is most useful for, as incremental data and real-time replication can be achieved without primary keys on the tables to be replicated.
  • The CDC mechanism can be used for Always On Availability Groups on SQL Server as CDC data is replicated to the replicas.

SQL Server Change Data Capture limitations or challenges with custom coding

  • Though the native SQL Server CDC functionality helps to capture changes, there is significant coding to be done, to capture these changes and use them for SQL replication to a Data Warehouse or Data Lake.
  • Not only does reliability and robustness need to be catered for but on-going maintenance, data issues and handling new tables and data types becomes an on-going issue for enterprises.
  • The alerting and monitoring solution needs to be hand coded as well, and when you look at the different components required to get this working, a hand coded solution does not offer quick time to value and can be expensive to develop and maintain.

The No Code, automated way to capture changes from SQL Server is by using BryteFlow

  • BryteFlow is a GUI based tool that helps to capture the changes from SQL Server with CDC and several other mechanisms. Read more about the solution
  • It is a no-code way of transferring your data to various data warehouses and data lakes
  • It creates the schemas automatically on the target with the optimum data types and implements best practices for the target
  • Data replicated to the target can be held automatically with SCD type 2 history if configured
  • Automated, sophisticated data reconciliation – this is another enterprise grade feature not ordinarily offered in the market
  • BryteFlow offers enterprise grade SQL server replication and watertight security for installations on-premises, on the cloud or using a hybrid approach. See how you can get SQL Server to Snowflake in just 4 easy steps.

Conclusion

In contrast to the coding-intensive approach, we offer easy, real-time SQL Server replication with CDC using BryteFlow. Just point and click to set up. Replicate large volumes of data easily with parallel threaded initial sync and partitioning followed by delta sync. Tables are created automatically with data upserted or with SCD Type 2 history. You get multiple CDC options appropriate for the SQL Server version or edition you are using (all versions/editions supported). Get ready to use data at the destination whether S3, Redshift, Snowflake, Azure Synapse or SQL Server.