If data has moved or changed at source, there are several methods to capture this:
Full Refresh and its related issues
As a way to update changes in source data you could opt for a full refresh of the data in the destination. This is horribly time consuming and a single point of failure (don’t say you weren’t warned!). Organisations usually rely on a night batch load for refreshing their data from sources as a full refresh. If there is a spike in data, or network or infrastructure issues, the full extract fails, leaving the business with no reports the next day. Trying to do this during the day will have your people pulling out their hair in frustration as it overloads the source and disrupts normal business workloads. 6 reasons to automate your Data Pipeline
Hence the next window is again nightly and this can be a nightmare that compounds. Successful Data Ingestion (What You Need to Know)
Issues with using timestamps or incrementing sequences:
Timestamps:
Organisations sometimes try to solve data updating requirements by using record create or update dates on the record – assuming that the source even logs these. In most cases, these dates are populated incorrectly, leading to incorrect data being extracted or more importantly being completely missed. There is a high probability that updates do not change the update date for the records and hence are not taken into account. But by far the biggest failing of this approach is that deletes can never be captured – leading to inconsistent, incorrect data on the destination. Debezium CDC Explained and a Great Alternative CDC Tool
Incrementing sequences:
These can only capture inserts. Updates and deletes can never be captured. And even though organizations maintain that they never update or delete their records, they will occasionally wreak havoc on the downstream systems due to the architectural approach being adopted.
GoldenGate CDC and a better alternative
The case for log based Change Data Capture
Log based Change Data Capture is by far the most enterprise grade mechanism to get access to your data from database sources. It has zero impact on the source and data can be extracted real-time or at a scheduled frequency, in bite-size chunks and hence there is no single point of failure. Change data capture technology gets data from database logs and gets only the deltas. But MOST importantly, it can capture all inserts, updates and deletes, making the data trustworthy at the destination. Postgres CDC (6 Easy Methods to Capture Data Changes)
However all CDC tools are not created equal
Using Change Data Capture technology is optimal but be warned – a lot of software claim they do CDC that will replicate incremental changes to the destination but do not make the cut when it comes to some key points. About Kafka CDC and Oracle CDC to Kafka
Speed: Is the data replication tool really, really fast?
Your data replication tool needs to replicate data in near real-time so your data is relevant and has value. BryteFlow is 6x faster than Oracle GoldenGate and even faster than some other competitors. BryteFlow Ingest replicated one million records (approx. 1 GB) in just 30 seconds at a client trial. It is a data replication tool that uses log based CDC and is unique in this regard were the throughput outpaces most competition and needs minimal security privileges. You are assured of low impact, high throughput and guaranteed delivery of data.
An Alternative to Matillion and Fivetran for SQL Server to Snowflake Migration
Data verification: Can you TRUST your replicated data?
Many replication tools offer CDC but not the required data reconciliation. How would anyone know if their huge datasets landed in the destination incomplete or had errors if they weren’t alerted? You will only find out months later the replicated data wasn’t accurate or had parts missing.
BryteFlow is probably the only data integration software that provides data replication with data reconciliation. BryteFlow TruData reconciles data with source and verifies the data for completeness, providing alerts if data is missing. Cloud Migration (Challenges, Benefits and Strategies)