How CDC Checkpoints Work

AWS DMS CDC (Change Data Capture) checkpoints work as a tracking mechanism to ensure data consistency during ongoing replication. Here's how they work and how to recover from failures:

Checkpoint Creation:
- DMS periodically records its progress position in the source database's transaction logs
- For Oracle, this is the SCN (System Change Number)
- For PostgreSQL, this is the LSN (Log Sequence Number)
- For MySQL/MariaDB, this is the binary log position
Checkpoint Storage:
- These checkpoints are stored internally by DMS in a special control table
- DMS automatically manages these checkpoints behind the scenes
- They're used to track which transactions have been processed
Consistency Guarantee:
- Checkpoints ensure that DMS doesn't miss transactions or process them twice
- DMS operates in a "exactly-once" delivery model for transactions
- Transactions are committed in the same order they occurred in the source

Recovering from CDC Failures

When a DMS CDC task fails, you have several recovery options:

Automatic Restart:

Simply restart the task
DMS will automatically resume from the last checkpoint

Example command:

aws dms start-replication-task \  --replication-task-arn your-task-arn \  --start-replication-task-type resume-processing

Start from a Specific Checkpoint:

You can explicitly specify a CDC start position
Useful when you need to reprocess transactions from a particular point

For Oracle example:

aws dms modify-replication-task \  --replication-task-arn your-task-arn \  --cdc-start-position "Oracle SCN:6916533"aws dms start-replication-task \  --replication-task-arn your-task-arn \  --start-replication-task-type start-replication

Start from a Timestamp:

You can specify a point in time to resume CDC
DMS will identify the corresponding transaction log position

Example:

aws dms modify-replication-task \  --replication-task-arn your-task-arn \  --cdc-start-time "2023-03-15T14:30:00Z"aws dms start-replication-task \  --replication-task-arn your-task-arn \  --start-replication-task-type start-replication

Common Recovery Challenges and Solutions

Transaction Log Purging:
- Problem: Source database has purged logs needed for recovery
- Solution:
  - Perform a new full load with CDC
  - Configure longer log retention on source systems
  - Set up regular checkpoint validation
Large Backlogs:
- Problem: DMS falls far behind, with millions of changes to process
- Solution:
  - Scale up the replication instance
  - Consider batch change processing settings
  - Monitor CDC latency metrics in CloudWatch
Connectivity Issues:
- Problem: Network problems cause intermittent failures
- Solution:
  - Increase retry settings
  - Implement VPC endpoints for more reliable connectivity
  - Check security group and network ACL settings
Data Definition Language (DDL) Changes:
- Problem: Schema changes cause replication failures
- Solution:
  - Configure how DMS handles DDL changes
  - Test schema changes in non-production environments first
  - Consider stopping and reconfiguring tasks for major schema changes

The key to successful CDC recovery is proper monitoring and ensuring your source database retains transaction logs long enough for recovery to be possible. For critical systems, regularly testing recovery procedures and monitoring source database log retention is essential.

PreviousDMS NextOracle to PostgreSQL Time-Window Data Reload Implementation Guide

Last updated 3 months ago

Was this helpful?

aws dms modify-replication-task \ --replication-task-arn your-task-arn \ --cdc-start-position "Oracle SCN:6916533"aws dms start-replication-task \ --replication-task-arn your-task-arn \ --start-replication-task-type start-replication

aws dms modify-replication-task \ --replication-task-arn your-task-arn \ --cdc-start-time "2023-03-15T14:30:00Z"aws dms start-replication-task \ --replication-task-arn your-task-arn \ --start-replication-task-type start-replication