DMS

Reload data from a specific time windows in the past

DMS isn't designed for time-windowed selective reloads while maintaining ongoing CDC. The challenge is that CDC works sequentially from transaction logs, and introducing historical data without creating duplicates or inconsistencies requires careful coordination.

The ReloadTables API in AWS DMS doesn't natively support reloading data for a specific time window in the past while CDC is ongoing. It's designed to perform a complete reload of the specified tables, not a targeted time-based refresh.

If this is a critical requirement, we need to implement a custom solution using a combination of DMS tasks and application-specific data reconciliation logic. To achieve this we have a few alternative approaches:

  1. Custom Solution with Staging Tables:

    • Create staging tables in your target database

    • Use a separate DMS task to load historical data (from the time window) into these staging tables

    • Write custom merge logic to incorporate this historical data maintaining the current CDC stream

    • This keeps your ongoing CDC undisturbed

  2. Adjust CDC Start Position:

    • If we want to "replay" changes from a specific point in time, we could:

      • Create a new task that starts CDC from your desired historical point

      • Direct this to a separate target schema/database

      • Merge the results after completion

  3. Backup/Snapshot Restoration:

    • Restore a backup/snapshot from your desired time point to a separate environment

    • Migrate just the data you need from this point-in-time copy

    • Merge with your current state

  4. AWS Database Migration Assessment:

    • Run data validation to identify specific records that differ between source and target

    • Selectively reload just those records rather than entire tables

Custom Solution with Staging Tables

Key Requirements:

  • No interruption to ongoing CDC replication

  • Transactional consistency during merges

  • Selective time-based data refresh

  • Full audit trail of changes

For migrating from Oracle to PostgreSQL with a time-window data reload:

Oracle to PostgreSQL Migration with Time-Window Data Reload

Oracle to PostgreSQL Migration with Time-Window Data ReloadEdited:

Oracle to PostgreSQL Migration Considerations

1. Data Type Handling

  • Oracle NUMBER types map to PostgreSQL NUMERIC

  • Oracle DATE/TIMESTAMP handling requires careful conversion

  • Added transformation rules in DMS task to handle Oracle-specific data types

  • Included a verification view to identify data type mismatches

2. Oracle-Specific DMS Settings

  • Added Oracle-specific settings in the task configuration:

    • LogMiner reader for CDC (more efficient with Oracle)

    • Character length semantics settings

    • Timestamp with timezone conversion options

    • Supplemental logging options

3. PostgreSQL Syntax Adjustments

  • Used PostgreSQL's CTE (WITH clause) syntax for efficient updates and inserts

  • Adjusted the merge procedure to use PostgreSQL's transaction handling

  • Created PostgreSQL-compatible monitoring views

4. Performance Optimizations

  • Added additional polling time in the orchestration script (Oracle migrations often take longer)

  • Added progress percentage reporting to monitor large table migrations

  • Improved transaction handling for PostgreSQL's MVCC architecture

5. Oracle-PostgreSQL Schema Compatibility

  • Added data type verification view to identify potential data format issues

  • Use of PostgreSQL-specific syntax for stored procedures

  • Handling of Oracle's case-sensitivity vs PostgreSQL's case-insensitivity

This approach allows you to selectively migrate and reload data from a specific time window in your Oracle database to PostgreSQL while maintaining your ongoing CDC replication. The solution is specifically optimized for the Oracle to PostgreSQL migration path.

Last updated

Was this helpful?