PuglieseWeb
  • Home
  • Software development
    • Cloud Data Security Principles
      • Separation of Duties (SoD)
      • Security Controls and Data Protection Framework
      • Vaultless Tokenization
    • Multi-cloud strategies
    • DMS
      • How CDC Checkpoints Work
      • Oracle to PostgreSQL Time-Window Data Reload Implementation Guide
      • Join tables separate PostgreSQL databases
      • Multi-Stage Migration Implementation Plan
      • Notes
      • Oracle Golden Gate to PostgreSQL Migration
      • Step-by-Step CDC Recovery Guide: Oracle to PostgreSQL Migration
    • AWS Pro
      • My notes
        • Data Migration Strategy
        • OpsWorks VS CloudFormation
      • Implementation Guides
        • AWS Lambda Scaling and Concurrency Optimization Guide
        • Understanding Cross-Account IAM Roles in AWS
        • HA TCP with Redundant DNS
        • Understanding 429 (Too Many Requests) & Throttling Pattern
        • EC2 Auto Scaling Log Collection Solutions Comparison
        • AWS PrivateLink Implementation Guide for Third-Party SaaS Integration
        • AWS Cross-Account Network Sharing Implementation Guide
        • Cross-Account Route 53 Private Hosted Zone Implementation Guide
          • Route 53
            • Routing Policies
              • Using a Weighted Routing Policy
              • Simple Routing Policy
              • Multivalue Answer Routing
            • Latency Routing Policy
            • Route 53 Traffic Flow
        • Direct Connect Gateway Implementation Guide
        • CICD for Lambda
        • AWS IAM Identity Center Integration with Active Directory
        • AWS Transit Gateway Multi-Account Implementation Guide
          • AWS Multi-Account Network Architecture with Infrastructure Account
      • Links
      • Cloud Adoption Framework
      • Data Stores
        • Data Store Types and Concepts in AWS
        • S3
          • Amazon S3 (Simple Storage Service)
            • Bucket Policies
          • Managing Permissions in Amazon S3
          • Amazon Glacier: AWS Archive Storage Service
          • Lab: Querying Data in Amazon S3 with Amazon Athena
          • LAB: Loading Data into a Redshift Cluster
        • Attached Storage
          • EBS
          • AWS Elastic File System (EFS): From Sun Microsystems to Modern Cloud Storage
          • AWS FSx Service Guide
          • Amazon Storage Gateway Guide
        • Databases
          • Amazon Storage Gateway Guide
          • Amazon RDS (Relational Database Service)
          • Aurora DB
          • Dynamo DB
          • Document DB
          • Amazon Redshift Overview
          • Data Pipeline
            • Data Lake VS Lake Formation
          • AWS Data Preparation Services
          • Amazon Neptune
          • Amazon ElastiCache
          • AWS Specialized Database Services
          • LAB - Deploy an Amazon RDS Multi-AZ and Read Replica in AWS
      • Networking
        • Concept
        • Basics
          • VPG
          • VPC
            • VPC endpoints
              • Interface Endpoint VS Elastic Network Interface (ENI)
            • PrivateLink
              • PrivateLink SAAS Use case
            • Transit Gateway
            • 5G Networks
            • VPN CloudHub
            • VPC security
            • VPC peering
            • VPC Endpoint
            • Route Table (and Routers)
            • Network Access Control List (NACL)
            • Network Security Group
            • NAT Gateway
              • NACL vs NAT
          • Elastic Load Balancing (ELB)
            • Gateway Load Balancer (GWLB)
          • CIDR ranges examples
          • Enhanced Networking
          • Elastic Fabric Adapter (EFA)
          • Elastic Network Interface (ENI)
        • Network to VPC Connectivity
          • Transit VS Direct Connect Gateway
          • Direct Connect
            • VIF (Virtual Interfaces)
            • VIF VS ENI
            • Customer Routers VS Customer Gateways
        • VPC-to-VPC
        • NAT & Internet Gateway
        • Routing
          • IPv4 Address Classes and Subnet Masks
          • VPC's DNS server
          • Transit VPC VS Transit Gateway
          • Example Routing tables configuration
          • Cross-regions failover
          • Loopback
        • Enhanced Networking
        • Hybrid and Cross-Account Networking
        • AWS Global Accelerator
        • Route 53
        • Cross-Account Route 53
        • CloudFront SSL/TLS and SNI Configuration
        • ELB
        • Lab: Creating a Multi-Region Network with VPC Peering Using SGs, IGW, and RTs
        • LAB - Creating a CloudFront Distribution with Regional S3 Origins
        • Lab: Creating and Configuring a Network Load Balancer in AWS
        • Lab: Troubleshooting Amazon EC2 Network Connectivity
        • Lab: Troubleshooting VPC Networking
      • Security
        • Cloud Security
          • IAM
            • SCIM
            • Use case 1
          • Core Concepts of AWS Cloud Security
            • OAuth VS OpenID Connect
          • Understanding User Access Security in AWS Organizations
          • Exploring Organizations
          • Controlling Access in AWS Organizations
            • SCP (Service Control Policy) implementation types
        • Network Controls and Security Groups
          • Firewalls
            • Network Controls and Security Groups Overview
          • AWS Directory Services
          • AWS Identity and Access Management (IAM) and Security Services
            • ASW Identity Sources
          • AWS Resource Access Manager (RAM): Cross-Account Resource Sharing
            • AWS App Mesh
        • Encryption
          • History and Modern Implementation of Encryption in AWS
          • Secret Manager
          • DDoS Attacks and AWS Protection Strategies: Technical Overview
          • AWS Managed Security Services Overview
          • IDS and IPS
          • AWS Service Catalog
      • Migrations
        • Migration Concepts
          • Hybrid Cloud Architectures
          • Migration Strategies
        • Migration Application
          • Services and Strategies
          • AWS Data Migration Services
          • Network Migrations and Cutovers
            • Network and Broadcast Addresses
            • VPC DNS
          • AWS Snow Family
      • Architecting to scale
        • Scaling Concepts and Services
          • Auto-Scaling
          • Compute Optimizer
          • Kinesis
          • DynamoDB Scaling
          • CloudFront Part Duex
            • CloudFront's Behavior
            • Lambda@Edge and CloudFront Functions
        • Event-Driven Architecture
          • SNS and Fan-out Architecture
            • SNS & outbox pattern
          • AWS Messaging Services: SQS and Amazon MQ
          • Lab: Scaling EC2 Using SQS
          • Lambda
          • Scaling Containers in AWS
          • Step Function and Batch
          • Elastic MapReduce
          • AWS Data Monitoring and Visualization Services
      • Business Continuity
        • AWS High Availability and Disaster Recovery
        • AWS Disaster Recovery Architectures
        • EBS Volumes
        • AWS Compute Options for High Availability
        • AWS Database High Availability Options
        • AWS Network High Availability Options
        • Lab: Connect Multiple VPCs with Transit Gateway
        • Deployment and Operations Management
          • Software Deployment Strategies
            • AWS CI/CD
            • Elastic Beanstalk
              • Elastic Beanstalk and App Runner
            • CloudFormation
            • Cross-Account Infrastructure Deployment
              • Example Code Pipeline
            • AWS Container Services
            • AWS API Gateway
            • LAB: Understanding CloudFormation Template Anatomy
          • Management Tool
            • Config and OpsWorks
            • System Manager
            • Enterprise Apps
            • AWS Machine Learning Landscape
            • AWS IoT Services
      • Cost Management and Optimization
        • Concepts
        • AWS Cost Optimization Strategies
        • AWS Tagging and Resource Groups
        • Managing Costs Across AWS Accounts
        • AWS Instance Purchasing Options
        • AWS Cost Management Tools
      • Others
        • SCPs vs AWS Config
        • Questions notes
        • Comparison of Deployment Strategies in AWS
        • Bedrock vs EMR
        • Software Deployment Strategies
    • AWS
      • Others
        • AWS Example architectures
          • Gaming application
          • Digital Payment System
            • Marketplace Application
            • Analytics & Reporting System MVP
            • Reporting System 2
            • Data Pipeline
            • Monitoring and visualization solution for your event-driven architecture (EDA) in AWS.
              • Visualize how services are linked together for each business flow
              • Visualize flow and metrics
            • Reporting
            • Data
        • AWS Key Learning
        • AWS NFRs
          • AWS Integration Pattern Comparison Matrix
          • AWS 99.999% Architecture
        • AWS Best Practices
          • use S3 for data migration
          • Principle of centralized control
          • For CPU Spikes in DB use RDS Proxy
          • API Security
          • Lambda VS ECS
          • Use CloudFront for Dynamic content
        • ECS Sizing
        • AWS Q&A
          • AWS Prep
          • prepexam
          • Big Data/ AI Q&A
          • DB Q&A
          • AWS Application Servers Q&A
          • General Q&A
          • VPC Q&A
      • DRs
      • AI, Analytics, Big Data, ML
        • EMR
          • Flink
          • Spark
          • Hadoop
            • Hive
        • Extra
          • Glue and EMR
          • Redshift Use Cases
        • AI
          • Media Services (Elastic Transcoder, Kinesis)
          • Textract
          • Rekognition (part of the exam)
          • Comprehend
          • Kendra
          • Fraud Detector
          • Transcribe, Polly, Lex
          • Translate
          • Time-series and Forecast
        • Big Data
          • Processing & Analytics
            • Amazon Athena VS Amazon Redshift
            • Athena & AWS Glue: Serverless Data Solutions
          • BigData Storage Solutions
          • EMR
        • Business intelligence
        • Sagemaker
          • SageMaker Neo
          • Elastic Inference (EI)
          • Integration patterns with Amazon SageMaker
          • Common Amazon SageMaker Endpoint usage patterns
          • Real-time interfaces
          • ML Example
        • Machine Learning
          • Data Engineering
            • Understanding Data Preparation
            • Feature Engineering: Transforming Raw Data into Powerful Model Inputs
            • Feature Transformation and Scaling in Machine Learning
            • Data Binning: Transforming Continuous Data into Meaningful Categories
          • Exploratory Data Analysis
            • Labs
              • Perform Feature Engineering Using Amazon SageMaker
            • Categorical Data Encoding: Converting Categories to Numbers
            • Text Feature Extraction for Machine Learning
            • Feature Extraction from Images and Speech: Understanding the Fundamentals
            • Dimensionality Reduction and Feature Selection in Machine Learning
          • Modelling
            • Prerequisites for Machine Learning Implementation
            • Classification Algorithms in Machine Learning
            • Understanding Regression Algorithms in Machine Learning
            • Time Series Analysis: Fundamentals and Applications
            • Clustering Algorithms in Machine Learning
      • Databases
        • Capturing data modification events
        • Time-Series Data (Amazon Timestream)
        • Graph DBs
          • Amazon Neptune
        • NoSQL
          • Apache Cassandra (Amazon Keyspaces)
          • Redshift
            • Redshift's ACID compliance
          • MongoDB (Amazon DocumentDB)
          • DynamoDB
            • Additional DynamoDB Features and Concepts
            • DynamoDB Consistency Models and ACID Properties
            • DynamoDB Partition Keys
          • Amazon Quantum Ledger DB (QLDB)
        • RDS
          • DR for RDS
          • RDS Multi-AZ VS RDS Proxy
          • Scaling Relational Databases
          • Aurora Blue/Green deployments
          • Aurora (Provisioned)
          • Amazon Aurora Serverless
        • Sharing RDS DB instance with an external auditor
      • Caching
        • DAX Accelerator
        • ElastiChache
        • CloudFront (External Cache)
        • Global Accelerator (GA)
      • Storages
        • S3
          • MFA Delete VS Object Lock
          • S3 Standard VS S3 Intelligent-Tiering
        • Instance Storage
        • EBS Volumes
          • Burst Capacity & Baseline IOPS
          • Provisioned IOPS vs GP3
          • EBS Multi-Attach
        • Snapshots
        • AWS Backup
        • File Sharing
          • FSx (File system for Windows or for Lustre)
          • EFS (Elastic File System)
      • Migration
        • Migration Hub
        • Application Discovery Service
        • Snow Family
        • DMS
        • SMS (Server Migration Service)
        • MGN (Application Migration Service)
        • Transfer family
        • DataSync
        • Storage Gateway
          • Volume gateway
          • Tape Gateway
          • File Gateway
          • Storage Gateway Volume Gateway VS Storage Gateway File Gateway
        • DataSync VS Storage Gateway File Gateway
      • AWS Regional Practices and Data Consistency Regional Isolation and Related Practices
      • Front End Web application
        • Pinpoint
        • Amplify
        • Device Farm
      • Glossary
      • Governance
        • Well-Architected Tool
        • Service Catalog and Proton
          • AWS Service Catalog
          • AWS Proton
        • AWS Health
        • AWS Licence Manager
        • AWS Control Tower
        • AWS Trusted Advisor
        • Saving Plans
        • AWS Compute Optimizer
        • AWS CUR
        • Cost Explorer and Budgets
        • Directory Service
        • AWS Config
        • Cross-Account Role Access
        • Resource Access Manager (RAM)
        • Organizations, Accouts, OU, SCP
      • Automation
        • System Manager (mainly for inside EC2 instances)
        • Elastic Beanstalk (for simple solutions)
        • IaC
          • SAM
          • CloudFormation
            • !Ref VS !GetAtt
            • CloudFormation examples
      • Security
        • Identity Management Services
          • IAM
            • Identity, Permission, Trust and Resource Policies
              • IAM Policy Examples
              • Trust policy
            • IAM roles cannot be attached to IAM Groups
            • AWS IAM Policies Study Guide
            • Cross-Account Access in AWS: Resource-Based Policies vs IAM Roles
            • EC2 instance profile VS Trust policy
          • Cognito
        • STS
        • AI based security
          • GuardDuty
          • Macie (S3)
        • AWS Network Firewall
        • Security Hub
        • Detective (Root Cause Analysis)
        • Inspector (EC2 and VPCs)
        • System Manager Parameter Store
        • Secret Manager
          • Secret Manger VS System Manager's Parameter Store
          • Secret Manager VS AWS KMS
        • Shield
          • DDoS
        • KMS vs CloudHSM
        • Firewall Manager
        • AWS WAF
      • Compute
        • Containers
          • ECS
            • ECS Anywhere
          • EKS
            • EKS Anywhere
          • Fargate
            • ECS Fargate VS EKS Fargate
          • ECR (Elastic Container Registry)
        • EC2
          • EC2 Purchase Options
            • Spot instances VS Spot Fleet
          • EC2 Instance Types
            • T Instance Credit System
          • Auto Scaling Groups (ASG)
          • Launch Template vs. Launch Configuration
          • AMI
          • EC2 Hibernation
        • Lambda
          • Publish VS deploy
      • Data Pipeline
      • ETL
      • AppFlow
      • AppSync
      • Step Functions
      • Batch
        • Spring Boot Batch VS AWS Batch
      • Decoupling Workflow
      • Elastic Load Balancers
      • Monitoring
        • OpenSearch
        • CloudWatch Logs Insights VS AWS X-Ray
        • QuickSight
        • Amazon Managed Service for Prometheus
        • Amazon Managed Grafana
        • CloudWatch Logs Insights
          • CloudWatch Logs Insights VS Kibana VS Grafana
        • CloudWatch Logs
        • CloudTrail
        • CloudWatch
        • X-Ray
      • On-Premises
        • ECS/EKS Anyware
        • SSM Agent
      • Serverless Application Repository
      • Troubleshooting
      • Messaging, Events and Streaming
        • Kinesis (Event Stream)
        • EventBridge (Event Router)
          • EventBridge Rule Example
          • EventBridge vs Apache Kafka
          • EventBridge VS Kinesis(Event Stream)
          • Event Bridge VS SNS
        • SNS (Event broadcaster)
        • SQS (Message Queue)
        • MSK
        • Amazon MQ
        • DLQ
    • Software Design
      • CloudEvents
        • CloudEvents Kafka
      • Transaction VS Operation DBs
      • Event-based Microservices
        • Relations database to event messages
      • Hexagonal Architecture with Java Spring
      • Distributed Systems using DDD
        • Scaling a distributed system
        • Zookeeper
        • Aggregates
        • Bounded Context
      • API Gateway
      • Cloud
        • The Twelve Factors
        • Open Service Broker API
      • Microservices
    • Design technique
    • Technologies
      • Kafka
      • Docker
        • Docker Commands
        • Artifactory
        • Dockerfile
      • ReactJs
        • Progressive Web App (PWA)
        • Guide to File Extensions in React Projects
    • Guides
      • OCP
      • AWS
        • Creating and Assuming an Administrator AWS IAM Role
        • Standing Up an Amazon Aurora Database with an Automatically Rotated Password Using AWS Secrets Manag
        • Standing Up an Apache Web Server EC2 Instance and Sending Logs to Amazon CloudWatch
        • Creating a Custom AMI and Deploying an Auto Scaling Group behind an Application Load Balancer
        • Assigning Static IPs to NLBs with ALB Target Groups
        • Hosting a Wordpress Application on ECS Fargate with RDS, Parameter Store, and Secrets Manager
        • Amazon Athena, Amazon S3, and VPC Flow Logs
      • Creating a CloudTrail Trail and EventBridge Alert for Console Sign-Ins
      • Load Balancer VS Reverse Proxy
      • Health check
      • Load Balancer
      • HTTP Protocol
      • TCP/IP Network Model
      • Event-base Microservices Implementation Guideline
      • How to write a service
      • Observability
      • Kafka Stream
      • Security
        • Securing Properties
          • HashiCorp Vault
      • Kubernates
      • Unix
        • Networking
        • Firewall
        • File system
        • alternatives
      • Setup CentOS 8 and Docker
    • Dev Tools
      • Docker Commands
      • Intellij
      • CheatSheets
        • Unix Commands
        • Vim Command
      • Templates
  • Working for an enterprise
    • Next step
    • Job roles
      • SME role
    • Common issues
Powered by GitBook
On this page
  • Overview
  • Prerequisites
  • Step 1: Prepare Target PostgreSQL Environment
  • Step 2: Configure DMS Time-Window Migration
  • Step 3: Create and Run the Orchestration Script
  • Step 4: Monitor and Validate the Process
  • Step 5: Troubleshooting Common Issues
  • Step 6: Finalizing the Process
  • Additional Considerations

Was this helpful?

  1. Software development
  2. DMS

Oracle to PostgreSQL Time-Window Data Reload Implementation Guide

Overview

This guide walks through implementing a solution for reloading data from a specific time window from Oracle to PostgreSQL while maintaining ongoing CDC (Change Data Capture) replication.

Prerequisites

  • AWS DMS (Database Migration Service) set up with:

    • Source Oracle endpoint configured

    • Target PostgreSQL endpoint configured

    • Replication instance with adequate capacity

  • Access to both Oracle and PostgreSQL databases with proper permissions

  • AWS CLI installed and configured on your management workstation

  • Basic knowledge of SQL, PostgreSQL, and Oracle database concepts

Step 1: Prepare Target PostgreSQL Environment

  1. Create Staging Schema and Tables:

    -- Connect to your PostgreSQL database
    psql -h <postgres-host> -p <postgres-port> -U <username> -d <database>
    
    -- Create staging schema
    CREATE SCHEMA IF NOT EXISTS staging;
    
    -- Create staging tables mirroring your production tables
    CREATE TABLE staging.customers (
        customer_id INTEGER PRIMARY KEY,
        name VARCHAR(100),
        email VARCHAR(100),
        status VARCHAR(20),
        created_at TIMESTAMP,
        updated_at TIMESTAMP,
        _dms_processing_status VARCHAR(20) DEFAULT 'NEW'
    );
    
    CREATE TABLE staging.orders (
        order_id INTEGER PRIMARY KEY,
        customer_id INTEGER,
        amount NUMERIC(10,2),
        order_date TIMESTAMP,
        updated_at TIMESTAMP,
        _dms_processing_status VARCHAR(20) DEFAULT 'NEW'
    );
    
    -- Create indexes for optimization
    CREATE INDEX idx_customers_updated_at ON staging.customers(updated_at);
    CREATE INDEX idx_orders_updated_at ON staging.orders(updated_at);
    
    -- Create admin schema for logging and monitoring
    CREATE SCHEMA IF NOT EXISTS admin;
    
    -- Create log tables
    CREATE TABLE admin.merge_logs (
        id SERIAL PRIMARY KEY,
        table_name VARCHAR(100),
        merge_time TIMESTAMP,
        records_updated INT,
        records_inserted INT
    );
    
    CREATE TABLE admin.merge_summary (
        id SERIAL PRIMARY KEY,
        merge_start TIMESTAMP,
        merge_end TIMESTAMP,
        status VARCHAR(20),
        notes TEXT
    );
  2. Create Merge Procedure:

    CREATE OR REPLACE PROCEDURE merge_time_window_data()
    LANGUAGE plpgsql
    AS $$
    DECLARE
        merge_start_time TIMESTAMP;
        merge_end_time TIMESTAMP;
        records_updated INT;
        records_inserted INT;
    BEGIN
        -- Record start time for logging
        merge_start_time := CURRENT_TIMESTAMP;
        
        -- Begin transaction
        BEGIN
            -- Merge customers table
            WITH updated_rows AS (
                UPDATE production.customers p
                SET 
                    name = s.name,
                    email = s.email,
                    status = s.status,
                    updated_at = s.updated_at
                FROM staging.customers s
                WHERE p.customer_id = s.customer_id
                AND s.updated_at > p.updated_at
                RETURNING p.customer_id
            )
            SELECT COUNT(*) INTO records_updated FROM updated_rows;
            
            -- Insert new records
            WITH inserted_rows AS (
                INSERT INTO production.customers
                SELECT 
                    s.customer_id, s.name, s.email, s.status, s.created_at, s.updated_at
                FROM staging.customers s
                LEFT JOIN production.customers p ON s.customer_id = p.customer_id
                WHERE p.customer_id IS NULL
                RETURNING 1
            )
            SELECT COUNT(*) INTO records_inserted FROM inserted_rows;
            
            -- Log merge results
            INSERT INTO admin.merge_logs
            (table_name, merge_time, records_updated, records_inserted)
            VALUES ('customers', CURRENT_TIMESTAMP, records_updated, records_inserted);
            
            -- Repeat for orders table
            WITH updated_orders AS (
                UPDATE production.orders p
                SET 
                    customer_id = s.customer_id,
                    amount = s.amount,
                    order_date = s.order_date,
                    updated_at = s.updated_at
                FROM staging.orders s
                WHERE p.order_id = s.order_id
                AND s.updated_at > p.updated_at
                RETURNING p.order_id
            )
            SELECT COUNT(*) INTO records_updated FROM updated_orders;
            
            WITH inserted_orders AS (
                INSERT INTO production.orders
                SELECT 
                    s.order_id, s.customer_id, s.amount, s.order_date, s.updated_at
                FROM staging.orders s
                LEFT JOIN production.orders p ON s.order_id = p.order_id
                WHERE p.order_id IS NULL
                RETURNING 1
            )
            SELECT COUNT(*) INTO records_inserted FROM inserted_orders;
            
            INSERT INTO admin.merge_logs
            (table_name, merge_time, records_updated, records_inserted)
            VALUES ('orders', CURRENT_TIMESTAMP, records_updated, records_inserted);
            
            -- Mark all staging records as processed
            UPDATE staging.customers 
            SET _dms_processing_status = 'PROCESSED';
            
            UPDATE staging.orders 
            SET _dms_processing_status = 'PROCESSED';
            
            -- Commit transaction
            COMMIT;
            
            merge_end_time := CURRENT_TIMESTAMP;
            
            -- Log overall merge statistics
            INSERT INTO admin.merge_summary
            (merge_start, merge_end, status, notes)
            VALUES (merge_start_time, merge_end_time, 'SUCCESS', 
                    'Merged Oracle data from time window');
                    
        EXCEPTION WHEN OTHERS THEN
            -- Roll back transaction on error
            ROLLBACK;
            
            -- Log error
            INSERT INTO admin.merge_summary
            (merge_start, merge_end, status, notes)
            VALUES (merge_start_time, CURRENT_TIMESTAMP, 'FAILED', 
                    'Error: ' || SQLERRM);
                    
            -- Re-raise exception
            RAISE;
        END;
    END;
    $$;
  3. Create Monitoring Views:

    CREATE OR REPLACE VIEW admin.oracle_pg_migration_status AS
    SELECT 
        'customers' AS table_name,
        COUNT(*) AS total_records,
        SUM(CASE WHEN _dms_processing_status = 'NEW' THEN 1 ELSE 0 END) AS pending_records,
        SUM(CASE WHEN _dms_processing_status = 'PROCESSED' THEN 1 ELSE 0 END) AS processed_records,
        SUM(CASE WHEN _dms_processing_status = 'ERROR' THEN 1 ELSE 0 END) AS error_records
    FROM staging.customers
    UNION ALL
    SELECT 
        'orders' AS table_name,
        COUNT(*) AS total_records,
        SUM(CASE WHEN _dms_processing_status = 'NEW' THEN 1 ELSE 0 END) AS pending_records,
        SUM(CASE WHEN _dms_processing_status = 'PROCESSED' THEN 1 ELSE 0 END) AS processed_records,
        SUM(CASE WHEN _dms_processing_status = 'ERROR' THEN 1 ELSE 0 END) AS error_records
    FROM staging.orders;

Step 2: Configure DMS Time-Window Migration

  1. Create DMS Task Settings File: Create a file named oracle-pg-task-settings.json with the following content:

    {
      "TargetMetadata": {
        "TargetSchema": "staging",
        "SupportLobs": true,
        "FullLobMode": false,
        "LobChunkSize": 64,
        "LimitedSizeLobMode": true,
        "LobMaxSize": 32
      },
      "FullLoadSettings": {
        "TargetTablePrepMode": "DO_NOTHING",
        "CreatePkAfterFullLoad": false,
        "StopTaskCachedChangesApplied": false,
        "StopTaskCachedChangesNotApplied": false,
        "MaxFullLoadSubTasks": 8,
        "TransactionConsistencyTimeout": 600,
        "CommitRate": 10000
      },
      "Logging": {
        "EnableLogging": true,
        "LogComponents": [
          {
            "Id": "SOURCE_UNLOAD",
            "Severity": "LOGGER_SEVERITY_DEFAULT"
          },
          {
            "Id": "TARGET_LOAD",
            "Severity": "LOGGER_SEVERITY_DEFAULT"
          },
          {
            "Id": "TRANSFORMATION",
            "Severity": "LOGGER_SEVERITY_DEFAULT"
          }
        ]
      },
      "OracleSettings": {
        "ReadTableSpaceName": false,
        "EnableHomogenousTablespace": false,
        "StandbyDelayTime": 0,
        "ArchivedLogsOnly": false,
        "ArchivedLogDestId": 0,
        "UsePathPrefix": "",
        "UseLogminerReader": true,
        "SecurityDbEncryption": false,
        "SecurityDbEncryptionName": "",
        "DirectPathNoLog": false,
        "AllowSelectNestedTables": true,
        "ConvertTimestampWithZoneToUTC": true,
        "NumberDataTypeScale": 38,
        "AsmServer": "",
        "CharLengthSemantics": "CHAR",
        "AddSupplementalLogging": true,
        "ReadAheadBlocks": 1000
      }
    }
  2. Create Table Mappings File: Create a file named oracle-pg-time-window-mapping.json with the following content (adjust the date range as needed):

    {
      "rules": [
        {
          "rule-type": "selection",
          "rule-id": "1",
          "rule-name": "customers-time-window",
          "object-locator": {
            "schema-name": "SOURCE_SCHEMA",
            "table-name": "CUSTOMERS"
          },
          "rule-action": "include",
          "filters": [
            {
              "filter-type": "source",
              "column-name": "UPDATED_AT",
              "filter-conditions": [
                {
                  "filter-operator": "gte",
                  "value": "2025-01-01 00:00:00"
                },
                {
                  "filter-operator": "lte",
                  "value": "2025-01-31 23:59:59"
                }
              ]
            }
          ]
        },
        {
          "rule-type": "selection",
          "rule-id": "2",
          "rule-name": "orders-time-window",
          "object-locator": {
            "schema-name": "SOURCE_SCHEMA",
            "table-name": "ORDERS"
          },
          "rule-action": "include",
          "filters": [
            {
              "filter-type": "source",
              "column-name": "UPDATED_AT",
              "filter-conditions": [
                {
                  "filter-operator": "gte",
                  "value": "2025-01-01 00:00:00"
                },
                {
                  "filter-operator": "lte",
                  "value": "2025-01-31 23:59:59"
                }
              ]
            }
          ]
        },
        {
          "rule-type": "transformation",
          "rule-id": "3",
          "rule-name": "convert-oracle-timestamp",
          "rule-action": "convert-column-type",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SOURCE_SCHEMA",
            "table-name": "%",
            "column-name": "%_AT"
          },
          "data-type": {
            "type": "datetime",
            "precision": 0,
            "scale": 0
          }
        },
        {
          "rule-type": "transformation",
          "rule-id": "4",
          "rule-name": "convert-oracle-number",
          "rule-action": "convert-column-type",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SOURCE_SCHEMA",
            "table-name": "ORDERS",
            "column-name": "AMOUNT"
          },
          "data-type": {
            "type": "numeric",
            "precision": 10,
            "scale": 2
          }
        }
      ]
    }

Step 3: Create and Run the Orchestration Script

  1. Create the Orchestration Script: Create a file named oracle-pg-time-window-reload.sh with the following content:

    #!/bin/bash
    
    # Configuration - Update these values
    SOURCE_ENDPOINT_ARN="arn:aws:dms:region:account:endpoint:oracle-source-endpoint"
    TARGET_ENDPOINT_ARN="arn:aws:dms:region:account:endpoint:postgres-target-endpoint"
    REPLICATION_INSTANCE_ARN="arn:aws:dms:region:account:rep:replication-instance-id"
    TABLE_MAPPINGS_FILE="oracle-pg-time-window-mapping.json"
    TASK_SETTINGS_FILE="oracle-pg-task-settings.json"
    START_DATE="2025-01-01T00:00:00"
    END_DATE="2025-01-31T23:59:59"
    PG_HOST="your-postgres-host"
    PG_PORT="5432"
    PG_NAME="your-postgres-db"
    PG_USER="your-postgres-user"
    PG_PASSWORD="your-postgres-password"
    
    echo "Starting Oracle to PostgreSQL time-window data reload from $START_DATE to $END_DATE"
    
    # Step 1: Create the migration task
    TASK_ARN=$(aws dms create-replication-task \
      --replication-task-identifier "oracle-pg-historical-reload-$(date +%Y%m%d%H%M%S)" \
      --source-endpoint-arn $SOURCE_ENDPOINT_ARN \
      --target-endpoint-arn $TARGET_ENDPOINT_ARN \
      --replication-instance-arn $REPLICATION_INSTANCE_ARN \
      --migration-type "full-load" \
      --table-mappings file://$TABLE_MAPPINGS_FILE \
      --replication-task-settings file://$TASK_SETTINGS_FILE \
      --tags Key=Purpose,Value=OraclePgHistoricalReload Key=TimeWindow,Value="${START_DATE}-${END_DATE}" \
      --query 'ReplicationTask.ReplicationTaskArn' \
      --output text)
    
    echo "Created Oracle to PostgreSQL replication task: $TASK_ARN"
    
    # Step 2: Start the task and wait for completion
    aws dms start-replication-task \
      --replication-task-arn $TASK_ARN \
      --start-replication-task-type start-replication
    
    echo "Task started. Waiting for completion..."
    
    # Poll for task completion
    while true; do
      STATUS=$(aws dms describe-replication-tasks \
        --filters Name=replication-task-arn,Values=$TASK_ARN \
        --query 'ReplicationTasks[0].Status' \
        --output text)
      
      PERCENT=$(aws dms describe-replication-tasks \
        --filters Name=replication-task-arn,Values=$TASK_ARN \
        --query 'ReplicationTasks[0].ReplicationTaskStats.FullLoadProgressPercent' \
        --output text || echo "N/A")
      
      echo "Current status: $STATUS - $PERCENT% completed"
      
      if [ "$STATUS" == "stopped" ] || [ "$STATUS" == "failed" ]; then
        break
      fi
      
      sleep 120
    done
    
    # Step 3: Check if task completed successfully
    if [ "$STATUS" == "stopped" ]; then
      echo "Oracle to PostgreSQL historical data migration completed successfully"
      
      # Step 4: Run the merge procedure
      echo "Starting data merge process..."
      PGPASSWORD=$PG_PASSWORD psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PG_NAME -c "CALL merge_time_window_data();"
      
      if [ $? -eq 0 ]; then
        echo "Data merge completed successfully"
      else
        echo "Data merge failed"
        exit 1
      fi
    else
      echo "Historical data migration failed"
      exit 1
    fi
    
    # Step 5: Clean up staging data (optional)
    read -p "Do you want to clean up staging tables? (y/n) " CLEANUP
    if [ "$CLEANUP" == "y" ]; then
      echo "Cleaning up staging tables..."
      PGPASSWORD=$PG_PASSWORD psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PG_NAME -c "TRUNCATE TABLE staging.customers; TRUNCATE TABLE staging.orders;"
    fi
    
    echo "Oracle to PostgreSQL time-window data reload process completed"
  2. Make the Script Executable:

    chmod +x oracle-pg-time-window-reload.sh
  3. Run the Orchestration Script:

    ./oracle-pg-time-window-reload.sh

Step 4: Monitor and Validate the Process

  1. Check Migration Status:

    -- Connect to PostgreSQL
    psql -h <postgres-host> -p <postgres-port> -U <username> -d <database>
    
    -- Check migration status
    SELECT * FROM admin.oracle_pg_migration_status;
  2. View Merge Logs:

    -- Check detailed merge logs
    SELECT * FROM admin.merge_logs ORDER BY merge_time DESC;
    
    -- Check merge summary
    SELECT * FROM admin.merge_summary ORDER BY merge_start DESC;
  3. Check for Data Type Issues:

    -- Create a verification view (if not already created)
    CREATE OR REPLACE VIEW admin.oracle_pg_type_verification AS
    WITH target_columns AS (
        SELECT 
            t.table_schema,
            t.table_name,
            c.column_name,
            c.data_type,
            c.character_maximum_length,
            c.numeric_precision,
            c.numeric_scale
        FROM 
            information_schema.tables t
        JOIN 
            information_schema.columns c ON t.table_schema = c.table_schema AND t.table_name = c.table_name
        WHERE 
            t.table_schema = 'production'
            AND t.table_type = 'BASE TABLE'
    ),
    staging_columns AS (
        SELECT 
            t.table_schema,
            t.table_name,
            c.column_name,
            c.data_type,
            c.character_maximum_length,
            c.numeric_precision,
            c.numeric_scale
        FROM 
            information_schema.tables t
        JOIN 
            information_schema.columns c ON t.table_schema = c.table_schema AND t.table_name = c.table_name
        WHERE 
            t.table_schema = 'staging'
            AND t.table_type = 'BASE TABLE'
    )
    SELECT
        t.table_name,
        t.column_name,
        t.data_type AS target_data_type,
        s.data_type AS staging_data_type,
        CASE
            WHEN t.data_type <> s.data_type THEN 'DATA TYPE MISMATCH'
            WHEN t.character_maximum_length <> s.character_maximum_length AND t.character_maximum_length IS NOT NULL THEN 'LENGTH MISMATCH'
            WHEN t.numeric_precision <> s.numeric_precision AND t.numeric_precision IS NOT NULL THEN 'PRECISION MISMATCH'
            WHEN t.numeric_scale <> s.numeric_scale AND t.numeric_scale IS NOT NULL THEN 'SCALE MISMATCH'
            ELSE 'OK'
        END AS status
    FROM
        target_columns t
    JOIN
        staging_columns s ON t.table_name = s.table_name AND t.column_name = s.column_name
    WHERE
        t.data_type <> s.data_type
        OR (t.character_maximum_length <> s.character_maximum_length AND t.character_maximum_length IS NOT NULL)
        OR (t.numeric_precision <> s.numeric_precision AND t.numeric_precision IS NOT NULL)
        OR (t.numeric_scale <> s.numeric_scale AND t.numeric_scale IS NOT NULL);
    
    -- Check for type issues
    SELECT * FROM admin.oracle_pg_type_verification;

Step 5: Troubleshooting Common Issues

  1. DMS Task Failures:

    • Check CloudWatch logs for detailed error messages

    • Verify Oracle supplemental logging is enabled for all source tables

    • Ensure PostgreSQL target has sufficient disk space

    • Validate that all Oracle data types are properly mapped to PostgreSQL equivalents

  2. Data Type Conversion Issues:

    • Oracle NUMBER to PostgreSQL NUMERIC may require explicit precision/scale definitions

    • Oracle DATE/TIMESTAMP handling might need special conversion rules

    • Character encoding differences between databases might cause string conversion issues

  3. Merge Procedure Errors:

    • Check admin.merge_summary for specific error messages

    • Verify that primary keys are defined correctly in both staging and production tables

    • Ensure there are no constraint violations during the merge process

Step 6: Finalizing the Process

  1. Cleanup Tasks:

    • After successful verification, truncate staging tables if needed

    • Consider deleting the one-time DMS task to save resources

    • Archive logs for audit purposes

  2. Documentation:

    • Document the entire process, including any issues encountered and resolutions

    • Record the time window that was reloaded for future reference

    • Note any data type conversion or mapping decisions made

Additional Considerations

  1. Performance Optimization:

    • Increase DMS MaxFullLoadSubTasks for larger tables

    • Adjust PostgreSQL configuration for better bulk load performance (increase work_mem, maintenance_work_mem, etc.)

    • Consider temporarily disabling indexes during the load process

  2. Parallel Ongoing CDC:

    • Ensure your continuous CDC task doesn't conflict with the time-window reload

    • Monitor both processes carefully for any signs of interference

  3. Recovery Options:

    • Create a backup of the target tables before running the merge process

    • Have a rollback plan in case of issues during the merge

This implementation guide provides a comprehensive approach to selectively reload time-window data from Oracle to PostgreSQL while maintaining ongoing CDC replication.

PreviousHow CDC Checkpoints WorkNextJoin tables separate PostgreSQL databases

Last updated 3 months ago

Was this helpful?