PuglieseWeb
  • Home
  • Software development
    • Cloud Data Security Principles
      • Separation of Duties (SoD)
      • Security Controls and Data Protection Framework
      • Vaultless Tokenization
    • Multi-cloud strategies
    • DMS
      • How CDC Checkpoints Work
      • Oracle to PostgreSQL Time-Window Data Reload Implementation Guide
      • Join tables separate PostgreSQL databases
      • Multi-Stage Migration Implementation Plan
      • Notes
      • Oracle Golden Gate to PostgreSQL Migration
      • Step-by-Step CDC Recovery Guide: Oracle to PostgreSQL Migration
    • AWS Pro
      • My notes
        • Data Migration Strategy
        • OpsWorks VS CloudFormation
      • Implementation Guides
        • AWS Lambda Scaling and Concurrency Optimization Guide
        • Understanding Cross-Account IAM Roles in AWS
        • HA TCP with Redundant DNS
        • Understanding 429 (Too Many Requests) & Throttling Pattern
        • EC2 Auto Scaling Log Collection Solutions Comparison
        • AWS PrivateLink Implementation Guide for Third-Party SaaS Integration
        • AWS Cross-Account Network Sharing Implementation Guide
        • Cross-Account Route 53 Private Hosted Zone Implementation Guide
          • Route 53
            • Routing Policies
              • Using a Weighted Routing Policy
              • Simple Routing Policy
              • Multivalue Answer Routing
            • Latency Routing Policy
            • Route 53 Traffic Flow
        • Direct Connect Gateway Implementation Guide
        • CICD for Lambda
        • AWS IAM Identity Center Integration with Active Directory
        • AWS Transit Gateway Multi-Account Implementation Guide
          • AWS Multi-Account Network Architecture with Infrastructure Account
      • Links
      • Cloud Adoption Framework
      • Data Stores
        • Data Store Types and Concepts in AWS
        • S3
          • Amazon S3 (Simple Storage Service)
            • Bucket Policies
          • Managing Permissions in Amazon S3
          • Amazon Glacier: AWS Archive Storage Service
          • Lab: Querying Data in Amazon S3 with Amazon Athena
          • LAB: Loading Data into a Redshift Cluster
        • Attached Storage
          • EBS
          • AWS Elastic File System (EFS): From Sun Microsystems to Modern Cloud Storage
          • AWS FSx Service Guide
          • Amazon Storage Gateway Guide
        • Databases
          • Amazon Storage Gateway Guide
          • Amazon RDS (Relational Database Service)
          • Aurora DB
          • Dynamo DB
          • Document DB
          • Amazon Redshift Overview
          • Data Pipeline
            • Data Lake VS Lake Formation
          • AWS Data Preparation Services
          • Amazon Neptune
          • Amazon ElastiCache
          • AWS Specialized Database Services
          • LAB - Deploy an Amazon RDS Multi-AZ and Read Replica in AWS
      • Networking
        • Concept
        • Basics
          • VPG
          • VPC
            • VPC endpoints
              • Interface Endpoint VS Elastic Network Interface (ENI)
            • PrivateLink
              • PrivateLink SAAS Use case
            • Transit Gateway
            • 5G Networks
            • VPN CloudHub
            • VPC security
            • VPC peering
            • VPC Endpoint
            • Route Table (and Routers)
            • Network Access Control List (NACL)
            • Network Security Group
            • NAT Gateway
              • NACL vs NAT
          • Elastic Load Balancing (ELB)
            • Gateway Load Balancer (GWLB)
          • CIDR ranges examples
          • Enhanced Networking
          • Elastic Fabric Adapter (EFA)
          • Elastic Network Interface (ENI)
        • Network to VPC Connectivity
          • Transit VS Direct Connect Gateway
          • Direct Connect
            • VIF (Virtual Interfaces)
            • VIF VS ENI
            • Customer Routers VS Customer Gateways
        • VPC-to-VPC
        • NAT & Internet Gateway
        • Routing
          • IPv4 Address Classes and Subnet Masks
          • VPC's DNS server
          • Transit VPC VS Transit Gateway
          • Example Routing tables configuration
          • Cross-regions failover
          • Loopback
        • Enhanced Networking
        • Hybrid and Cross-Account Networking
        • AWS Global Accelerator
        • Route 53
        • Cross-Account Route 53
        • CloudFront SSL/TLS and SNI Configuration
        • ELB
        • Lab: Creating a Multi-Region Network with VPC Peering Using SGs, IGW, and RTs
        • LAB - Creating a CloudFront Distribution with Regional S3 Origins
        • Lab: Creating and Configuring a Network Load Balancer in AWS
        • Lab: Troubleshooting Amazon EC2 Network Connectivity
        • Lab: Troubleshooting VPC Networking
      • Security
        • Cloud Security
          • IAM
            • SCIM
            • Use case 1
          • Core Concepts of AWS Cloud Security
            • OAuth VS OpenID Connect
          • Understanding User Access Security in AWS Organizations
          • Exploring Organizations
          • Controlling Access in AWS Organizations
            • SCP (Service Control Policy) implementation types
        • Network Controls and Security Groups
          • Firewalls
            • Network Controls and Security Groups Overview
          • AWS Directory Services
          • AWS Identity and Access Management (IAM) and Security Services
            • ASW Identity Sources
          • AWS Resource Access Manager (RAM): Cross-Account Resource Sharing
            • AWS App Mesh
        • Encryption
          • History and Modern Implementation of Encryption in AWS
          • Secret Manager
          • DDoS Attacks and AWS Protection Strategies: Technical Overview
          • AWS Managed Security Services Overview
          • IDS and IPS
          • AWS Service Catalog
      • Migrations
        • Migration Concepts
          • Hybrid Cloud Architectures
          • Migration Strategies
        • Migration Application
          • Services and Strategies
          • AWS Data Migration Services
          • Network Migrations and Cutovers
            • Network and Broadcast Addresses
            • VPC DNS
          • AWS Snow Family
      • Architecting to scale
        • Scaling Concepts and Services
          • Auto-Scaling
          • Compute Optimizer
          • Kinesis
          • DynamoDB Scaling
          • CloudFront Part Duex
            • CloudFront's Behavior
            • Lambda@Edge and CloudFront Functions
        • Event-Driven Architecture
          • SNS and Fan-out Architecture
            • SNS & outbox pattern
          • AWS Messaging Services: SQS and Amazon MQ
          • Lab: Scaling EC2 Using SQS
          • Lambda
          • Scaling Containers in AWS
          • Step Function and Batch
          • Elastic MapReduce
          • AWS Data Monitoring and Visualization Services
      • Business Continuity
        • AWS High Availability and Disaster Recovery
        • AWS Disaster Recovery Architectures
        • EBS Volumes
        • AWS Compute Options for High Availability
        • AWS Database High Availability Options
        • AWS Network High Availability Options
        • Lab: Connect Multiple VPCs with Transit Gateway
        • Deployment and Operations Management
          • Software Deployment Strategies
            • AWS CI/CD
            • Elastic Beanstalk
              • Elastic Beanstalk and App Runner
            • CloudFormation
            • Cross-Account Infrastructure Deployment
              • Example Code Pipeline
            • AWS Container Services
            • AWS API Gateway
            • LAB: Understanding CloudFormation Template Anatomy
          • Management Tool
            • Config and OpsWorks
            • System Manager
            • Enterprise Apps
            • AWS Machine Learning Landscape
            • AWS IoT Services
      • Cost Management and Optimization
        • Concepts
        • AWS Cost Optimization Strategies
        • AWS Tagging and Resource Groups
        • Managing Costs Across AWS Accounts
        • AWS Instance Purchasing Options
        • AWS Cost Management Tools
      • Others
        • SCPs vs AWS Config
        • Questions notes
        • Comparison of Deployment Strategies in AWS
        • Bedrock vs EMR
        • Software Deployment Strategies
    • AWS
      • Others
        • AWS Example architectures
          • Gaming application
          • Digital Payment System
            • Marketplace Application
            • Analytics & Reporting System MVP
            • Reporting System 2
            • Data Pipeline
            • Monitoring and visualization solution for your event-driven architecture (EDA) in AWS.
              • Visualize how services are linked together for each business flow
              • Visualize flow and metrics
            • Reporting
            • Data
        • AWS Key Learning
        • AWS NFRs
          • AWS Integration Pattern Comparison Matrix
          • AWS 99.999% Architecture
        • AWS Best Practices
          • use S3 for data migration
          • Principle of centralized control
          • For CPU Spikes in DB use RDS Proxy
          • API Security
          • Lambda VS ECS
          • Use CloudFront for Dynamic content
        • ECS Sizing
        • AWS Q&A
          • AWS Prep
          • prepexam
          • Big Data/ AI Q&A
          • DB Q&A
          • AWS Application Servers Q&A
          • General Q&A
          • VPC Q&A
      • DRs
      • AI, Analytics, Big Data, ML
        • EMR
          • Flink
          • Spark
          • Hadoop
            • Hive
        • Extra
          • Glue and EMR
          • Redshift Use Cases
        • AI
          • Media Services (Elastic Transcoder, Kinesis)
          • Textract
          • Rekognition (part of the exam)
          • Comprehend
          • Kendra
          • Fraud Detector
          • Transcribe, Polly, Lex
          • Translate
          • Time-series and Forecast
        • Big Data
          • Processing & Analytics
            • Amazon Athena VS Amazon Redshift
            • Athena & AWS Glue: Serverless Data Solutions
          • BigData Storage Solutions
          • EMR
        • Business intelligence
        • Sagemaker
          • SageMaker Neo
          • Elastic Inference (EI)
          • Integration patterns with Amazon SageMaker
          • Common Amazon SageMaker Endpoint usage patterns
          • Real-time interfaces
          • ML Example
        • Machine Learning
          • Data Engineering
            • Understanding Data Preparation
            • Feature Engineering: Transforming Raw Data into Powerful Model Inputs
            • Feature Transformation and Scaling in Machine Learning
            • Data Binning: Transforming Continuous Data into Meaningful Categories
          • Exploratory Data Analysis
            • Labs
              • Perform Feature Engineering Using Amazon SageMaker
            • Categorical Data Encoding: Converting Categories to Numbers
            • Text Feature Extraction for Machine Learning
            • Feature Extraction from Images and Speech: Understanding the Fundamentals
            • Dimensionality Reduction and Feature Selection in Machine Learning
          • Modelling
            • Prerequisites for Machine Learning Implementation
            • Classification Algorithms in Machine Learning
            • Understanding Regression Algorithms in Machine Learning
            • Time Series Analysis: Fundamentals and Applications
            • Clustering Algorithms in Machine Learning
      • Databases
        • Capturing data modification events
        • Time-Series Data (Amazon Timestream)
        • Graph DBs
          • Amazon Neptune
        • NoSQL
          • Apache Cassandra (Amazon Keyspaces)
          • Redshift
            • Redshift's ACID compliance
          • MongoDB (Amazon DocumentDB)
          • DynamoDB
            • Additional DynamoDB Features and Concepts
            • DynamoDB Consistency Models and ACID Properties
            • DynamoDB Partition Keys
          • Amazon Quantum Ledger DB (QLDB)
        • RDS
          • DR for RDS
          • RDS Multi-AZ VS RDS Proxy
          • Scaling Relational Databases
          • Aurora Blue/Green deployments
          • Aurora (Provisioned)
          • Amazon Aurora Serverless
        • Sharing RDS DB instance with an external auditor
      • Caching
        • DAX Accelerator
        • ElastiChache
        • CloudFront (External Cache)
        • Global Accelerator (GA)
      • Storages
        • S3
          • MFA Delete VS Object Lock
          • S3 Standard VS S3 Intelligent-Tiering
        • Instance Storage
        • EBS Volumes
          • Burst Capacity & Baseline IOPS
          • Provisioned IOPS vs GP3
          • EBS Multi-Attach
        • Snapshots
        • AWS Backup
        • File Sharing
          • FSx (File system for Windows or for Lustre)
          • EFS (Elastic File System)
      • Migration
        • Migration Hub
        • Application Discovery Service
        • Snow Family
        • DMS
        • SMS (Server Migration Service)
        • MGN (Application Migration Service)
        • Transfer family
        • DataSync
        • Storage Gateway
          • Volume gateway
          • Tape Gateway
          • File Gateway
          • Storage Gateway Volume Gateway VS Storage Gateway File Gateway
        • DataSync VS Storage Gateway File Gateway
      • AWS Regional Practices and Data Consistency Regional Isolation and Related Practices
      • Front End Web application
        • Pinpoint
        • Amplify
        • Device Farm
      • Glossary
      • Governance
        • Well-Architected Tool
        • Service Catalog and Proton
          • AWS Service Catalog
          • AWS Proton
        • AWS Health
        • AWS Licence Manager
        • AWS Control Tower
        • AWS Trusted Advisor
        • Saving Plans
        • AWS Compute Optimizer
        • AWS CUR
        • Cost Explorer and Budgets
        • Directory Service
        • AWS Config
        • Cross-Account Role Access
        • Resource Access Manager (RAM)
        • Organizations, Accouts, OU, SCP
      • Automation
        • System Manager (mainly for inside EC2 instances)
        • Elastic Beanstalk (for simple solutions)
        • IaC
          • SAM
          • CloudFormation
            • !Ref VS !GetAtt
            • CloudFormation examples
      • Security
        • Identity Management Services
          • IAM
            • Identity, Permission, Trust and Resource Policies
              • IAM Policy Examples
              • Trust policy
            • IAM roles cannot be attached to IAM Groups
            • AWS IAM Policies Study Guide
            • Cross-Account Access in AWS: Resource-Based Policies vs IAM Roles
            • EC2 instance profile VS Trust policy
          • Cognito
        • STS
        • AI based security
          • GuardDuty
          • Macie (S3)
        • AWS Network Firewall
        • Security Hub
        • Detective (Root Cause Analysis)
        • Inspector (EC2 and VPCs)
        • System Manager Parameter Store
        • Secret Manager
          • Secret Manger VS System Manager's Parameter Store
          • Secret Manager VS AWS KMS
        • Shield
          • DDoS
        • KMS vs CloudHSM
        • Firewall Manager
        • AWS WAF
      • Compute
        • Containers
          • ECS
            • ECS Anywhere
          • EKS
            • EKS Anywhere
          • Fargate
            • ECS Fargate VS EKS Fargate
          • ECR (Elastic Container Registry)
        • EC2
          • EC2 Purchase Options
            • Spot instances VS Spot Fleet
          • EC2 Instance Types
            • T Instance Credit System
          • Auto Scaling Groups (ASG)
          • Launch Template vs. Launch Configuration
          • AMI
          • EC2 Hibernation
        • Lambda
          • Publish VS deploy
      • Data Pipeline
      • ETL
      • AppFlow
      • AppSync
      • Step Functions
      • Batch
        • Spring Boot Batch VS AWS Batch
      • Decoupling Workflow
      • Elastic Load Balancers
      • Monitoring
        • OpenSearch
        • CloudWatch Logs Insights VS AWS X-Ray
        • QuickSight
        • Amazon Managed Service for Prometheus
        • Amazon Managed Grafana
        • CloudWatch Logs Insights
          • CloudWatch Logs Insights VS Kibana VS Grafana
        • CloudWatch Logs
        • CloudTrail
        • CloudWatch
        • X-Ray
      • On-Premises
        • ECS/EKS Anyware
        • SSM Agent
      • Serverless Application Repository
      • Troubleshooting
      • Messaging, Events and Streaming
        • Kinesis (Event Stream)
        • EventBridge (Event Router)
          • EventBridge Rule Example
          • EventBridge vs Apache Kafka
          • EventBridge VS Kinesis(Event Stream)
          • Event Bridge VS SNS
        • SNS (Event broadcaster)
        • SQS (Message Queue)
        • MSK
        • Amazon MQ
        • DLQ
    • Software Design
      • CloudEvents
        • CloudEvents Kafka
      • Transaction VS Operation DBs
      • Event-based Microservices
        • Relations database to event messages
      • Hexagonal Architecture with Java Spring
      • Distributed Systems using DDD
        • Scaling a distributed system
        • Zookeeper
        • Aggregates
        • Bounded Context
      • API Gateway
      • Cloud
        • The Twelve Factors
        • Open Service Broker API
      • Microservices
    • Design technique
    • Technologies
      • Kafka
      • Docker
        • Docker Commands
        • Artifactory
        • Dockerfile
      • ReactJs
        • Progressive Web App (PWA)
        • Guide to File Extensions in React Projects
    • Guides
      • OCP
      • AWS
        • Creating and Assuming an Administrator AWS IAM Role
        • Standing Up an Amazon Aurora Database with an Automatically Rotated Password Using AWS Secrets Manag
        • Standing Up an Apache Web Server EC2 Instance and Sending Logs to Amazon CloudWatch
        • Creating a Custom AMI and Deploying an Auto Scaling Group behind an Application Load Balancer
        • Assigning Static IPs to NLBs with ALB Target Groups
        • Hosting a Wordpress Application on ECS Fargate with RDS, Parameter Store, and Secrets Manager
        • Amazon Athena, Amazon S3, and VPC Flow Logs
      • Creating a CloudTrail Trail and EventBridge Alert for Console Sign-Ins
      • Load Balancer VS Reverse Proxy
      • Health check
      • Load Balancer
      • HTTP Protocol
      • TCP/IP Network Model
      • Event-base Microservices Implementation Guideline
      • How to write a service
      • Observability
      • Kafka Stream
      • Security
        • Securing Properties
          • HashiCorp Vault
      • Kubernates
      • Unix
        • Networking
        • Firewall
        • File system
        • alternatives
      • Setup CentOS 8 and Docker
    • Dev Tools
      • Docker Commands
      • Intellij
      • CheatSheets
        • Unix Commands
        • Vim Command
      • Templates
  • Working for an enterprise
    • Next step
    • Job roles
      • SME role
    • Common issues
Powered by GitBook
On this page
  • Core Concepts and Definitions
  • Service Level Concepts
  • Real-World Example: The S3 Outage of 2017
  • Types of Disasters
  • Best Practices
  • Werner Vogels' Principle
  • Implementation Considerations

Was this helpful?

  1. Software development
  2. AWS Pro
  3. Business Continuity

AWS High Availability and Disaster Recovery

PreviousBusiness ContinuityNextAWS Disaster Recovery Architectures

Last updated 5 months ago

Was this helpful?

Core Concepts and Definitions

Business Continuity vs. Disaster Recovery

  • Business Continuity (BC)

    • Focus on minimizing business disruption

    • Preventive and planning measures

    • Defines acceptable service levels

  • Disaster Recovery (DR)

    • Response to events threatening continuity

    • Reactive measures and procedures

    • Implementation of recovery plans

Fault Tolerance vs. High Availability

  • Fault Tolerance

    • Complete ability to tolerate faults

    • No user-visible disruption

    • Often more expensive to implement

    • Zero downtime objective

  • High Availability

    • Allows for minimal unplanned downtime

    • More cost-effective approach

    • Balance between reliability and cost

    • Accepts some potential disruption

Service Level Concepts

Service Level Agreements (SLA)

  • Commitments for quality/availability

  • Not absolute guarantees

  • May include compensation provisions

  • Requires customer documentation for claims

Recovery Objectives

  1. Recovery Time Objective (RTO)

    • Time to restore business processes

    • Measured from disruption to recovery

    • Defines acceptable downtime

    • Key factor in BC planning

  2. Recovery Point Objective (RPO)

    • Acceptable data loss measured in time

    • Gap between last backup and incident

    • Influences backup strategy

    • Drives data protection requirements

Real-World Example: The S3 Outage of 2017

Incident Overview

  • Date: February 28, 2017

  • Location: AWS Northern Virginia (us-east-1)

  • Duration: 252 minutes

  • Cause: Human error during system maintenance

Impact Analysis

  • Exceeded 99.99% availability goal

  • Affected multiple dependent services

  • Disrupted AWS's own health dashboard

  • Demonstrated cascading failure risks

Lessons Learned

  • Human error can bypass safeguards

  • Dependencies create vulnerability chains

  • Need for cross-region resilience

  • Importance of proper change management

Types of Disasters

Infrastructure Failures

  1. Hardware Failures

    • Network equipment malfunctions

    • Storage system crashes

    • Physical infrastructure issues

  2. Software Failures

    • Deployment errors

    • Configuration mistakes

    • Application crashes

External Threats

  1. Load-Related

    • DDoS attacks

    • Traffic spikes

    • Resource exhaustion

  2. Infrastructure

    • Fiber cuts

    • Power outages

    • Facility issues

Data and System Issues

  1. Data-Induced

    • Corruption

    • Type conversion errors

    • Invalid data processing

  2. Credential Issues

    • Certificate expirations

    • Key rotations

    • Authentication failures

Resource Limitations

  1. Capacity Issues

    • Instance type unavailability

    • Storage limitations

    • Network bandwidth constraints

  2. Identifier Exhaustion

    • IP address depletion

    • Resource ID limitations

    • Namespace conflicts

Best Practices

Planning and Prevention

  • Regular backup testing

  • Documentation maintenance

  • Staff training

  • Change management procedures

Implementation Strategies

  1. Multi-AZ Deployment

    • Cross-zone redundancy

    • Load balancing

    • Automatic failover

  2. Cross-Region Solutions

    • Geographic distribution

    • Data replication

    • Traffic routing

  3. Dependency Management

    • Service decoupling

    • Circuit breakers

    • Fallback mechanisms

Monitoring and Response

  • Real-time monitoring

  • Automated alerts

  • Incident response procedures

  • Regular testing and drills

Werner Vogels' Principle

"Everything fails all the time" - Werner Vogels, Amazon CTO

This principle emphasizes:

  • Assume failure will occur

  • Design for resilience

  • Plan for recovery

  • Test failure scenarios

Implementation Considerations

Cost vs. Reliability

  • Balance between protection and expense

  • Tiered recovery solutions

  • Risk-based investments

  • Cost-benefit analysis

Testing Requirements

  • Regular DR drills

  • Failure simulation

  • Recovery validation

  • Documentation updates

Compliance and Reporting

  • SLA monitoring

  • Incident documentation

  • Recovery metrics

  • Compliance validation