Cloud Infrastructure

AWS Cloud Migration: Zero Downtime, Six Months

Challenge: Managed hosting provider became a performance and support bottleneck with uneven performance and slow response times. Infrastructure wasn't scaling with business growth, requiring migration of 100% of infrastructure across 4 data centers to AWS without any downtime for revenue-generating systems serving billions in transactions.

← Back to Case Studies


Executive Summary

Led complete AWS cloud migration for affiliate marketing platform in 6 months with zero downtime for revenue-generating systems. Migrated 100% of infrastructure from managed hosting across 4 data centers to AWS, while simultaneously modernizing observability, introducing DevSecOps practices, and establishing deployment automation.

Key Results:

  • Zero downtime for revenue-generating systems handling $1B+ annual transactions
  • 100% infrastructure migrated in 6 months
  • 4 data centers → 1 cloud platform consolidation
  • 400%+ deployment frequency increase through GitOps automation
  • MTTR improved from days to <1 hour for critical systems

The Challenge

Business Context

Situation (2022):

  • Managed hosting provider became performance and support bottleneck
  • Uneven performance, slow support responses
  • Infrastructure not scaling with business needs
  • Risk of service disruption from vendor dependency

Executive Decision:

  • Migrate everything to AWS
  • Timeline: 6 months (aggressive)
  • Constraint: Zero tolerance for revenue disruption
  • Budget approved for migration consultants

Technical Scope

Infrastructure to Migrate:

  • 100% of production systems across 4 data centers
  • Mix of physical servers and VMware virtual machines
  • Legacy SQL Server databases (2012 Standard Edition)
  • Windows file clusters
  • Web services and application servers
  • Network infrastructure (firewalls, load balancers)

Scale:

  • Production systems handling $1B+ annual transaction volume
  • 24/7 operations with high availability requirements
  • Legacy systems with 15+ years of accumulated complexity

Risk Profile:

  • Revenue disruption unacceptable
  • Customer trust at stake
  • Competitive market sensitive to service quality
  • Acquisition integration pressures

The Approach

Phase 1: Strategy & Planning

Migration Strategy Evaluation:

Option A: Lift-and-Shift

  • Move infrastructure as-is to AWS
  • Minimal code changes
  • Fastest path to cloud
  • Selected for most systems

Option B: Re-architecture

  • Redesign for cloud-native patterns
  • Microservices, containers, serverless
  • Maximum cloud benefits
  • Too slow for 6-month timeline

Hybrid Approach Selected:

  • Lift-and-shift foundation
  • Targeted modernization where high-value
  • Phased approach enabling incremental improvement

Cost/Risk/Effort Assessment:

  • Produced detailed analysis for leadership
  • Mapped dependencies and migration waves
  • Identified technical debt requiring remediation
  • Established success criteria and risk thresholds

Consultant Onboarding:

  • Engaged external migration specialists
  • Provided architecture briefings on ShareASale systems
  • Shared documentation of quirks and gotchas
  • Established clear scope and deliverables

Phase 2: Observability Migration First

Why Observability First:

  • Can’t manage what you can’t measure
  • Establish visibility before making changes
  • Enable rapid incident response
  • Validate migration success with metrics

From LogicMonitor to DataDog + CloudWatch:

LogicMonitor Context:

  • Tied to managed hosting vendor being replaced
  • Adequate but not cloud-native
  • Migration opportunity to upgrade tooling

DataDog Benefits:

  • Awin Global standard (alignment with parent company)
  • Superior cloud-native integrations
  • Better APM and distributed tracing
  • More powerful query and alerting

Migration Execution:

  • Audited existing monitors and alerts
  • Identified gaps in coverage
  • Created new DataDog pipelines and dashboards
  • Integrated OpsGenie for on-call management
  • Established SLO-based monitoring

Severity Classifications:

  • Critical (Severity 1): Revenue-impacting, 15-min response SLA
  • High (Severity 2): Customer-facing issues, 1-hour response
  • Medium (Severity 3): Internal systems, 4-hour response
  • Low (Severity 4): Non-urgent, next business day

Result:

  • Comprehensive observability before migration began
  • Clear metrics for validating each migration wave
  • Rapid incident detection and response capability
  • Foundation for DevSecOps culture shift

Phase 3: Infrastructure Migration Waves

Wave 1: SQL Server → AWS RDS

Source System:

  • SQL Server 2012 Standard Edition
  • Aging hardware approaching capacity
  • Limited memory and I/O
  • Manual backup processes

Target Architecture:

  • AWS RDS SQL Server 2019 Enterprise Edition
  • Always On Availability Groups (high availability)
  • Automated backups with point-in-time recovery
  • Multi-AZ deployment for failover

Migration Process:

  • Schema and data profiling
  • Performance testing in staging
  • Phased cutover with fallback plan
  • Monitoring and validation

Wave 2: File Clusters → AWS FSx

Source System:

  • Windows file clusters on VMware
  • Shared storage for application assets
  • Manual replication processes
  • Limited disaster recovery

Target Architecture:

  • AWS FSx for Windows File Server
  • Multi-AZ deployment
  • Automated backups
  • Integration with Active Directory

Benefits:

  • High-performance file storage
  • Automatic scaling
  • Built-in disaster recovery
  • Simplified management

Wave 3: Web Services → EC2 + Auto Scaling

Source System:

  • Physical and virtual web servers
  • Manual scaling processes
  • Limited fault tolerance

Target Architecture:

  • EC2 instances with Auto Scaling Groups
  • Application Load Balancers (ALB)
  • Launch templates for consistency
  • Auto Scaling policies based on metrics

Modernization:

  • GitOps-based deployment workflows
  • Immutable infrastructure patterns
  • Blue/green deployment capability
  • Automated testing in pipeline

Wave 4: Network Configuration

Challenges:

  • Translate firewall rules to Security Groups
  • Load balancer configuration migration
  • Network segmentation design
  • VPN connectivity to legacy systems during transition

Solution:

  • VPC design with public/private subnets
  • Security Groups replacing firewalls
  • Network ACLs for additional protection
  • Transit Gateway for hybrid connectivity

Phase 4: Cutover & Validation

Phased Cutover Strategy:

  • Non-critical systems first
  • Canary deployments for production
  • 24/7 monitoring during transition
  • Rapid rollback capability

Contingency Planning:

  • Detailed rollback procedures
  • Communication plans for stakeholders
  • War room coordination during cutover
  • Post-mortem process for issues

Validation:

  • Synthetic monitoring for health checks
  • Performance comparison vs. baseline
  • Customer-facing testing
  • Business metric validation (transaction volumes, etc.)

Results

Migration Outcomes

Execution:

  • Zero downtime for revenue-generating systems
  • 100% infrastructure migrated successfully in 6 months
  • 4 data centers → 1 cloud platform (AWS us-east-1)
  • All waves completed on or ahead of schedule

Operational Improvements:

  • Removed dependency on dysfunctional managed services provider
  • Gained direct infrastructure control
  • Enabled rapid scaling and provisioning
  • Simplified disaster recovery

Recognition:

  • Featured in AWS case study (since removed from public site)
  • Executive satisfaction with delivery
  • Zero customer impact during migration
  • Team morale boost from successful delivery

Cultural Transformation

DevSecOps Practices:

  • Introduced “Shift Left” security principles
  • Code review and automated testing
  • Security scanning in CI/CD pipeline
  • Compliance-as-code patterns

GitOps Workflows:

  • Migrated from SVN to GitLab
  • Git-based deployment automation
  • Infrastructure as Code (Terraform concepts)
  • Version-controlled configuration

Deployment Automation:

  • 400%+ deployment frequency increase
  • Manual deployments → GitLab CI automation
  • Reduced human error through automation
  • Faster time-to-market for features

Incident Response:

  • MTTR: Days → <1 hour for critical systems
  • MTTR: Hours → ~5 minutes for highest-priority systems
  • Comprehensive monitoring and alerting
  • Runbooks and escalation procedures

Lessons Learned

What Worked Well

Observability First:

  • Having visibility before migration was critical
  • Enabled rapid issue detection and resolution
  • Validated migration success with metrics
  • Built confidence in migration process

Consultant Partnership:

  • External expertise accelerated execution
  • Knowledge transfer improved internal capability
  • Risk mitigation through experience
  • But required strong internal coordination

Phased Approach:

  • De-risked migration through incremental waves
  • Enabled learning and course corrections
  • Maintained business continuity
  • Rollback capability provided safety net

What I’d Do Differently

More Aggressive Modernization:

  • Lift-and-shift was safe but left technical debt
  • Could have re-architected more systems
  • Containers would have simplified operations
  • But timeline pressure justified pragmatism

Earlier Automation:

  • Some manual processes remained post-migration
  • Should have automated more during migration
  • Infrastructure as Code adoption could have been faster

Better Documentation:

  • Under time pressure, documentation lagged
  • Should have documented architecture decisions real-time
  • Knowledge transfer could have been more systematic

What This Demonstrates

For Cloud Migration Roles:

  • Complete AWS migration execution experience
  • Zero-downtime migration methodology
  • 6-month aggressive timeline delivered
  • Risk management and mitigation

For SRE / Infrastructure Roles:

  • High-availability architecture design
  • Disaster recovery planning and execution
  • Observability platform migration (DataDog, CloudWatch)
  • Incident response process establishment

For DevOps / Platform Engineering:

  • GitOps workflow implementation
  • CI/CD pipeline establishment
  • Deployment automation (400%+ frequency increase)
  • Infrastructure modernization

For Technical Leadership:

  • Multi-phase program execution
  • Consultant management and coordination
  • Stakeholder communication (C-level to engineers)
  • Cultural transformation (DevSecOps adoption)

Technologies Used

AWS Services:

  • RDS (SQL Server with Always On)
  • FSx (Windows File Server)
  • EC2 (Auto Scaling Groups, Launch Templates)
  • VPC (networking, Security Groups, NACLs)
  • Application Load Balancer
  • CloudWatch (monitoring and logging)

Observability:

  • DataDog (APM, logs, metrics, dashboards)
  • CloudWatch (AWS-native monitoring)
  • OpsGenie (incident management, on-call rotation)

Development & Deployment:

  • GitLab (source control, CI/CD)
  • GitLab CI (pipeline automation)
  • PowerShell (automation scripts)
  • Terraform concepts (infrastructure as code)

Migration Tools:

  • AWS Server Migration Service
  • Database migration utilities
  • Custom scripts for configuration replication

Contact

Led AWS cloud migrations or considering one for your organization? Let’s discuss zero-downtime strategies, observability-first approaches, and cultural transformation.

Get in Touch: stevenleve.com/contact
LinkedIn: linkedin.com/in/steve-leve


← Back to Case Studies