AWS Cloud Migration: Zero Downtime, Six Months
Executive Summary
Led complete AWS cloud migration for affiliate marketing platform in 6 months with zero downtime for revenue-generating systems. Migrated 100% of infrastructure from managed hosting across 4 data centers to AWS, while simultaneously modernizing observability, introducing DevSecOps practices, and establishing deployment automation.
Key Results:
- Zero downtime for revenue-generating systems handling $1B+ annual transactions
- 100% infrastructure migrated in 6 months
- 4 data centers → 1 cloud platform consolidation
- 400%+ deployment frequency increase through GitOps automation
- MTTR improved from days to <1 hour for critical systems
The Challenge
Business Context
Situation (2022):
- Managed hosting provider became performance and support bottleneck
- Uneven performance, slow support responses
- Infrastructure not scaling with business needs
- Risk of service disruption from vendor dependency
Executive Decision:
- Migrate everything to AWS
- Timeline: 6 months (aggressive)
- Constraint: Zero tolerance for revenue disruption
- Budget approved for migration consultants
Technical Scope
Infrastructure to Migrate:
- 100% of production systems across 4 data centers
- Mix of physical servers and VMware virtual machines
- Legacy SQL Server databases (2012 Standard Edition)
- Windows file clusters
- Web services and application servers
- Network infrastructure (firewalls, load balancers)
Scale:
- Production systems handling $1B+ annual transaction volume
- 24/7 operations with high availability requirements
- Legacy systems with 15+ years of accumulated complexity
Risk Profile:
- Revenue disruption unacceptable
- Customer trust at stake
- Competitive market sensitive to service quality
- Acquisition integration pressures
The Approach
Phase 1: Strategy & Planning
Migration Strategy Evaluation:
Option A: Lift-and-Shift
- Move infrastructure as-is to AWS
- Minimal code changes
- Fastest path to cloud
- Selected for most systems
Option B: Re-architecture
- Redesign for cloud-native patterns
- Microservices, containers, serverless
- Maximum cloud benefits
- Too slow for 6-month timeline
Hybrid Approach Selected:
- Lift-and-shift foundation
- Targeted modernization where high-value
- Phased approach enabling incremental improvement
Cost/Risk/Effort Assessment:
- Produced detailed analysis for leadership
- Mapped dependencies and migration waves
- Identified technical debt requiring remediation
- Established success criteria and risk thresholds
Consultant Onboarding:
- Engaged external migration specialists
- Provided architecture briefings on ShareASale systems
- Shared documentation of quirks and gotchas
- Established clear scope and deliverables
Phase 2: Observability Migration First
Why Observability First:
- Can’t manage what you can’t measure
- Establish visibility before making changes
- Enable rapid incident response
- Validate migration success with metrics
From LogicMonitor to DataDog + CloudWatch:
LogicMonitor Context:
- Tied to managed hosting vendor being replaced
- Adequate but not cloud-native
- Migration opportunity to upgrade tooling
DataDog Benefits:
- Awin Global standard (alignment with parent company)
- Superior cloud-native integrations
- Better APM and distributed tracing
- More powerful query and alerting
Migration Execution:
- Audited existing monitors and alerts
- Identified gaps in coverage
- Created new DataDog pipelines and dashboards
- Integrated OpsGenie for on-call management
- Established SLO-based monitoring
Severity Classifications:
- Critical (Severity 1): Revenue-impacting, 15-min response SLA
- High (Severity 2): Customer-facing issues, 1-hour response
- Medium (Severity 3): Internal systems, 4-hour response
- Low (Severity 4): Non-urgent, next business day
Result:
- Comprehensive observability before migration began
- Clear metrics for validating each migration wave
- Rapid incident detection and response capability
- Foundation for DevSecOps culture shift
Phase 3: Infrastructure Migration Waves
Wave 1: SQL Server → AWS RDS
Source System:
- SQL Server 2012 Standard Edition
- Aging hardware approaching capacity
- Limited memory and I/O
- Manual backup processes
Target Architecture:
- AWS RDS SQL Server 2019 Enterprise Edition
- Always On Availability Groups (high availability)
- Automated backups with point-in-time recovery
- Multi-AZ deployment for failover
Migration Process:
- Schema and data profiling
- Performance testing in staging
- Phased cutover with fallback plan
- Monitoring and validation
Wave 2: File Clusters → AWS FSx
Source System:
- Windows file clusters on VMware
- Shared storage for application assets
- Manual replication processes
- Limited disaster recovery
Target Architecture:
- AWS FSx for Windows File Server
- Multi-AZ deployment
- Automated backups
- Integration with Active Directory
Benefits:
- High-performance file storage
- Automatic scaling
- Built-in disaster recovery
- Simplified management
Wave 3: Web Services → EC2 + Auto Scaling
Source System:
- Physical and virtual web servers
- Manual scaling processes
- Limited fault tolerance
Target Architecture:
- EC2 instances with Auto Scaling Groups
- Application Load Balancers (ALB)
- Launch templates for consistency
- Auto Scaling policies based on metrics
Modernization:
- GitOps-based deployment workflows
- Immutable infrastructure patterns
- Blue/green deployment capability
- Automated testing in pipeline
Wave 4: Network Configuration
Challenges:
- Translate firewall rules to Security Groups
- Load balancer configuration migration
- Network segmentation design
- VPN connectivity to legacy systems during transition
Solution:
- VPC design with public/private subnets
- Security Groups replacing firewalls
- Network ACLs for additional protection
- Transit Gateway for hybrid connectivity
Phase 4: Cutover & Validation
Phased Cutover Strategy:
- Non-critical systems first
- Canary deployments for production
- 24/7 monitoring during transition
- Rapid rollback capability
Contingency Planning:
- Detailed rollback procedures
- Communication plans for stakeholders
- War room coordination during cutover
- Post-mortem process for issues
Validation:
- Synthetic monitoring for health checks
- Performance comparison vs. baseline
- Customer-facing testing
- Business metric validation (transaction volumes, etc.)
Results
Migration Outcomes
Execution:
- Zero downtime for revenue-generating systems
- 100% infrastructure migrated successfully in 6 months
- 4 data centers → 1 cloud platform (AWS us-east-1)
- All waves completed on or ahead of schedule
Operational Improvements:
- Removed dependency on dysfunctional managed services provider
- Gained direct infrastructure control
- Enabled rapid scaling and provisioning
- Simplified disaster recovery
Recognition:
- Featured in AWS case study (since removed from public site)
- Executive satisfaction with delivery
- Zero customer impact during migration
- Team morale boost from successful delivery
Cultural Transformation
DevSecOps Practices:
- Introduced “Shift Left” security principles
- Code review and automated testing
- Security scanning in CI/CD pipeline
- Compliance-as-code patterns
GitOps Workflows:
- Migrated from SVN to GitLab
- Git-based deployment automation
- Infrastructure as Code (Terraform concepts)
- Version-controlled configuration
Deployment Automation:
- 400%+ deployment frequency increase
- Manual deployments → GitLab CI automation
- Reduced human error through automation
- Faster time-to-market for features
Incident Response:
- MTTR: Days → <1 hour for critical systems
- MTTR: Hours → ~5 minutes for highest-priority systems
- Comprehensive monitoring and alerting
- Runbooks and escalation procedures
Lessons Learned
What Worked Well
Observability First:
- Having visibility before migration was critical
- Enabled rapid issue detection and resolution
- Validated migration success with metrics
- Built confidence in migration process
Consultant Partnership:
- External expertise accelerated execution
- Knowledge transfer improved internal capability
- Risk mitigation through experience
- But required strong internal coordination
Phased Approach:
- De-risked migration through incremental waves
- Enabled learning and course corrections
- Maintained business continuity
- Rollback capability provided safety net
What I’d Do Differently
More Aggressive Modernization:
- Lift-and-shift was safe but left technical debt
- Could have re-architected more systems
- Containers would have simplified operations
- But timeline pressure justified pragmatism
Earlier Automation:
- Some manual processes remained post-migration
- Should have automated more during migration
- Infrastructure as Code adoption could have been faster
Better Documentation:
- Under time pressure, documentation lagged
- Should have documented architecture decisions real-time
- Knowledge transfer could have been more systematic
What This Demonstrates
For Cloud Migration Roles:
- Complete AWS migration execution experience
- Zero-downtime migration methodology
- 6-month aggressive timeline delivered
- Risk management and mitigation
For SRE / Infrastructure Roles:
- High-availability architecture design
- Disaster recovery planning and execution
- Observability platform migration (DataDog, CloudWatch)
- Incident response process establishment
For DevOps / Platform Engineering:
- GitOps workflow implementation
- CI/CD pipeline establishment
- Deployment automation (400%+ frequency increase)
- Infrastructure modernization
For Technical Leadership:
- Multi-phase program execution
- Consultant management and coordination
- Stakeholder communication (C-level to engineers)
- Cultural transformation (DevSecOps adoption)
Technologies Used
AWS Services:
- RDS (SQL Server with Always On)
- FSx (Windows File Server)
- EC2 (Auto Scaling Groups, Launch Templates)
- VPC (networking, Security Groups, NACLs)
- Application Load Balancer
- CloudWatch (monitoring and logging)
Observability:
- DataDog (APM, logs, metrics, dashboards)
- CloudWatch (AWS-native monitoring)
- OpsGenie (incident management, on-call rotation)
Development & Deployment:
- GitLab (source control, CI/CD)
- GitLab CI (pipeline automation)
- PowerShell (automation scripts)
- Terraform concepts (infrastructure as code)
Migration Tools:
- AWS Server Migration Service
- Database migration utilities
- Custom scripts for configuration replication
Contact
Led AWS cloud migrations or considering one for your organization? Let’s discuss zero-downtime strategies, observability-first approaches, and cultural transformation.
Get in Touch: stevenleve.com/contact
LinkedIn: linkedin.com/in/steve-leve