Cloud Infrastructure

AWS Cloud Migration: Zero Downtime, Six Months

Challenge: Managed hosting provider became a performance and support bottleneck with uneven performance and slow response times. Infrastructure wasn't scaling with business growth, requiring migration of 100% of infrastructure across 4 data centers to AWS without any downtime for revenue-generating systems serving billions in transactions.

← Back to Case Studies

Executive Summary

Led complete AWS cloud migration for affiliate marketing platform in 6 months with zero downtime for revenue-generating systems. Migrated 100% of infrastructure from managed hosting across 4 data centers to AWS, while simultaneously modernizing observability, introducing DevSecOps practices, and establishing deployment automation.

Key Results:

Zero downtime for revenue-generating systems handling $1B+ annual transactions
100% infrastructure migrated in 6 months
4 data centers → 1 cloud platform consolidation
400%+ deployment frequency increase through GitOps automation
MTTR improved from days to <1 hour for critical systems

The Challenge

Business Context

Situation (2022):

Managed hosting provider became performance and support bottleneck
Uneven performance, slow support responses
Infrastructure not scaling with business needs
Risk of service disruption from vendor dependency

Executive Decision:

Migrate everything to AWS
Timeline: 6 months (aggressive)
Constraint: Zero tolerance for revenue disruption
Budget approved for migration consultants

Technical Scope

Infrastructure to Migrate:

100% of production systems across 4 data centers
Mix of physical servers and VMware virtual machines
Legacy SQL Server databases (2012 Standard Edition)
Windows file clusters
Web services and application servers
Network infrastructure (firewalls, load balancers)

Scale:

Production systems handling $1B+ annual transaction volume
24/7 operations with high availability requirements
Legacy systems with 15+ years of accumulated complexity

Risk Profile:

Revenue disruption unacceptable
Customer trust at stake
Competitive market sensitive to service quality
Acquisition integration pressures

The Approach

Phase 1: Strategy & Planning

Migration Strategy Evaluation:

Option A: Lift-and-Shift

Move infrastructure as-is to AWS
Minimal code changes
Fastest path to cloud
Selected for most systems

Option B: Re-architecture

Redesign for cloud-native patterns
Microservices, containers, serverless
Maximum cloud benefits
Too slow for 6-month timeline

Hybrid Approach Selected:

Lift-and-shift foundation
Targeted modernization where high-value
Phased approach enabling incremental improvement

Cost/Risk/Effort Assessment:

Produced detailed analysis for leadership
Mapped dependencies and migration waves
Identified technical debt requiring remediation
Established success criteria and risk thresholds

Consultant Onboarding:

Engaged external migration specialists
Provided architecture briefings on ShareASale systems
Shared documentation of quirks and gotchas
Established clear scope and deliverables

Phase 2: Observability Migration First

Why Observability First:

Can’t manage what you can’t measure
Establish visibility before making changes
Enable rapid incident response
Validate migration success with metrics

From LogicMonitor to DataDog + CloudWatch:

LogicMonitor Context:

Tied to managed hosting vendor being replaced
Adequate but not cloud-native
Migration opportunity to upgrade tooling

DataDog Benefits:

Awin Global standard (alignment with parent company)
Superior cloud-native integrations
Better APM and distributed tracing
More powerful query and alerting

Migration Execution:

Audited existing monitors and alerts
Identified gaps in coverage
Created new DataDog pipelines and dashboards
Integrated OpsGenie for on-call management
Established SLO-based monitoring

Severity Classifications:

Critical (Severity 1): Revenue-impacting, 15-min response SLA
High (Severity 2): Customer-facing issues, 1-hour response
Medium (Severity 3): Internal systems, 4-hour response
Low (Severity 4): Non-urgent, next business day

Result:

Comprehensive observability before migration began
Clear metrics for validating each migration wave
Rapid incident detection and response capability
Foundation for DevSecOps culture shift

Phase 3: Infrastructure Migration Waves

Wave 1: SQL Server → AWS RDS

Source System:

SQL Server 2012 Standard Edition
Aging hardware approaching capacity
Limited memory and I/O
Manual backup processes

Target Architecture:

AWS RDS SQL Server 2019 Enterprise Edition
Always On Availability Groups (high availability)
Automated backups with point-in-time recovery
Multi-AZ deployment for failover

Migration Process:

Schema and data profiling
Performance testing in staging
Phased cutover with fallback plan
Monitoring and validation

Wave 2: File Clusters → AWS FSx

Source System:

Windows file clusters on VMware
Shared storage for application assets
Manual replication processes
Limited disaster recovery

Target Architecture:

AWS FSx for Windows File Server
Multi-AZ deployment
Automated backups
Integration with Active Directory

Benefits:

High-performance file storage
Automatic scaling
Built-in disaster recovery
Simplified management

Wave 3: Web Services → EC2 + Auto Scaling

Source System:

Physical and virtual web servers
Manual scaling processes
Limited fault tolerance

Target Architecture:

EC2 instances with Auto Scaling Groups
Application Load Balancers (ALB)
Launch templates for consistency
Auto Scaling policies based on metrics

Modernization:

GitOps-based deployment workflows
Immutable infrastructure patterns
Blue/green deployment capability
Automated testing in pipeline

Wave 4: Network Configuration

Challenges:

Translate firewall rules to Security Groups
Load balancer configuration migration
Network segmentation design
VPN connectivity to legacy systems during transition

Solution:

VPC design with public/private subnets
Security Groups replacing firewalls
Network ACLs for additional protection
Transit Gateway for hybrid connectivity

Phase 4: Cutover & Validation

Phased Cutover Strategy:

Non-critical systems first
Canary deployments for production
24/7 monitoring during transition
Rapid rollback capability

Contingency Planning:

Detailed rollback procedures
Communication plans for stakeholders
War room coordination during cutover
Post-mortem process for issues

Validation:

Synthetic monitoring for health checks
Performance comparison vs. baseline
Customer-facing testing
Business metric validation (transaction volumes, etc.)

Results

Migration Outcomes

Execution:

Zero downtime for revenue-generating systems
100% infrastructure migrated successfully in 6 months
4 data centers → 1 cloud platform (AWS us-east-1)
All waves completed on or ahead of schedule

Operational Improvements:

Removed dependency on dysfunctional managed services provider
Gained direct infrastructure control
Enabled rapid scaling and provisioning
Simplified disaster recovery

Recognition:

Featured in AWS case study (since removed from public site)
Executive satisfaction with delivery
Zero customer impact during migration
Team morale boost from successful delivery

Cultural Transformation

DevSecOps Practices:

Introduced “Shift Left” security principles
Code review and automated testing
Security scanning in CI/CD pipeline
Compliance-as-code patterns

GitOps Workflows:

Migrated from SVN to GitLab
Git-based deployment automation
Infrastructure as Code (Terraform concepts)
Version-controlled configuration

Deployment Automation:

400%+ deployment frequency increase
Manual deployments → GitLab CI automation
Reduced human error through automation
Faster time-to-market for features

Incident Response:

MTTR: Days → <1 hour for critical systems
MTTR: Hours → ~5 minutes for highest-priority systems
Comprehensive monitoring and alerting
Runbooks and escalation procedures

Lessons Learned

What Worked Well

Observability First:

Having visibility before migration was critical
Enabled rapid issue detection and resolution
Validated migration success with metrics
Built confidence in migration process

Consultant Partnership:

External expertise accelerated execution
Knowledge transfer improved internal capability
Risk mitigation through experience
But required strong internal coordination

Phased Approach:

De-risked migration through incremental waves
Enabled learning and course corrections
Maintained business continuity
Rollback capability provided safety net

What I’d Do Differently

More Aggressive Modernization:

Lift-and-shift was safe but left technical debt
Could have re-architected more systems
Containers would have simplified operations
But timeline pressure justified pragmatism

Earlier Automation:

Some manual processes remained post-migration
Should have automated more during migration
Infrastructure as Code adoption could have been faster

Better Documentation:

Under time pressure, documentation lagged
Should have documented architecture decisions real-time
Knowledge transfer could have been more systematic

What This Demonstrates

For Cloud Migration Roles:

Complete AWS migration execution experience
Zero-downtime migration methodology
6-month aggressive timeline delivered
Risk management and mitigation

For SRE / Infrastructure Roles:

High-availability architecture design
Disaster recovery planning and execution
Observability platform migration (DataDog, CloudWatch)
Incident response process establishment

For DevOps / Platform Engineering:

GitOps workflow implementation
CI/CD pipeline establishment
Deployment automation (400%+ frequency increase)
Infrastructure modernization

For Technical Leadership:

Multi-phase program execution
Consultant management and coordination
Stakeholder communication (C-level to engineers)
Cultural transformation (DevSecOps adoption)

Technologies Used

AWS Services:

RDS (SQL Server with Always On)
FSx (Windows File Server)
EC2 (Auto Scaling Groups, Launch Templates)
VPC (networking, Security Groups, NACLs)
Application Load Balancer
CloudWatch (monitoring and logging)

Observability:

DataDog (APM, logs, metrics, dashboards)
CloudWatch (AWS-native monitoring)
OpsGenie (incident management, on-call rotation)

Development & Deployment:

GitLab (source control, CI/CD)
GitLab CI (pipeline automation)
PowerShell (automation scripts)
Terraform concepts (infrastructure as code)

Migration Tools:

AWS Server Migration Service
Database migration utilities
Custom scripts for configuration replication

Contact

Led AWS cloud migrations or considering one for your organization? Let’s discuss zero-downtime strategies, observability-first approaches, and cultural transformation.

Get in Touch: stevenleve.com/contact
LinkedIn: linkedin.com/in/steve-leve

← Back to Case Studies