MarTech

Tracking System: 250M Monthly Requests, 263ms Latency

Challenge: ShareASale's tracking system was responsible for capturing every affiliate transaction—the source of the company's revenue. During platform consolidation following acquisition, the tracking system required modernization to maintain reliability at scale (250M monthly requests) while preserving accuracy and minimizing latency for real-time commission attribution.

← Back to Case Studies

Executive Summary

Designed multi-layer reverse proxy architecture enabling ShareASale’s revenue-critical tracking system to remain functional during platform consolidation while appearing unchanged to 250K+ publisher integrations. Built production system bridging two incompatible platforms using Cloudflare Workers + AWS Lambda, maintaining 263ms average latency at 250M monthly requests with 99.5% revenue protection and 75% error rate reduction.

Key Results:

250M monthly requests handled seamlessly
263ms average latency maintained
99.5% revenue protection achieved
75% error rate reduction through improved fault tolerance
Zero customer re-integration required

The Challenge

Mission-Critical System

Business Context: ShareASale’s tracking system was responsible for capturing every affiliate transaction—the source of the company’s revenue. During platform consolidation, this system needed to:

Continue operating through multi-year migration
Maintain 99.9%+ uptime (every dropped transaction = lost revenue)
Require zero customer re-integration (250K+ publisher sites)
Support 250M monthly tracking requests
Preserve backward compatibility with legacy parameters

Technical Reality:

ShareASale platform: Custom tracking URLs, parameters, cookie logic
Awin platform: Different URL structure, parameters, attribution model
Integration points: 250K+ publisher websites with hardcoded tracking pixels
Business constraint: Publishers cannot be asked to update their integrations

The Problem

Incompatible Platforms:

ShareASale tracking request:

https://shareasale.com/r.cfm?b=123&u=456&m=789&urllink=example.com

Awin platform expected completely different structure, parameters, and processing logic.

Cannot Change:

250,000+ publisher websites have ShareASale tracking installed
JavaScript tracking pixels embedded across millions of pages
Deep links in emails, social media, and marketing materials
Asking publishers to re-integrate = massive customer disruption

Must Work:

Every tracking request must complete successfully
Attribution logic must remain consistent
Performance cannot degrade
Revenue cannot be impacted

The Solution: Multi-Layer Reverse Proxy

Architecture Overview

[Publisher Site] → [Cloudflare Edge]
                         ↓
                  [Routing Logic]
                         ↓
        ┌────────────────┴────────────────┐
        ↓                                 ↓
  [Awin Platform]                [ShareASale Legacy]
    (Primary)                         (Failover)

Architecture Layers:

Layer 1: Cloudflare (Edge)

WAF rules for traffic filtering
DDoS protection and rate limiting
Initial routing decisions based on rules
Custom Workers for intelligent request handling

Layer 2: Akamai (CDN)

Global traffic distribution
Caching for static content
Secondary failover routing
Geographic load balancing

Layer 3: AWS (Application)

Lambda functions for request transformation
Parameter translation ShareASale ↔ Awin
Retry and failover logic
Integration with Awin backend systems

Cloudflare Workers Implementation

Custom Routing Logic:

// Conceptual implementation
export default {
  async fetch(request) {
    const url = new URL(request.url);
    
    // Parse ShareASale tracking parameters
    const params = {
      merchant: url.searchParams.get('m'),
      affiliate: url.searchParams.get('u'),
      banner: url.searchParams.get('b'),
      destination: url.searchParams.get('urllink')
    };
    
    // Translate to Awin format
    const awinParams = translateParameters(params);
    
    // Route to Awin platform
    try {
      const response = await fetch(awinPlatformUrl, {
        method: 'GET',
        headers: buildHeaders(awinParams)
      });
      
      if (response.ok) {
        return response;
      } else {
        // Failover to ShareASale legacy
        return await failoverToLegacy(params);
      }
    } catch (error) {
      // Handle errors, retry logic
      return await handleError(error, params);
    }
  }
};

Key Features:

Parameter translation: ShareASale → Awin format
Intelligent routing: Primary (Awin) with failover (ShareASale)
Retry logic: Automatic retries on transient failures
Error handling: Graceful degradation
Logging: Request tracking for debugging

Cloudflare Services Used:

Workers: Edge compute for routing logic
KV: Configuration storage (routing rules, mappings)
D1: Request logging and analytics
Wrangler: Safe CI/CD deployment

AWS Lambda Integration

Awin-Side Processing:

On the Awin platform side, AWS Lambda functions handled:

Request validation: Ensure required parameters present
Data transformation: Convert ShareASale data structures to Awin
Attribution logic: Apply correct commission rules
Database writes: Store transaction data in Awin systems
Response generation: Return appropriate tracking pixel

Technology Stack:

Lambda: TypeScript functions
API Gateway: HTTP endpoint management
DynamoDB: Fast key-value lookups
RDS: Transaction storage
CloudWatch: Logging and monitoring

Integration Points:

Cloudflare Workers call AWS API Gateway
Lambda functions process and route internally
Awin backend systems receive standardized data
Response flows back through proxy layers

Observability & Monitoring

Comprehensive Monitoring:

DataDog:

Distributed tracing across all layers (Cloudflare → AWS → Awin)
Request latency percentiles (p50, p95, p99)
Error rate tracking by type
Custom dashboards for tracking health
Anomaly detection and alerting

CloudWatch:

Lambda execution metrics
API Gateway performance
RDS query performance
Cost tracking

OpsGenie:

Incident escalation and on-call rotation
Integration with DataDog alerts
Escalation policies by severity
Post-mortem workflow

Custom Dashboards:

Real-time request volume
Latency trends
Error rates by category
Failover frequency
Geographic distribution

Synthetic Monitoring:

Health checks every 5 minutes
End-to-end tracking request simulation
Alerting on failures
Geographic diversity (multiple regions tested)

Technical Challenges Solved

Parameter Translation

Challenge: ShareASale and Awin used completely different parameter structures and naming.

Solution: Built translation layer with comprehensive mapping:

const parameterMap = {
  // ShareASale → Awin
  'm': 'advertiser_id',
  'u': 'publisher_id', 
  'b': 'creative_id',
  'urllink': 'destination_url',
  'afftrack': 'tracking_reference'
  // ... dozens more parameters
};

Complexity:

50+ parameters to map
Different data types (integers, strings, URLs)
Conditional logic (some parameters only in certain contexts)
Validation rules (ensure data integrity)

Challenge: ShareASale and Awin used different cookie structures for attribution.

Solution:

Read ShareASale cookies at edge
Translate to Awin cookie format
Set both cookie formats during transition
Gradually phase out ShareASale cookies

Considerations:

Cookie expiration handling
Domain scope differences
GDPR/privacy compliance
Cross-device attribution

High Availability

Challenge: 99.9%+ uptime requirement with zero tolerance for revenue loss.

Solution: Multi-Layer Failover

Primary Path: Publisher Site → Cloudflare → Awin Platform

Failover Path 1: If Awin fails → Route to ShareASale Legacy

Failover Path 2: If both fail → Return tracking pixel with retry queue

Retry Logic:

Exponential backoff for transient failures
Circuit breaker for persistent failures
Dead letter queue for manual recovery

Result:

99.5% revenue protection (nearly perfect)
No single point of failure
Graceful degradation under stress

Results

Performance Metrics

Scale:

250M monthly tracking requests
~95,000 requests per hour average
~1,600 requests per minute sustained
Peaks 3-5x average during holidays

Latency:

263ms average response time
p50: 180ms
p95: 450ms
p99: 800ms

Availability:

99.5% revenue protection (effectively 99.5% uptime)
Zero scheduled downtime
Unplanned outages < 4 hours annually

Reliability Improvements

Error Rate Reduction:

75% error rate reduction vs. legacy system
Improved retry logic caught transient failures
Better error handling prevented cascading failures
Circuit breakers protected against overload

Fault Tolerance:

Multi-layer failover prevented revenue loss
Graceful degradation under stress
No single point of failure
Automatic recovery from transient issues

Business Impact

Revenue Protection:

99.5% of transactions captured successfully
$100M+ annual revenue protected
Zero customer disruption required
Seamless migration from customer perspective

Customer Experience:

250K+ publishers: Zero re-integration required
10K+ merchants: Uninterrupted tracking
No change to user-facing experience
Maintained trust during complex migration

Migration Enablement:

Bought time for permanent Awin solution development
Allowed phased migration approach
Reduced migration risk significantly
Protected business continuity

Lessons Learned

What Worked Well

Multi-Layer Approach:

Defense in depth provided resilience
Multiple failover options prevented outages
Flexibility to iterate on routing logic

Edge Computing:

Cloudflare Workers = global low latency
Zero cold starts (vs. AWS Lambda)
Easy deployment and rollback
Cost-effective at scale

Comprehensive Monitoring:

Early detection of issues
Fast incident response
Data-driven optimization
Clear visibility for stakeholders

What I’d Do Differently

Earlier Load Testing:

Should have load tested at full scale earlier
Discovered some bottlenecks in production
More aggressive synthetic traffic testing

More Granular Alerts:

Initial alerts sometimes too noisy
Refined over time but could have been better day 1
More context in alert messages

Documentation:

Runbooks created reactively
Should have written comprehensive docs upfront
Knowledge transfer could have been smoother

What This Demonstrates

For Tracking & Attribution Roles:

Revenue-critical system design and implementation
High-availability architecture (99.5% revenue protection)
Cookie-based attribution logic
Cross-platform tracking integration

For Platform Engineering:

Distributed systems at scale (250M requests/month)
Multi-layer reverse proxy architecture
Edge computing with Cloudflare Workers
AWS Lambda serverless integration

For Site Reliability Engineering:

High-availability system design
Comprehensive observability (DataDog, CloudWatch)
Incident response and on-call processes
Performance optimization (263ms at scale)

For Technical Leadership:

Complex system integration across organizations
Risk management for revenue-critical systems
Stakeholder communication (business + technical)
Interim solution enabling long-term strategy

Technologies Used

Edge Layer:

Cloudflare Workers (JavaScript/Node.js)
Cloudflare KV (configuration storage)
Cloudflare D1 (request logging)
Cloudflare WAF (security)
Wrangler (deployment)

CDN Layer:

Akamai (global distribution)
Caching and routing rules

Application Layer:

AWS Lambda (TypeScript)
AWS API Gateway
AWS DynamoDB
AWS RDS (PostgreSQL)

Observability:

DataDog (tracing, metrics, logs)
CloudWatch (AWS monitoring)
OpsGenie (incident management)
Custom dashboards

Contact

Building revenue-critical tracking systems or complex platform integrations? Let’s discuss high-availability architecture, edge computing, and systems that protect business continuity.

Get in Touch: stevenleve.com/contact
LinkedIn: linkedin.com/in/steve-leve