Designing Resilient API Gateways: Rate Limiting, Circuit Breakers & Observability Patterns
Why Your API Gateway Needs Resilience
An API gateway is the single entry point to your system. If it goes down, everything goes down. Building resilience into the gateway isn't optional — it's the difference between a cascading failure that takes down your entire platform and a graceful degradation that users barely notice.
Rate Limiting Strategies
Rate limiting protects downstream services from being overwhelmed. Here are three production-tested strategies:
1. Token Bucket (Most Common)
class TokenBucket {
private tokens: number;
private lastRefill: number;
constructor(
private capacity: number,
private refillRate: number, // tokens per second
) {
this.tokens = capacity;
this.lastRefill = Date.now();
}
tryConsume(): boolean {
this.refill();
if (this.tokens > 0) {
this.tokens--;
return true;
}
return false;
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed * this.refillRate,
);
this.lastRefill = now;
}
}
2. Sliding Window Log
More memory-intensive but provides precise control. Each request's timestamp is stored in a sorted set (Redis), and the window is computed on each request.
3. Adaptive Rate Limiting
The most sophisticated approach — adjust limits based on upstream latency:
function getDynamicLimit(upstreamLatency: number): number {
if (upstreamLatency < 100) return 1000; // Healthy
if (upstreamLatency < 500) return 500; // Degraded
if (upstreamLatency < 1000) return 200; // Stressed
return 50; // Overloaded
}
Circuit Breaker Pattern
The circuit breaker prevents cascading failures by failing fast when a downstream service is unhealthy.
Three States
┌──────────┐
│ CLOSED │──────── Success ──────►
└────┬─────┘
│ Failures > threshold
▼
┌──────────┐
│ OPEN │───── Timeout ──────────►
└────┬─────┘
│ Half-open success
▼
┌──────────┐
│ HALF_OPEN│
└──────────┘
Implementation
@Component
public class CircuitBreaker {
private final AtomicInteger failureCount = new AtomicInteger(0);
private final AtomicReference<State> state =
new AtomicReference<>(State.CLOSED);
private volatile long openedAt;
private static final int THRESHOLD = 5;
private static final long TIMEOUT_MS = 30_000;
public boolean isAllowed() {
State current = state.get();
if (current == State.OPEN) {
if (System.currentTimeMillis() - openedAt > TIMEOUT_MS) {
state.compareAndSet(State.OPEN, State.HALF_OPEN);
return true;
}
return false;
}
return true;
}
public void onSuccess() {
state.set(State.CLOSED);
failureCount.set(0);
}
public void onFailure() {
if (failureCount.incrementAndGet() >= THRESHOLD) {
state.set(State.OPEN);
openedAt = System.currentTimeMillis();
}
}
enum State { CLOSED, OPEN, HALF_OPEN }
}
Observability: The Three Pillars
Logging (Structured)
Every request through the gateway gets a correlation ID and structured JSON logs:
{
"correlationId": "req_abc123",
"method": "POST",
"path": "/api/transactions",
"upstreamLatency": 142,
"gatewayLatency": 3,
"statusCode": 200,
"rateLimitRemaining": 42
}
Metrics (Prometheus)
Key metrics to export:
Tracing (Distributed)
Use W3C Trace-Context headers to propagate traces across service boundaries. This is critical for debugging latency issues in multi-hop requests.
Production Lessons
1. Always add jitter to retry intervals — Without jitter, synchronized retries can cause thundering herd problems 2. Rate limit at the gateway AND the service — Defense in depth 3. Circuit breakers need manual overrides — For emergency access during incidents 4. Test with chaos engineering — Netflix's Chaos Monkey taught us that if you don't test failures, they will test you