Configuring AWS ALB Timeouts

Author’s Note: I recently encountered this exact library issue. After solving it, I used Claude AI to help me structure this troubleshooting guide based on my experience. The problem and solution are genuine - I just wanted to create a quick resource that others could benefit from.

ALB Idle Timeout Configuration: Best Practices to Prevent 5xx Errors

If you’ve ever encountered mysterious 502 Bad Gateway or 504 Gateway Timeout errors from your AWS Application Load Balancer (ALB), you’re not alone. Many of these issues stem from improperly configured timeout settings between your ALB and backend applications. In this article, we’ll explore the best practices for configuring ALB idle timeouts.

Understanding the Problem

AWS Application Load Balancers manage connections between clients and your backend applications. When timeout configurations are misaligned, you create race conditions that result in intermittent 5xx errors and degraded user experience.

The root cause typically involves two types of timeouts that must work in harmony:

Request timeouts: How long to wait for a single request/response cycle
Keep-alive (idle) timeouts: How long to maintain connections between requests

Request Timeout Best Practices

The Golden Rule: Application Timeout < ALB Timeout

Your application’s request timeout should always be shorter than your ALB’s timeout. Here’s the recommended hierarchy:

ALB Request Timeout: 60 seconds (default)
Application Timeout: 50-55 seconds
Database/External API Timeout: 45 seconds or less

Why This Matters

When your application timeout equals or exceeds the ALB timeout, you create a dangerous race condition:

Request arrives at ALB and forwards to your application
Both timers start counting down simultaneously
Application timeout hits first and closes the connection
ALB is still waiting for a response when connection terminates
ALB generates 5xx error instead of receiving your application’s graceful error response

Keep-Alive Timeout Best Practices

The Reverse Rule: Application Keep-Alive > ALB Idle Timeout

Unlike request timeouts, keep-alive timeouts follow the opposite pattern:

ALB Idle Timeout: 60 seconds (default)
Application Keep-Alive: 65+ seconds
Database Connection Pool Idle: 70+ seconds

Why Application Keep-Alive Should Be Higher

If your application’s keep-alive timeout is shorter than ALB’s idle timeout:

Client completes request through ALB to application
Application closes connection after its shorter keep-alive period
ALB still considers connection active and attempts reuse
Next request fails with 502 Bad Gateway because connection is closed

Advanced Configuration Scenarios

High-Traffic Applications

For applications handling high request volumes:

ALB Idle Timeout: 30 seconds (reduced)
Application Keep-Alive: 35 seconds
Request Timeout: 25 seconds

Shorter timeouts free up connections faster and improve resource utilization.

Long-Running Operations

For applications with legitimate long-running operations:

ALB Request Timeout: 300 seconds (5 minutes)
Application Timeout: 280 seconds
Keep-Alive: 310 seconds

Consider moving truly long operations to asynchronous processing with status polling endpoints.

Microservices Architecture

For service-to-service communication:

External ALB (Internet-facing):
  - Request Timeout: 60s
  - Idle Timeout: 60s

Internal ALB (Service-to-service):
  - Request Timeout: 30s
  - Idle Timeout: 30s

Application Configuration:
  - Request Timeout: 25s
  - Keep-Alive: 35s

Monitoring and Troubleshooting

Key Metrics to Monitor

ALB CloudWatch Metrics:

HTTPCode_ELB_5XX_Count: Track 5xx errors from ALB
TargetResponseTime: Monitor backend response times
ActiveConnectionCount: Watch connection patterns

Application Metrics:

Connection pool utilization
Request timeout occurrences
Keep-alive connection reuse rates

Common Error Patterns

502 Bad Gateway:

Often indicates application closed connection unexpectedly
Check: Application keep-alive < ALB idle timeout
Check: Application crashed or became unresponsive

504 Gateway Timeout:

ALB timeout expired waiting for response
Check: Application timeout ≥ ALB request timeout
Check: Long-running operations without proper timeout handling

Debugging Commands

Test connection behavior:

# Test keep-alive behavior
curl -H "Connection: keep-alive" -v http://your-alb-endpoint/

# Monitor connection reuse
curl -H "Connection: keep-alive" -v \
  http://your-alb-endpoint/endpoint1 \
  http://your-alb-endpoint/endpoint2

Check ALB configuration:

aws elbv2 describe-load-balancer-attributes \
  --load-balancer-arn arn:aws:elasticloadbalancing:...

Best Practices Summary

Request Timeouts: Application timeout should be 10-15% shorter than ALB timeout
Keep-Alive Timeouts: Application keep-alive should be 5-10 seconds longer than ALB idle timeout
Monitor Religiously: Set up alerts for 5xx error rates and connection metrics
Test Thoroughly: Use load testing tools to validate timeout behavior under various conditions
Document Configuration: Maintain clear documentation of timeout values across your infrastructure
Gradual Changes: Modify timeout values incrementally and monitor impact

Conclusion

Proper timeout configuration is crucial for maintaining reliable web applications behind AWS Application Load Balancers. By following the principles outlined in this guide—keeping application request timeouts shorter than ALB timeouts and application keep-alive timeouts longer than ALB idle timeouts—you can eliminate most timeout-related 5xx errors.

Remember that timeout configuration is not a one-size-fits-all solution. Consider your application’s specific requirements, traffic patterns, and operational constraints when implementing these best practices. Regular monitoring and testing will help you fine-tune these settings for optimal performance and reliability.

The investment in properly configuring these timeouts will pay dividends in improved user experience, reduced operational overhead, and fewer 3 AM troubleshooting sessions.