Why am I getting 429 Rate Limit Errors during load testing on self-hosted LangSmith?

Last updated: December 9, 2025

Problem

When running load tests against a self-hosted LangSmith instance, you may encounter rate limit errors like:

langsmith.utils.LangSmithRateLimitError: Rate limit exceeded for https://your-langsmith-url/api/v1/runs/multipart
HTTPError('429 Client Error: Too Many Requests for url: https://your-langsmith-url/api/v1/runs/multipart', '')

This can also manifest as missing traces in the LangSmith UI, where traces appear incomplete or fail to be recorded.

Cause

While LangSmith Cloud has documented rate limits (6000 requests/10 seconds for the /runs/multipart endpoint), 429 errors in self-hosted deployments are often caused by infrastructure components, not LangSmith itself.

Common sources of 429 errors in self-hosted environments:

Web Application Firewall (WAF) - Rate limiting rules blocking high-volume traffic
Load Balancer - Request rate limits or connection limits
Ingress Controller - nginx or other ingress rate limiting configurations
API Gateway - If using an API gateway in front of LangSmith

Solution

Step 1: Identify the Source of 429 Errors

Check if LangSmith is actually generating the 429 errors:

Review LangSmith backend pod logs for 429 responses
If no 429s appear in LangSmith logs, the error is coming from an upstream infrastructure component

Step 2: Check Infrastructure Rate Limits

WAF (Web Application Firewall):

Review WAF rules for request rate limits
Check for blocked requests in WAF logs/metrics
Example: A WAF rule blocking IPs that send >10,000 requests in a 5-minute window

Load Balancer (AWS ALB, etc.):

Check for rate limiting or throttling configurations
Review connection limits and request quotas

Ingress Controller:

Check nginx ingress annotations for rate limiting:

nginx.ingress.kubernetes.io/limit-rps: "100"
nginx.ingress.kubernetes.io/limit-connections: "10"

Step 3: Adjust Infrastructure Settings

Once identified, either:

Allowlist LangSmith traffic from rate limiting rules
Increase rate limit thresholds to accommodate load testing volumes
Disable rate limiting for internal/trusted traffic sources during load tests

Step 4: Scale LangSmith for High Throughput

For sustained high-throughput workloads, ensure your self-hosted LangSmith is properly scaled:

# Helm values.yaml for high-throughput scenarios
platformBackend:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70

queue:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 8
    targetCPUUtilizationPercentage: 70

backend:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 5
    targetCPUUtilizationPercentage: 70

Important Notes

Self-hosted LangSmith's internal rate limits are not currently configurable via helm values
Running a single backend pod is insufficient for load testing scenarios
Keep your LangSmith deployment up to date, as newer versions include performance improvements and rate limiter fixes
The LangSmith Python SDK includes retry logic for transient 429 errors, but sustained rate limiting will still cause trace loss