Why am I getting 429 Rate Limit Errors during load testing on self-hosted LangSmith?

Last updated: December 9, 2025

Problem

When running load tests against a self-hosted LangSmith instance, you may encounter rate limit errors like:

langsmith.utils.LangSmithRateLimitError: Rate limit exceeded for https://your-langsmith-url/api/v1/runs/multipart
HTTPError('429 Client Error: Too Many Requests for url: https://your-langsmith-url/api/v1/runs/multipart', '')

This can also manifest as missing traces in the LangSmith UI, where traces appear incomplete or fail to be recorded.

Cause

While LangSmith Cloud has documented rate limits (6000 requests/10 seconds for the /runs/multipart endpoint), 429 errors in self-hosted deployments are often caused by infrastructure components, not LangSmith itself.

Common sources of 429 errors in self-hosted environments:

  1. Web Application Firewall (WAF) - Rate limiting rules blocking high-volume traffic

  2. Load Balancer - Request rate limits or connection limits

  3. Ingress Controller - nginx or other ingress rate limiting configurations

  4. API Gateway - If using an API gateway in front of LangSmith

Solution

Step 1: Identify the Source of 429 Errors

Check if LangSmith is actually generating the 429 errors:

  1. Review LangSmith backend pod logs for 429 responses

  2. If no 429s appear in LangSmith logs, the error is coming from an upstream infrastructure component

Step 2: Check Infrastructure Rate Limits

WAF (Web Application Firewall):

  • Review WAF rules for request rate limits

  • Check for blocked requests in WAF logs/metrics

  • Example: A WAF rule blocking IPs that send >10,000 requests in a 5-minute window

Load Balancer (AWS ALB, etc.):

  • Check for rate limiting or throttling configurations

  • Review connection limits and request quotas

Ingress Controller:

  • Check nginx ingress annotations for rate limiting:

    nginx.ingress.kubernetes.io/limit-rps: "100"
    nginx.ingress.kubernetes.io/limit-connections: "10"

Step 3: Adjust Infrastructure Settings

Once identified, either:

  1. Allowlist LangSmith traffic from rate limiting rules

  2. Increase rate limit thresholds to accommodate load testing volumes

  3. Disable rate limiting for internal/trusted traffic sources during load tests

Step 4: Scale LangSmith for High Throughput

For sustained high-throughput workloads, ensure your self-hosted LangSmith is properly scaled:

# Helm values.yaml for high-throughput scenarios
platformBackend:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70

queue:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 8
    targetCPUUtilizationPercentage: 70

backend:
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 5
    targetCPUUtilizationPercentage: 70

Important Notes

  • Self-hosted LangSmith's internal rate limits are not currently configurable via helm values

  • Running a single backend pod is insufficient for load testing scenarios

  • Keep your LangSmith deployment up to date, as newer versions include performance improvements and rate limiter fixes

  • The LangSmith Python SDK includes retry logic for transient 429 errors, but sustained rate limiting will still cause trace loss

Related Resources