What are the hardware specifications and resource limits for LangSmith Deployments?

Last updated: December 29, 2025

Context

Users deploying applications on LangSmith Deployment need to understand the available hardware resources and scaling capabilities to properly plan their deployments and ensure optimal performance for their workloads.

Answer

Each LangSmith Deployment runs with the following specifications:

  • CPU: 2 cores per container

  • Memory: 2 GB RAM per container

  • Containers: Autoscales up to 10 containers

Important considerations:

  • Each worker is limited to 10 concurrent runs by default (N_JOBS_PER_WORKER)

  • Autoscaling is triggered based on:

    • CPU usage (75% threshold)

    • Memory usage (75% threshold)

    • Pending runs (~10 per container)

  • Connection timeouts: Long-running requests (>10 minutes) may encounter timeout errors. Configure appropriate HTTP timeouts and implement retry logic for operations that may exceed default timeout limits

  • You may experience brief pending job spikes during container warm-up periods, even when CPU/memory usage is low

To optimize performance:

  1. You can adjust N_JOBS_PER_WORKER to reduce queuing

  2. Monitor your deployment's resource usage through the deployment monitoring tab

  3. Consider breaking up memory-intensive workloads across multiple deployments if needed

  4. For long-running operations, configure HTTP timeouts appropriately:

    llm = ChatOpenAI(
        model="your-model",
        timeout=httpx.Timeout(
            timeout=1200.0,   # 20 minutes total
            read=1200.0,      # 20 minutes read timeout
            connect=30.0,     # 30 seconds to establish connection
            write=30.0,       # 30 seconds for write operations
        ),
        max_retries=2,
    )
    

Note: Resource limits cannot be increased beyond these specifications. If you need additional capacity, consider architectural changes to your application to work within these constraints.