Resolving common issues when upgrading LangSmith Helm chart from 0.11.x to 0.12.x?

Last updated: December 18, 2025

Context

When upgrading self-hosted LangSmith from Helm chart version 0.11.x to 0.12.x, you may encounter several issues including:

  1. Migration job OOM errors - The feedbackDataMigration or other PostgreSQL migration jobs may run out of memory (OOMKilled) during the upgrade process

  2. Blank screen after SSO login - Some users may experience a blank screen after authenticating via AWS SSO or other identity providers

  3. Bulk export validation failures - After upgrade, bulk export destinations may fail validation with errors about include_bucket_in_prefix

  4. Bulk export format_version errors - Existing bulk exports may fail with ValidationError: Input should be 'v1' or 'v2_beta' due to incomplete migrations

These issues are more likely to occur on deployments with:

  • Large amounts of historical data (200+ days of traces)

  • High queue worker counts (60+ workers)

  • Large ClickHouse memory consumption (30+ GB)

Answer

Issue 1: Migration Job OOM Errors (feedbackDataMigration)

Root Cause: The feedbackDataMigration schema migration job processes all existing feedback data in memory. Unlike horizontally-scaled workloads, this single migration pod must process ALL feedback data, which can exceed default memory limits on deployments with extensive historical data.

Solution: Increase memory allocation for migration jobs before upgrading.

In your Helm values file, add or update the migration job resources:

migrations:
  resources:
    requests:
      memory: "8Gi"
      cpu: "1"
    limits:
      memory: "16Gi"
      cpu: "2"

Recommended memory settings based on deployment age:

  • < 90 days of data: 4Gi limit

  • 90-180 days of data: 8Gi limit

  • 180+ days of data: 16Gi limit (or higher)

Important: After a failed migration attempt:

  1. Collect diagnostics immediately using the diagnostics script before Kubernetes garbage collection removes logs

  2. Tail migration pod logs during the upgrade: kubectl logs -f job/langsmith-pg-migrations -n <namespace>

  3. Monitor memory usage during migration

Issue 2: Blank Screen After SSO Login

Root Cause: This issue may be related to session handling changes in the 0.12.x release or browser cache conflicts after the upgrade.

Solution:

  1. Have affected users clear browser cache and cookies for the LangSmith domain

  2. Try an incognito/private browser window

  3. If the issue persists, check backend logs for authentication errors:

    kubectl logs -l app=langsmith-backend -n <namespace> | grep -i "auth\|sso\|session"Copy

Issue 3: Bulk Export Destination Validation Failures

Root Cause: Helm chart 0.12.x introduced the include_bucket_in_prefix parameter for bulk export destinations. Existing destinations may require this parameter to be explicitly set.

Solution: When creating or updating bulk export destinations, add the include_bucket_in_prefix parameter:

{
  "destination_type": "s3",
  "config": {
    "bucket_name": "your-bucket-name",
    "prefix": "langsmith-exports",
    "s3_region": "us-east-1",
    "include_bucket_in_prefix": true
  }
}

For existing destinations that were working before the upgrade, setting "include_bucket_in_prefix": true should restore functionality.

Issue 4: Bulk Export format_version Validation Errors

Symptoms:

ValidationError: 1 validation error for BulkExport
format_version
  Input should be 'v1' or 'v2_beta' [type=enum, input_value=None, input_type=NoneType]

The /bulk-exports API may also fail to list existing exports with the same error.

Root Cause: The database migration that sets the default format_version value for existing bulk exports may not have completed successfully. This migration should update all existing bulk export records with NULL format_version to 'v1'.

The relevant migrations are:

  • 9e3fe47a4500 → bulk export format_version (adds the column)

  • ce08d43fb55d → bulk export format_version default (sets default value for existing records)

Solution:

  1. Verify the issue by checking the bulk_exports table:

    SELECT id, format_version FROM bulk_exports WHERE format_version IS NULL;
  2. Apply the fix manually if records have NULL format_version:

    UPDATE bulk_exports
    SET format_version = 'v1'
    WHERE format_version IS NULL;
  3. Verify the alembic version to confirm migrations ran:

    SELECT * FROM alembic_version;

    For Helm 0.12.31 (app 0.12.69), the expected alembic version should be 09f3b8e4b21f or later.

Pre-Upgrade Checklist

Before upgrading from 0.11.x to 0.12.x:

  1. Review the changelog for breaking changes at Self-hosted LangSmith Changelog

  2. Note the breaking change in v0.12.0: The langgraphPlatform option is deprecated. Use config.deployment instead:

    # Old (deprecated)
    langgraphPlatform:
      enabled: true
    
    # New (v0.12.0+)
    config:
      deployment:
        enabled: true
  3. Increase migration job memory as described above

  4. Backup your databases before upgrading

  5. Run the upgrade during a maintenance window when you can monitor logs and respond to issues

  6. Have the diagnostics script ready to capture logs immediately if issues occur

Post-Upgrade Verification

After the upgrade completes:

  1. Verify all pods are running:

    kubectl get pods -n <namespace>
  2. Verify migration jobs completed:

    kubectl get jobs -n <namespace>
    # Both langsmith-pg-migrations and langsmith-ch-migrations should show COMPLETIONS: 1/1
  3. Check for migration logs:

    kubectl logs job/langsmith-pg-migrations -n <namespace>

    Look for: Running upgrade ... -> ... messages confirming migrations ran

  4. Test bulk exports if you use this feature

  5. Test SSO login with multiple users

Resources