Troubleshooting Agent Builder, Insights (clio) Startup & DNS Failures

Last updated: March 13, 2026

Overview

Two common issues affect Agent Builder (and Insights) on self-hosted LangSmith deployments:

  1. Agent Builder pod fails to start due to license verification failure or stale deployment/session conflicts after a reinstall.

  2. Agent Builder features fail at runtime because pods cannot resolve the external hostname from inside the cluster (DNS issue).


Issue 1 — Agent Builder Pod Fails to Start

Symptoms

  • Agent Builder pod crashes on startup with ValueError: License verification failed

  • Bootstrap job hangs or times out during helm upgrade

  • Bootstrap returns 409 Conflict — a tracing project named agent-builder or clio already exists

Cause

The license key was not updated before running the bootstrap job, or stale deployment/session records from a previous install are blocking re-creation.

Resolution

Step 1 — Clean up stale deployments via the API

Get session IDs for agent-builder and clio:

curl -X GET "https://<your-hostname>/api/v1/sessions" \
-H "x-api-key: YOUR_API_KEY" \
-H "X-Tenant-ID: YOUR_WORKSPACE_ID" \
-H "Content-Type: application/json" | jq '.[] | select(.name == "clio" or .name == "agent-builder") | {name, id}'

Delete each deployment (with its tracing project):

curl -X DELETE "https://<your-hostname>/api/v2/deployments/<DEPLOYMENT_ID>?delete_tracing_project=true" \
-H "x-api-key: YOUR_API_KEY"

Step 2 — If API deletion is blocked, use SQL

-- Soft delete (recommended — triggers reconciler cleanup)
UPDATE host_projects
SET status = 'AWAITING_DELETE', updated_at = now()
WHERE name IN ('clio', 'agent-builder') AND tenant_id = '<TENANT_UUID>';

-- Also clean up associated tracing projects if needed
DELETE FROM tracer_sessions
WHERE id = '<TRACER_SESSION_ID>';

Step 3 — Re-run the bootstrap

helm upgrade langsmith langsmith/langsmith -n <namespace> -f values.yaml

Step 4 — If Agent Builder UI doesn't appear after bootstrap, restart the frontend

kubectl rollout restart deployment langsmith-frontend -n <namespace>

Note: If the license key was updated after the bootstrap job ran, reinstall the chart so bootstrap picks up the new key.


Issue 2 — Agent Builder Fails at Runtime (DNS / External Hostname)

Symptoms

  • Agent Builder UI loads but features don't work (threads, assistants, crons return errors)

  • Pods cannot reach SMITH_BACKEND_ENDPOINT or GO_ENDPOINT

  • nslookup <your-hostname> from inside the cluster fails

  • Hostname is only resolvable via /etc/hosts on the host machine, not inside the cluster

Cause

The agent bootstrap script hardcodes SMITH_BACKEND_ENDPOINT, GO_ENDPOINT, HOST_BACKEND_ENDPOINT, and MCP_SERVER_URL using the external hostname from config.hostname. Pods cannot resolve this hostname from inside the cluster when it's not backed by real DNS (e.g. a dummy domain set only in /etc/hosts).

Resolution

Option A — Patch the LGP custom resource directly (immediate workaround)

# Find the agent-builder LGP CR
kubectl get lgp -n <namespace>

# Edit it
kubectl edit lgp agent-builder -n <namespace>

In spec.serverSpec.env, update these entries to use internal K8s service names:

SMITH_BACKEND_ENDPOINT=http://<release-name>-backend.<namespace>.svc.cluster.local:1984
GO_ENDPOINT=http://<release-name>-platform-backend.<namespace>.svc.cluster.local:1986
HOST_BACKEND_ENDPOINT=http://<release-name>-host-backend.<namespace>.svc.cluster.local:1985
MCP_SERVER_URL=http://<release-name>-platform-backend.<namespace>.svc.cluster.local:1986/mcp

Replace <release-name> and <namespace> with your Helm release name and namespace.
These env vars will be overwritten on the next helm upgrade since bootstrap re-derives them from config.hostname.

Option B — Add a CoreDNS hosts block

Add your hostname to the CoreDNS ConfigMap so it resolves to the ingress IP from inside the cluster:

hosts {
<INGRESS_IP> <your-hostname>
fallthrough
}

Then restart CoreDNS:

kubectl rollout restart deployment coredns -n kube-system

Verify resolution from inside a pod:

kubectl exec -n <namespace> <any-pod> -- nslookup <your-hostname>

Notes

  • From v0.13.17+, Agent Builder communicates with backend services via Kube DNS by default. Upgrading to this version or later eliminates the need for the workarounds above (except for the default MCP server, which still requires external DNS).

  • The root cause of the LGP DNS issue is that unlike tool-server and trigger-server (which use internal K8s service names), the bootstrap script sets SMITH_BACKEND_ENDPOINT, GO_ENDPOINT, HOST_BACKEND_ENDPOINT, and MCP_SERVER_URL to the external hostname — routing internal traffic externally.

  • X-Tenant-ID in API calls is the workspace ID, found in the LangSmith UI URL.