Why are my agent health checks failing with GCP Gateway API on GKE?

Last updated: December 4, 2025

Context

When deploying LangSmith agents on Google Kubernetes Engine (GKE) using the Gateway API, you may encounter health check failures. This occurs because the GCP Load Balancer performs health checks on port 80 at path `/`, but LangSmith agents expose their health endpoint at `/ok` on port 8000. This mismatch causes the Load Balancer to incorrectly mark healthy agent pods as unhealthy, preventing proper traffic routing.

Answer

This is a known limitation with GKE Gateway API. The HealthCheckPolicy is GKE-specific and not part of the standard Ingress/Gateway specification. The external load balancer health check for GKE Gateway cannot be disabled and is always created for backends attached to a Gateway.

There are two approaches to resolve this issue:

Option 1: Create Manual HealthCheckPolicy Resources (Temporary Workaround)

For each agent deployment, create a HealthCheckPolicy that configures the correct port and path:

apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: agent-name-healthcheck
  namespace: langsmith
spec:
  default:
    checkIntervalSec: 10
    timeoutSec: 5
    healthyThreshold: 2
    unhealthyThreshold: 2
    config:
      type: HTTP
      httpHealthCheck:
        port: 8000
        requestPath: /ok
  targetRef:
    group: ""
    kind: Service
    name: ${agent-service-name}

This approach requires manual creation for every agent deployment and is not scalable for production environments.

Option 2: Use Envoy Gateway (Recommended)

The recommended long-term solution is to migrate from GKE Gateway to Envoy Gateway, which does not require external health checks and avoids this compatibility issue entirely.

Additional Configuration

If you continue using the workaround approach, ensure you disable the ingressHealthCheckEnable value in your configuration to prevent deployment timeouts despite having healthy agent backends.

Note: LangSmith does not officially support GKE Gateway due to this health check limitation. For production deployments, migrating to Envoy Gateway is strongly recommended.