Semantic Health Checks
The problem
Standard Kubernetes liveness and readiness probes check whether a process is alive and whether a port is accepting connections. For stateless web services, this is sufficient. For LLM agents, it is not.
An agent pod can be running, healthy from Kubernetes’ perspective, and consistently producing wrong, hallucinated, or off-task output. A broken API key, a misconfigured system prompt, or model degradation can cause this. None of these are detectable by an HTTP probe that checks /healthz.
The solution
Each agent pod exposes two health endpoints on port 8080:
| Endpoint | Type | Behavior |
|---|---|---|
GET /healthz | Liveness | Always returns 200 OK if the process is running. Used by Kubernetes to decide whether to restart the container. |
GET /readyz | Semantic readiness | Calls the configured LLM provider with a validation prompt and checks the response. Returns 200 OK if the output passes validation; returns 503 Service Unavailable if it fails. |
When /readyz returns 503, Kubernetes marks the pod NotReady. The ArkonisService stops routing tasks to that pod. The operator logs the failure and the pod continues trying — if the underlying issue resolves (e.g., transient API error), the pod recovers automatically.
How the readiness probe works
- The agent runtime receives an HTTP
GET /readyzfrom the kubelet. - The runtime sends a validation prompt to the configured LLM provider using the agent’s model.
- The response is checked against expected output characteristics (correct format, non-empty, no error indicators).
- If the check passes, the endpoint returns
200. If it fails or the API call errors, it returns503.
The probe runs on the kubelet’s schedule, configured via standard Kubernetes readinessProbe fields that the operator injects automatically.
Spec field
spec:
livenessProbe:
type: semantic # "semantic" | "ping"
intervalSeconds: 60 # how often to run the semantic check
| Field | Type | Default | Description |
|---|---|---|---|
type | string | ping | ping — HTTP reachability only. semantic — enables /readyz API validation. |
intervalSeconds | int | 60 | Interval between semantic checks. |
validatorPrompt | string | (internal default) | Custom prompt sent to the LLM during /readyz validation. Falls back to a built-in default if not set. |