Scaling
Manual scaling
In v1alpha1, replica count is set directly in spec.replicas:
spec:
replicas: 5
The operator reconciles the backing pod count to match. Scale up or down by editing the ArkonisDeployment and applying it:
kubectl patch aodep research-agent --type=merge -p '{"spec":{"replicas":10}}'
Queue-depth autoscaling (planned)
CPU and memory are poor proxies for agent load. What matters is task queue depth — how many tasks are waiting to be processed.
Queue-depth-based autoscaling via KEDA is planned for v1beta1. The AgentScaler resource will allow defining scale-up/scale-down triggers based on Redis Streams queue length, with configurable thresholds and cooldown periods.
Resource limits
The spec.limits block controls per-agent resource consumption, not Kubernetes CPU/memory limits (those are set on the backing pods separately):
spec:
limits:
maxTokensPerCall: 8000
maxConcurrentTasks: 5
timeoutSeconds: 120
| Field | Type | Default | Description |
|---|---|---|---|
maxTokensPerCall | int | 8000 | Maximum tokens (input + output) per LLM API call. |
maxConcurrentTasks | int | 5 | Maximum tasks a single agent pod will process simultaneously. |
timeoutSeconds | int | 120 | Per-task timeout. The agent pod abandons the task and returns an error after this duration. |
These values are injected as environment variables (AGENT_MAX_TOKENS, AGENT_TIMEOUT_SECONDS) into agent pods and enforced by the agent runtime.