flowchart TB
subgraph AWS["AWS account"]
VPC["Shared lab VPC"]
subgraph C["Per-attendee EKS cluster (x60)"]
ARGO["in-cluster ArgoCD"]
PLAT["Platform controls"]
OBS["Observability"]
AI["AI layer"]
BURN["Burn targets"]
end
end
TF["Terraform fleet"] -->|provisions| C
VPC --- C
ARGO --> PLAT & OBS & AI & BURN
Each attendee gets their own cluster (no hub/spoke), reconciling itself from Git. Provisioned by Terraform.
flowchart LR FB["fork bomb"] -->|"pids.max=1024"| BLOCK["EAGAIN
node survives"] FB -. "syscall" .-> FALCO["Falco
CRITICAL"] FALCO --> SK["Falcosidekick"] SK -->|"forward"| TALON["Falco Talon"] TALON -->|"terminate"| KILL["pod killed (~4s)"]
podPidsLimit is the inline block (the wall). Falco→Talon is detect-and-respond (the alarm). Both validated live.
flowchart LR APP["workloads
guard-proxy
kagent"] -->|"OTLP"| OTEL["OTel Collector"] OTEL -->|"metrics"| PROM["Prometheus
Grafana"] OTEL -->|"traces"| TEMPO["Tempo"] ALLOY["Alloy"] -->|"logs"| LOKI["Loki"] OTEL -->|"spanmetrics"| DD["Datadog"] FALCO["Falcosidekick"] --> DD
Path-independent: Datadog is required for the event but the OTel layer keeps it swappable (drop the Datadog exporter to run OSS-only).
flowchart LR UI["chat-ui"] --> GP["guard-proxy"] GP -->|"input guard"| LG1["LLM Guard"] GP -->|"forward"| AG["kagent agent"] AG -->|"Pod Identity"| BR["Claude
on Bedrock"] AG -->|"tools"| MCP["MCP
good / rogue"] AG -->|"response"| GP GP -->|"output guard"| LG2["LLM Guard"] GP -->|"meter"| COST["cost counter"]
gitops/ai-layer/ - guard-proxy fronts the kagent-owned agent pod (the inspection point).