Architecture at a glance

flowchart TB
  subgraph AWS["AWS account"]
    VPC["Shared lab VPC"]
    subgraph C["Per-attendee EKS cluster (x60)"]
      ARGO["in-cluster ArgoCD"]
      PLAT["Platform controls"]
      OBS["Observability"]
      AI["AI layer"]
      BURN["Burn targets"]
    end
  end
  TF["Terraform fleet"] -->|provisions| C
  VPC --- C
  ARGO --> PLAT & OBS & AI & BURN
  

Each attendee gets their own cluster (no hub/spoke), reconciling itself from Git. Provisioned by Terraform.

4 · Runtime security

Fork bomb: a wall + an alarm

flowchart LR
  FB["fork bomb"] -->|"pids.max=1024"| BLOCK["EAGAIN
node survives"] FB -. "syscall" .-> FALCO["Falco
CRITICAL"] FALCO --> SK["Falcosidekick"] SK -->|"forward"| TALON["Falco Talon"] TALON -->|"terminate"| KILL["pod killed (~4s)"]

podPidsLimit is the inline block (the wall). Falco→Talon is detect-and-respond (the alarm). Both validated live.

6 · Observability

The lens for every beat

flowchart LR
  APP["workloads
guard-proxy
kagent"] -->|"OTLP"| OTEL["OTel Collector"] OTEL -->|"metrics"| PROM["Prometheus
Grafana"] OTEL -->|"traces"| TEMPO["Tempo"] ALLOY["Alloy"] -->|"logs"| LOKI["Loki"] OTEL -->|"spanmetrics"| DD["Datadog"] FALCO["Falcosidekick"] --> DD

Path-independent: Datadog is required for the event but the OTel layer keeps it swappable (drop the Datadog exporter to run OSS-only).

7 · The AI layer

How a prompt reaches a model

flowchart LR
  UI["chat-ui"] --> GP["guard-proxy"]
  GP -->|"input guard"| LG1["LLM Guard"]
  GP -->|"forward"| AG["kagent agent"]
  AG -->|"Pod Identity"| BR["Claude
on Bedrock"] AG -->|"tools"| MCP["MCP
good / rogue"] AG -->|"response"| GP GP -->|"output guard"| LG2["LLM Guard"] GP -->|"meter"| COST["cost counter"]

gitops/ai-layer/ - guard-proxy fronts the kagent-owned agent pod (the inspection point).