frit — GPU reliability engineering at homelab scale

frit

work in progress

GPU reliability engineering at homelab scale. Running the full inference stack with real SLOs, load tests, and chaos, practicing the patterns that hold at 1000.

milestones

M0GPU Foundationshipped driver · DCGM · k3s · GPU-in-k8s · vLLM on the T4 — bare VM → stack → bare, one command

M1GPU Metrics + Observabilityshipped GPU Operator + DCGM exporter → Prometheus → Grafana, via Flux

M2Inference Layer + Token Pathshipped vLLM + Ray Serve + LiteLLM + Open WebUI serving on the T4 · OpenAI-compatible token path

M3Load Testing + Benchmarksactive guidellm harness, GitOps-triggered · TTFT / ITL / TPOT + throughput · native per-run reports + public benchmarks page

M4 – M8 · SLOs, chaos, postmortems, OSS cadence