Today I organized the basic model loading and I/O path for a Triton Python backend.

Checklist

  • Create a minimal repro for errors
  • Decompose bottlenecks from a serving perspective
  • Define targets using p50/p95 latency