API-first by default

9 out of 10 typical workloads should stay on the API.

At typical scale, hosted API providers compete hard enough on price that self-hosting usually loses — and that is before counting your time. But exceptions exist.

Exception traits

Check if any of these apply.

  • Sustained high-volume API spend
  • Strict data residency or air-gap requirements
  • Predictable batch or steady GPU utilization
  • Small open-weight substitute is acceptable
  • Latency requirements API providers cannot meet

Four questions

No parameter anxiety.

The primary flow asks only for API spend band, hosted model, traffic shape, and hard constraints.

Serving modalities

Self-hosting is not one thing.

Exception results compare API, managed endpoints, serverless GPU, VMs, batch GPU, and owned hardware.

Advanced path

Open-weight details stay available.

If you already know the model and cloud, inspect the precomputed GPU × quantization × shape matrix.