API-first by default

9 out of 10 typical workloads should stay on the API.

At typical scale, hosted API providers compete hard enough on price that self-hosting usually loses — and that is before counting your time. But exceptions exist.

Exception traits

Check if any of these apply.

  • Sustained high-volume API spend
  • Strict data residency or air-gap requirements
  • Predictable batch or steady GPU utilization
  • Small open-weight substitute is acceptable
  • Latency requirements API providers cannot meet

Common scenarios

Compare specific decisions.

Four questions

No parameter anxiety.

The primary flow asks only for API spend band, hosted model, traffic shape, and hard constraints.

Serving modalities

Self-hosting is not one thing.

Exception results compare API, managed endpoints, serverless GPU, VMs, batch GPU, and owned hardware.

Advanced path

Open-weight details stay available.

If you already know the model and cloud, inspect the precomputed GPU × quantization × shape matrix.