Four questions
No parameter anxiety.
The primary flow asks only for API spend band, hosted model, traffic shape, and hard constraints.
API-first by default
At typical scale, hosted API providers compete hard enough on price that self-hosting usually loses — and that is before counting your time. But exceptions exist.
Exception traits
Four questions
The primary flow asks only for API spend band, hosted model, traffic shape, and hard constraints.
Serving modalities
Exception results compare API, managed endpoints, serverless GPU, VMs, batch GPU, and owned hardware.
Advanced path
If you already know the model and cloud, inspect the precomputed GPU × quantization × shape matrix.