Methodology
Regimes, not false precision.
RunWhere precomputes model-cloud artifacts, then classifies each composition into one of three directional regimes.
Regime Thresholds
| Condition | Regime |
|---|---|
| Cost ratio below 1.5x | Close call |
| Self-host at least 1.5x cheaper | Self-host wins clearly |
| API at least 1.5x cheaper, or self-host is infeasible | API wins clearly |
Cost Formulas
Self-host monthly cost is GPU hourly price times required GPU count times 720 monthly hours, divided by utilization, plus operational labor. API monthly cost is tokens per day times 30 days times the blended input/output price.
self_host = (gpu_hourly * gpu_count * 24 * 30 / utilization) + labor
api = monthly_mtok * (input_share * input_price + output_share * output_price) Self-Host Throughput
Throughput is estimated from memory bandwidth divided by active FP16 parameter bytes, then adjusted by utilization and quantization multipliers. MoE models use active parameters for serving pressure; dense and hybrid models use total parameters.
Bounded Defaults
- GPU utilization: 60%
- Input/output token split: 70% input, 30% output
- INT8 throughput multiplier: 1.7x
- INT4 throughput multiplier: 2.5x
- Operational labor: 0.1 FTE at $150,000/year
- Peak-to-average traffic ratio: 3:1
- Storage/network overhead: off by default
Source Provenance
Artifact values are labeled as calculated, cited, or measured. The current seed snapshots are deterministic local inputs so the static app works end to end; production refresh scripts are the intended replacement for live cloud, Hugging Face, and OpenRouter data.
Pricing data last refreshed: April 29, 2026.
Curation Criteria
Models must be self-hostable open weights, available on Hugging Face, compatible with common inference runtimes, and small enough for a single cloud GPU node. Preview entries disclose unresolved hard-filter questions.
Excluded From V1
- Reserved capacity, committed-use discounts, and spot pricing
- Multi-cloud compositions
- User accounts or saved private scenarios
- Precision cost accounting beyond the directional regime
Escalation Policy
Live analysis is reserved for close-call regimes. Clear wins stay deterministic because the precomputed artifact already answers the question.
Versioning
Every artifact includes an ID, generation timestamp, ruleset version, and source commit placeholders. Shareable URLs encode identity in the path and user knobs in the query string.