"Should we self-host an open-source model or use OpenAI's API?" is asked at every growth-stage company building AI features in 2026. The answer is usually buy. The discussion gets dragged out anyway by sunk-cost reasoning, sovereignty concerns, and the AI-company-identity question — "if we're an AI company, shouldn't we run our own models?"
The honest framing: most growth-stage companies don't have the AI-platform team to operate self-hosted models reliably, the volume to make the unit economics work, or the strategic reason to do it. Buying — paying per token to a hosted provider — is usually the right default. Self-hosting becomes the right answer in specific cases that are easy to identify if you're honest about the math.
The cost math
Hosted APIs charge per million tokens. Pricing varies by model and provider, but for most production workloads in 2026, it's well-understood and predictable. Self-hosting requires GPUs (or rented GPU instances), the infrastructure to keep them running, on-call coverage when they fail, model updates as new versions ship, and an evaluation pipeline to make sure none of the upgrades broke anything.
Break-even depends on volume. Below a certain monthly token count — and the threshold is usually higher than engineers initially estimate — hosted APIs are cheaper than the all-in cost of running your own. Above it, self-hosting starts winning on per-token cost, but the fixed costs of operating the platform don't disappear.
When buy clearly wins
- Low to medium volume. If you're running thousands or low millions of tokens per day, you don't have the volume to amortize self-hosting overhead.
- Frontier-model quality matters. GPT, Claude, and Gemini's flagship models are still measurably better than the best open-weight models for many use cases. If your product depends on quality, you want the latest hosted models.
- The team doesn't have ML ops experience. Running models reliably in production is a real specialization. If you don't have it, you'll spend a year building it before you ship anything customers see.
- The use case requires the latest models. Hosted providers ship new capabilities quickly. Self-hosted teams are usually 3–9 months behind on whichever frontier feature matters this quarter.
When build can win
- Very high volume and cost-sensitive. When you're running tens of millions of tokens per day on a workload where each token's value is small, self-hosting an open-weight model can be much cheaper.
- Regulated environment requiring on-prem. Some industries (defense, certain financial use cases, healthcare with specific data classifications) require data and inference to stay in your own infrastructure.
- Specialized fine-tuning over base models. If your competitive advantage is a model fine-tuned on your proprietary data, you may need to host it. (See the RAG-vs-fine-tuning piece — most companies who think they need fine-tuning don't.)
- You have ML ops as a team strength. If you've already invested in the platform, the marginal cost of running another model is low.
The hybrid most production systems converge on
Most production AI systems in 2026 use hosted APIs for high-quality, low-volume cases (frontier reasoning, complex generation, customer-facing features) and self-hosted open-weight models for high-volume, narrower tasks (classification, embedding, simple extraction). The split happens naturally — not as a strategic position, but because the costs and the use cases naturally pull in different directions.
Purity is overrated. The companies that hold out for "all open" or "all hosted" usually do it for identity reasons, not cost reasons.
What the discussion usually misses
Two things. First, velocity matters more than per-token cost at most growth-stage companies. The opportunity cost of spending six months building self-hosted infrastructure instead of shipping features is almost always higher than the cumulative API spend you'd save. Second, vendor lock-in is real but reversible. Switching from one hosted provider to another is now a meaningful but solvable engineering project — six weeks of work, not six quarters.
If you're running this debate internally and want a vendor-neutral perspective, let's talk. We've built on hosted, on self-hosted, and on hybrid — the right answer depends on your specifics, not on a default.
