Models, Pricing, and Limits

The gateway supports models from multiple providers through one endpoint.

Choosing a Model

Select models based on:

Quality requirements
Latency tolerance
Cost sensitivity
Feature support (vision, tool use, long context, and more)

Not all models support the same capabilities.

Current and Legacy Versions

By default, users generally retain access to:

Current model versions
One previous version (when available)

If you need older legacy versions for compatibility, submit a request to the support team. Legacy availability is sometimes limited by providers. For example, if a vulnerability is discovered in a legacy model, a provider may withdraw that model.

Keep this in mind when designing long-running studies.

New Model Availability

The target turnaround for onboarding newly released models is typically 24-48 hours after provider release.

Some releases may take longer because of validation, policy review, or provider capacity constraints.

Managing Spend

Use layered controls:

Team budget caps
Key-level limits where needed
Rate limiting for burst control
Routine usage reviews and cleanup of unused keys

Practical Cost Controls

Start with lower-cost models during development.
Limit max output tokens for automated workflows.
Cache repeatable prompts where possible.
Move high-value workloads to stronger models only when justified.

Models, Pricing, and Limits ​

Choosing a Model ​

Current and Legacy Versions ​

New Model Availability ​

Managing Spend ​