Every AIsa API endpoint applies rate limits to protect both the gateway and upstream providers. Limits are enforced per API key. This page documents the defaults, how to read the rate-limit headers, and how to handle throttling gracefully.Documentation Index
Fetch the complete documentation index at: https://aisa.one/docs/llms.txt
Use this file to discover all available pages before exploring further.
What’s limited
AIsa enforces three dimensions of capacity:| Dimension | Applies to | What it measures |
|---|---|---|
| RPM | All endpoints | Requests per minute |
| TPM | LLM inference endpoints | Input + output tokens per minute (combined) |
| Concurrency | Streaming + long-running endpoints | Simultaneous in-flight requests |
TPM counts input + output tokens together. A request that sends 10K tokens and generates 5K tokens consumes 15K TPM.
Default limits per tier
| Tier | RPM | TPM | Concurrency | Who gets it |
|---|---|---|---|---|
| Free | 60 | 60,000 | 5 | New accounts with $2 signup credit |
| Starter | 600 | 600,000 | 20 | After first paid top-up |
| Growth | 3,000 | 3,000,000 | 50 | $500+ topped up OR approved application |
| Enterprise | Custom | Custom | Custom | Contact sales |
Per-endpoint overrides
Some endpoints have tighter default limits independent of your tier because the upstream provider caps throughput:| Endpoint group | Default override |
|---|---|
POST /chat/completions (GPT-5.4) | RPM capped at upstream quota |
POST /messages (Claude Opus) | RPM capped at upstream quota |
POST /perplexity/sonar-deep-research | 5 RPM per key (long-running) |
/aigc/video-generation | 3 concurrent video tasks per key |
/v1/models/*:generateContent (image gen) | 30 RPM per key |
Reading rate-limit headers
Every response (including429) includes four headers:
| Header | Meaning |
|---|---|
X-RateLimit-Limit-Requests | Your RPM cap |
X-RateLimit-Remaining-Requests | Requests remaining this minute |
X-RateLimit-Limit-Tokens | Your TPM cap |
X-RateLimit-Remaining-Tokens | Tokens remaining this minute |
X-RateLimit-Reset-Requests | UNIX timestamp when the request counter resets |
X-RateLimit-Reset-Tokens | UNIX timestamp when the token counter resets |
Retry-After | (429 only) Seconds until you can retry |
Handling 429 responses
Detect a 429
The response body follows the standard error shape with
error.type = "rate_limit_error" and a code of rate_limit_exceeded, upstream_rate_limit, or quota_exceeded.Honor `Retry-After`
Always wait at least the number of seconds in
Retry-After before the next attempt. Never retry immediately.Use exponential backoff + jitter
After the initial wait, double the delay on each subsequent 429, with ±25% jitter. Cap at 30 seconds. See the retry example.
Prefer queuing over retrying
Instead of tight retry loops, queue requests and drain them at a rate below your RPM. A simple token-bucket with a leak rate of
RPM/60 requests per second is robust.Example: staying under the limit
Requesting a quota increase
If you need higher limits for production traffic:- Top up your wallet — moving from Free → Starter is automatic.
- For Growth or Enterprise tiers, email developer@aisa.one with:
- Your workspace ID
- Expected peak RPM and TPM
- Which models or endpoints you need the increase for
- A brief description of your use case
Related
Error Codes
Full list of HTTP status codes and recommended responses.
Usage Logs
Per-request billing and rate-limit telemetry in the dashboard.