Rate Limits

Every AIsa API endpoint applies rate limits to protect both the gateway and upstream providers. Limits are enforced per API key. This page documents the defaults, how to read the rate-limit headers, and how to handle throttling gracefully.

What’s limited

AIsa enforces three dimensions of capacity:

Dimension	Applies to	What it measures
RPM	All endpoints	Requests per minute
TPM	LLM inference endpoints	Input + output tokens per minute (combined)
Concurrency	Streaming + long-running endpoints	Simultaneous in-flight requests

TPM counts input + output tokens together. A request that sends 10K tokens and generates 5K tokens consumes 15K TPM.

Default limits per tier

Tier	RPM	TPM	Concurrency	Who gets it
Free	60	60,000	5	New accounts with $2 signup credit
Starter	600	600,000	20	After first paid top-up
Growth	3,000	3,000,000	50	$500+ topped up OR approved application
Enterprise	Custom	Custom	Custom	Contact sales

Moving from Free → Starter is automatic on your first wallet top-up. Higher tiers require a quota-increase request — see Requesting a quota increase.

Per-endpoint overrides

Some endpoints have tighter default limits independent of your tier because the upstream provider caps throughput:

Endpoint group	Default override
`POST /chat/completions` (GPT-5.4)	RPM capped at upstream quota
`POST /messages` (Claude Opus)	RPM capped at upstream quota
`POST /perplexity/sonar-deep-research`	5 RPM per key (long-running)
`/aigc/video-generation`	3 concurrent video tasks per key
`/v1/models/*:generateContent` (image gen)	30 RPM per key

Reading rate-limit headers

Every response (including 429) includes four headers:

X-RateLimit-Limit-Requests:    600
X-RateLimit-Remaining-Requests: 587
X-RateLimit-Limit-Tokens:      600000
X-RateLimit-Remaining-Tokens:  592104
X-RateLimit-Reset-Requests:    1745012400
X-RateLimit-Reset-Tokens:      1745012400
Retry-After:                   3

Header	Meaning
`X-RateLimit-Limit-Requests`	Your RPM cap
`X-RateLimit-Remaining-Requests`	Requests remaining this minute
`X-RateLimit-Limit-Tokens`	Your TPM cap
`X-RateLimit-Remaining-Tokens`	Tokens remaining this minute
`X-RateLimit-Reset-Requests`	UNIX timestamp when the request counter resets
`X-RateLimit-Reset-Tokens`	UNIX timestamp when the token counter resets
`Retry-After`	(429 only) Seconds until you can retry

Handling 429 responses

Detect a 429

The response body follows the standard error shape with error.type = "rate_limit_error" and a code of rate_limit_exceeded, upstream_rate_limit, or quota_exceeded.

Honor `Retry-After`

Always wait at least the number of seconds in Retry-After before the next attempt. Never retry immediately.

Use exponential backoff + jitter

After the initial wait, double the delay on each subsequent 429, with ±25% jitter. Cap at 30 seconds. See the retry example.

Prefer queuing over retrying

Instead of tight retry loops, queue requests and drain them at a rate below your RPM. A simple token-bucket with a leak rate of RPM/60 requests per second is robust.

Stay under the cap

Monitor X-RateLimit-Remaining-* on every response. When it drops below 10% of the limit, slow down preemptively — avoiding the 429 entirely.

Example: staying under the limit

from openai import OpenAI

client = OpenAI(base_url="https://api.aisa.one/v1", api_key="sk-aisa-...")

response = client.chat.completions.with_raw_response.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}],
)
remaining = int(response.headers.get("x-ratelimit-remaining-requests", 1000))
if remaining < 50:
    # Throttle preemptively — sleep a bit before the next call
    time.sleep(1.0)
body = response.parse()

Requesting a quota increase

If you need higher limits for production traffic:

Top up your wallet — moving from Free → Starter is automatic.
For Growth or Enterprise tiers, email developer@aisa.one with:
- Your workspace ID
- Expected peak RPM and TPM
- Which models or endpoints you need the increase for
- A brief description of your use case

Approvals for Growth typically land within one business day; Enterprise is negotiated with a dedicated account team.

Error Codes

Full list of HTTP status codes and recommended responses.

Usage Logs

Per-request billing and rate-limit telemetry in the dashboard.

Overview

Chat API

Video API

Search API

Perplexity API

Financial API

Twitter API

Scholar API

Prediction Market API

CoinGecko

Essentials

Rate Limits

What’s limited

Default limits per tier

Per-endpoint overrides

Reading rate-limit headers

Handling 429 responses

Example: staying under the limit

Requesting a quota increase

Error Codes

Usage Logs

Overview

Chat API

Video API

Search API

Perplexity API

Financial API

Twitter API

Scholar API

Prediction Market API

CoinGecko

Essentials

Documentation Index

​What’s limited

​Default limits per tier

​Per-endpoint overrides

​Reading rate-limit headers

​Handling 429 responses

​Example: staying under the limit

​Requesting a quota increase

​Related

Error Codes

Usage Logs

What’s limited

Default limits per tier

Per-endpoint overrides

Reading rate-limit headers

Handling 429 responses

Example: staying under the limit

Requesting a quota increase

Related