Rate Limits

The API enforces per-key and per-organization request limits to keep the service responsive. Every response includes rate limit headers so your integration always knows where it stands.

Request rate limits

Limits are enforced in a 60-second fixed window, applied both per API key and per organization. If either limit is reached, the request receives a 429 response.

Route type	Per key	Per organization	Applies to
Read	120 / min	360 / min	jobs, credits
Create	60 / min	180 / min	classify
Scan	20 / min	60 / min	scan, scan/lite, scan/deep (same numeric policy per path; each path has its own burst bucket)

These are default launch limits. If you need higher throughput for a production workload, contact support@on-page.ai.

Per-org job concurrency

Counts apply to the combined totalacross classify and scan jobs — one org can't saturate the worker pool with any single job type.

Concurrent active jobs per org

Up to 5 jobs can run simultaneously for your organization, across any combination of classify / scan. Additional submissions queue and start as slots free up. Hit the cap and you get a 429 with ORG_ACTIVE_LIMIT_REACHED.

100

Max queued jobs per org

If your organization already has 100 jobs waiting, new submissions are rejected with ORG_QUEUED_LIMIT_REACHED until the queue drains.

What happens under heavy load

When the global scan queue is saturated, new requests may receive a 429 with error code SCAN_QUEUE_SATURATED. The Retry-After header tells you how long to wait. This is rare — it only occurs when the entire system is under exceptional load, not just your organization.

Response headers

X-RateLimit-* appears on every authenticated response so you can track your remaining budget in real time. Retry-After and X-Concurrency-* are conditional — present only on 429 responses (see the per-header notes below).

X-RateLimit-Limit

Maximum requests allowed in the current window.

X-RateLimit-Remaining

Requests remaining before the limit resets.

X-RateLimit-Reset

Unix timestamp (seconds) when the current window resets.

Retry-After

Seconds to wait before retrying. Present on 429 responses when the limiter computed one.

X-Concurrency-Limit

The concurrency cap that was hit. Present on 429 responses with ORG_ACTIVE_LIMIT_REACHED or ORG_QUEUED_LIMIT_REACHED.

X-Concurrency-Remaining

Remaining in-flight capacity — always 0 when the caller just hit the cap. Retry after jobs drain (observe via webhooks or GET /v1/jobs/:id polling).

Handling 429 responses

A 429 means you've hit a limit. Here's what to do:

Always respect Retry-After

The header tells you exactly how long to wait. Don't guess — use the value.

Use idempotency keys on create routes

Pass an Idempotency-Key header so retries are safe and don't create duplicate jobs.

Back off on repeated 429s

If you see multiple 429s in a row, reduce your request rate rather than retrying immediately.

Poll existing jobs instead of resubmitting

If a scan was already submitted, poll its status with GET /v1/jobs/:id instead of submitting a new one.

Scans can take 30 seconds to 3 minutes

Don't treat slow scans as failures. Poll at reasonable intervals (every 3-5 seconds) and let the system work.

Rate limit error codes

Code	Meaning
`RATE_LIMITED`	Request rate limit exceeded for this key or organization.
`ORG_ACTIVE_LIMIT_REACHED`	Your org has 5 jobs running at once. Wait for one to finish before submitting more.
`ORG_QUEUED_LIMIT_REACHED`	Your org has 100 jobs waiting in queue. Wait for the queue to drain.
`SCAN_QUEUE_SATURATED`	Global scan capacity is full. Retry after the indicated delay.