REST API · v1

REST API

The Syttra API is plain HTTP + JSON. Async-first: create a job, poll status, download results. All timestamps are ISO 8601 UTC.

Base URL: https://api.syttra.com · Auth header: Authorization: Bearer sk_live_...

Crawl modes

Every job picks one of three modes. Each has a dedicated guide with worked examples:

Single pagesingle

Crawl exactly one URL. Fastest mode — good for quick lookups, snippet extraction, recurring checks of one page.

Open guide

Full sitefull

Start at one URL and follow internal links via BFS up to max_depth. Bounded by max_pages. Best when you want everything.

Open guide

Select pagesselect

Crawl an explicit list of URLs you've picked from the sitemap. Lets you target only the pages worth quota.

Open guide

Authentication

Every request must include your API key as a Bearer token. Keys are SHA-256 hashed at rest — we never store the plaintext. You can create and revoke keys from your dashboard.

curl

curl https://api.syttra.com/v1/jobs \
  -H "Authorization: Bearer sk_live_abcdef123..." \
  -H "Content-Type: application/json"

Requests without a valid key return 401 Unauthorized. Requests from an over-quota account return 402 Payment Required.

Job lifecycle

Every crawl runs as a Job with one of five states:

queued — accepted, waiting for a worker
running — actively crawling
completed — done, results available
failed — a non-recoverable error
cancelled — cancelled by the user

Single-page jobs typically complete in <5 seconds. Full-site crawls scale with max_pages and site responsiveness.

Get job status

GET/v1/jobs/{id}200

Returns the current state of a job, including progress counters and any error details. Same shape regardless of mode.

Example response

200 OK

{
  "id": "9a47f5c1-...",
  "status": "running",
  "created_at": "2026-04-29T13:42:00Z",
  "started_at": "2026-04-29T13:42:03Z",
  "completed_at": null,
  "pages_discovered": 12,
  "pages_crawled": 7,
  "pages_failed": 0,
  "url": "https://example.com",
  "mode": "full",
  "error": null,
  "links": { "self": "/v1/jobs/...", "result": "/v1/jobs/.../result" }
}

Polling pattern

Poll every 2-5 seconds for small jobs, backing off to 15-30s for large full-site crawls. A websocket / SSE stream endpoint is on the post-launch roadmap.

Download results

GET/v1/jobs/{id}/result200

Streams the scraped content back. Multi-page jobs (full / select) return one document per crawled page concatenated with --- separators.

Query parameters

Parameter	Type	Default
format Which format to download if the job produced multiple. Defaults to the first in export_formats.	string	—

Results are cached for 24 hours after completion, then permanently deleted. After that the endpoint returns 410 Gone. Filename: result.md for single-page jobs, crawl-{id}.md for multi-page (so the difference is obvious on disk).

List your jobs

GET/v1/jobs200

Returns a paginated list of jobs you own. Cursor-based pagination.

Query parameters

Parameter	Type	Default
status Filter by state: queued, running, completed, failed, cancelled.	string enum	—
limit Page size. Max 100.	integer	20
cursor Opaque cursor from the previous response's next_cursor.	string	—

Cancel or delete a job

DELETE/v1/jobs/{id}204

For a running job: cancels it. For a completed job: deletes the cached result immediately. Metadata (ID, timing) is kept for 30 days either way.

No response body. Idempotent — calling on an already-cancelled job still returns 204.

Error envelope

Every error response uses a consistent shape:

4xx / 5xx response

{
  "error": {
    "code": "quota_exceeded",
    "message": "You've used 100 / 100 pages this month.",
    "request_id": "req_01HQ..."
  }
}

Common codes

400 validation_error — bad request body
401 unauthorized — missing or invalid API key
402 quota_exceeded — over your monthly page limit
403 forbidden — job belongs to a different user
404 not_found — job or result does not exist
410 gone — result cache expired (after 24h)
413 payload_too_large — request body over 64 KiB
422 unsafe_url — URL resolves to a private / internal address
422 tos_blocked — target site's ToS forbids automated access
429 rate_limited — slow down
5xx internal_error — it's us, not you. The request_id helps debugging.

Rate limits

During private beta:

60 requests/minute on mutating endpoints (POST, DELETE)
120 requests/minute on read endpoints (GET)
See your usage page for your current monthly page allowance.

Every response carries X-RateLimit-Remaining and X-RateLimit-Reset headers. Rate limits tighten or loosen with your plan at public launch.