Syttra
REST API · v1

REST API

The Syttra API is plain HTTP + JSON. Async-first: create a job, poll status, download results. All timestamps are ISO 8601 UTC.

Base URL: https://api.syttra.com · Auth header: Authorization: Bearer sk_live_...

Crawl modes

Every job picks one of three modes. Each has a dedicated guide with worked examples:

Authentication

Every request must include your API key as a Bearer token. Keys are SHA-256 hashed at rest — we never store the plaintext. You can create and revoke keys from your dashboard.

curl
curl https://api.syttra.com/v1/jobs \
  -H "Authorization: Bearer sk_live_abcdef123..." \
  -H "Content-Type: application/json"

Requests without a valid key return 401 Unauthorized. Requests from an over-quota account return 402 Payment Required.

Job lifecycle

Every crawl runs as a Job with one of five states:

  • queued — accepted, waiting for a worker
  • running — actively crawling
  • completed — done, results available
  • failed — a non-recoverable error
  • cancelled — cancelled by the user

Single-page jobs typically complete in <5 seconds. Full-site crawls scale with max_pages and site responsiveness.

Get job status

GET/v1/jobs/{id}200

Returns the current state of a job, including progress counters and any error details. Same shape regardless of mode.

Example response

200 OK
{
  "id": "9a47f5c1-...",
  "status": "running",
  "created_at": "2026-04-29T13:42:00Z",
  "started_at": "2026-04-29T13:42:03Z",
  "completed_at": null,
  "pages_discovered": 12,
  "pages_crawled": 7,
  "pages_failed": 0,
  "url": "https://example.com",
  "mode": "full",
  "error": null,
  "links": { "self": "/v1/jobs/...", "result": "/v1/jobs/.../result" }
}

Polling pattern

Poll every 2-5 seconds for small jobs, backing off to 15-30s for large full-site crawls. A websocket / SSE stream endpoint is on the post-launch roadmap.

Download results

GET/v1/jobs/{id}/result200

Streams the scraped content back. Multi-page jobs (full / select) return one document per crawled page concatenated with --- separators.

Query parameters

ParameterTypeDefault
format
Which format to download if the job produced multiple. Defaults to the first in export_formats.
string

Results are cached for 24 hours after completion, then permanently deleted. After that the endpoint returns 410 Gone. Filename: result.md for single-page jobs, crawl-{id}.md for multi-page (so the difference is obvious on disk).

List your jobs

GET/v1/jobs200

Returns a paginated list of jobs you own. Cursor-based pagination.

Query parameters

ParameterTypeDefault
status
Filter by state: queued, running, completed, failed, cancelled.
string enum
limit
Page size. Max 100.
integer20
cursor
Opaque cursor from the previous response's next_cursor.
string

Cancel or delete a job

DELETE/v1/jobs/{id}204

For a running job: cancels it. For a completed job: deletes the cached result immediately. Metadata (ID, timing) is kept for 30 days either way.

No response body. Idempotent — calling on an already-cancelled job still returns 204.

Error envelope

Every error response uses a consistent shape:

4xx / 5xx response
{
  "error": {
    "code": "quota_exceeded",
    "message": "You've used 100 / 100 pages this month.",
    "request_id": "req_01HQ..."
  }
}

Common codes

  • 400 validation_error — bad request body
  • 401 unauthorized — missing or invalid API key
  • 402 quota_exceeded — over your monthly page limit
  • 403 forbidden — job belongs to a different user
  • 404 not_found — job or result does not exist
  • 410 gone — result cache expired (after 24h)
  • 413 payload_too_large — request body over 64 KiB
  • 422 unsafe_url — URL resolves to a private / internal address
  • 422 tos_blocked — target site's ToS forbids automated access
  • 429 rate_limited — slow down
  • 5xx internal_error — it's us, not you. The request_id helps debugging.

Rate limits

During private beta:

  • 60 requests/minute on mutating endpoints (POST, DELETE)
  • 120 requests/minute on read endpoints (GET)
  • See your usage page for your current monthly page allowance.

Every response carries X-RateLimit-Remaining and X-RateLimit-Reset headers. Rate limits tighten or loosen with your plan at public launch.