Syttra
REST API · full mode

Full-site crawl

Start from one URL and follow internal links breadth-first. Bounded by max_depth (how far from the seed) and max_pages (the hard cap on total pages). Same-domain only — links to other hosts are ignored.

When to use it

  • You want everything reachable from a seed URL.
  • The site has no useful sitemap (or you want fresher discovery than the sitemap shows).
  • You don't mind paying quota for some pages you'll never read — full mode reserves the entire max_pages budget upfront.

If you only want specific pages, prefer select mode — it lets you pick exact URLs from the sitemap before any quota is spent.

Create a full-site job

POST/v1/jobs202 Accepted

Submits a seed URL. The crawler walks all same-domain internal links via BFS until it hits max_depth or max_pages.

Request body

ParameterTypeDefault
url*
Seed URL. Must be a public HTTPS URL. Same-domain link discovery starts here.
string (URL)required
mode*
Set to "full" for this guide.
string enumrequired
export_formats
Output formats. Currently: markdown, text. Multi-page output is concatenated with --- separators.
array of strings["markdown"]
max_depth
Maximum link depth from the seed. 1-10. Depth 1 = pages directly linked from seed; 2 = links of those pages; etc.
integer3
max_pages
Hard cap on total pages crawled. 1-1000. Quota pre-flight reserves this number — pick conservatively.
integer50
concurrency
How many pages to fetch in parallel. 1-20. Higher = faster, but harder on the target site.
integer5
respect_robots
Check robots.txt before crawling. Disabling is only allowed with an Enterprise agreement.
booleantrue
check_tos
Inspect the target site's ToS for anti-scraping clauses. Jobs fail with tos_blocked if detected.
booleantrue

The urls field is select-mode only and rejected here with 422 validation_error.

Example request

POST /v1/jobs
curl https://api.syttra.com/v1/jobs \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "mode": "full",
    "max_depth": 2,
    "max_pages": 30,
    "concurrency": 5,
    "export_formats": ["markdown"]
  }'

Example response

202 Accepted
{
  "id": "b1c8e4d7-...",
  "status": "queued",
  "url": "https://example.com",
  "mode": "full",
  "links": {
    "self":   "/v1/jobs/b1c8e4d7-...",
    "result": "/v1/jobs/b1c8e4d7-.../result"
  }
}

Behaviour notes

  • Same-domain only. Links to other hostnames are silently dropped. Use multiple jobs (or select mode) for cross-domain crawls.
  • Asset URLs are skipped. Links to .png, .pdf, .css etc. are filtered before queuing — they don't count against max_pages.
  • Quota pre-flight reserves the budget. Submitting max_pages=50 when you've already used 60 of a 100-page allowance returns 402 quota_exceeded even if the actual crawl would only fetch 12.
  • BFS, not DFS. Pages closer to the seed are crawled first — handy if the crawl is cancelled mid-flight: the most relevant pages land first.

What's next

Polling, multi-page download format, and error handling live in the REST API overview. Quick links: