diff --git a/api-reference/endpoint/smartcrawler/start.mdx b/api-reference/endpoint/smartcrawler/start.mdx index 60716e5..c28c7a6 100644 --- a/api-reference/endpoint/smartcrawler/start.mdx +++ b/api-reference/endpoint/smartcrawler/start.mdx @@ -36,8 +36,9 @@ Content-Type: `application/json` "same_domain": "boolean" }, "sitemap": "boolean", - "stealth": "boolean" - "webhook_url": str + "stealth": "boolean", + "webhook_url": "string", + "wait_ms": "integer" } ``` @@ -58,7 +59,8 @@ Content-Type: `application/json` | rules | object | No | - | Crawl rules for filtering URLs. Object with optional fields: `exclude` (array of regex URL patterns), `include_paths` (array of path patterns to include, supports wildcards `*` and `**`), `exclude_paths` (array of path patterns to exclude, takes precedence over `include_paths`), `same_domain` (boolean, default: true). See Rules section below for details. | | sitemap | boolean | No | false | Use sitemap.xml for discovery | | stealth | boolean | No | false | Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost | -| webhook_url | str | No | None | Webhook URL to send the job result to. When provided, a signed webhook notification will be sent upon job completion. See [Webhook Signature Verification](#webhook-signature-verification) below. +| webhook_url | string | No | None | Webhook URL to send the job result to. When provided, a signed webhook notification will be sent upon job completion. See [Webhook Signature Verification](#webhook-signature-verification) below. | +| wait_ms | integer | No | 3000 | Milliseconds to wait before scraping each page. Useful for pages with heavy JavaScript rendering that need extra time to load. | ### Example ```json diff --git a/services/smartcrawler.mdx b/services/smartcrawler.mdx index a0502c0..2dc004e 100644 --- a/services/smartcrawler.mdx +++ b/services/smartcrawler.mdx @@ -107,6 +107,7 @@ curl -X 'POST' \ | rules | object | No | Crawl rules object with optional fields: `exclude` (array of regex URL patterns), `include_paths` (array of path patterns to include, supports wildcards `*` and `**`), `exclude_paths` (array of path patterns to exclude, takes precedence over `include_paths`), `same_domain` (boolean, default: true). See below for details. | | sitemap | bool | No | Use sitemap.xml for discovery (default: false). | | webhook_url | string | No | URL to receive webhook notification on job completion. | +| wait_ms | int | No | Milliseconds to wait before scraping each page. Useful for pages with heavy JavaScript rendering that need extra time to load (default: 3000). | @@ -463,6 +464,7 @@ POST https://api.scrapegraphai.com/v1/crawl | max_pages | int | No | Max pages to crawl | | rules | object | No | Crawl rules object with optional fields: `exclude` (regex URL patterns), `include_paths` (path patterns to include), `exclude_paths` (path patterns to exclude), `same_domain` (boolean) | | sitemap | bool | No | Use sitemap.xml | +| wait_ms | int | No | Milliseconds to wait before scraping each page. Useful for pages with heavy JavaScript rendering (default: 3000). | #### Response Format ```json