ScrapeGraphAI · FrancescoSaverioZuppichini · Feb 13, 2026 · Feb 13, 2026 · Feb 13, 2026
diff --git a/skills/just-scrape/SKILL.md b/skills/just-scrape/SKILL.md
@@ -0,0 +1,284 @@
+---
+name: just-scrape
+description: "CLI tool for AI-powered web scraping, data extraction, search, and crawling via ScrapeGraph AI. Use when the user needs to scrape websites, extract structured data from URLs, convert pages to markdown, crawl multi-page sites, search the web for information, automate browser interactions (login, click, fill forms), get raw HTML, discover sitemaps, or generate JSON schemas. Triggers on tasks involving: (1) extracting data from websites, (2) web scraping or crawling, (3) converting webpages to markdown, (4) AI-powered web search with extraction, (5) browser automation, (6) generating output schemas for scraping. The CLI is just-scrape (npm package just-scrape)."
+---
+
+# Web Scraping with just-scrape
+
+## Setup
+
+```bash
+npm install -g just-scrape
+export SGAI_API_KEY="sgai-..."
+```
+
+API key resolution order: `SGAI_API_KEY` env var → `.env` file → `~/.scrapegraphai/config.json` → interactive prompt (saves to config).
+
+## Command Selection
+
+| Need | Command |
+|---|---|
+| Extract structured data from a known URL | `smart-scraper` |
+| Search the web and extract from results | `search-scraper` |
+| Convert a page to clean markdown | `markdownify` |
+| Crawl multiple pages from a site | `crawl` |
+| Get raw HTML | `scrape` |
+| Automate browser actions (login, click, fill) | `agentic-scraper` |
+| Generate a JSON schema from description | `generate-schema` |
+| Get all URLs from a sitemap | `sitemap` |
+| Check credit balance | `credits` |
+| Browse past requests | `history` |
+| Validate API key | `validate` |
+
+## Common Flags
+
+All commands support `--json` for machine-readable output (suppresses banner, spinners, prompts).
+
+Scraping commands share these optional flags:
+- `--render-js` — render JavaScript (+1 credit)
+- `--stealth` — bypass anti-bot detection (+4 credits)
+- `--headers <json>` — custom HTTP headers as JSON string
+- `--schema <json>` — enforce output JSON schema
+
+## Commands
+
+### Smart Scraper
+
+Extract structured data from any URL using AI.
+
+```bash
+just-scrape smart-scraper <url> -p <prompt>
+just-scrape smart-scraper <url> -p <prompt> --schema <json>
+just-scrape smart-scraper <url> -p <prompt> --scrolls <n>     # infinite scroll (0-100)
+just-scrape smart-scraper <url> -p <prompt> --pages <n>       # multi-page (1-100)
+just-scrape smart-scraper <url> -p <prompt> --render-js       # JS rendering (+1 credit)
+just-scrape smart-scraper <url> -p <prompt> --stealth         # anti-bot (+4 credits)
+just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
+just-scrape smart-scraper <url> -p <prompt> --plain-text
+```
+
+```bash
+# E-commerce extraction
+just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"
+
+# Strict schema + scrolling
+just-scrape smart-scraper https://news.example.com -p "Get headlines and dates" \
+  --schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}' \
+  --scrolls 5
+
+# JS-heavy SPA behind anti-bot
+just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats" \
+  --render-js --stealth
+```
+
+### Search Scraper
+
+Search the web and extract structured data from results.
+
+```bash
+just-scrape search-scraper <prompt>
+just-scrape search-scraper <prompt> --num-results <n>     # sources to scrape (3-20, default 3)
+just-scrape search-scraper <prompt> --no-extraction       # markdown only (2 credits vs 10)
+just-scrape search-scraper <prompt> --schema <json>
+just-scrape search-scraper <prompt> --stealth --headers <json>
+```
+
+```bash
+# Research across sources
+just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10
+
+# Cheap markdown-only
+just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5
+
+# Structured output
+just-scrape search-scraper "Top 5 cloud providers pricing" \
+  --schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'
+```
+
+### Markdownify
+
+Convert any webpage to clean markdown.
+
+```bash
+just-scrape markdownify <url>
+just-scrape markdownify <url> --render-js       # +1 credit
+just-scrape markdownify <url> --stealth         # +4 credits
+just-scrape markdownify <url> --headers <json>
+```
+
+```bash
+just-scrape markdownify https://blog.example.com/my-article
+just-scrape markdownify https://protected.example.com --render-js --stealth
+just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md
+```
+
+### Crawl
+
+Crawl multiple pages and extract data from each.
+
+```bash
+just-scrape crawl <url> -p <prompt>
+just-scrape crawl <url> -p <prompt> --max-pages <n>      # default 10
+just-scrape crawl <url> -p <prompt> --depth <n>           # default 1
+just-scrape crawl <url> --no-extraction --max-pages <n>   # markdown only (2 credits/page)
+just-scrape crawl <url> -p <prompt> --schema <json>
+just-scrape crawl <url> -p <prompt> --rules <json>        # include_paths, same_domain
+just-scrape crawl <url> -p <prompt> --no-sitemap
+just-scrape crawl <url> -p <prompt> --render-js --stealth
+```
+
+```bash
+# Crawl docs site
+just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3
+
+# Filter to blog pages only
+just-scrape crawl https://example.com -p "Extract article titles" \
+  --rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50
+
+# Raw markdown, no AI extraction (cheaper)
+just-scrape crawl https://example.com --no-extraction --max-pages 10
+```
+
+### Scrape
+
+Get raw HTML content from a URL.
+
+```bash
+just-scrape scrape <url>
+just-scrape scrape <url> --render-js        # +1 credit
+just-scrape scrape <url> --stealth          # +4 credits
+just-scrape scrape <url> --branding         # extract logos/colors/fonts (+2 credits)
+just-scrape scrape <url> --country-code <iso>
+```
+
+```bash
+just-scrape scrape https://example.com
+just-scrape scrape https://store.example.com --stealth --country-code DE
+just-scrape scrape https://example.com --branding
+```
+
+### Agentic Scraper
+
+Browser automation with AI — login, click, navigate, fill forms. Steps are comma-separated strings.
+
+```bash
+just-scrape agentic-scraper <url> -s <steps>
+just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
+just-scrape agentic-scraper <url> -s <steps> --schema <json>
+just-scrape agentic-scraper <url> -s <steps> --use-session   # persist browser session
+```
+
+```bash
+# Login + extract dashboard
+just-scrape agentic-scraper https://app.example.com/login \
+  -s "Fill email with user@test.com,Fill password with secret,Click Sign In" \
+  --ai-extraction -p "Extract all dashboard metrics"
+
+# Multi-step form
+just-scrape agentic-scraper https://example.com/wizard \
+  -s "Click Next,Select Premium plan,Fill name with John,Click Submit"
+
+# Persistent session across runs
+just-scrape agentic-scraper https://app.example.com \
+  -s "Click Settings" --use-session
+```
+
+### Generate Schema
+
+Generate a JSON schema from a natural language description.
+
+```bash
+just-scrape generate-schema <prompt>
+just-scrape generate-schema <prompt> --existing-schema <json>
+```
+
+```bash
+just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"
+
+# Refine an existing schema
+just-scrape generate-schema "Add an availability field" \
+  --existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'
+```
+
+### Sitemap
+
+Get all URLs from a website's sitemap.
+
+```bash
+just-scrape sitemap <url>
+just-scrape sitemap https://example.com --json | jq -r '.urls[]'
+```
+
+### History
+
+Browse request history. Interactive by default (arrow keys to navigate, select to view details).
+
+```bash
+just-scrape history <service>                     # interactive browser
+just-scrape history <service> <request-id>        # specific request
+just-scrape history <service> --page <n>
+just-scrape history <service> --page-size <n>     # max 100
+just-scrape history <service> --json
+```
+
+Services: `markdownify`, `smartscraper`, `searchscraper`, `scrape`, `crawl`, `agentic-scraper`, `sitemap`
+
+```bash
+just-scrape history smartscraper
+just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'
+```
+
+### Credits & Validate
+
+```bash
+just-scrape credits
+just-scrape credits --json | jq '.remaining_credits'
+just-scrape validate
+```
+
+## Common Patterns
+
+### Generate schema then scrape with it
+
+```bash
+just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
+just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"
+```
+
+### Pipe JSON for scripting
+
+```bash
+just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
+  just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
+done
+```
+
+### Protected sites
+
+```bash
+# JS-heavy SPA behind Cloudflare
+just-scrape smart-scraper https://protected.example.com -p "Extract data" --render-js --stealth
+
+# With custom cookies/headers
+just-scrape smart-scraper https://example.com -p "Extract data" \
+  --cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'
+```
+
+## Credit Costs
+
+| Feature | Extra Credits |
+|---|---|
+| `--render-js` | +1 per page |
+| `--stealth` | +4 per request |
+| `--branding` (scrape only) | +2 |
+| `search-scraper` extraction | 10 per request |
+| `search-scraper --no-extraction` | 2 per request |
+| `crawl --no-extraction` | 2 per page |
+
+## Environment Variables
+
+```bash
+SGAI_API_KEY=sgai-...              # API key
+JUST_SCRAPE_TIMEOUT_S=300          # Request timeout in seconds (default 120)
+JUST_SCRAPE_DEBUG=1                # Debug logging to stderr
+```