From 4656847c5136ace7de575c5b48c590013ad0622d Mon Sep 17 00:00:00 2001 From: fra Date: Fri, 13 Feb 2026 15:11:30 +0100 Subject: [PATCH 1/2] feat: add Claude Code skill for just-scrape CLI usage MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a SKILL.md that teaches Claude how to use the just-scrape CLI — command selection, flags, examples, workflows, and credit costs. Installable via: npx skills add ScrapeGraphAI/just-scrape Co-Authored-By: Claude Opus 4.6 --- .claude/skills/just-scrape/SKILL.md | 284 ++++++++++++++++++++++++++++ 1 file changed, 284 insertions(+) create mode 100644 .claude/skills/just-scrape/SKILL.md diff --git a/.claude/skills/just-scrape/SKILL.md b/.claude/skills/just-scrape/SKILL.md new file mode 100644 index 0000000..f9c7f42 --- /dev/null +++ b/.claude/skills/just-scrape/SKILL.md @@ -0,0 +1,284 @@ +--- +name: just-scrape +description: "CLI tool for AI-powered web scraping, data extraction, search, and crawling via ScrapeGraph AI. Use when the user needs to scrape websites, extract structured data from URLs, convert pages to markdown, crawl multi-page sites, search the web for information, automate browser interactions (login, click, fill forms), get raw HTML, discover sitemaps, or generate JSON schemas. Triggers on tasks involving: (1) extracting data from websites, (2) web scraping or crawling, (3) converting webpages to markdown, (4) AI-powered web search with extraction, (5) browser automation, (6) generating output schemas for scraping. The CLI is just-scrape (npm package just-scrape)." +--- + +# Web Scraping with just-scrape + +## Setup + +```bash +npm install -g just-scrape +export SGAI_API_KEY="sgai-..." +``` + +API key resolution order: `SGAI_API_KEY` env var → `.env` file → `~/.scrapegraphai/config.json` → interactive prompt (saves to config). + +## Command Selection + +| Need | Command | +|---|---| +| Extract structured data from a known URL | `smart-scraper` | +| Search the web and extract from results | `search-scraper` | +| Convert a page to clean markdown | `markdownify` | +| Crawl multiple pages from a site | `crawl` | +| Get raw HTML | `scrape` | +| Automate browser actions (login, click, fill) | `agentic-scraper` | +| Generate a JSON schema from description | `generate-schema` | +| Get all URLs from a sitemap | `sitemap` | +| Check credit balance | `credits` | +| Browse past requests | `history` | +| Validate API key | `validate` | + +## Common Flags + +All commands support `--json` for machine-readable output (suppresses banner, spinners, prompts). + +Scraping commands share these optional flags: +- `--render-js` — render JavaScript (+1 credit) +- `--stealth` — bypass anti-bot detection (+4 credits) +- `--headers ` — custom HTTP headers as JSON string +- `--schema ` — enforce output JSON schema + +## Commands + +### Smart Scraper + +Extract structured data from any URL using AI. + +```bash +just-scrape smart-scraper -p +just-scrape smart-scraper -p --schema +just-scrape smart-scraper -p --scrolls # infinite scroll (0-100) +just-scrape smart-scraper -p --pages # multi-page (1-100) +just-scrape smart-scraper -p --render-js # JS rendering (+1 credit) +just-scrape smart-scraper -p --stealth # anti-bot (+4 credits) +just-scrape smart-scraper -p --cookies --headers +just-scrape smart-scraper -p --plain-text +``` + +```bash +# E-commerce extraction +just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings" + +# Strict schema + scrolling +just-scrape smart-scraper https://news.example.com -p "Get headlines and dates" \ + --schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}' \ + --scrolls 5 + +# JS-heavy SPA behind anti-bot +just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats" \ + --render-js --stealth +``` + +### Search Scraper + +Search the web and extract structured data from results. + +```bash +just-scrape search-scraper +just-scrape search-scraper --num-results # sources to scrape (3-20, default 3) +just-scrape search-scraper --no-extraction # markdown only (2 credits vs 10) +just-scrape search-scraper --schema +just-scrape search-scraper --stealth --headers +``` + +```bash +# Research across sources +just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10 + +# Cheap markdown-only +just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5 + +# Structured output +just-scrape search-scraper "Top 5 cloud providers pricing" \ + --schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}' +``` + +### Markdownify + +Convert any webpage to clean markdown. + +```bash +just-scrape markdownify +just-scrape markdownify --render-js # +1 credit +just-scrape markdownify --stealth # +4 credits +just-scrape markdownify --headers +``` + +```bash +just-scrape markdownify https://blog.example.com/my-article +just-scrape markdownify https://protected.example.com --render-js --stealth +just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md +``` + +### Crawl + +Crawl multiple pages and extract data from each. + +```bash +just-scrape crawl -p +just-scrape crawl -p --max-pages # default 10 +just-scrape crawl -p --depth # default 1 +just-scrape crawl --no-extraction --max-pages # markdown only (2 credits/page) +just-scrape crawl -p --schema +just-scrape crawl -p --rules # include_paths, same_domain +just-scrape crawl -p --no-sitemap +just-scrape crawl -p --render-js --stealth +``` + +```bash +# Crawl docs site +just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3 + +# Filter to blog pages only +just-scrape crawl https://example.com -p "Extract article titles" \ + --rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50 + +# Raw markdown, no AI extraction (cheaper) +just-scrape crawl https://example.com --no-extraction --max-pages 10 +``` + +### Scrape + +Get raw HTML content from a URL. + +```bash +just-scrape scrape +just-scrape scrape --render-js # +1 credit +just-scrape scrape --stealth # +4 credits +just-scrape scrape --branding # extract logos/colors/fonts (+2 credits) +just-scrape scrape --country-code +``` + +```bash +just-scrape scrape https://example.com +just-scrape scrape https://store.example.com --stealth --country-code DE +just-scrape scrape https://example.com --branding +``` + +### Agentic Scraper + +Browser automation with AI — login, click, navigate, fill forms. Steps are comma-separated strings. + +```bash +just-scrape agentic-scraper -s +just-scrape agentic-scraper -s --ai-extraction -p +just-scrape agentic-scraper -s --schema +just-scrape agentic-scraper -s --use-session # persist browser session +``` + +```bash +# Login + extract dashboard +just-scrape agentic-scraper https://app.example.com/login \ + -s "Fill email with user@test.com,Fill password with secret,Click Sign In" \ + --ai-extraction -p "Extract all dashboard metrics" + +# Multi-step form +just-scrape agentic-scraper https://example.com/wizard \ + -s "Click Next,Select Premium plan,Fill name with John,Click Submit" + +# Persistent session across runs +just-scrape agentic-scraper https://app.example.com \ + -s "Click Settings" --use-session +``` + +### Generate Schema + +Generate a JSON schema from a natural language description. + +```bash +just-scrape generate-schema +just-scrape generate-schema --existing-schema +``` + +```bash +just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array" + +# Refine an existing schema +just-scrape generate-schema "Add an availability field" \ + --existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}' +``` + +### Sitemap + +Get all URLs from a website's sitemap. + +```bash +just-scrape sitemap +just-scrape sitemap https://example.com --json | jq -r '.urls[]' +``` + +### History + +Browse request history. Interactive by default (arrow keys to navigate, select to view details). + +```bash +just-scrape history # interactive browser +just-scrape history # specific request +just-scrape history --page +just-scrape history --page-size # max 100 +just-scrape history --json +``` + +Services: `markdownify`, `smartscraper`, `searchscraper`, `scrape`, `crawl`, `agentic-scraper`, `sitemap` + +```bash +just-scrape history smartscraper +just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}' +``` + +### Credits & Validate + +```bash +just-scrape credits +just-scrape credits --json | jq '.remaining_credits' +just-scrape validate +``` + +## Common Patterns + +### Generate schema then scrape with it + +```bash +just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json +just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)" +``` + +### Pipe JSON for scripting + +```bash +just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do + just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl +done +``` + +### Protected sites + +```bash +# JS-heavy SPA behind Cloudflare +just-scrape smart-scraper https://protected.example.com -p "Extract data" --render-js --stealth + +# With custom cookies/headers +just-scrape smart-scraper https://example.com -p "Extract data" \ + --cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}' +``` + +## Credit Costs + +| Feature | Extra Credits | +|---|---| +| `--render-js` | +1 per page | +| `--stealth` | +4 per request | +| `--branding` (scrape only) | +2 | +| `search-scraper` extraction | 10 per request | +| `search-scraper --no-extraction` | 2 per request | +| `crawl --no-extraction` | 2 per page | + +## Environment Variables + +```bash +SGAI_API_KEY=sgai-... # API key +JUST_SCRAPE_TIMEOUT_S=300 # Request timeout in seconds (default 120) +JUST_SCRAPE_DEBUG=1 # Debug logging to stderr +``` From 4c6f30bf34e6855f00052a6262c7e92207264e00 Mon Sep 17 00:00:00 2001 From: fra Date: Fri, 13 Feb 2026 15:11:58 +0100 Subject: [PATCH 2/2] feat: add Claude Code skill for just-scrape CLI usage MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a SKILL.md that teaches Claude how to use the just-scrape CLI — command selection, flags, examples, workflows, and credit costs. Installable via: npx skills add ScrapeGraphAI/just-scrape Co-Authored-By: Claude Opus 4.6 --- {.claude/skills => skills}/just-scrape/SKILL.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {.claude/skills => skills}/just-scrape/SKILL.md (100%) diff --git a/.claude/skills/just-scrape/SKILL.md b/skills/just-scrape/SKILL.md similarity index 100% rename from .claude/skills/just-scrape/SKILL.md rename to skills/just-scrape/SKILL.md