Skip to content

ScrapeGraphAI/scrapegraph-js

Repository files navigation

ScrapeGraphAI JS SDK

npm version License: MIT

ScrapeGraphAI JS SDK

Official TypeScript SDK for the ScrapeGraphAI AI API.

Install

npm i scrapegraph-js
# or
bun add scrapegraph-js

Quick Start

import { ScrapeGraphAI } from "scrapegraph-js";

// reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI({ apiKey: "..." })
const sgai = ScrapeGraphAI();

const result = await sgai.scrape({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
});

if (result.status === "success") {
  console.log(result.data?.results.markdown?.data);
} else {
  console.error(result.error);
}

Every function returns ApiResult<T> — no exceptions to catch:

type ApiResult<T> = {
  status: "success" | "error";
  data: T | null;
  error?: string;
  elapsedMs: number;
};

API

scrape

Scrape a webpage in multiple formats (markdown, html, screenshot, json, etc).

const res = await sgai.scrape({
  url: "https://example.com",
  formats: [
    { type: "markdown", mode: "reader" },
    { type: "screenshot", fullPage: true, width: 1440, height: 900 },
    { type: "json", prompt: "Extract product info" },
  ],
  contentType: "text/html",        // optional, auto-detected
  fetchConfig: {                   // optional
    mode: "js",                    // "auto" | "fast" | "js"
    stealth: true,
    timeout: 30000,
    wait: 2000,
    scrolls: 3,
    headers: { "Accept-Language": "en" },
    cookies: { session: "abc" },
    country: "us",
  },
});

Formats:

  • markdown — Clean markdown (modes: normal, reader, prune)
  • html — Raw HTML (modes: normal, reader, prune)
  • links — All links on the page
  • images — All image URLs
  • summary — AI-generated summary
  • json — Structured extraction with prompt/schema
  • branding — Brand colors, typography, logos
  • screenshot — Page screenshot (fullPage, width, height, quality)

extract

Extract structured data from a URL, HTML, or markdown using AI.

const res = await sgai.extract({
  url: "https://example.com",
  prompt: "Extract product names and prices",
  schema: { /* JSON schema */ },   // optional
  mode: "reader",                  // optional
  fetchConfig: { /* ... */ },      // optional
});
// Or pass html/markdown directly instead of url

search

Search the web and optionally extract structured data.

const res = await sgai.search({
  query: "best programming languages 2024",
  numResults: 5,                   // 1-20, default 3
  format: "markdown",              // "markdown" | "html"
  prompt: "Extract key points",    // optional, for AI extraction
  schema: { /* ... */ },           // optional
  timeRange: "past_week",          // optional
  locationGeoCode: "us",           // optional
  fetchConfig: { /* ... */ },      // optional
});

crawl

Crawl a website and its linked pages.

// Start a crawl
const start = await sgai.crawl.start({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
  maxPages: 50,
  maxDepth: 2,
  maxLinksPerPage: 10,
  includePatterns: ["/blog/*"],
  excludePatterns: ["/admin/*"],
  fetchConfig: { /* ... */ },
});

// Check status
const status = await sgai.crawl.get(start.data?.id!);

// Control
await sgai.crawl.stop(id);
await sgai.crawl.resume(id);
await sgai.crawl.delete(id);

monitor

Monitor a webpage for changes on a schedule.

// Create a monitor
const mon = await sgai.monitor.create({
  url: "https://example.com",
  name: "Price Monitor",
  interval: "0 * * * *",           // cron expression
  formats: [{ type: "markdown" }],
  webhookUrl: "https://...",       // optional
  fetchConfig: { /* ... */ },
});

// Manage monitors
await sgai.monitor.list();
await sgai.monitor.get(cronId);
await sgai.monitor.update(cronId, { interval: "0 */6 * * *" });
await sgai.monitor.pause(cronId);
await sgai.monitor.resume(cronId);
await sgai.monitor.delete(cronId);

history

Fetch request history.

const list = await sgai.history.list({
  service: "scrape",               // optional filter
  page: 1,
  limit: 20,
});

const entry = await sgai.history.get("request-id");

credits / healthy

const credits = await sgai.credits();
// { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }

const health = await sgai.healthy();
// { status: "ok", uptime: 12345 }

Examples

Service Example Description
scrape scrape_basic.ts Basic markdown scraping
scrape scrape_multi_format.ts Multiple formats (markdown, links, images, screenshot, summary)
scrape scrape_json_extraction.ts Structured JSON extraction with schema
scrape scrape_pdf.ts PDF document parsing with OCR metadata
scrape scrape_with_fetchconfig.ts JS rendering, stealth mode, scrolling
extract extract_basic.ts AI data extraction from URL
extract extract_with_schema.ts Extraction with JSON schema
search search_basic.ts Web search with results
search search_with_extraction.ts Search + AI extraction
crawl crawl_basic.ts Start and monitor a crawl
crawl crawl_with_formats.ts Crawl with screenshots and patterns
monitor monitor_basic.ts Create a page monitor
monitor monitor_with_webhook.ts Monitor with webhook notifications
utilities credits.ts Check account credits and limits
utilities health.ts API health check
utilities history.ts Request history

Environment Variables

Variable Description Default
SGAI_API_KEY Your ScrapeGraphAI API key
SGAI_API_URL Override API base URL https://api.scrapegraphai.com/api/v2
SGAI_DEBUG Enable debug logging ("1") off
SGAI_TIMEOUT Request timeout in seconds 120

Development

bun install
bun run test              # unit tests
bun run test:integration  # live API tests (requires SGAI_API_KEY)
bun run build             # tsup → dist/
bun run check             # tsc --noEmit + biome

License

MIT - ScrapeGraphAI AI

About

Official JavaScript/TypeScript SDK for the ScrapeGraph AI API. Smart scraping, search, crawling, markdownify, agentic browser automation, scheduled jobs, and structured data extraction — all powered by AI.

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors