chore: sync upstream firecrawl/main (2026-04-06)#4
Open
cayman-openclaw wants to merge 454 commits intomainfrom
Open
chore: sync upstream firecrawl/main (2026-04-06)#4cayman-openclaw wants to merge 454 commits intomainfrom
cayman-openclaw wants to merge 454 commits intomainfrom
Conversation
* ci(tests): add JUnit XML reporting for test server CI Add jest-junit reporter to produce JUnit XML test results, and configure the test-server workflow to upload reports as artifacts and render them in the GitHub PR checks UI via dorny/test-reporter. * ci(tests): remove dorny/test-reporter in favor of Blacksmith auto-detection Blacksmith automatically detects and parses JUnit XML files written to disk during the job. The jest-junit reporter already writes the file, so the dorny/test-reporter step and artifact upload are unnecessary. * ci(tests): restore dorny/test-reporter and artifact upload Blacksmith auto-detects JUnit XML files on disk (no config needed), but didn't trigger on the first run. Restore the dorny reporter and artifact upload so we get GitHub check annotations either way. * ci(tests): match original working JUnit config from PR firecrawl#2462 Align jest-junit and dorny/test-reporter config to match the original implementation that worked with Blacksmith: use <rootDir>/test-results, addFileAttribute, suiteNameTemplate, and fail-on-error: true.
Co-authored-by: Nicolas <20311743+nickscamara@users.noreply.github.com>
…rawl#2924) * bump sdk versions * Rename persistentSession to profile across APIs
…crawl#2923) Replace plain Error throws in buildFallbackList with a new BrandingNotSupportedError (TransportableError subclass) so these expected user-input validation errors are handled as known errors instead of being captured as unhandled exceptions in Sentry. Fixes https://firecrawl.sentry.io/issues/API-AV Fixes https://firecrawl.sentry.io/issues/API-N7
* fix: pass X402_FACILITATOR_URL to resource server * update lockfile
* feat(pdf) Check for %PDF magic bytes on Pdf Engine * broader check with test
…2931) - Pipeline Redis writes for queued jobs using redis.pipeline() instead of sequential awaits (3N round trips → 1 batch) - Increase ZSCAN COUNT from 20 to 1000 to reduce Redis round trips when scanning existing concurrency queues - Use ZCOUNT instead of ZRANGEBYSCORE().length to count active jobs without transferring all member strings - Parallelize ACUC team lookups for Crawl and Extract concurrency limits - Return detailed response with per-team stats (backlogged, queued, started) so callers can see progress Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…irecrawl#2934) The concurrency-queue-backfill endpoint was calling getACUCTeam with RateLimiterMode.Extract, which triggers a 15x multiplier in the Postgres ACUC function that can overflow integer columns for high-usage teams. Use only the crawl ACUC (no 15x multiplier) for concurrency limits. Also wrap each team iteration in try-catch so one failing team doesn't abort the entire backfill. Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…#2937) The precrawl team (86100d9a) has ~2.2M backlogged jobs which causes the backfill endpoint to time out before it can process any other teams. Filter it out so the endpoint can unstick all other teams. Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…erabilities (firecrawl#2935) Bump minimatch override from >=10.2.1 to >=10.2.3 (GHSA-23c5-xmqv-rm74, GHSA-7r86-cg39-jmmj) Bump fast-xml-parser override from ^5.3.6 to ^5.3.8 (GHSA-fj3w-jwp8-x2g3)
Add metrics for various jobs
* feat(api): promote from reconciler * fix(api): unbounded loop
Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com> Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: Gaurav Chadha <gauravchadha1676@gmail.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Gaurav Chadha <65453826+Chadha93@users.noreply.github.com> Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
* fix: resolve npm audit vulnerabilities across all apps - API: override handlebars>=4.7.9, path-to-regexp>=0.1.13, brace-expansion>=5.0.5 - Playwright Service: override path-to-regexp>=8.4.0 - JS SDK Firecrawl: override handlebars>=4.7.9, brace-expansion>=5.0.5 - Test Suite: override brace-expansion>=5.0.5 - Ingestion UI: override brace-expansion>=5.0.5 - Test Site: update astro to ^5.18.1 * fix(api): pin path-to-regexp override to 0.1.13 for express 4 compat The previous override >=0.1.13 resolved to 8.x which has an incompatible API, breaking express 4 routing.
…iment (firecrawl#3253) Replace single Jaccard similarity with full precision/recall/F1 metrics to better diagnose OCR quality issues (e.g. hallucinated words vs missing content). Keeps wordSimilarity field as Jaccard for backward compat.
…firecrawl#3160) Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…ath (firecrawl#3256) The /crawl/{id}/errors endpoint's Supabase fallback path (used when Redis crawl data has expired) queried .eq('success', false) on the scrapes table, but the actual column is named 'is_successful'. This caused a guaranteed 500 UNKNOWN_ERROR for any crawl errors request after Redis TTL expiry. Also adds null-safety for scrape.error which is a nullable column, preventing a TypeError crash in deserializeTransportableError when a failed scrape has a null error field. Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
firecrawl#3257) Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
The branding extraction pipeline never populated colors.secondary despite it being defined in the TypeScript type and documented in example output. The LLM schema, JS color inference, merge logic, and prompt all lacked secondary color support. Add secondaryColor to the LLM colorRoles schema, infer a secondary color from the JS palette (distinct chromatic color after primary and accent), map it through the merge layer, and prompt the LLM to extract it. Co-Authored-By: micahstairs <micah@sideguide.dev>
…ry-color fix(branding): populate colors.secondary in branding extraction
…recrawl#3259) * fix(js-sdk): pin axios to 1.14.0 to mitigate supply chain attack axios 1.14.1 and 0.30.4 were compromised with a malicious dependency (plain-crypto-js@4.2.1) that deploys a RAT via postinstall hook. This pins the direct dependency to 1.14.0 and adds pnpm overrides to block the compromised versions for any transitive resolution. Advisory: https://www.aikido.dev/blog/axios-npm-compromised-maintainer-hijacked-rat * fix(js-sdk): update pnpm-lock.yaml to lock axios at 1.14.0 Regenerated lockfile so pnpm install resolves the pinned safe version deterministically. Without this, the lockfile could still resolve to a compromised version in CI pipelines using frozen lockfiles.
Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…ag (firecrawl#3263) Add AUTO_RECHARGE_ENABLED environment variable (defaults to false) to globally disable the auto-recharge flow. When unset or false, auto-recharge will not trigger regardless of per-team database settings. Set AUTO_RECHARGE_ENABLED=true to re-enable the previous behavior. Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…rawl#3269) The browser create controller and scrape-browser interact controller had ad-hoc credit checks that read req.acuc.remaining_credits directly, bypassing the Autumn billing system entirely. This caused 402 errors for customers whose credits are managed by Autumn even when they had sufficient balance. Both checks now use autumnService.checkCredits() directly. The legacy ACUC remaining_credits check has been removed. Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
The API supports a `profile` parameter for browser profile persistence across scrapes, but the Python SDK's scrape() method signature didn't include it. The ScrapeOptions type and validation logic already handled it, but callers had no way to pass it through the explicit kwargs.
…e-profile-param fix(python-sdk): expose profile parameter on scrape() method
…ecrawl#3272) Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: devhims <thinktank.himanshu@gmail.com>
… pipeline (firecrawl#3281) The deriveDiff transformer compares the current document.markdown against the previously stored document from GCS. The stored document has already been through the full pipeline including removeBase64Images, which replaces inline base64 data with '<Base64-Image-Removed>' placeholders. However, deriveDiff was running before removeBase64Images, so the current markdown still had raw base64 data during comparison. This caused change tracking to always report 'changed' for pages with inline base64 images when removeBase64Images was true (the default). Moving removeBase64Images before deriveDiff ensures both the current and stored markdown have the same base64 treatment before comparison. Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…rawl#3283) Add the ignore_robots_txt parameter to CrawlRequest and CrawlParamsData models, and add the corresponding field mapping (ignoreRobotsTxt) in the crawl method. This parameter was already supported by the API but missing from the Python SDK. Bump version to 4.22.0. Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…l#3285) ACUC is no longer the source of truth for credit balances now that we are migrating to Autumn. Temporarily disable the APPROACHING_LIMIT and LIMIT_REACHED email notifications until they are re-wired to use Autumn balances. Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…wl#3288) Auto-recharge is now handled entirely by Autumn. Comment out the legacy auto-recharge settings fetch (DB query + Redis cache) and the trigger block to avoid double-charging or firing at the wrong threshold. - Hardcode isAutoRechargeEnabled=false in evaluateTeamCredits - Comment out auto-recharge settings fetch from teams table - Comment out auto-recharge trigger block - Remove unused imports (autoCharge, supabase, redis, config, etc.) - Update knip config to ignore orphaned auto-recharge files Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
…l#3286) Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com> Co-authored-by: micahstairs <micah@sideguide.dev>
- Update lodash to ^4.18.0 in apps/api (GHSA-r5fr-rjxr-66jc, GHSA-f23m-r3pf-42rh) - Override lodash >=4.18.0 in apps/test-suite (transitive via artillery) - Override defu >=6.1.5 in apps/test-site (transitive via astro>unstorage>h3)
* track search results clickhouse * env rename
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
upstream/maininto forkmainviasync/upstream-2026-04-06mainUpstream drift snapshot before sync
Review focus
apps/api(v1/v2, browser/scrape/search/billing/autumn)apps/elixir-sdk,apps/java-sdk, expanded rust/python/js v2 support).github/workflows, helm templates, docker/k8s files)Validation
git merge --no-ff upstream/main)Notes