Skip to content

chore: sync upstream firecrawl/main (2026-04-06)#4

Open
cayman-openclaw wants to merge 454 commits intomainfrom
sync/upstream-2026-04-06
Open

chore: sync upstream firecrawl/main (2026-04-06)#4
cayman-openclaw wants to merge 454 commits intomainfrom
sync/upstream-2026-04-06

Conversation

@cayman-openclaw
Copy link
Copy Markdown
Collaborator

@cayman-openclaw cayman-openclaw commented Apr 6, 2026

Summary

  • merge upstream/main into fork main via sync/upstream-2026-04-06
  • preserve existing fork-specific history on main
  • no merge conflicts during integration

Upstream drift snapshot before sync

  • fork ahead of upstream by 21 commits
  • fork behind upstream by 453 commits

Review focus

  • API/controller changes in apps/api (v1/v2, browser/scrape/search/billing/autumn)
  • new SDK surfaces (apps/elixir-sdk, apps/java-sdk, expanded rust/python/js v2 support)
  • infra/workflow and deployment updates (.github/workflows, helm templates, docker/k8s files)

Validation

  • merge completed cleanly (git merge --no-ff upstream/main)
  • no conflict resolution edits were required

Notes

  • This PR is an upstream integration sync and is expected to be large.
  • Recommend CI pass plus targeted smoke checks before merge.

mogery and others added 30 commits February 26, 2026 19:37
* ci(tests): add JUnit XML reporting for test server CI

Add jest-junit reporter to produce JUnit XML test results, and
configure the test-server workflow to upload reports as artifacts
and render them in the GitHub PR checks UI via dorny/test-reporter.

* ci(tests): remove dorny/test-reporter in favor of Blacksmith auto-detection

Blacksmith automatically detects and parses JUnit XML files written to
disk during the job. The jest-junit reporter already writes the file,
so the dorny/test-reporter step and artifact upload are unnecessary.

* ci(tests): restore dorny/test-reporter and artifact upload

Blacksmith auto-detects JUnit XML files on disk (no config needed),
but didn't trigger on the first run. Restore the dorny reporter and
artifact upload so we get GitHub check annotations either way.

* ci(tests): match original working JUnit config from PR firecrawl#2462

Align jest-junit and dorny/test-reporter config to match the original
implementation that worked with Blacksmith: use <rootDir>/test-results,
addFileAttribute, suiteNameTemplate, and fail-on-error: true.
Co-authored-by: Nicolas <20311743+nickscamara@users.noreply.github.com>
…rawl#2924)

* bump sdk versions

* Rename persistentSession to profile across APIs
…crawl#2923)

Replace plain Error throws in buildFallbackList with a new
BrandingNotSupportedError (TransportableError subclass) so these
expected user-input validation errors are handled as known errors
instead of being captured as unhandled exceptions in Sentry.

Fixes https://firecrawl.sentry.io/issues/API-AV
Fixes https://firecrawl.sentry.io/issues/API-N7
* fix: pass X402_FACILITATOR_URL to resource server

* update lockfile
* feat(pdf) Check for %PDF magic bytes on Pdf Engine

* broader check with test
…2931)

- Pipeline Redis writes for queued jobs using redis.pipeline() instead
  of sequential awaits (3N round trips → 1 batch)
- Increase ZSCAN COUNT from 20 to 1000 to reduce Redis round trips when
  scanning existing concurrency queues
- Use ZCOUNT instead of ZRANGEBYSCORE().length to count active jobs
  without transferring all member strings
- Parallelize ACUC team lookups for Crawl and Extract concurrency limits
- Return detailed response with per-team stats (backlogged, queued,
  started) so callers can see progress

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…irecrawl#2934)

The concurrency-queue-backfill endpoint was calling getACUCTeam with
RateLimiterMode.Extract, which triggers a 15x multiplier in the
Postgres ACUC function that can overflow integer columns for
high-usage teams. Use only the crawl ACUC (no 15x multiplier) for
concurrency limits. Also wrap each team iteration in try-catch so
one failing team doesn't abort the entire backfill.

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…#2937)

The precrawl team (86100d9a) has ~2.2M backlogged jobs which causes
the backfill endpoint to time out before it can process any other
teams. Filter it out so the endpoint can unstick all other teams.

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…erabilities (firecrawl#2935)

Bump minimatch override from >=10.2.1 to >=10.2.3 (GHSA-23c5-xmqv-rm74, GHSA-7r86-cg39-jmmj)
Bump fast-xml-parser override from ^5.3.6 to ^5.3.8 (GHSA-fj3w-jwp8-x2g3)
* feat(api): promote from reconciler

* fix(api): unbounded loop
Signed-off-by: Gaurav Chadha <gauravchadha1676@gmail.com>
Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: Gaurav Chadha <gauravchadha1676@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Gaurav Chadha <65453826+Chadha93@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
mogery and others added 30 commits March 30, 2026 16:37
* fix: resolve npm audit vulnerabilities across all apps

- API: override handlebars>=4.7.9, path-to-regexp>=0.1.13, brace-expansion>=5.0.5
- Playwright Service: override path-to-regexp>=8.4.0
- JS SDK Firecrawl: override handlebars>=4.7.9, brace-expansion>=5.0.5
- Test Suite: override brace-expansion>=5.0.5
- Ingestion UI: override brace-expansion>=5.0.5
- Test Site: update astro to ^5.18.1

* fix(api): pin path-to-regexp override to 0.1.13 for express 4 compat

The previous override >=0.1.13 resolved to 8.x which has an
incompatible API, breaking express 4 routing.
…iment (firecrawl#3253)

Replace single Jaccard similarity with full precision/recall/F1 metrics
to better diagnose OCR quality issues (e.g. hallucinated words vs missing
content). Keeps wordSimilarity field as Jaccard for backward compat.
…firecrawl#3160)

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…ath (firecrawl#3256)

The /crawl/{id}/errors endpoint's Supabase fallback path (used when
Redis crawl data has expired) queried .eq('success', false) on the
scrapes table, but the actual column is named 'is_successful'. This
caused a guaranteed 500 UNKNOWN_ERROR for any crawl errors request
after Redis TTL expiry.

Also adds null-safety for scrape.error which is a nullable column,
preventing a TypeError crash in deserializeTransportableError when
a failed scrape has a null error field.

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
firecrawl#3257)

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
The branding extraction pipeline never populated colors.secondary despite
it being defined in the TypeScript type and documented in example output.
The LLM schema, JS color inference, merge logic, and prompt all lacked
secondary color support.

Add secondaryColor to the LLM colorRoles schema, infer a secondary color
from the JS palette (distinct chromatic color after primary and accent),
map it through the merge layer, and prompt the LLM to extract it.

Co-Authored-By: micahstairs <micah@sideguide.dev>
…ry-color

fix(branding): populate colors.secondary in branding extraction
…recrawl#3259)

* fix(js-sdk): pin axios to 1.14.0 to mitigate supply chain attack

axios 1.14.1 and 0.30.4 were compromised with a malicious dependency
(plain-crypto-js@4.2.1) that deploys a RAT via postinstall hook.

This pins the direct dependency to 1.14.0 and adds pnpm overrides to
block the compromised versions for any transitive resolution.

Advisory: https://www.aikido.dev/blog/axios-npm-compromised-maintainer-hijacked-rat

* fix(js-sdk): update pnpm-lock.yaml to lock axios at 1.14.0

Regenerated lockfile so pnpm install resolves the pinned safe version
deterministically. Without this, the lockfile could still resolve to
a compromised version in CI pipelines using frozen lockfiles.
Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…ag (firecrawl#3263)

Add AUTO_RECHARGE_ENABLED environment variable (defaults to false) to
globally disable the auto-recharge flow. When unset or false,
auto-recharge will not trigger regardless of per-team database settings.
Set AUTO_RECHARGE_ENABLED=true to re-enable the previous behavior.

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…rawl#3269)

The browser create controller and scrape-browser interact controller
had ad-hoc credit checks that read req.acuc.remaining_credits directly,
bypassing the Autumn billing system entirely. This caused 402 errors for
customers whose credits are managed by Autumn even when they had
sufficient balance.

Both checks now use autumnService.checkCredits() directly. The legacy
ACUC remaining_credits check has been removed.

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
The API supports a `profile` parameter for browser profile persistence
across scrapes, but the Python SDK's scrape() method signature didn't
include it. The ScrapeOptions type and validation logic already handled
it, but callers had no way to pass it through the explicit kwargs.
…e-profile-param

fix(python-sdk): expose profile parameter on scrape() method
…ecrawl#3272)

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devhims <thinktank.himanshu@gmail.com>
… pipeline (firecrawl#3281)

The deriveDiff transformer compares the current document.markdown against
the previously stored document from GCS. The stored document has already
been through the full pipeline including removeBase64Images, which
replaces inline base64 data with '<Base64-Image-Removed>' placeholders.
However, deriveDiff was running before removeBase64Images, so the current
markdown still had raw base64 data during comparison. This caused change
tracking to always report 'changed' for pages with inline base64 images
when removeBase64Images was true (the default).

Moving removeBase64Images before deriveDiff ensures both the current and
stored markdown have the same base64 treatment before comparison.

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…rawl#3283)

Add the ignore_robots_txt parameter to CrawlRequest and CrawlParamsData
models, and add the corresponding field mapping (ignoreRobotsTxt) in the
crawl method. This parameter was already supported by the API but missing
from the Python SDK.

Bump version to 4.22.0.

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…l#3285)

ACUC is no longer the source of truth for credit balances now that
we are migrating to Autumn. Temporarily disable the APPROACHING_LIMIT
and LIMIT_REACHED email notifications until they are re-wired to use
Autumn balances.

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…wl#3288)

Auto-recharge is now handled entirely by Autumn. Comment out the legacy
auto-recharge settings fetch (DB query + Redis cache) and the trigger
block to avoid double-charging or firing at the wrong threshold.

- Hardcode isAutoRechargeEnabled=false in evaluateTeamCredits
- Comment out auto-recharge settings fetch from teams table
- Comment out auto-recharge trigger block
- Remove unused imports (autoCharge, supabase, redis, config, etc.)
- Update knip config to ignore orphaned auto-recharge files

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
…l#3286)

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: micahstairs <micah@sideguide.dev>
- Update lodash to ^4.18.0 in apps/api (GHSA-r5fr-rjxr-66jc, GHSA-f23m-r3pf-42rh)
- Override lodash >=4.18.0 in apps/test-suite (transitive via artillery)
- Override defu >=6.1.5 in apps/test-site (transitive via astro>unstorage>h3)
* track search results clickhouse

* env rename
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.