Skip to content

Add transient-error retry to SalesforceBulkOperator#64575

Open
nagasrisai wants to merge 19 commits intoapache:mainfrom
nagasrisai:feat/salesforce-bulk-transient-retry
Open

Add transient-error retry to SalesforceBulkOperator#64575
nagasrisai wants to merge 19 commits intoapache:mainfrom
nagasrisai:feat/salesforce-bulk-transient-retry

Conversation

@nagasrisai
Copy link
Copy Markdown
Contributor

@nagasrisai nagasrisai commented Apr 1, 2026

Follow-up to #64519 as suggested by eladkal , adding retry support for transient Salesforce Bulk errors.

The issue is that Salesforce Bulk API errors come back at the record level, not as exceptions. So when Salesforce is under concurrent write load and throws UNABLE_TO_LOCK_ROW on a few records, the operator just returns those as success=False entries with no way to recover automatically. Your only option today is to pull the XCom, filter it yourself, build a retry payload, and call the operator again from the DAG.

I added three optional parameters to handle this inside the operator: max_retries controls how many times to attempt a retry (defaults to 0, so nothing changes for existing users), retry_delay is how long to wait between attempts, and transient_error_codes is the set of Salesforce status codes that qualify for retry ,defaulting to UNABLE_TO_LOCK_ROW and API_TEMPORARILY_UNAVAILABLE.

On each retry pass only the failing records are re-submitted, not the whole payload. The results slot back into their original positions since Salesforce guarantees the response order matches the input order. Permanent errors like INVALID_FIELD are not in the default set and will never be retried.

Closes #64519

Introduces max_retries, retry_delay, and transient_error_codes params.
When max_retries > 0, records that fail with a transient Salesforce error
(UNABLE_TO_LOCK_ROW or API_TEMPORARILY_UNAVAILABLE by default) are
re-submitted after retry_delay seconds, up to max_retries times.
Only the failed records are re-submitted, not the entire payload.

Related to apache#64519
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 1, 2026

@nagasrisai This PR has been converted to draft because it does not yet meet our Pull Request quality criteria.

Issues found:

  • Pre-commit / static checks: Failing: CI image checks / Static checks. Run prek run --from-ref main locally to find and fix issues. See Pre-commit / static checks docs.
  • mypy (type checking): Failing: CI image checks / MyPy checks (mypy-providers). Run prek --stage manual mypy-providers --all-files locally to reproduce. You need breeze ci-image build --python 3.10 for Docker-based mypy. See mypy (type checking) docs.
  • Provider tests: Failing: provider distributions tests / Compat 2.11.1:P3.10:, provider distributions tests / Compat 3.0.6:P3.10:, provider distributions tests / Compat 3.1.8:P3.10:, Non-DB tests: providers / Non-DB-prov::3.10:amazon...google, Low dep tests: providers / All-prov:LowestDeps:14:3.10:amazon...salesforce. Run provider tests with breeze run pytest <provider-test-path> -xvs. See Provider tests docs.

What to do next:

  • The comment informs you what you need to do.
  • Fix each issue, then mark the PR as "Ready for review" in the GitHub UI - but only after making sure that all the issues are fixed.
  • There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates.
  • Maintainers will then proceed with a normal review.

Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds built-in, record-level retry handling to SalesforceBulkOperator for transient Salesforce Bulk API failures (e.g., lock/contention errors) so DAG authors don’t need to manually re-submit failed records from XCom.

Changes:

  • Introduces max_retries, retry_delay, and transient_error_codes parameters to control transient-error retries.
  • Refactors bulk submission into _run_operation() and adds _retry_transient_failures() to re-submit only transient-failing records while preserving result ordering.
  • Adds unit tests covering retry/no-retry behavior, delay handling, and custom transient error codes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
providers/salesforce/src/airflow/providers/salesforce/operators/bulk.py Adds retry configuration, refactors operation execution, and implements transient record-level retries.
providers/salesforce/tests/unit/salesforce/operators/test_bulk_retry.py Adds unit tests validating transient retry behavior and result placement.

@nagasrisai nagasrisai marked this pull request as ready for review April 2, 2026 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants