Skip to content

Add OpenLineage parent info injection to GlueJobOperator #64513

Open
rahul-madaan wants to merge 2 commits intoapache:mainfrom
rahul-madaan:rahul-madaan-glue-inject-parent
Open

Add OpenLineage parent info injection to GlueJobOperator #64513
rahul-madaan wants to merge 2 commits intoapache:mainfrom
rahul-madaan:rahul-madaan-glue-inject-parent

Conversation

@rahul-madaan
Copy link
Copy Markdown
Contributor


Add two new parameters to GlueJobOperator that automatically inject
OpenLineage lineage-linking properties into the Glue job's Spark --conf
argument at run time:

  • openlineage_inject_parent_job_info — injects parentRunFacet and
    rootParentRun Spark properties so the Glue Spark job's OpenLineage
    events carry a parent-run link back to the triggering Airflow task.
  • openlineage_inject_transport_info — injects
    spark.openlineage.transport.* Spark properties so the Glue job sends
    OpenLineage events to the same backend as Airflow.

Both parameters default to the existing
openlineage.spark_inject_parent_job_info /
openlineage.spark_inject_transport_info config values (already used by
the Databricks operators), so instance-level defaults work without
per-operator changes.

The injection logic lives in two new helpers in
airflow.providers.openlineage.utils.spark:
inject_parent_job_information_into_glue_arguments and
inject_transport_information_into_glue_arguments. They mirror the
existing inject_*_into_spark_properties functions used by the Spark/
Databricks operators but produce a Glue-compatible --conf string —
multiple Spark properties chained with --conf as separator, which is
how AWS Glue's Spark launcher parses them. Both helpers are idempotent:
if the relevant spark.openlineage.* keys are already present in
--conf, the injection is skipped.

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    Generated-by: Claude Sonnet 4.6 following the guidelines

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

…ed via airfow

Signed-off-by: Rahul Madan <madan.rahul9@gmail.com>
@mobuchowski
Copy link
Copy Markdown
Contributor

Signed-off-by: Rahul Madan <madan.rahul9@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants