Add OpenLineage parent info injection to GlueJobOperator #64513
Open
rahul-madaan wants to merge 2 commits intoapache:mainfrom
Open
Add OpenLineage parent info injection to GlueJobOperator #64513rahul-madaan wants to merge 2 commits intoapache:mainfrom
rahul-madaan wants to merge 2 commits intoapache:mainfrom
Conversation
…ed via airfow Signed-off-by: Rahul Madan <madan.rahul9@gmail.com>
Contributor
|
@rahul-madaan there are some CI issues: https://github.com/apache/airflow/actions/runs/23773037756/job/69268638122?pr=64513 |
Signed-off-by: Rahul Madan <madan.rahul9@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add two new parameters to
GlueJobOperatorthat automatically injectOpenLineage lineage-linking properties into the Glue job's Spark
--confargument at run time:
openlineage_inject_parent_job_info— injectsparentRunFacetandrootParentRunSpark properties so the Glue Spark job's OpenLineageevents carry a parent-run link back to the triggering Airflow task.
openlineage_inject_transport_info— injectsspark.openlineage.transport.*Spark properties so the Glue job sendsOpenLineage events to the same backend as Airflow.
Both parameters default to the existing
openlineage.spark_inject_parent_job_info/openlineage.spark_inject_transport_infoconfig values (already used bythe Databricks operators), so instance-level defaults work without
per-operator changes.
The injection logic lives in two new helpers in
airflow.providers.openlineage.utils.spark:inject_parent_job_information_into_glue_argumentsandinject_transport_information_into_glue_arguments. They mirror theexisting
inject_*_into_spark_propertiesfunctions used by the Spark/Databricks operators but produce a Glue-compatible
--confstring —multiple Spark properties chained with
--confas separator, which ishow AWS Glue's Spark launcher parses them. Both helpers are idempotent:
if the relevant
spark.openlineage.*keys are already present in--conf, the injection is skipped.Was generative AI tooling used to co-author this PR?
Generated-by: Claude Sonnet 4.6 following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.