Two-token mechanism for task execution to prevent token expiration while tasks wait in executor queues by anishgirianish · Pull Request #60108 · apache/airflow

anishgirianish · 2026-01-04T20:57:42Z

Summary

Tasks waiting in executor queues (Celery, Kubernetes) can have their JWT tokens expire before execution starts, causing auth failures on the Execution API. This is a real problem in production, when queues back up or workers are slow to pick up tasks, the original short-lived token expires and the worker gets a 403 when it finally tries to start the task.

Fixes: #53713
Related: #59553
closes: #62129

Approach

Two-token mechanism: a long-lived workload token (24h default, configurable) travels with the task through the queue, and a short-lived execution token is issued when the task actually starts running.

The workload token carries a scope: "workload" claim and is restricted to the /run endpoint only, enforced via FastAPI SecurityScopes and a custom ExecutionAPIRoute. When /run succeeds, it returns an execution token via X-Execution-Token header. The SDK client picks it up and uses it for all subsequent API calls. The existing JWTReissueMiddleware handles refreshing execution tokens near expiry and skips workload tokens.

For dag.test() / InProcessExecutionAPI, auth is bypassed and a stub JWTGenerator with a random secret is used so no signing key configuration is needed.

New config: execution_api.jwt_workload_token_expiration_time (default 86400s)

Built on @ashb's SecurityScopes foundation.

Security considerations

Even if a workload token is intercepted, it can only call /run which already guards against running a task more than once (returns 409 if the task isn't in QUEUED/RESTARTING state). All other endpoints reject workload tokens , they require execution scope. The execution token issued by /run is short-lived and automatically refreshed, keeping the existing security posture for all API calls during task execution.

Testing

Tested end-to-end with CeleryExecutor in Breeze, triggered a DAG, confirmed tasks completed successfully with the token swap happening transparently. Unit tests cover token generation, scope enforcement (accepted on /run, rejected elsewhere), invalid scope handling, execution token header in response, SDK client token swap and priority, and registry teardown to prevent test pollution.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

tirkarthi · 2026-01-07T02:00:17Z

As per my understanding this was removed in #55506 to use a middleware that refreshes token. Are you running an instance with execution api only separately with api-server? Could this middleware approach be extended for task-sdk calls too?

cc: @vincbeck @pierrejeambrun

anishgirianish · 2026-01-07T05:29:32Z

Hi @tirkarthi,
Thanks for pointing out the middleware approach from #55506 - that's helpful context.

I took a stab at extending that pattern in #60197, handling expired tokens transparently in JWTBearer + middleware so no client-side changes are needed. Would love your thoughts on it.

Totally happy to go with whichever approach the team feels is better!

cc: @vincbeck @pierrejeambrun

vincbeck · 2026-01-07T14:32:58Z

Hi @tirkarthi, Thanks for pointing out the middleware approach from #55506 - that's helpful context.

I took a stab at extending that pattern in #60197, handling expired tokens transparently in JWTBearer + middleware so no client-side changes are needed. Would love your thoughts on it.

Totally happy to go with whichever approach the team feels is better!

cc: @vincbeck @pierrejeambrun

Would love to hear @ashb or @amoghrajesh 's opinion on this one

ashb

We can't do this approach. It lets any Execution API token be resurrected which fundamentally breaks lots of security assumptions -- it amounts to having tokens not expire. That is bad.

Instead what we should do is generate a new token (i.e. ones with extra/different set of JWT claims) that is only valid for the /run endpoint and valid for longer (say 24hours, make it configurable) and this is what gets sent in the workload.

The run endpoint then would set the header to give the running task a "short lived" token (the one we have right now basically) that is usable on the rest of the Execution API. This approach is safer as the existing controls in the /run endpoint already prevent a task being run one than once, which should also prevent against "resurrecting" an expired token and using it to access things like connections etc. And we should validate that the token used on all endpoints but run is explicitly lacking this new claim.

ashb

Much better approach, and on the right track, thanks.

Some changes though:

"queue" is not the right thing to use, as these tokens could be used for executing other workloads soon (for instance we have already talked about wanting Dag level callbacks to be executed on the workers, not in the dag processor, which would be done by having a new type from the ExecuteTaskWorkload).

so maybe we have "scope": "ExecuteTaskWorkload"?
A little bit of refactoring is needed before we are ready to merge this.

airflow-core/src/airflow/api_fastapi/auth/tokens.py

airflow-core/src/airflow/api_fastapi/execution_api/deps.py

airflow-core/src/airflow/api_fastapi/execution_api/app.py

airflow-core/src/airflow/api_fastapi/execution_api/deps.py

airflow-core/src/airflow/api_fastapi/execution_api/routes/__init__.py

airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py

airflow-core/src/airflow/config_templates/config.yml

anishgirianish · 2026-03-21T05:16:17Z

Hi @kaxil @ashb, Thank you very much for the detailed review. I have addresed the feeback in the latest push. Would like to request you for your re-reveiw when ever you get a chance. Thank you so much

airflow-core/src/airflow/config_templates/config.yml

airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py

task-sdk/src/airflow/sdk/api/client.py

amoghrajesh

Mostly looks good now, just a few basic qns / feedback otherwise I am good.

airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py

airflow-core/src/airflow/config_templates/config.yml

airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py

task-sdk/src/airflow/sdk/api/client.py

kaxil · 2026-03-26T00:58:54Z

airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py

+            "exp": 9999999999,
+            "iat": 1000000000,
+        }
+        lifespan.registry.register_value(JWTValidator, validator)


This JWTValidator registration is dead code -- the client fixture's mock_jwt_bearer overrides _jwt_bearer via FastAPI dependency overrides, so FastAPI never calls the real _jwt_bearer (which would use JWTValidator from the registry). Every request through client gets scope: "execution" regardless of what's registered here.

The test passes because execution-scoped tokens are allowed on /run, not because workload-scoped tokens are. To actually test workload token acceptance, the test needs to either:

Remove the _jwt_bearer dependency override for this test and let the real auth flow use this JWTValidator, or

Override mock_jwt_bearer to return TIToken(..., claims={..., "scope": "workload"}) instead of the conftest's hardcoded "scope": "execution".

airflow-core/tests/unit/api_fastapi/execution_api/conftest.py

airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py

kaxil · 2026-03-26T01:07:41Z

airflow-core/src/airflow/api_fastapi/auth/tokens.py


    kid: str = attrs.field(default=attrs.Factory(_generate_kid, takes_self=True))
    valid_for: float
+    workload_valid_for: float = attrs.field(


The workload_valid_for default reads from config via _conf_factory, and _jwt_generator() in app.py also reads the same config key and passes it explicitly. The explicit kwarg takes precedence, so the default factory never runs in production. Having two code paths that reference the same config key is easy to get out of sync -- consider dropping the attrs default (make it required like valid_for) and always passing it explicitly, or drop the explicit kwarg in _jwt_generator() and let the default handle it.

airflow-core/src/airflow/api_fastapi/execution_api/app.py

anishgirianish requested review from amoghrajesh, ashb and kaxil as code owners January 4, 2026 20:57

boring-cyborg bot added area:API Airflow's REST/HTTP API area:task-sdk labels Jan 4, 2026

anishgirianish force-pushed the fix/token-expiration-worker branch from b183c74 to 9c31417 Compare January 4, 2026 21:05

anishgirianish mentioned this pull request Jan 4, 2026

AIRFLOW__SCHEDULER__TASK_QUEUED_TIMEOUT configuration ignored #59553

Closed

2 tasks

anishgirianish force-pushed the fix/token-expiration-worker branch 3 times, most recently from c707ddc to 4ef9dfe Compare January 4, 2026 22:45

eladkal added this to the Airflow 3.1.6 milestone Jan 6, 2026

anishgirianish mentioned this pull request Jan 7, 2026

fix(execution-api): Refresh expired JWT tokens for active tasks #59553 #60197

Closed

ephraimbuddy modified the milestones: Airflow 3.1.6, Airflow 3.1.7 Jan 7, 2026

ashb requested changes Jan 7, 2026

View reviewed changes

anishgirianish force-pushed the fix/token-expiration-worker branch from 4ef9dfe to b32da6b Compare January 8, 2026 18:11

anishgirianish requested review from XD-DENG, hussein-awala, o-nikolas, pierrejeambrun and vincbeck as code owners January 8, 2026 18:11

anishgirianish force-pushed the fix/token-expiration-worker branch from 14a516a to 5915391 Compare January 9, 2026 02:05

ashb reviewed Jan 9, 2026

View reviewed changes

ashb self-requested a review January 9, 2026 12:09

anishgirianish force-pushed the fix/token-expiration-worker branch from e7e3ae1 to e879863 Compare January 9, 2026 23:52

anishgirianish changed the title ~~Add token refresh mechanism for Execution API (#59553)~~ Two-token mechanism for task execution to prevent token expiration while tasks wait in executor queues (#59553) Jan 10, 2026

anishgirianish force-pushed the fix/token-expiration-worker branch from b511b8f to 57ac225 Compare January 10, 2026 07:07

anishgirianish force-pushed the fix/token-expiration-worker branch 4 times, most recently from ae3141d to e1f8725 Compare March 19, 2026 23:20

anishgirianish requested a review from kaxil March 20, 2026 18:15

anishgirianish force-pushed the fix/token-expiration-worker branch from e1f8725 to ff8f59c Compare March 20, 2026 18:16

ashb reviewed Mar 24, 2026

View reviewed changes

airflow-core/src/airflow/config_templates/config.yml Outdated Show resolved Hide resolved

ashb reviewed Mar 24, 2026

View reviewed changes

airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py Show resolved Hide resolved

ashb reviewed Mar 24, 2026

View reviewed changes

task-sdk/src/airflow/sdk/api/client.py Outdated Show resolved Hide resolved

amoghrajesh reviewed Mar 24, 2026

View reviewed changes

anishgirianish force-pushed the fix/token-expiration-worker branch from ff8f59c to 4afd940 Compare March 26, 2026 00:18

kaxil reviewed Mar 26, 2026

View reviewed changes

kaxil modified the milestones: Airflow 3.1.9, Airflow 3.2.1 Mar 26, 2026

kaxil reviewed Mar 26, 2026

View reviewed changes

anishgirianish force-pushed the fix/token-expiration-worker branch 6 times, most recently from 9fc3fa3 to bb96119 Compare March 27, 2026 04:56

anishgirianish added 7 commits March 27, 2026 13:37

layed out two mechanism based on new security scope arichitecture

a5a68cd

clean ups

8e82295

fixing test

54448b1

refactor on cleanups

6d5397c

some more precise clarification on invariant overrides

9c657d2

address review feeback

d07fcff

adress review comments

9eaf6dd

anishgirianish force-pushed the fix/token-expiration-worker branch from bb96119 to 9eaf6dd Compare March 27, 2026 18:37

Conversation

anishgirianish commented Jan 4, 2026 • edited by amoghrajesh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Security considerations

Testing

Uh oh!

tirkarthi commented Jan 7, 2026

Uh oh!

anishgirianish commented Jan 7, 2026

Uh oh!

vincbeck commented Jan 7, 2026

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anishgirianish commented Mar 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amoghrajesh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaxil Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kaxil Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

anishgirianish commented Jan 4, 2026 •

edited by amoghrajesh

Loading

kaxil Mar 26, 2026 •

edited

Loading