Add code mode (Monty sandbox) to common.ai AgentOperator by kaxil · Pull Request #68407 · apache/airflow

kaxil · 2026-06-11T20:42:13Z

Summary

Adds an opt-in code_mode=True flag to AgentOperator (and @task.agent) that wraps the agent's tools in a single run_code tool executed in pydantic-ai's Monty sandbox, via the new [code-mode] extra (pydantic-ai-harness[codemode]). For multi-tool workflows the model writes one Python snippet that calls the tools as functions -- with loops, conditionals, and asyncio.gather -- instead of one model round-trip per tool call, cutting round-trips and token use.

Design rationale

Why an extra, not a base dependency. pydantic-monty is pre-1.0 and fast-moving (a Rust/native wheel). Pinning it as a hard dependency of every common.ai install would let its churn or any platform-wheel gap break the whole provider for users who never touch code mode. It is gated behind the code-mode extra and raises AirflowOptionalProviderFeatureException when used without it -- the same pattern the provider already uses for mcp, sql, and skills.
Why a bool flag rather than passing the capability through agent_params. Capability instances aren't round-trip-safe through DAG serialization (see the existing "Capabilities" docs note). code_mode is a plain boolean; the CodeMode capability is constructed at execution time in _build_agent, never stored on the serialized operator.
The tool boundary is unchanged. CodeMode collapses the tools you registered into one run_code tool; the generated code runs deny-by-default (no filesystem, network, or env access) and can only call those tools, which still execute in the worker. Code mode changes how the model invokes tools, not what it can reach.
Toolset return schemas. HookToolset and SQLToolset now set return_schema on their tool definitions so code mode renders -> str instead of -> Any. Both always return serialized strings (_serialize_for_llm / json.dumps), so {"type": "string"} is accurate. The kwarg is applied through a small version-guarded helper because ToolDefinition.return_schema is newer than the provider's pydantic-ai floor.

Usage

AgentOperator(
    task_id="analyst",
    prompt="For the top 3 customers by order count, what was each one's total spend?",
    llm_conn_id="pydanticai_default",
    system_prompt="You are a SQL analyst. Write Python that calls the tools to answer.",
    toolsets=[SQLToolset(db_conn_id="postgres_default", allowed_tables=["customers", "orders"])],
    code_mode=True,  # pip install "apache-airflow-providers-common-ai[code-mode]"
)

Gotchas / limitations

Incompatible with durable=True: durable replay caches individual model/tool steps via a shared step counter that assumes a stable call order across runs, which code mode's free-form generated Python does not guarantee. The combination is rejected at construction (mirroring the existing durable + enable_hitl_review guard).
Monty supports a subset of Python and no third-party imports; it sandboxes the glue code between tool calls, not a general code runtime.
Draft: opened for early review. The real CodeMode round-trip is exercised via a local breeze spike (the harness isn't in CI), and the unit tests cover the provider-owned wiring (build-or-raise, capability injection) with the harness mocked.

feat: Add code mode (Monty sandbox) to common.ai AgentOperator

9428340

boring-cyborg Bot added area:providers kind:documentation provider:common-ai labels Jun 11, 2026

docs: Clarify when to use common.ai code mode

3560e19

kaxil changed the title ~~feat: Add code mode (Monty sandbox) to common.ai AgentOperator~~ Add code mode (Monty sandbox) to common.ai AgentOperator Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code mode (Monty sandbox) to common.ai AgentOperator#68407

Add code mode (Monty sandbox) to common.ai AgentOperator#68407
kaxil wants to merge 2 commits into
apache:mainfrom
astronomer:feat-common-ai-code-mode

kaxil commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kaxil commented Jun 11, 2026

Summary

Design rationale

Usage

Gotchas / limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant