Skip to content

Add code mode (Monty sandbox) to common.ai AgentOperator#68407

Draft
kaxil wants to merge 2 commits into
apache:mainfrom
astronomer:feat-common-ai-code-mode
Draft

Add code mode (Monty sandbox) to common.ai AgentOperator#68407
kaxil wants to merge 2 commits into
apache:mainfrom
astronomer:feat-common-ai-code-mode

Conversation

@kaxil

@kaxil kaxil commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

Adds an opt-in code_mode=True flag to AgentOperator (and @task.agent) that wraps the agent's tools in a single run_code tool executed in pydantic-ai's Monty sandbox, via the new [code-mode] extra (pydantic-ai-harness[codemode]). For multi-tool workflows the model writes one Python snippet that calls the tools as functions -- with loops, conditionals, and asyncio.gather -- instead of one model round-trip per tool call, cutting round-trips and token use.

Design rationale

  • Why an extra, not a base dependency. pydantic-monty is pre-1.0 and fast-moving (a Rust/native wheel). Pinning it as a hard dependency of every common.ai install would let its churn or any platform-wheel gap break the whole provider for users who never touch code mode. It is gated behind the code-mode extra and raises AirflowOptionalProviderFeatureException when used without it -- the same pattern the provider already uses for mcp, sql, and skills.
  • Why a bool flag rather than passing the capability through agent_params. Capability instances aren't round-trip-safe through DAG serialization (see the existing "Capabilities" docs note). code_mode is a plain boolean; the CodeMode capability is constructed at execution time in _build_agent, never stored on the serialized operator.
  • The tool boundary is unchanged. CodeMode collapses the tools you registered into one run_code tool; the generated code runs deny-by-default (no filesystem, network, or env access) and can only call those tools, which still execute in the worker. Code mode changes how the model invokes tools, not what it can reach.
  • Toolset return schemas. HookToolset and SQLToolset now set return_schema on their tool definitions so code mode renders -> str instead of -> Any. Both always return serialized strings (_serialize_for_llm / json.dumps), so {"type": "string"} is accurate. The kwarg is applied through a small version-guarded helper because ToolDefinition.return_schema is newer than the provider's pydantic-ai floor.

Usage

AgentOperator(
    task_id="analyst",
    prompt="For the top 3 customers by order count, what was each one's total spend?",
    llm_conn_id="pydanticai_default",
    system_prompt="You are a SQL analyst. Write Python that calls the tools to answer.",
    toolsets=[SQLToolset(db_conn_id="postgres_default", allowed_tables=["customers", "orders"])],
    code_mode=True,  # pip install "apache-airflow-providers-common-ai[code-mode]"
)

Gotchas / limitations

  • Incompatible with durable=True: durable replay caches individual model/tool steps via a shared step counter that assumes a stable call order across runs, which code mode's free-form generated Python does not guarantee. The combination is rejected at construction (mirroring the existing durable + enable_hitl_review guard).
  • Monty supports a subset of Python and no third-party imports; it sandboxes the glue code between tool calls, not a general code runtime.
  • Draft: opened for early review. The real CodeMode round-trip is exercised via a local breeze spike (the harness isn't in CI), and the unit tests cover the provider-owned wiring (build-or-raise, capability injection) with the harness mocked.

@kaxil kaxil changed the title feat: Add code mode (Monty sandbox) to common.ai AgentOperator Add code mode (Monty sandbox) to common.ai AgentOperator Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant