microsoft · westey-m · Jun 18, 2026 · Jun 18, 2026
diff --git a/python/packages/core/AGENTS.md b/python/packages/core/AGENTS.md
@@ -120,7 +120,10 @@ agent_framework/
 
 - **`AgentLoopMiddleware`** - `AgentMiddleware` that re-runs an agent in a loop by calling `call_next()` repeatedly (the pipeline re-reads `context.messages` each time). One configurable class covers two patterns: a required user `should_continue` predicate (sync or async, the first positional/keyword arg), and a chat-client judge built via the `.with_judge(...)` factory (a second chat client decides whether the original request was answered; loops while it is *not*, using a `JudgeVerdict` structured-output response — internally just an async `should_continue` predicate). The constructor covers the predicate pattern directly; only the judge has a convenience classmethod factory (`.with_judge(judge_client, ...)`) that forwards to `__init__`. Supports both streaming and non-streaming runs. By default a non-streaming run returns an aggregated `AgentResponse` containing every iteration's messages plus the injected `next_message` "nudge" messages (as `user` messages); set `return_final_only=True` to return only the last iteration's response. Streaming runs always yield each iteration's updates and emit the injected nudge messages as `user` updates between iterations (the `return_final_only` flag has no effect on streaming, and the final response reflects the last iteration; `MiddlewareTermination` is handled cleanly). `should_continue` is required; other constructor args are optional: `max_iterations` (safety cap; defaults to `DEFAULT_MAX_ITERATIONS`=10, explicit `None`→unbounded, positive int caps; `.with_judge` uses `DEFAULT_JUDGE_MAX_ITERATIONS`=5 as its default), `next_message` (defaults to a short "continue" nudge), `return_final_only`, and `additional_instructions` (an extra `system` message injected ahead of the input before the agent runs — becomes part of the original messages so it survives `fresh_context` resets and persists via a session). The judge is configured only through `.with_judge` (`judge_client`/`instructions`/`criteria`), not the constructor, and its `reasoning` is fed back to the agent as the next iteration's input; the judge forwards the original request messages and the agent's latest response messages verbatim so multi-modal content is preserved. `criteria` (a `list[str]`) is both injected as the agent's `additional_instructions` and rendered into the judge instructions wherever the `{{criteria}}` placeholder (`CRITERIA_PLACEHOLDER`) appears (`DEFAULT_JUDGE_INSTRUCTIONS` ends with it; custom `instructions` may include it, and it is stripped when no criteria are given). The `should_continue`/`next_message` callables are invoked with keyword args (`iteration`, `last_result`, `messages`, `original_messages`, `session`, `agent`, `progress`, `feedback`) and may be sync or async; declare only what you need plus `**kwargs`. `should_continue` may return a plain `bool` or a `(bool, str | None)` tuple whose second item is feedback surfaced to `next_message`/`record_feedback` via the `feedback` kwarg (the judge uses this to relay its `reasoning`). Stop precedence per iteration is `max_iterations` → `should_continue`, evaluated before `record_feedback` so the feedback is available to it.
   - **Feedback tracking** - `record_feedback` captures a per-iteration progress entry (called with the loop kwargs; if it returns a truthy string the entry is appended, otherwise the agent's response text is used as the fallback entry). The accumulated log is exposed to every callback via the `progress` keyword (a per-iteration copy of prior entries) and, when `inject_progress=True` (default), injected into the next iteration's input as a `user` message (the full log without a session, only the latest entry with a session to avoid duplicating history). `fresh_context=True` restarts each iteration from the original task plus the progress log; when a session is attached it is snapshotted (`to_dict()`) before the loop and restored (`from_dict` + field copy) between iterations so the local transcript and any service-side conversation id reset too (in-loop working-state is discarded, pre-loop state preserved, continuity carried only by the progress log).
-- **`todos_remaining(provider)`** / **`background_tasks_running(provider)`** - Helper factories returning `should_continue` predicates that loop while a `TodoProvider` has open items, or while a `BackgroundAgentsProvider`'s persisted state shows running tasks.
+- **`todos_remaining(*, modes=None)`** / **`todos_remaining_message`** - Helper factories for todo-driven loops (the Python counterpart of .NET's `TodoCompletionLoopEvaluator`), designed for `create_harness_agent` but usable with any agent that registers a `TodoProvider` via `context_providers`. They resolve the `TodoProvider`/`AgentModeProvider` from the *running agent* (`agent.context_providers`, via `_resolve_context_provider`) rather than taking the provider as an argument, so they can be wired directly into `loop_should_continue`/`loop_next_message`. `todos_remaining` returns a `should_continue` predicate that loops while any todo is open; pass `modes=[...]` to gate looping to specific operating modes (case-insensitive; honors the `AgentModeProvider`'s `source_id`/`available_modes`), `modes=None` (default) applies in every mode, and an empty sequence raises `ValueError`. `todos_remaining_message` is a `next_message` callable that lists the still-open todo titles and tells the agent to finish them, returning `None` when the session/agent/provider is unavailable or nothing is open (in which case the middleware's default `None` handling applies: reuse the previous iteration's messages verbatim under the default `fresh_context=False`, or `DEFAULT_NEXT_MESSAGE` only when `fresh_context=True`).
+- **`background_tasks_running(provider)`** - Helper factory returning a `should_continue` predicate that loops while a `BackgroundAgentsProvider`'s persisted state shows running tasks (takes the provider explicitly, unlike `todos_remaining`).
+  - **Approval escape hatch** - `_has_pending_approval_request(result)` checks whether an iteration's response carries a pending tool-approval request (any content with `type == "function_approval_request"`). Both the streaming and non-streaming loops stop and return that response to the caller *before* evaluating `should_continue`/`max_iterations` or injecting `next_message`, so the loop is HITL-safe even when wrapped outermost around a `ToolApprovalMiddleware` (mirrors the C# `LoopAgent`'s `HasPendingApprovalRequests`).
+  - **Harness integration** - `create_harness_agent` enables the loop when a `loop_should_continue` callable is passed; it prepends `AgentLoopMiddleware(loop_should_continue, max_iterations=loop_max_iterations, next_message=loop_next_message)` ahead of `ToolApprovalMiddleware` so the loop is the outermost middleware (each iteration is a full agent run including tool approval, and the escape hatch hands pending approvals back to the caller). `loop_next_message` and `loop_max_iterations` only take effect together with `loop_should_continue` (with no `loop_should_continue` there is no loop, so they are ignored); `loop_max_iterations` defaults to the loop's default cap (`None` → unbounded).
 
 ### Workflows (`_workflows/`)
 

diff --git a/python/packages/core/agent_framework/__init__.py b/python/packages/core/agent_framework/__init__.py
@@ -107,6 +107,7 @@
     JudgeVerdict,
     background_tasks_running,
     todos_remaining,
+    todos_remaining_message,
 )
 from ._harness._memory import (
     DEFAULT_MEMORY_SOURCE_ID,
@@ -598,6 +599,7 @@
     "set_agent_mode",
     "step",
     "todos_remaining",
+    "todos_remaining_message",
     "tool",
     "tool_call_args_match",
     "tool_called_check",

diff --git a/python/packages/core/agent_framework/_harness/_agent.py b/python/packages/core/agent_framework/_harness/_agent.py
@@ -21,6 +21,7 @@
 from .._sessions import ContextProvider, HistoryProvider, InMemoryHistoryProvider
 from .._skills import SkillsProvider
 from ._background_agents import BackgroundAgentsProvider
+from ._loop import DEFAULT_MAX_ITERATIONS, AgentLoopMiddleware
 from ._memory import MemoryContextProvider, MemoryStore
 from ._mode import AgentModeProvider
 from ._todo import TodoProvider
@@ -35,6 +36,7 @@
     from .._compaction import CompactionStrategy, TokenizerProtocol
     from .._middleware import MiddlewareTypes
     from .._tools import ToolTypes
+    from ._loop import NextMessageCallable, ShouldContinueCallable
     from ._tool_approval import ToolApprovalRuleCallback
 
 logger = logging.getLogger(__name__)
@@ -254,6 +256,9 @@ def create_harness_agent(
     disable_web_search: bool = False,
     disable_tool_auto_approval: bool = False,
     auto_approval_rules: Sequence[ToolApprovalRuleCallback] | None = None,
+    loop_should_continue: ShouldContinueCallable | None = None,
+    loop_next_message: NextMessageCallable | None = None,
+    loop_max_iterations: int | None = DEFAULT_MAX_ITERATIONS,
     otel_provider_name: str | None = None,
     context_providers: Sequence[ContextProvider] | None = None,
     middleware: Sequence[MiddlewareTypes] | None = None,
@@ -273,6 +278,7 @@ def create_harness_agent(
     - **BackgroundAgentsProvider** — delegate work to background sub-agents
     - **Tool approval** — "don't ask again" standing approval rules plus heuristic
       auto-approval callbacks
+    - **Looping** — re-run the agent until a ``should_continue`` predicate is satisfied
     - **OpenTelemetry** — observability via ``AgentTelemetryLayer``
 
     Each feature can be disabled or customized via keyword arguments.
@@ -380,6 +386,19 @@ def create_harness_agent(
             content and returns ``True`` to approve it. Rules are evaluated after standing rules
             (derived from prior user approvals) but before prompting the user. Only used when
             ``disable_tool_auto_approval`` is False.
+        loop_should_continue: Optional predicate that enables the looping middleware. When provided, the
+            agent is re-run in a loop (via :class:`~agent_framework.AgentLoopMiddleware`, wired as
+            the outermost middleware so each iteration is a full agent run including tool approval)
+            for as long as the predicate returns ``True``, up to ``loop_max_iterations``. If an
+            iteration returns a pending tool-approval request, the loop stops and returns it so the
+            caller can approve before continuing. When None (default), no loop is added.
+        loop_next_message: Optional callable controlling the input for the next loop iteration.
+            Only takes effect when ``loop_should_continue`` is set (otherwise no loop is added and
+            this is ignored).
+        loop_max_iterations: Safety cap on the number of loop iterations. ``None`` means unbounded;
+            a positive integer caps the loop (defaults to the loop middleware's default cap). Only
+            takes effect when ``loop_should_continue`` is set (otherwise no loop is added and this
+            is ignored).
         otel_provider_name: Custom OpenTelemetry provider/source name for telemetry.
         context_providers: Additional context providers to include after the built-in ones.
         middleware: Additional middleware to include.
@@ -475,9 +494,21 @@ def create_harness_agent(
     # placed first so it sits outermost: it intercepts inbound "always approve" responses and
     # outbound approval requests at the caller boundary, and its re-invocation loop re-runs any
     # user-supplied middleware. ToolApprovalMiddleware requires an AgentSession at run time.
+    # When should_continue is supplied, the loop is prepended ahead of tool approval so it sits
+    # outermost of all: each loop iteration is a full agent run (including tool approval), and the
+    # loop's approval escape hatch returns any pending approval request to the caller.
     assembled_middleware: list[MiddlewareTypes] = []
     if not disable_tool_auto_approval:
         assembled_middleware.append(ToolApprovalMiddleware(auto_approval_rules=auto_approval_rules))
+    if loop_should_continue is not None:
+        assembled_middleware.insert(
+            0,
+            AgentLoopMiddleware(
+                loop_should_continue,
+                max_iterations=loop_max_iterations,
+                next_message=loop_next_message,
+            ),
+        )
     if middleware:
         assembled_middleware.extend(middleware)