diff --git a/crates/core/src/api/llm.rs b/crates/core/src/api/llm.rs index 58039ddd..827001b7 100644 --- a/crates/core/src/api/llm.rs +++ b/crates/core/src/api/llm.rs @@ -459,9 +459,10 @@ fn emit_llm_end_without_output(handle: &LlmHandle, metadata: Option) -> Re /// Execute an LLM call through the managed middleware pipeline. /// -/// This runs conditional-execution guardrails, request intercepts, -/// sanitize-request guardrails, execution intercepts, the provider callback, -/// and sanitize-response guardrails in the runtime-defined order. +/// This runs conditional-execution guardrails, request intercepts, and +/// sanitize-request guardrails, emits the LLM-start event, then runs execution +/// intercepts, the provider callback when it is not replaced, and +/// sanitize-response guardrails in the runtime-defined order. /// /// # Parameters /// - `name`: Logical provider or model family name recorded on emitted events. @@ -488,6 +489,10 @@ fn emit_llm_end_without_output(handle: &LlmHandle, metadata: Option) -> Re /// execution intercepts, codecs, or the callback itself. /// /// # Notes +/// The LLM-start event is emitted before execution intercepts run. When +/// execution fails after that point, the runtime still emits an LLM-end event +/// without an output payload. +/// /// Response codecs enrich observability output only and do not change the /// value returned to the caller. pub async fn llm_call_execute(params: LlmCallExecuteParams) -> Result { @@ -587,9 +592,9 @@ pub async fn llm_call_execute(params: LlmCallExecuteParams) -> Result { /// Execute a streaming LLM call through the managed middleware pipeline. /// -/// This runs the same pre-execution middleware as [`llm_call_execute`] and -/// then wraps the provider stream so chunk callbacks and finalization can emit -/// a single LLM-end event when streaming completes. +/// This runs the same pre-execution middleware as [`llm_call_execute`], emits +/// the LLM-start event, and then wraps the provider stream so chunk callbacks +/// and finalization can emit a single LLM-end event when streaming completes. /// /// # Parameters /// - `name`: Logical provider or model family name recorded on emitted events. @@ -617,6 +622,8 @@ pub async fn llm_call_execute(params: LlmCallExecuteParams) -> Result { /// execution intercepts, stream callbacks, codecs, or the provider callback. /// /// # Notes +/// The LLM-start event is emitted before stream execution intercepts run. +/// /// The returned stream emits chunk-level results while the runtime defers the /// LLM-end event until the collector and finalizer complete. pub async fn llm_stream_call_execute(params: LlmStreamCallExecuteParams) -> Result { diff --git a/docs/about/concepts/middleware.md b/docs/about/concepts/middleware.md index 99debd53..741b1b27 100644 --- a/docs/about/concepts/middleware.md +++ b/docs/about/concepts/middleware.md @@ -108,7 +108,8 @@ arguments passed to the callback or the real value returned to the caller. ## Managed Execution Order -For managed execution, NeMo Flow applies middleware in this order: +For managed execution, NeMo Flow applies middleware and emits lifecycle events +in this order: ```{mermaid} sequenceDiagram @@ -130,7 +131,7 @@ sequenceDiagram else allowed Runtime->>Req: rewrite the real request Runtime->>San: sanitize emitted start payload - Runtime->>Subs: emit start event + Runtime->>Subs: emit start event before execution Runtime->>Exec: wrap execution Exec->>Callback: invoke callback Callback-->>Exec: return real result @@ -143,17 +144,17 @@ sequenceDiagram 1. Conditional-execution guardrails 2. Request intercepts -3. Sanitize-request guardrails for emitted start events +3. Sanitize-request guardrails and emit the start event 4. Execution intercepts -5. The real callback -6. Sanitize-response guardrails for emitted end events +5. The real callback, unless an execution intercept replaces it +6. Sanitize-response guardrails and emit the end event -For streaming LLM flows, **stream execution intercepts** sit inside the -execution path between items 4 and 6. `sanitize-request` guardrails still apply -at item 3 to the emitted start payload, execution intercepts still wrap the -call boundary at item 4, and stream execution intercepts then run on emitted -streaming start/chunk/end activity before `sanitize-response` guardrails rewrite -the emitted response-side payloads at item 6. +For streaming LLM flows, the same pre-execution order applies: the runtime +applies `sanitize-request` guardrails and emits the LLM start event before the +stream execution intercept chain runs. Stream execution intercepts are the +execution family for streaming provider callbacks. The runtime then collects +chunks and finalizes the stream before `sanitize-response` guardrails rewrite +the emitted end-event payload at item 6. This ordering is what makes the semantic split between intercepts and guardrails important: @@ -180,8 +181,9 @@ flowchart TB subgraph Invocation direction TB HasExecutionIntercept{{Has Valid Execution Intercept}} - ExecutionIntercepts[/Execution Intercept/] + ExecutionIntercepts[/Execution Intercepts/] DefaultCallable[Default Callable] + InterceptResult[Execution Result] end subgraph Streaming @@ -194,6 +196,8 @@ flowchart TB direction TB SanitizeRequestGuardrails[/Sanitize Request Guardrail/] SanitizeResponseGuardrails[/Sanitize Response Guardrail/] + StartEvent[Emit Start Event] + EndEvent[Emit End Event] EventSubscribers[["Event Subscribers"]] end end @@ -201,34 +205,39 @@ flowchart TB Response([Response]) Request --> ConditionalExecutionGuardrails - RequestIntercepts -->|Transformed Request| SanitizeRequestGuardrails & Invocation + RequestIntercepts -->|Transformed Request| SanitizeRequestGuardrails ConditionalExecutionGuardrails -->|"(rejected)"| EventSubscribers ConditionalExecutionGuardrails -->|"(rejected)"| RaiseException ConditionalExecutionGuardrails -->|"(passed)"| RequestIntercepts + SanitizeRequestGuardrails -->|Sanitized Start Payload| StartEvent + StartEvent --> EventSubscribers + StartEvent -->|Before Execution Intercepts| HasExecutionIntercept + RequestIntercepts -.->|Real Request| HasExecutionIntercept HasExecutionIntercept -->|No| DefaultCallable HasExecutionIntercept -->|Yes| ExecutionIntercepts - ExecutionIntercepts -.->|chain=yes| HasExecutionIntercept - ExecutionIntercepts -.->|chain=no| DefaultCallable + ExecutionIntercepts -.->|calls next| HasExecutionIntercept + ExecutionIntercepts -->|returns or replaces| InterceptResult + DefaultCallable -->|returns| InterceptResult - Invocation -->|Response| SanitizeResponseGuardrails - Invocation -->|Response| Response + InterceptResult -->|Response| SanitizeResponseGuardrails + InterceptResult -->|Response| Response - Invocation -.->|stream chunks| Collector + InterceptResult -.->|stream chunks| Collector Collector -..->|stream chunks| Response - Invocation -.->|"(stream ends)"| Finalizer + InterceptResult -.->|"(stream ends)"| Finalizer Finalizer -.->|Aggregated Response| SanitizeResponseGuardrails Finalizer o--o|shared state| Collector - SanitizeRequestGuardrails -->|Sanitized Request| EventSubscribers - SanitizeResponseGuardrails -->|Sanitized Response| EventSubscribers + SanitizeResponseGuardrails -->|Sanitized End Payload| EndEvent + EndEvent --> EventSubscribers class Execution,Invocation,Streaming,Observability,Request,Response grey-lightest; - class EventSubscribers teal-lightest; + class EventSubscribers,StartEvent,EndEvent teal-lightest; class RequestIntercepts,HasExecutionIntercept,ExecutionIntercepts yellow-lightest; class ConditionalExecutionGuardrails,SanitizeRequestGuardrails,SanitizeResponseGuardrails green-lightest; class RaiseException red-lightest; - class DefaultCallable,Collector,Finalizer magenta-lightest; + class DefaultCallable,InterceptResult,Collector,Finalizer magenta-lightest; ``` ## Choosing the Right Surface diff --git a/docs/instrument-applications/advanced-guide.md b/docs/instrument-applications/advanced-guide.md index a68cd111..a2d447b9 100644 --- a/docs/instrument-applications/advanced-guide.md +++ b/docs/instrument-applications/advanced-guide.md @@ -207,13 +207,14 @@ Run one allowed request and one rejected request: ## Debug Middleware Order -Middleware runs by ascending priority inside each middleware family. Families run in this order for managed tool calls: +Middleware runs by ascending priority inside each middleware family. Families +and lifecycle emission run in this order for managed tool calls: 1. Conditional-execution guardrails. 2. Request intercepts. -3. Sanitize-request guardrails for emitted start events. -4. Execution intercepts and the real callback. -5. Sanitize-response guardrails for emitted end events. +3. Sanitize-request guardrails and start-event emission. +4. Execution intercepts and the real callback or replacement. +5. Sanitize-response guardrails and end-event emission. If a later middleware does not run, check whether an earlier conditional-execution guardrail rejected the call or a request intercept raised an error. diff --git a/docs/instrument-applications/instrument-llm-call.md b/docs/instrument-applications/instrument-llm-call.md index 5129c69f..f3c4fc18 100644 --- a/docs/instrument-applications/instrument-llm-call.md +++ b/docs/instrument-applications/instrument-llm-call.md @@ -218,7 +218,8 @@ Check both behavior and instrumentation: - The provider result matches what the application returned before the wrapper was added. - The subscriber prints an agent or request scope event. - The subscriber prints LLM start and LLM end events for `demo-provider`. -- LLM start input contains the request after request intercepts. +- LLM start input contains the request after request intercepts and + sanitize-request guardrails. - LLM end output contains the provider response after response guardrails. - The LLM event includes the normalized `model_name` when you provide one. diff --git a/docs/instrument-applications/instrument-tool-call.md b/docs/instrument-applications/instrument-tool-call.md index 76d286aa..dc8b0f3d 100644 --- a/docs/instrument-applications/instrument-tool-call.md +++ b/docs/instrument-applications/instrument-tool-call.md @@ -198,7 +198,8 @@ Check both behavior and instrumentation: - The tool result matches what the application returned before the wrapper was added. - The subscriber prints an agent or request scope event. - The subscriber prints tool start and tool end events for `search`. -- Tool start input contains the request arguments after request intercepts. +- Tool start input contains the request arguments after request intercepts and + sanitize-request guardrails. - Tool end output contains the tool result after response guardrails. If only the business result appears, the callback ran but instrumentation did not run. Confirm that the call goes through `tools.execute`, `toolCallExecute`, or `tool_call_execute`. diff --git a/docs/resources/support-and-faqs.md b/docs/resources/support-and-faqs.md index 7c535cb2..158d29c8 100644 --- a/docs/resources/support-and-faqs.md +++ b/docs/resources/support-and-faqs.md @@ -252,10 +252,10 @@ sanitization without moving that logic into every call site. For managed execution, the pipeline runs: - Conditional guardrails - Request intercepts -- Request sanitization for emitted start events +- Request sanitization and start-event emission - Execution intercepts -- The original callback -- Response sanitization for emitted end events +- The original callback, unless an execution intercept replaces it +- Response sanitization and end-event emission ### What Is The Difference Between Guardrails And Intercepts? @@ -282,8 +282,10 @@ and [Middleware Registration Families](../instrument-applications/advanced-guide ### How Does Middleware Ordering Work? Managed execution applies conditional guardrails first, then request intercepts, -then request sanitization for emitted start events, then execution intercepts -and the real callback, then response sanitization for emitted end events. +then request sanitization and start-event emission, then execution intercepts +and the real callback, then response sanitization and end-event emission. +The start event is emitted before execution intercepts run, so subscribers see +a lifecycle start even when an execution intercept replaces the callback. Registries are priority ordered. When scope-local behavior is present, NeMo Flow combines applicable global and ancestor scope-local entries into the execution