Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions crates/core/src/api/llm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -459,9 +459,10 @@ fn emit_llm_end_without_output(handle: &LlmHandle, metadata: Option<Json>) -> Re

/// Execute an LLM call through the managed middleware pipeline.
///
/// This runs conditional-execution guardrails, request intercepts,
/// sanitize-request guardrails, execution intercepts, the provider callback,
/// and sanitize-response guardrails in the runtime-defined order.
/// This runs conditional-execution guardrails, request intercepts, and
/// sanitize-request guardrails, emits the LLM-start event, then runs execution
/// intercepts, the provider callback when it is not replaced, and
/// sanitize-response guardrails in the runtime-defined order.
///
/// # Parameters
/// - `name`: Logical provider or model family name recorded on emitted events.
Expand All @@ -488,6 +489,10 @@ fn emit_llm_end_without_output(handle: &LlmHandle, metadata: Option<Json>) -> Re
/// execution intercepts, codecs, or the callback itself.
///
/// # Notes
/// The LLM-start event is emitted before execution intercepts run. When
/// execution fails after that point, the runtime still emits an LLM-end event
/// without an output payload.
///
/// Response codecs enrich observability output only and do not change the
/// value returned to the caller.
pub async fn llm_call_execute(params: LlmCallExecuteParams) -> Result<Json> {
Expand Down Expand Up @@ -587,9 +592,9 @@ pub async fn llm_call_execute(params: LlmCallExecuteParams) -> Result<Json> {

/// Execute a streaming LLM call through the managed middleware pipeline.
///
/// This runs the same pre-execution middleware as [`llm_call_execute`] and
/// then wraps the provider stream so chunk callbacks and finalization can emit
/// a single LLM-end event when streaming completes.
/// This runs the same pre-execution middleware as [`llm_call_execute`], emits
/// the LLM-start event, and then wraps the provider stream so chunk callbacks
/// and finalization can emit a single LLM-end event when streaming completes.
///
/// # Parameters
/// - `name`: Logical provider or model family name recorded on emitted events.
Expand Down Expand Up @@ -617,6 +622,8 @@ pub async fn llm_call_execute(params: LlmCallExecuteParams) -> Result<Json> {
/// execution intercepts, stream callbacks, codecs, or the provider callback.
///
/// # Notes
/// The LLM-start event is emitted before stream execution intercepts run.
///
/// The returned stream emits chunk-level results while the runtime defers the
/// LLM-end event until the collector and finalizer complete.
pub async fn llm_stream_call_execute(params: LlmStreamCallExecuteParams) -> Result<LlmJsonStream> {
Expand Down
55 changes: 32 additions & 23 deletions docs/about/concepts/middleware.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,8 @@ arguments passed to the callback or the real value returned to the caller.

## Managed Execution Order

For managed execution, NeMo Flow applies middleware in this order:
For managed execution, NeMo Flow applies middleware and emits lifecycle events
in this order:

```{mermaid}
sequenceDiagram
Expand All @@ -130,7 +131,7 @@ sequenceDiagram
else allowed
Runtime->>Req: rewrite the real request
Runtime->>San: sanitize emitted start payload
Runtime->>Subs: emit start event
Runtime->>Subs: emit start event before execution
Runtime->>Exec: wrap execution
Exec->>Callback: invoke callback
Callback-->>Exec: return real result
Expand All @@ -143,17 +144,17 @@ sequenceDiagram

1. Conditional-execution guardrails
2. Request intercepts
3. Sanitize-request guardrails for emitted start events
3. Sanitize-request guardrails and emit the start event
4. Execution intercepts
5. The real callback
6. Sanitize-response guardrails for emitted end events
5. The real callback, unless an execution intercept replaces it
6. Sanitize-response guardrails and emit the end event

For streaming LLM flows, **stream execution intercepts** sit inside the
execution path between items 4 and 6. `sanitize-request` guardrails still apply
at item 3 to the emitted start payload, execution intercepts still wrap the
call boundary at item 4, and stream execution intercepts then run on emitted
streaming start/chunk/end activity before `sanitize-response` guardrails rewrite
the emitted response-side payloads at item 6.
For streaming LLM flows, the same pre-execution order applies: the runtime
applies `sanitize-request` guardrails and emits the LLM start event before the
stream execution intercept chain runs. Stream execution intercepts are the
execution family for streaming provider callbacks. The runtime then collects
chunks and finalizes the stream before `sanitize-response` guardrails rewrite
the emitted end-event payload at item 6.

This ordering is what makes the semantic split between intercepts and
guardrails important:
Expand All @@ -180,8 +181,9 @@ flowchart TB
subgraph Invocation
direction TB
HasExecutionIntercept{{Has Valid Execution Intercept}}
ExecutionIntercepts[/Execution Intercept/]
ExecutionIntercepts[/Execution Intercepts/]
DefaultCallable[Default Callable]
InterceptResult[Execution Result]
end

subgraph Streaming
Expand All @@ -194,41 +196,48 @@ flowchart TB
direction TB
SanitizeRequestGuardrails[/Sanitize Request Guardrail/]
SanitizeResponseGuardrails[/Sanitize Response Guardrail/]
StartEvent[Emit Start Event]
EndEvent[Emit End Event]
EventSubscribers[["Event Subscribers"]]
end
end

Response([Response])

Request --> ConditionalExecutionGuardrails
RequestIntercepts -->|Transformed Request| SanitizeRequestGuardrails & Invocation
RequestIntercepts -->|Transformed Request| SanitizeRequestGuardrails
ConditionalExecutionGuardrails -->|"(rejected)"| EventSubscribers
ConditionalExecutionGuardrails -->|"(rejected)"| RaiseException
ConditionalExecutionGuardrails -->|"(passed)"| RequestIntercepts
SanitizeRequestGuardrails -->|Sanitized Start Payload| StartEvent
StartEvent --> EventSubscribers
StartEvent -->|Before Execution Intercepts| HasExecutionIntercept
RequestIntercepts -.->|Real Request| HasExecutionIntercept

HasExecutionIntercept -->|No| DefaultCallable
HasExecutionIntercept -->|Yes| ExecutionIntercepts
ExecutionIntercepts -.->|chain=yes| HasExecutionIntercept
ExecutionIntercepts -.->|chain=no| DefaultCallable
ExecutionIntercepts -.->|calls next| HasExecutionIntercept
ExecutionIntercepts -->|returns or replaces| InterceptResult
DefaultCallable -->|returns| InterceptResult

Invocation -->|Response| SanitizeResponseGuardrails
Invocation -->|Response| Response
InterceptResult -->|Response| SanitizeResponseGuardrails
InterceptResult -->|Response| Response

Invocation -.->|stream chunks| Collector
InterceptResult -.->|stream chunks| Collector
Collector -..->|stream chunks| Response
Invocation -.->|"(stream ends)"| Finalizer
InterceptResult -.->|"(stream ends)"| Finalizer
Finalizer -.->|Aggregated Response| SanitizeResponseGuardrails
Finalizer o--o|shared state| Collector

SanitizeRequestGuardrails -->|Sanitized Request| EventSubscribers
SanitizeResponseGuardrails -->|Sanitized Response| EventSubscribers
SanitizeResponseGuardrails -->|Sanitized End Payload| EndEvent
EndEvent --> EventSubscribers

class Execution,Invocation,Streaming,Observability,Request,Response grey-lightest;
class EventSubscribers teal-lightest;
class EventSubscribers,StartEvent,EndEvent teal-lightest;
class RequestIntercepts,HasExecutionIntercept,ExecutionIntercepts yellow-lightest;
class ConditionalExecutionGuardrails,SanitizeRequestGuardrails,SanitizeResponseGuardrails green-lightest;
class RaiseException red-lightest;
class DefaultCallable,Collector,Finalizer magenta-lightest;
class DefaultCallable,InterceptResult,Collector,Finalizer magenta-lightest;
```

## Choosing the Right Surface
Expand Down
9 changes: 5 additions & 4 deletions docs/instrument-applications/advanced-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,13 +207,14 @@ Run one allowed request and one rejected request:

## Debug Middleware Order

Middleware runs by ascending priority inside each middleware family. Families run in this order for managed tool calls:
Middleware runs by ascending priority inside each middleware family. Families
and lifecycle emission run in this order for managed tool calls:

1. Conditional-execution guardrails.
2. Request intercepts.
3. Sanitize-request guardrails for emitted start events.
4. Execution intercepts and the real callback.
5. Sanitize-response guardrails for emitted end events.
3. Sanitize-request guardrails and start-event emission.
4. Execution intercepts and the real callback or replacement.
5. Sanitize-response guardrails and end-event emission.

If a later middleware does not run, check whether an earlier conditional-execution guardrail rejected the call or a request intercept raised an error.

Expand Down
3 changes: 2 additions & 1 deletion docs/instrument-applications/instrument-llm-call.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,8 @@ Check both behavior and instrumentation:
- The provider result matches what the application returned before the wrapper was added.
- The subscriber prints an agent or request scope event.
- The subscriber prints LLM start and LLM end events for `demo-provider`.
- LLM start input contains the request after request intercepts.
- LLM start input contains the request after request intercepts and
sanitize-request guardrails.
- LLM end output contains the provider response after response guardrails.
- The LLM event includes the normalized `model_name` when you provide one.

Expand Down
3 changes: 2 additions & 1 deletion docs/instrument-applications/instrument-tool-call.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,8 @@ Check both behavior and instrumentation:
- The tool result matches what the application returned before the wrapper was added.
- The subscriber prints an agent or request scope event.
- The subscriber prints tool start and tool end events for `search`.
- Tool start input contains the request arguments after request intercepts.
- Tool start input contains the request arguments after request intercepts and
sanitize-request guardrails.
- Tool end output contains the tool result after response guardrails.

If only the business result appears, the callback ran but instrumentation did not run. Confirm that the call goes through `tools.execute`, `toolCallExecute`, or `tool_call_execute`.
Expand Down
12 changes: 7 additions & 5 deletions docs/resources/support-and-faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,10 +252,10 @@ sanitization without moving that logic into every call site.
For managed execution, the pipeline runs:
- Conditional guardrails
- Request intercepts
- Request sanitization for emitted start events
- Request sanitization and start-event emission
- Execution intercepts
- The original callback
- Response sanitization for emitted end events
- The original callback, unless an execution intercept replaces it
- Response sanitization and end-event emission

### What Is The Difference Between Guardrails And Intercepts?

Expand All @@ -282,8 +282,10 @@ and [Middleware Registration Families](../instrument-applications/advanced-guide
### How Does Middleware Ordering Work?

Managed execution applies conditional guardrails first, then request intercepts,
then request sanitization for emitted start events, then execution intercepts
and the real callback, then response sanitization for emitted end events.
then request sanitization and start-event emission, then execution intercepts
and the real callback, then response sanitization and end-event emission.
The start event is emitted before execution intercepts run, so subscribers see
a lifecycle start even when an execution intercept replaces the callback.

Registries are priority ordered. When scope-local behavior is present, NeMo Flow
combines applicable global and ancestor scope-local entries into the execution
Expand Down
Loading