feat(tracing): add export-stage OTEL span masking and media detection by hassiebp · Pull Request #1646 · langfuse/langfuse-python

hassiebp · 2026-05-07T12:44:55Z

Summary

Adds an export-stage transformation layer for spans before they are sent to the downstream OTLP exporter.

adds the public mask_otel_spans callback on Langfuse(...)
adds the batch-shaped public contract: MaskOtelSpansParams, MaskOtelSpansResult, OtelSpanIdentifier, OtelSpanData, and OtelSpanPatch
wires mask_otel_spans through the resource manager into the Langfuse span processor
runs export-stage media detection over exported span attributes before masking
prefilters serialized string attributes before JSON parsing so JSON-looking strings without supported media hints are left unchanged cheaply
applies sparse keyed span patches to post-media OTEL span attribute snapshots before export
keeps existing synchronous mask behavior unchanged
hardens media detection for Gemini inline_data / inlineData payloads

Why

Attribute masking and media detection currently happen when Langfuse SDK spans set attributes. That leaves third-party OTEL spans flowing through the Langfuse span processor without equivalent export-stage handling. This PR adds a batch-native export-path hook while preserving existing Langfuse SDK masking behavior.

Behavior

mask_otel_spans runs only after existing export filters accept a span.
Media detection runs before mask_otel_spans.
Direct base64 data URI string attributes are processed without JSON parsing.
JSON-looking string attributes are parsed only if they contain supported media-shape hints such as data:, inline_data, inlineData, media_type, mime_type, or mimeType plus data.
mask_otel_spans receives the whole OTEL export batch as params.spans, keyed by OtelSpanIdentifier(trace_id, span_id).
MaskOtelSpansResult.span_patches is sparse: omitted spans are exported unchanged.
OtelSpanPatch can delete exact attribute keys and set attribute values; delete runs before set, so set wins.
Callback exceptions or invalid outer batch results drop the whole export batch.
Patches for unknown span identifiers drop the whole export batch.
Invalid patch objects for known spans drop only those spans.
Invalid returned set values delete only that affected attribute and log the value type, not the value.
Span events and links are not transformed.

Validation

uv run --frozen ruff check .
uv run --frozen mypy langfuse --no-error-summary
uv run --frozen pytest tests/unit/test_mask_otel_spans.py tests/unit/test_media_manager.py -q
uv run --frozen pytest tests/unit/test_otel.py tests/unit/test_additional_headers_simple.py -q
uv run --frozen pytest -n auto --dist worksteal tests/unit

Disclaimer: Experimental PR review

Greptile Summary

This PR adds an export-stage transformation layer (LangfuseTransformingSpanExporter) that runs between the OTel batch span processor and the downstream OTLP exporter, enabling media detection and attribute masking on third-party spans that bypass the existing SDK-level mask hook.

Introduces mask_otel_spans as a new public Langfuse(...) parameter, wired through the resource manager into LangfuseSpanProcessor, and exposes a batch-native public contract (MaskOtelSpansParams, MaskOtelSpansResult, OtelSpanPatch, etc.) in langfuse.types.
Extends MediaManager._find_and_process_media with a fail_open flag and adds Gemini inline_data/inlineData detection; applies media replacement at export time with a substring prefilter to avoid JSON parsing on non-media strings.
The OTEL tracer initialization in resource_manager.py was moved after the media manager setup so media_manager is available when the span processor is constructed.

Confidence Score: 4/5

Safe to merge; all changes are additive and opt-in, existing mask and media paths are untouched.

The change is well-structured and defensively written, but cloned spans lose their dropped_attributes_count metadata — downstream OTLP consumers would see 0 even for spans that originally exceeded the attribute limit.

langfuse/_client/span_exporter.py — specifically the _clone_span method and attribute wrapping strategy.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant BSP as BatchSpanProcessor
    participant LTS as LangfuseTransformingSpanExporter
    participant MM as MediaManager
    participant MF as mask_otel_spans callback
    participant OE as OTLPSpanExporter

    App->>BSP: span ends (on_end)
    BSP->>LTS: export(batch of ReadableSpans)

    loop for each span
        LTS->>MM: "_find_and_process_media(attributes, fail_open=True)"
        MM-->>LTS: post-media attribute dict
    end

    alt mask_otel_spans configured
        LTS->>MF: MaskOtelSpansParams(spans)
        MF-->>LTS: MaskOtelSpansResult or None

        alt exception or invalid result
            LTS-->>BSP: SUCCESS (batch dropped)
        else patch for unknown identifier
            LTS-->>BSP: SUCCESS (batch dropped)
        else valid patches
            loop for each span
                LTS->>LTS: apply OtelSpanPatch (delete then set)
            end
        end
    end

    LTS->>LTS: _clone_span(span, patched_attributes)
    LTS->>OE: export(cloned ReadableSpans)
    OE-->>LTS: SpanExportResult
    LTS-->>BSP: SpanExportResult

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
langfuse/_client/span_exporter.py:579-596
`_clone_span` converts `span.attributes` (a `BoundedAttributes`) to a plain `dict`. The OTel SDK's `ReadableSpan.dropped_attributes` property guards with `isinstance(self._attributes, BoundedAttributes)` and returns `0` for any other type, so the cloned span will always export `dropped_attributes_count=0` even when the original span had attributes silently dropped by the bounds limit. This loses span fidelity in the OTLP payload for any span that hit the attribute limit.

```suggestion
    @staticmethod
    def _clone_span(
        *, span: ReadableSpan, attributes: Dict[str, AttributeValue]
    ) -> ReadableSpan:
        return ReadableSpan(
            name=span.name,
            context=span.context,
            parent=span.parent,
            resource=span.resource,
            attributes=BoundedAttributes(
                maxlen=None,
                attributes=attributes,
                immutable=True,
                max_value_len=None,
            ),
            events=span.events,
            links=span.links,
            kind=span.kind,
            status=span.status,
            start_time=span.start_time,
            end_time=span.end_time,
            instrumentation_scope=span.instrumentation_scope,
        )
```

### Issue 2 of 2
langfuse/_client/span_exporter.py:661-669
**Prefilter false-positive on `data:` substrings in URLs** — `_may_contain_serialized_media` triggers on any string containing the literal `data:` (e.g., `"https://api.example.com/data:types"`), causing unnecessary `json.loads` parsing before the string is returned unchanged. No data is lost, but large attribute strings with `data:` in a URL incur a redundant parse on every export.

_{Reviews (1): Last reviewed commit: "test(tracing): cover export-stage span m..." | Re-trigger Greptile}

github-actions · 2026-05-07T15:47:18Z

@claude review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c9aca2e48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T16:58:06Z

+            context=span.context,
+            parent=span.parent,
+            resource=span.resource,
+            attributes=attributes,


Preserve dropped attribute counts when cloning exported spans

Rebuilding each ReadableSpan with a plain dict for attributes drops OpenTelemetry’s bounded-attribute metadata, so spans that exceeded attribute limits are exported with dropped_attributes_count=0. This silently degrades trace fidelity for any high-cardinality spans, and downstream OTLP consumers can no longer tell attributes were truncated. Keep the cloned attributes in a bounded container (or preserve the original dropped-count metadata) when constructing the transformed span.

Useful? React with 👍 / 👎.

claude · 2026-05-07T17:06:50Z

+        return ReadableSpan(
+            name=span.name,
+            context=span.context,
+            parent=span.parent,
+            resource=span.resource,
+            attributes=attributes,
+            events=span.events,
+            links=span.links,
+            kind=span.kind,
+            status=span.status,
+            start_time=span.start_time,
+            end_time=span.end_time,
+            instrumentation_scope=span.instrumentation_scope,
+        )


🟡 Cloned spans emit dropped_attributes_count=0 to OTLP regardless of how many attributes the original span dropped, because _clone_span constructs the new ReadableSpan with a plain dict. ReadableSpan.dropped_attributes only reports the real count when self._attributes is a BoundedAttributes instance and otherwise returns 0, so any third-party span that hit OTEL_ATTRIBUTE_COUNT_LIMIT before reaching this exporter loses that fidelity signal in the OTLP payload. Fix is mechanical: wrap the dict in BoundedAttributes(maxlen=None, attributes=attributes, immutable=True, max_value_len=None) when constructing the cloned span.

Extended reasoning...

What the bug is

_clone_span (langfuse/_client/span_exporter.py:373-386) builds the cloned ReadableSpan by passing the post-transform attributes argument as a plain Dict[str, AttributeValue]. The OTel SDK's ReadableSpan.__init__ stores the argument verbatim into self._attributes, and the dropped_attributes property is implemented as:

@property def dropped_attributes(self) -> int: if isinstance(self._attributes, BoundedAttributes): return self._attributes.dropped return 0

Because we hand it a plain dict, the property returns 0 for every cloned span, regardless of whether the source span originally had attributes silently dropped due to OTEL_ATTRIBUTE_COUNT_LIMIT.

Why existing code does not prevent it

The upstream call site (export, line ~50) already converts the source BoundedAttributes into a plain dict via dict(span.attributes or {}) so the transformation pipeline can mutate it. The cloned span is then constructed with that same dict. There is no path in _clone_span that ever wraps it back into a BoundedAttributes, so the type check inside dropped_attributes always fails on the cloned span.

Impact

The OTLP encoder at opentelemetry/exporter/otlp/proto/common/_internal/trace_encoder/__init__.py:130 reads sdk_span.dropped_attributes directly to populate dropped_attributes_count in the protobuf. Since LangfuseSpanProcessor now always wraps the underlying exporter in LangfuseTransformingSpanExporter whenever media_manager is provided (which is unconditional in resource_manager.py), every span exported by the Langfuse processor will report dropped_attributes_count=0 in OTLP. Practical scope is limited to spans that exceed OTEL_ATTRIBUTE_COUNT_LIMIT (default 128), so most workloads won't notice — but for instrumentations that emit very wide spans (e.g. heavily annotated genai prompts), downstream observability backends lose a useful fidelity indicator that this PR's export-stage transform path silently strips.

Step-by-step proof

Source span finishes with 200 attributes set, exceeding the default 128 limit. The SDK's BoundedAttributes keeps 128 and increments dropped to 72. span.attributes.dropped == 72 and the OTLP encoder, called on the original span, would emit dropped_attributes_count=72.

LangfuseTransformingSpanExporter.export runs dict(span.attributes or {}), producing a plain dict with the surviving 128 entries. The dropped count is not preserved on the dict.

Media processing and optional mask_otel_spans patches run on this dict.

_clone_span(span=span, attributes=attributes) constructs a new ReadableSpan(..., attributes=attributes, ...). self._attributes is now a plain dict, not a BoundedAttributes.

The wrapped OTLP exporter calls sdk_span.dropped_attributes on the cloned span. The isinstance(self._attributes, BoundedAttributes) guard fails, so the property returns 0.

The serialized protobuf carries dropped_attributes_count=0 instead of 72.

How to fix

Wrap the post-transform dict in a BoundedAttributes with effectively-unlimited bounds when building the cloned span, so the SDK property has something to introspect. Carry over the original dropped count so the cloned span reports the same number the original would have:

from opentelemetry.attributes import BoundedAttributes attributes=BoundedAttributes( maxlen=None, attributes=attributes, immutable=True, max_value_len=None, ),

Note: the upstream dict() conversion at the top of export already loses the .dropped value from the source BoundedAttributes, so the simplest fix that preserves fidelity is to keep a reference to the original dropped count alongside the dict and pass it back through (e.g. via BoundedAttributes's constructor accepting the original dropped count, or by tracking the source span's dropped separately and re-applying it). Greptile's suggested wrap is the minimum viable patch and at least removes the unconditional 0, even if reconstructing the exact original count requires plumbing through one extra integer.

blacksmith-sh · 2026-05-07T20:20:52Z

Found 3 test failures on Blacksmith runners:

Failures

Test	View Logs
`TestMultiProjectSetup/test_context_isolation_between_projects`	View Logs
`TestMultiProjectSetup/test_cross_project_tracing`	View Logs
`TestMultiProjectSetup/test_sdk_client_isolation`	View Logs

claude · 2026-05-07T20:33:20Z

+            events=span.events,
+            links=span.links,


🟡 Same root cause as the already-flagged dropped_attributes_count issue, but _clone_span also passes events=span.events and links=span.links (lines 444-445), and ReadableSpan.events/ReadableSpan.links are property accessors that return plain tuples — not the original BoundedList instances. The cloned span's dropped_events / dropped_links properties guard with isinstance(self._events, BoundedList) (resp. _links) and return 0 otherwise, so every cloned span will report dropped_events_count=0 and dropped_links_count=0 in OTLP regardless of how many were dropped on the original. While fixing the attributes case, wrap events/links back in BoundedList (or carry over the source .dropped counts) to close the symmetric fidelity gap.

Extended reasoning...

What the bug is

_clone_span at langfuse/_client/span_exporter.py:434-451 builds the cloned ReadableSpan by passing events=span.events and links=span.links. The OTel SDK's ReadableSpan.events and ReadableSpan.links properties return plain tuples, not the original BoundedList instances. The OTel SDK's ReadableSpan.dropped_events / dropped_links properties then guard with isinstance(self._events, BoundedList) (resp. self._links) and return 0 otherwise. As a result, every cloned span exported through LangfuseTransformingSpanExporter reports dropped_events_count=0 and dropped_links_count=0 in OTLP, even when the original span had events or links silently dropped by SpanLimits.max_events / max_links.

Verified against OTel SDK source

In opentelemetry/sdk/trace/__init__.py:

@property def dropped_events(self) -> int: if isinstance(self._events, BoundedList): return self._events.dropped return 0 @property def dropped_links(self) -> int: if isinstance(self._links, BoundedList): return self._links.dropped return 0 @property def events(self) -> Sequence[Event]: return tuple(event for event in self._events) @property def links(self) -> Sequence[trace_api.Link]: return tuple(link for link in self._links)

The OTLP encoder at opentelemetry/exporter/otlp/proto/common/_internal/trace_encoder/__init__.py:131-132 reads these directly:

dropped_events_count=sdk_span.dropped_events, dropped_links_count=sdk_span.dropped_links,

So the cloned span's tuples cause both encoder fields to be unconditionally 0.

Why existing code does not prevent it

Nothing in _clone_span rewraps the property-returned tuples into BoundedList. The ReadableSpan.__init__ simply stores them verbatim into self._events/self._links, and the type guard in dropped_events/dropped_links then fails on every cloned span.

Step-by-step proof for events

Source span ends with 200 events recorded; SpanLimits.max_events defaults to 128, so the SDK's BoundedList keeps 128 and increments dropped to 72. source._events is a BoundedList whose .dropped == 72. Encoded directly, the OTLP payload would carry dropped_events_count=72.

Export hits LangfuseTransformingSpanExporter._clone_span(span=span, attributes=attributes).

span.events runs the property: tuple(event for event in self._events) → a tuple of 128 events. The original BoundedList is not propagated.

The new ReadableSpan is constructed with events=<tuple of 128>. new._events is a tuple.

The wrapped OTLP exporter calls sdk_span.dropped_events on the cloned span. isinstance(self._events, BoundedList) → False, so the property returns 0.

The serialized protobuf carries dropped_events_count=0 instead of 72.

The same path applies to links with SpanLimits.max_links (default 128).

Impact

This is a fidelity regression on the export-stage transform path that this PR introduces. LangfuseSpanProcessor always wires media_manager into LangfuseTransformingSpanExporter whenever tracing is enabled, so every span exported by the Langfuse processor flows through this clone. Practical scope is narrow — only spans that exceeded max_events/max_links (default 128 each) are affected, which is uncommon but legitimately occurs for long-running spans (workflow / agent loops) or instrumentations that emit many events per span. Severity is nit, matching the symmetric dropped_attributes_count issue raised by the other reviewers.

How to fix

Same shape as Greptile's suggested fix for attributes — wrap events/links back into a BoundedList with effectively-unlimited bounds at clone time, and ideally carry over the source's .dropped count so the cloned span reports the original number rather than just preserving the type guard. The isinstance check is what flips the value from 0 to the real count, so the minimum viable patch is a BoundedList wrap; preserving the actual source .dropped is a small extra step that fully restores fidelity. Worth folding into the same fix the author makes for attributes.

hassiebp force-pushed the codex/export-stage-span-mask branch from 9092ee3 to 98b2a5e Compare May 7, 2026 12:45

hassiebp changed the title ~~[codex] feat(tracing): add export-stage span mask~~ feat(tracing): add export-stage span mask May 7, 2026