Support virtual gen ai analysis for otlp and zipkin traces by peachisai · Pull Request #13767 · apache/skywalking

peachisai · 2026-03-29T08:54:28Z

If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
Update the CHANGES log.

peachisai · 2026-03-29T08:55:08Z

open a new PR for #13766

wu-sheng · 2026-03-29T08:56:16Z

Why open a new one? Usually keeping updating the old one is better to keep context and change discussion clear.

peachisai · 2026-03-29T09:49:12Z

Why open a new one? Usually keeping updating the old one is better to keep context and change discussion clear.

I accidentally overwrote some code from the previous commit, so I've started a new branch.
And that' why I commented old PR in this session to link the previouse context

wu-sheng · 2026-03-29T10:04:51Z

OK. You should be able to just revert wrong changes. That's fine.
Let's keep thllon this.

wu-sheng

Good progress from #13766 — most bugs are fixed. A few remaining issues:

Bug

Inverted error status logic
Line 227:
```
metrics.setStatus(StringUtil.isNotBlank(getZipkinSpanTagValue(tags, "error")));
```
This sets status=true (success) when the error tag is present, and status=false (failure) when it's absent. It's inverted. The original extractMetricsFromSWSpan uses metrics.setStatus(!span.getIsError()) — so status=true means success. Should be:
```
metrics.setStatus(StringUtil.isBlank(getZipkinSpanTagValue(tags, "error")));
```

Design

SpanForward still has duplicate source conversion methods
SpanForward has its own toVirtualGenAIServiceMeta(), toVirtualGenAIInstance(), toProviderAccess(), toModelAccess() (lines 474-515) that are never called — processGenAILogic() correctly uses transferToSources() from the service. These 4 dead methods should be removed.
Cost tag mutation still lives in SpanForward
The setEstimatedCost() method in SpanForward mutates the ZipkinSpan tags before it's persisted. This is fine functionally, but the BigDecimal.valueOf(totalEstimatedCost) takes a long (already Math.round()-ed in the analyzer). So dividing an already-rounded integer by 1,000,000 loses most precision. Consider computing the cost display value from the raw double totalCost before rounding, or passing the raw cost through GenAIMetrics.
NamingControl passed as parameter to transferToSources() is awkward API design
NamingControl is a core service — the analyzer could hold a reference to it (injected at construction time or via module manager) instead of requiring every caller to pass it. This would simplify the interface.

Minor

Changes log trailing space: "* Support virtual GenAI analysis for otlp and zipkin traces ." — trailing space before the period.
Missing trailing newline in otlp-virtual-genai.yaml and zipkin-virtual-genai.yaml (both end without newline per the diff).
toVirtualGenAIInstance inconsistency: In transferToSources() the instance's serviceName is set to namingControl.formatServiceName(metrics.getProviderName()), but the original VirtualGenAIProcessor.toInstance() set it to metrics.getProviderName() without formatting. Since NamingControl.formatServiceName may truncate, this could cause service name mismatches between the service and instance registrations when the provider name is very long. Verify this is intentional. (The same difference exists for the ServiceMeta.name in the original vs new code — in the original VirtualGenAIProcessor, serviceName on the instance was NOT formatted, but the ServiceMeta.name was.)

wu-sheng

Please fix the following issues:

Bug: Inverted error status logic — StringUtil.isNotBlank(getZipkinSpanTagValue(tags, "error")) sets status=true when the error tag IS present, which means "success when there's an error". Should be StringUtil.isBlank(...).
Dead code — The 4 private methods toVirtualGenAIServiceMeta(), toVirtualGenAIInstance(), toProviderAccess(), toModelAccess() in SpanForward are never called. processGenAILogic() already delegates to transferToSources(). Please remove them.
Cost precision loss — setEstimatedCost() takes metrics.getTotalEstimatedCost() which is a long (already rounded via Math.round(totalCost)), then divides by 1,000,000. The rounding already lost the fractional cost. Consider passing the raw double totalCost through GenAIMetrics so the display value retains precision.
API design — transferToSources(GenAIMetrics, NamingControl) requires every caller to pass NamingControl. Since GenAIMeterAnalyzer is a singleton service, inject NamingControl at construction time and simplify the signature to transferToSources(GenAIMetrics).

wu-sheng

All 4 issues from the previous review are fixed:

Error status logic — now correctly uses StringUtil.isBlank(...) (success = no error tag)
Dead code removed — the 4 unused toVirtualGenAI* / toProviderAccess / toModelAccess methods deleted from SpanForward
Cost precision — GenAIMetrics.totalEstimatedCost changed to double, Math.round() deferred to transferToSources() when writing to GenAIProviderAccess/GenAIModelAccess (which need long), and SpanForward.setEstimatedCost() now receives the raw double for the Zipkin tag
NamingControl injected — now injected into GenAIMeterAnalyzer constructor via GenAIAnalyzerModuleProvider.prepare(), transferToSources() signature simplified

Minor nit (non-blocking): extractMetricsFromSWSpan still has metrics.setTotalEstimatedCost(Math.round(totalCost)) which rounds then widens back to double — should be just totalCost like the Zipkin path. And the changes.md trailing space "traces ." is still there.

wu-sheng · 2026-03-29T13:35:04Z

And you need to fix code style issue.

fix code style

wu-sheng

LGTM. All previous issues addressed:

extractMetricsFromSWSpan now stores raw totalCost (no premature rounding)
NamingControl injected via setNamingControl() in start() phase — correct lifecycle for SkyWalking module system (prepare() creates the service, start() wires cross-module deps)
Private methods use the instance field directly instead of parameter passing
Changes.md trailing space fixed

wu-sheng

Two suggestions:

1. Share the Python app between OTLP and Zipkin e2e tests

The two Python scripts (otlp-virtual-genai/openai-call.py and zipkin-virtual-genai/openai-call.py) and Dockerfiles are nearly identical — only the exporter differs (3 lines).

Use opentelemetry-instrument (auto-instrumentation CLI) to eliminate all OTel boilerplate from the script. The script becomes pure business logic (just OpenAI client calls), and the exporter is selected entirely via env vars:

One shared openai-call.py (no TracerProvider, no exporter code)
One shared Dockerfile.python (install opentelemetry-distro + both exporters)
OTLP docker-compose: OTEL_TRACES_EXPORTER=otlp, entrypoint opentelemetry-instrument python3 /openai-call.py
Zipkin docker-compose: OTEL_TRACES_EXPORTER=zipkin, entrypoint opentelemetry-instrument python3 /openai-call.py

The expected files are also identical and could be shared.

2. Documentation: clarify which semantic convention is used

virtual-genai.md should clarify that the tag keys follow the OpenTelemetry GenAI Semantic Conventions. Currently the "Span Contract" section only describes the SkyWalking native agent contract (Exit span, layer == GENAI). For OTLP/Zipkin sources, there's no "Exit span" or "layer" — the OAP detects GenAI spans by checking for the gen_ai.response.model tag.

The doc should distinguish:

SkyWalking native agent: Exit span + SpanLayer.GenAI + gen_ai.* tags
OTLP / Zipkin: Any span with gen_ai.response.model tag present (following OTel GenAI Semantic Conventions)

Also note: the current OTel Python instrumentation emits gen_ai.system (old semconv) by default, not gen_ai.provider.name (new semconv). SkyWalking falls back to model-name prefix matching when gen_ai.provider.name is absent. Consider also reading gen_ai.system as a fallback for broader compatibility.

wu-sheng · 2026-03-30T01:05:26Z

@peachisai Most are good, Could you polish a little further?

peachisai · 2026-03-30T01:30:58Z

@wu-sheng
OK, can.
Regarding the gen_ai.system tag, it is deprecated, so I don't think we need to support it. I'm considering opening a new PR in the OpenTelemetry community to update this tag to follow the latest semantic conventions.

peachisai added 2 commits March 29, 2026 16:51

fix

483d906

fix changes

a8d44ab

wu-sheng added backend OAP backend related. feature New feature tracing Distributed tracing labels Mar 29, 2026

wu-sheng added this to the 10.4.0 milestone Mar 29, 2026

wu-sheng reviewed Mar 29, 2026

View reviewed changes

wu-sheng requested changes Mar 29, 2026

View reviewed changes

fix bugs

e1fceb9

wu-sheng previously approved these changes Mar 29, 2026

View reviewed changes

fix namingControl issue

0e6498f

peachisai dismissed wu-sheng’s stale review via 0e6498f March 29, 2026 13:36

peachisai added 3 commits March 29, 2026 21:39

fix totalCost issue

60759b5

fix code style

e5c14a9

Merge pull request #86 from peachisai/dev

1c15d1c

fix code style

wu-sheng approved these changes Mar 30, 2026

View reviewed changes

wu-sheng reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support virtual gen ai analysis for otlp and zipkin traces#13767

Support virtual gen ai analysis for otlp and zipkin traces#13767
peachisai wants to merge 7 commits intoapache:masterfrom
peachisai:Support-Virtual-GenAI-analysis-for-OTLP-and-Zipkin-traces

peachisai commented Mar 29, 2026

Uh oh!

peachisai commented Mar 29, 2026 •

edited

Loading

Uh oh!

wu-sheng commented Mar 29, 2026

Uh oh!

peachisai commented Mar 29, 2026 •

edited

Loading

Uh oh!

wu-sheng commented Mar 29, 2026

Uh oh!

wu-sheng left a comment

Uh oh!

wu-sheng left a comment

Uh oh!

wu-sheng left a comment

Uh oh!

wu-sheng commented Mar 29, 2026

Uh oh!

wu-sheng left a comment

Uh oh!

wu-sheng left a comment

Uh oh!

wu-sheng commented Mar 30, 2026

Uh oh!

peachisai commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peachisai commented Mar 29, 2026

Uh oh!

peachisai commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wu-sheng commented Mar 29, 2026

Uh oh!

peachisai commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wu-sheng commented Mar 29, 2026

Uh oh!

wu-sheng left a comment

Choose a reason for hiding this comment

Bug

Design

Minor

Uh oh!

wu-sheng left a comment

Choose a reason for hiding this comment

Uh oh!

wu-sheng left a comment

Choose a reason for hiding this comment

Uh oh!

wu-sheng commented Mar 29, 2026

Uh oh!

wu-sheng left a comment

Choose a reason for hiding this comment

Uh oh!

wu-sheng left a comment

Choose a reason for hiding this comment

1. Share the Python app between OTLP and Zipkin e2e tests

2. Documentation: clarify which semantic convention is used

Uh oh!

wu-sheng commented Mar 30, 2026

Uh oh!

peachisai commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

peachisai commented Mar 29, 2026 •

edited

Loading

peachisai commented Mar 29, 2026 •

edited

Loading