Skip to content

tracing: migrate opentracing-cpp → opentelemetry-cpp (API only)#201

Merged
chen3feng merged 7 commits into
masterfrom
tracing-opentelemetry
Jun 19, 2026
Merged

tracing: migrate opentracing-cpp → opentelemetry-cpp (API only)#201
chen3feng merged 7 commits into
masterfrom
tracing-opentelemetry

Conversation

@chen3feng

Copy link
Copy Markdown
Collaborator

opentracing-cpp is deprecated and superseded by OpenTelemetry. flare ships no real tracer — it exposes a pluggable abstraction (TracingOpsProvider / QuickerSpan / TracingOps) over the tracing-API types, with concrete providers plugged in by users. This re-bases that abstraction on the opentelemetry-cpp API (header-only, v1.27.0) and removes opentracing-cpp from both build systems.

Scope / key decisions

  • API only — no SDK/exporter, so no protobuf/grpc is pulled in and the deliberately-pinned, abseil-free protobuf 3.21.12 stays intact. (The SDK+OTLP path would conflict with it.)
  • Keep the abstraction (perf buffering + pluggability), rewriting only the types/bodies.
  • Keep the wire format: rpc_meta.proto's bytes tracing_context stays an opaque, provider-serialized blob (no W3C text-map propagators).

Source (flare/rpc/tracing/)

  • tracing_ops_provider.h: virtuals re-typed to OTel; new flare-owned SpanStartOptions (parent SpanContext + SpanKind + timestamps + start attrs) replaces opentracing::StartSpanOptions/ChildOf; Extractstd::optional<SpanContext>.
  • tracing_ops.h/.cc: span_ is nostd::shared_ptr<trace::Span>; SetAttribute/AddEvent/End; SpanContext is a value type; buffered ops keep flare's owning variant and convert to the non-owning AttributeValue only at flush (avoids a use-after-free).
  • framework_tags.hconstexpr std::string_view; string_view_interop.hToStd/ToOtel. Standard-tag table is a literal allowlist; span.kind moves from a tag to SpanStartOptions::kind.
  • Call sites rpc_channel.cc / normal_connection_handler.cc updated. rpc_meta.proto field unchanged (comment updated).
  • Tests/benchmark rewritten onto a recording fake otel::trace::Span (OTel's noop span discards everything and can't be asserted on). The integration test asserts span kind via a provider-recorded span.kind attribute.

Build

  • blade: BLADE_ROOT vcpkg_config gains opentelemetry-cpp (default features → no exporters); thin wrapper //thirdparty/opentelemetry-cpp:api. The opentelemetry_api component is header-only (no .a), so the wrapper references opentelemetry_common for the include dir (which carries both the otel + absl headers); flare still consumes only API headers. Vendored thirdparty/opentracing-cpp/ removed.
  • bazel: deps.bzl http_archive for opentelemetry-cpp 1.27.0 + api-only bazel/opentelemetry-cpp.BUILD (header-only; bundled nostdno abseil needed). patch_cmds drop the archive's nested BUILD files so the api headers can be globbed from one package. opentracing BUILD/patch + .bazelignore/.gitattributes entries removed.

Verification

Both build systems build //flare/rpc/tracing/... and pass tracing_ops_test + integration_test (full RPC flow: client/server spans, context propagation through the bytes field, framework-tag translation, logs).

🤖 Generated with Claude Code

chen3feng and others added 4 commits June 19, 2026 22:21
opentracing-cpp is deprecated/superseded by OpenTelemetry. flare ships no real
tracer -- it exposes a pluggable abstraction (TracingOpsProvider / QuickerSpan /
TracingOps) over the tracing-API types, with providers plugged in by users.
Re-base that abstraction on the opentelemetry-cpp **API** (header-only, v1.27.0)
and drop opentracing-cpp from both build systems.

Scope: API only -- no SDK/exporter, so no protobuf/grpc is pulled in and the
deliberately-pinned (abseil-free) protobuf 3.21.12 stays intact. The
abstraction and the opaque `bytes tracing_context` wire format are kept; only
the types/bodies change.

Source (flare/rpc/tracing/):
- tracing_ops_provider.h: re-typed virtuals; new flare-owned SpanStartOptions
  (parent SpanContext + SpanKind + timestamps + start attrs) replacing
  opentracing::StartSpanOptions/ChildOf. Extract -> std::optional<SpanContext>.
- tracing_ops.h/.cc: span_ is nostd::shared_ptr<trace::Span>; SetAttribute/
  AddEvent/End; SpanContext is a value type; buffered ops keep flare's owning
  variant and convert to (non-owning) AttributeValue only at flush.
- framework_tags.h -> constexpr std::string_view; string_view_interop.h ->
  ToStd/ToOtel (nostd<->std). Standard-tag table is a literal allowlist;
  span.kind moves from a tag to SpanStartOptions::kind.
- Call sites rpc_channel.cc / normal_connection_handler.cc updated; rpc_meta.proto
  comment updated (wire field unchanged).
- Tests/benchmark rewritten onto a recording fake otel::trace::Span (OTel's noop
  span can't be asserted on); integration test asserts span kind via the
  provider-recorded "span.kind" instead of an opentracing tag.

Build:
- blade: BLADE_ROOT vcpkg_config gains opentelemetry-cpp; thin wrapper
  //thirdparty/opentelemetry-cpp:api (refs opentelemetry_common -- the api
  component is header-only with no .a; that lib's include dir carries the api +
  absl headers). opentracing thirdparty dir removed.
- bazel: deps.bzl http_archive for opentelemetry-cpp 1.27.0 + api-only
  bazel/opentelemetry-cpp.BUILD (header-only, bundled nostd -> no abseil);
  patch_cmds drop the archive's nested BUILDs so the api headers can be globbed.
  opentracing BUILD/patch + .bazelignore/.gitattributes entries removed.

Verified: blade and bazel both build //flare/rpc/tracing/... and pass
tracing_ops_test + integration_test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The repo-wide `bazel build ...` job parses every BUILD file; the blade thin
wrapper thirdparty/opentelemetry-cpp/BUILD uses blade-only syntax
(vcpkg#...:lib, visibility='PUBLIC'). Add it to .bazelignore (the opentracing
entry was removed but its replacement wasn't added).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread .claude/settings.json Outdated
@@ -0,0 +1,8 @@
{

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要提交

Comment thread .claude/settings.local.json Outdated
@@ -0,0 +1,250 @@
{

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要提交

@chen3feng

Copy link
Copy Markdown
Collaborator Author

Code review 意见(非阻塞)

整体方向正确:API-only 迁移克制到位(不引入 protobuf/grpc、保留 bytes tracing_context 不透明 blob),缓冲层"owning variant → flush 时才转 non-owning AttributeValue"的生命周期处理正确,无 UAF,也无阻塞性 bug。以下几点建议留待后续处理。

1. forcibly_sampled 的 attribute 改名可能让现有 provider 失效

flare/rpc/tracing/tracing_ops.h:358

span_->SetAttribute("flare.sampling_priority", 1);

旧代码设的是 opentracing 标准 tag sampling.priority。新名字带了 flare. 前缀,但没有走 provider_->SetFrameworkTag 翻译(直接 SetAttribute)。结果是个混合体:监听标准 sampling.priority 的 provider 收不到这个信号;而带 flare. 前缀本应被翻译,这里却绕过了。"直接 set 不翻译"本身与旧代码一致(不是回归),但 key 改名 + 前缀语义不一致这点,建议在 flare/doc/tracing.md 里向 provider 实现者说明,或考虑让它走 framework-tag 通道。

2. Log → Event 的映射:空 event name

flare/rpc/tracing/tracing_ops.cc:136

span_->AddEvent(e.GetKey(), {{"value", attr}});

flare log 的 key 变成 OTel event 名,value 放在固定属性名 "value" 下。对常见的单参 AddTracingLog(value)(key 为空,见 integration_test.cc 断言 pair("", kLogValue))会产生空名 event。很多 OTel 后端按 event name 分组/展示,空名 event 渲染会比较难看。语义上可接受,但建议给空 key 一个默认 event 名(如 "log"),并在 doc 里说明:旧 opentracing 把多个 field 归到一条带时间戳的 log record,现在每条 log 是独立 event。

3. variant 对 C++ 标准的隐式依赖

flare/rpc/tracing/tracing_ops.h:174-176

QuickerValue variant 的 bool 在前、std::string 在后。调用方传 const char*(如 rpc_channel.ccfull_name().c_str())。当前两套构建都是 C++20(-std=gnu++2a / c++2a),P0608R3 保证 const char* 正确选中 std::string,没问题。但在 C++17 下这会静默选中 bool(经典陷阱)。当前安全,只是这层正确性隐式依赖编译标准且无显式保护——可考虑在调用点用 std::string(...) 代替 .c_str(),或加注释/static_assert 防御未来降级。

4. nostd / provider ABI(信息性)

当前保留 OTel nostd:: 类型(经讨论确认:与 blade 侧预编译 vcpkg 库的 bundled nostd 天然一致,故有意不切到 std)。需要注意的硬不变量:TracingOpsProvider 虚接口上的 nostd::shared_ptr/variant/string_view 要求 flare 与任何 provider 插件采用相同的 OTel STL 配置(都不定义 HAVE_ABSEIL / OPENTELEMETRY_STL_VERSION),否则会在接口边界 ABI 不匹配。建议在 doc 里写明这条约束,供未来写 provider 的人参考。

5. 测试中的小问题(可忽略)

  • tracing_ops_test.ccDummySpan::tags/logsinline static,跨用例不清空(当前只有一个用例写入,无影响)。
  • integration_test.ccreported_spansSleepFor 后无锁读取(测试场景 benign)。

已核对正确:无残留 opentracing 代码引用(仅注释);ReportViaDpc move 走 span_FLARE_CHECK(!Tracing()) 成立;非采样路径 span_ = nullptr 不再隐式 End()(符合 OTel 语义);FlushBufferedOpsscratch 生命周期覆盖 Set*/AddEvent;IsStandardTag 正确移除 span.kind;bazel patch_cmds-mindepth 2 删嵌套 BUILD 的思路正确;LICENSE/README/.bazelignore 归属更新干净。

chen3feng and others added 3 commits June 19, 2026 23:49
The TracingOpsProvider interface uses OpenTelemetry API types
(nostd::shared_ptr<Span>, nostd::string_view, AttributeValue) directly.
The nostd:: backing is a compile-time choice (bundled / std via
OPENTELEMETRY_STL_VERSION / abseil) and the three are mutually
incompatible types, so a provider must be built with the same
OpenTelemetry configuration as flare. flare leaves the default nostd
in place.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Prevents accidentally re-committing .claude/settings*.json (cf. review
comments on PR #201).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- forcibly_sampled now routes `flare.sampling_priority` through the
  provider's `SetFrameworkTag` translation (before `IsSampled`) instead of
  a raw `SetAttribute`, keeping the `flare.`-prefixed tag consistent with
  every other framework tag and letting providers honor the force-sample
  signal. Add `ext::kSamplingPriority`.
- An empty log key (single-arg `AddTracingLog`) now materializes as the
  default event name "log" so backends that group/display by event name
  don't render a blank one; document the per-log-event mapping.
- Guard `QuickerValue`'s reliance on P0608R3 (C++20) with a static_assert
  so a future standard downgrade can't silently bind `const char*` to bool.
- Reset `DummySpan`'s inline-static tags/logs at the start of the recording
  test so assertions can't observe leftovers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chen3feng chen3feng merged commit 92a08ae into master Jun 19, 2026
14 of 15 checks passed
@chen3feng chen3feng deleted the tracing-opentelemetry branch June 19, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant