Skip to content

Validate agent-based LLMObs meta_struct export [python@0829a33d95872059d52363b6472a5f7735c6f30d]#7047

Draft
mabdinur wants to merge 2 commits into
mainfrom
munir/test-llmo-system-test-changes
Draft

Validate agent-based LLMObs meta_struct export [python@0829a33d95872059d52363b6472a5f7735c6f30d]#7047
mabdinur wants to merge 2 commits into
mainfrom
munir/test-llmo-system-test-changes

Conversation

@mabdinur
Copy link
Copy Markdown
Contributor

@mabdinur mabdinur commented May 29, 2026

Summary

Validation-only draft PR. Not for merge. The goal is purely to run the full system-tests CI and confirm it passes against:

  • dd-trace-py PR #18254 (munir/agentbased-llmo, commit 0829a33) — agent-based LLMObs export, where LLMObs payloads ride APM traces via meta_struct["_llmobs"] for kept traces.
  • dd-apm-test-agent PR #370 — extracts/synthesizes EVP LLMObs requests from meta_struct, published as the dev image ddapm-test-agent:dev-llmobs-meta-struct.

Changes

Bump the test agent to the dev dev-llmobs-meta-struct image across all LLM-adjacent surfaces (other scenarios stay on the released image):

  • INTEGRATION_FRAMEWORKS — anthropic / openai / google_genai LLMObs tests.
  • PARAMETRIC — covers tests/parametric/test_llm_observability/.
  • VCRCassettesContainer — backs INTEGRATION_FRAMEWORKS, AI_GUARD, and AI_GUARD_TELEMETRY.

How the tracer build is selected

The [python@0829a33d95872059d52363b6472a5f7735c6f30d] marker in the PR title puts Python into system-tests dev mode. dd-trace-py no longer publishes wheels keyed by dev-branch name, but per-commit wheels are still available in S3, so the marker pins the exact commit SHA of the PR head (load-binary.sh python -> python-load-from-s3 -> prebuilt wheel from dd-trace-py-builds).

Note: S3 artifacts age out after ~2 weeks. If the build later fails to find the wheel, re-run the dd-trace-py GitLab CI for the commit (or update the SHA to a fresher commit).

Expected CI behavior

  • The Fail if target branch is specified job is expected to FAIL by design — it is a merge-guard that fires whenever a [lang@...] marker is present. This is not a test failure.
  • All actual system-tests jobs (LLMObs integration + parametric + AI Guard) should pass.

Test plan

  • INTEGRATION_FRAMEWORKS LLMObs suites pass in CI against the tracer commit + dev test agent
  • PARAMETRIC LLMObs tests pass
  • AI_GUARD / AI_GUARD_TELEMETRY pass
  • No unexpected regressions in other scenarios

Point the integration_frameworks scenario at the dev-tagged
dd-apm-test-agent image (dev-llmobs-meta-struct) that synthesizes EVP
LLMObs requests from APM trace meta_struct["_llmobs"], so the suite can
validate dd-trace-py's agent-based LLMObs export (PR #18254).

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

CODEOWNERS have been resolved as:

utils/_context/_scenarios/integration_frameworks.py                     @DataDog/system-tests-core
utils/_context/_scenarios/parametric.py                                 @DataDog/system-tests-core
utils/_context/containers.py                                            @DataDog/system-tests-core

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented May 29, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 15 Pipeline jobs failed

Testing the test | System Tests (golang, dev) / parametric / parametric (1)   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). 2 failed assertions in test_distinct_aggregationkeys_TS003: expected 1 bucket containing stats but found 2 buckets instead.

🧪 1 Test failed

tests.parametric.test_library_tracestats.Test_Library_Tracestats.test_distinct_aggregationkeys_TS003[agent_env0-library_env0, parametric-golang] from system_tests_suite   View in Datadog (Fix with Cursor)
AssertionError: There should be one bucket containing the stats
assert 2 == 1
 &#43;  where 2 = len([{&#39;Duration&#39;: 10000000000, &#39;Start&#39;: 1780085010000000000, &#39;Stats&#39;: [{&#39;Duration&#39;: 3851002, &#39;ErrorSummary&#39;: store: {}}, m...key:0, offset:0, zero_count: 0.0, count: 0.0, sum: 0.0, min: inf, max: -inf, &#39;Errors&#39;: 0, &#39;GRPCStatusCode&#39;: &#39;&#39;, ...}]}])

self = &lt;tests.parametric.test_library_tracestats.Test_Library_Tracestats object at 0x7f8b0cb13440&gt;
test_agent = &lt;utils.docker_fixtures._test_agent.TestAgentAPI object at 0x7f8b0c547c20&gt;
test_library = &lt;utils.docker_fixtures._test_clients._test_client_parametric.ParametricTestClientApi object at 0x7f8adaa6b3b0&gt;

    @enable_tracestats()
    @enable_agent_version()
...

DataDog/system-tests | Ubuntu_20_amd64.HOS: [test-app-php]   View in Datadog   GitLab

🔄 Retry job. This looks flaky and may succeed on retry. Exception launching AWS provision step remote command. AssertionError: Previous errors in the virtual machine provisioning steps.

Testing the test | Fail if target branch is specified   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. This PR can't be merged due to the title specifying a target branch.

View all 15 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: fd32dcb | Docs | Datadog PR Page | Give us feedback!

Extend the dev-tagged dd-apm-test-agent (dev-llmobs-meta-struct) to the
remaining LLM-adjacent surfaces so the draft validates no regressions:

- PARAMETRIC: exercises tests/parametric/test_llm_observability.
- VCRCassettesContainer: backs INTEGRATION_FRAMEWORKS plus the AI_GUARD
  and AI_GUARD_TELEMETRY scenarios.

Non-LLM references (APMTestAgentContainer used by DOCKER_SSI, and the
k8s lib-injection test agent) are intentionally left on their pinned
versions.

Co-authored-by: Cursor <cursoragent@cursor.com>
@mabdinur mabdinur changed the title Validate agent-based LLMObs meta_struct export [python@munir/agentbased-llmo] Validate agent-based LLMObs meta_struct export [python@0829a33d95872059d52363b6472a5f7735c6f30d] May 29, 2026
@mabdinur mabdinur closed this May 29, 2026
@mabdinur mabdinur reopened this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant