Skip to content

Adding core OTEL layer with accompanying sample and tests#342

Closed
rodrigobr-msft wants to merge 3 commits intomainfrom
users/robrandao/otel-core
Closed

Adding core OTEL layer with accompanying sample and tests#342
rodrigobr-msft wants to merge 3 commits intomainfrom
users/robrandao/otel-core

Conversation

@rodrigobr-msft
Copy link
Copy Markdown
Contributor

@rodrigobr-msft rodrigobr-msft commented Mar 26, 2026

This pull request introduces a new telemetry subsystem for the microsoft-agents-hosting-core package, providing a structured and maintainable way to instrument the codebase with OpenTelemetry spans and metrics. The design centralizes telemetry logic, making it easier to manage and extend, and avoids scattering telemetry code throughout the application. Additionally, it updates dependencies and error handling for improved robustness.

Key changes include:

Telemetry Subsystem Implementation

  • Added a new telemetry module, including core components like agents_telemetry, span wrappers (BaseSpanWrapper, SimpleSpanWrapper), attribute definitions, resource configuration, and utility functions for extracting telemetry-relevant data. This subsystem enables structured, consistent telemetry instrumentation throughout the codebase. [1] [2] [3] [4] [5] [6] [7] [8] [9]

  • Added a design note in telemetry/__init__.py explaining the rationale for centralizing telemetry logic and not auto-loading the module to avoid unnecessary overhead.

Dependency Management

  • Updated setup.py to add opentelemetry-api and opentelemetry-sdk as required dependencies for telemetry support.

Error Handling Improvements

  • Refactored error raising in agent_application.py to use consistent multi-line string formatting with ApplicationError, improving readability and maintainability. [1] [2] [3] [4] [5] [6]

Sample/Test Configuration

  • Added a sample .env template in test_samples/otel/env.TEMPLATE for configuring OpenTelemetry exporters and related environment variables, facilitating local testing and deployment of telemetry features.

Copilot AI review requested due to automatic review settings March 26, 2026 18:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a first-class OpenTelemetry (OTEL) telemetry subsystem in microsoft-agents-hosting-core, along with a sample app and test suite to validate spans/metrics behavior, and minor refactors to ApplicationError raising patterns for readability.

Changes:

  • Added microsoft_agents.hosting.core.telemetry (core tracer/meter access, span-wrapper abstractions, resource metadata, and small attribute/utils helpers).
  • Added OTEL-focused tests (span wrapper behavior, tracer/meter initialization, metric delta reader) and shared test fixtures/utilities.
  • Added an OTEL sample (test_samples/otel) and registered OTEL deps in hosting-core packaging.

Reviewed changes

Copilot reviewed 25 out of 28 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/hosting_core/telemetry/test_utils.py Unit tests for telemetry utility helpers (scopes, conversation id, delivery mode).
tests/hosting_core/telemetry/test_simple_span_wrapper.py Tests for span wrapper lifecycle, attributes, status, and exception behavior.
tests/hosting_core/telemetry/test_agents_telemetry.py Tests for tracer/meter initialization and span callback behavior.
tests/hosting_core/telemetry/init.py Telemetry test package marker.
tests/_common/telemetry_utils.py Helper functions to locate/sum metrics in collected output.
tests/_common/fixtures/telemetry.py In-memory OTEL exporter/metric reader fixtures + delta metric reader wrapper.
tests/_common/_tests/test_delta_metric_reader.py Unit tests validating DeltaMetricReader semantics.
test_samples/otel/start_dashboard.ps1 Helper script to launch an OTEL dashboard container.
test_samples/otel/src/telemetry.py Sample OTEL provider configuration and library instrumentation hooks.
test_samples/otel/src/start_server.py Sample aiohttp server startup wiring for an agent endpoint.
test_samples/otel/src/main.py Sample entrypoint wiring telemetry + agent + server.
test_samples/otel/src/get_user_info.py Sample Graph call used by the OTEL demo agent.
test_samples/otel/src/card.py Sample adaptive card rendering helper.
test_samples/otel/src/agent.py Sample agent application demonstrating auth + basic messaging flows.
test_samples/otel/src/init.py Sample package marker.
test_samples/otel/requirements.txt Sample runtime dependencies including OTEL instrumentations/exporters.
test_samples/otel/env.TEMPLATE Sample environment configuration template including OTEL env vars.
libraries/microsoft-agents-hosting-core/setup.py Adds opentelemetry-api / opentelemetry-sdk as hosting-core dependencies.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/utils.py Telemetry utility helpers for extracting/formatting common activity values.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/core/type_defs.py Shared callback/attribute type aliases for telemetry layer.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/core/simple_span_wrapper.py Simple span wrapper implementation built atop the base wrapper + agents telemetry.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/core/resource.py Defines OTEL Resource and service identity constants.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/core/base_span_wrapper.py Base span wrapper lifecycle abstraction (context manager + manual start/end).
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/core/_agents_telemetry.py Tracer/meter access and a timed span context manager with callback support.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/core/init.py Core telemetry public surface exports.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/attributes.py Telemetry attribute key constants + UNKNOWN value.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/telemetry/init.py Telemetry package exports and design note header.
libraries/microsoft-agents-hosting-core/microsoft_agents/hosting/core/app/agent_application.py Refactors ApplicationError raising blocks into a consistent multiline style.
libraries/microsoft-agents-activity/microsoft_agents/activity/otel/init.py Adds initial instrumentor scaffold for activity package OTEL integration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +10

from opentelemetry.sdk.resources import Resource

SERVICE_NAME = "microsoft_agents"
SERVICE_VERSION = "1.0.0"

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SERVICE_VERSION is hard-coded to "1.0.0", which will drift from the actual installed package version and make traces/metrics misleading. Consider deriving it from distribution metadata (e.g., importlib.metadata.version("microsoft-agents-hosting-core") with a safe fallback), similar to connector/get_product_info.py using importlib.metadata.version(...).

Suggested change
from opentelemetry.sdk.resources import Resource
SERVICE_NAME = "microsoft_agents"
SERVICE_VERSION = "1.0.0"
from importlib import metadata as importlib_metadata
from opentelemetry.sdk.resources import Resource
SERVICE_NAME = "microsoft_agents"
def _get_service_version() -> str:
"""Return the installed package version, or a safe fallback."""
try:
return importlib_metadata.version("microsoft-agents-hosting-core")
except importlib_metadata.PackageNotFoundError:
# Package metadata not available (e.g., editable install or non-standard env).
return "unknown"
except Exception:
# Any other unexpected error: do not break telemetry initialization.
return "unknown"
SERVICE_VERSION = _get_service_version()

Copilot uses AI. Check for mistakes.
# This design hides the "mess" of telemetry to one location rather than throughout the codebase.
#
# NOTE: this module should not be auto-loaded from __init__.py in order to avoid

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module header note is incomplete (...in order to avoid) and doesn’t explain what should be avoided. Please complete or remove the comment so future readers aren’t left with a dangling design note.

Suggested change
# unintended side effects during import and to keep telemetry initialization explicit.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +40
try:
run_app(APP, host="localhost", port=environ.get("PORT", 3978))
except Exception as error:
raise error
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_app(..., port=environ.get("PORT", 3978)) will pass a string when PORT is set, but aiohttp expects an int port. Cast the env var to int (and handle invalid values) to avoid runtime failures. Also the surrounding try/except that immediately re-raises (raise error) is redundant and can be dropped (or use bare raise if you intend to add logging).

Suggested change
try:
run_app(APP, host="localhost", port=environ.get("PORT", 3978))
except Exception as error:
raise error
port_env = environ.get("PORT")
port = 3978
if port_env is not None:
try:
port = int(port_env)
except ValueError:
logger.warning(
"Invalid PORT environment variable %r; falling back to default port %d",
port_env,
port,
)
run_app(APP, host="localhost", port=port)

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +14

configure_otel_providers(service_name="quickstart_agent")

from .agent import AGENT_APP, CONNECTION_MANAGER
from .start_server import start_server

start_server(
agent_application=AGENT_APP,
auth_configuration=CONNECTION_MANAGER.get_default_connection_configuration(),
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configure_otel_providers(...) is executed at import time and the module performs imports after side effects. This makes the sample harder to reuse/import (and violates standard import ordering). Consider moving the setup + server start into a main() function guarded by if __name__ == "__main__": so importing the module doesn’t automatically configure global OTEL providers and start a server.

Suggested change
configure_otel_providers(service_name="quickstart_agent")
from .agent import AGENT_APP, CONNECTION_MANAGER
from .start_server import start_server
start_server(
agent_application=AGENT_APP,
auth_configuration=CONNECTION_MANAGER.get_default_connection_configuration(),
)
from .agent import AGENT_APP, CONNECTION_MANAGER
from .start_server import start_server
def main() -> None:
configure_otel_providers(service_name="quickstart_agent")
start_server(
agent_application=AGENT_APP,
auth_configuration=CONNECTION_MANAGER.get_default_connection_configuration(),
)
if __name__ == "__main__":
main()

Copilot uses AI. Check for mistakes.
assert otel_span is not None

def test_otel_span_raises_when_not_started(self):
"""Accessing otel_span before start raises RuntimeError."""
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says otel_span access "raises RuntimeError" when not started, but the assertion checks that it is None. Update the docstring (or the behavior) so they match.

Suggested change
"""Accessing otel_span before start raises RuntimeError."""
"""Accessing otel_span before start returns None."""

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants