Skip to content

Add OpenTelemetry instrumentation for LiteLLM#88

Open
whoIam0987 wants to merge 3 commits intoalibaba:mainfrom
whoIam0987:mingzhi/litellm
Open

Add OpenTelemetry instrumentation for LiteLLM#88
whoIam0987 wants to merge 3 commits intoalibaba:mainfrom
whoIam0987:mingzhi/litellm

Conversation

@whoIam0987
Copy link

@whoIam0987 whoIam0987 commented Dec 15, 2025

Description

Instrument Litellm with genai util.
Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Modify releated unit tests.

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@CLAassistant
Copy link

CLAassistant commented Dec 15, 2025

CLA assistant check
All committers have signed the CLA.

@whoIam0987 whoIam0987 changed the title [WIP] Add OpenTelemetry instrumentation for LiteLLM Add OpenTelemetry instrumentation for LiteLLM Feb 5, 2026
@ralf0131 ralf0131 requested a review from Copilot February 5, 2026 13:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds OpenTelemetry instrumentation for the LiteLLM library, which provides a unified interface to 100+ LLM providers. The instrumentation captures telemetry data for LLM operations including completions, embeddings, streaming, tool calls, and retry mechanisms.

Changes:

  • Added new instrumentation package for LiteLLM with support for sync/async completion and embedding APIs
  • Implemented streaming response wrappers with proper span lifecycle management
  • Added comprehensive test suite covering various LiteLLM features including tool calls, retries, and error handling

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
__init__.py Main instrumentor class that wraps LiteLLM functions and manages telemetry handlers
_wrapper.py Completion wrappers for sync/async calls with streaming support
_embedding_wrapper.py Embedding operation wrappers for sync/async calls
_stream_wrapper.py Stream response wrappers handling chunk accumulation and finalization
_utils.py Utility functions for message conversion, provider parsing, and invocation creation
version.py, package.py Package metadata and dependency declarations
pyproject.toml Package configuration with build system and dependencies
README.rst Documentation for installation, configuration, and usage
test_*.py Comprehensive test suite covering completions, embeddings, streaming, tools, retries, and errors
test-requirements.txt Test dependency specifications

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 64 to 65

# Create invocation object
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate comment. The comment "Create invocation object" appears twice consecutively on lines 63 and 65. Remove one of these duplicate lines.

Suggested change
# Create invocation object

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions have been adopted

Comment on lines 165 to 166

# Create invocation object
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate comment. The comment "Create invocation object" appears twice consecutively on lines 164 and 166. Remove one of these duplicate lines.

Suggested change
# Create invocation object

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions have been adopted

Comment on lines 86 to 87

# For streaming, we need special handling
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate comment. The comment "For streaming, we need special handling" appears twice consecutively on lines 85 and 87. Remove one of these duplicate lines.

Suggested change
# For streaming, we need special handling

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions have been adopted

Comment on lines 360 to 368
stream_wrapper = AsyncStreamWrapper(
stream=response,
span=invocation.span, # For TTFT tracking
callback=lambda span,
last_chunk,
error: self._handle_stream_end_with_handler(
invocation, last_chunk, error, stream_wrapper
),
)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential circular reference in lambda callback. The lambda function references stream_wrapper (line 366) before it's assigned (line 360). While Python closures can handle this due to lazy evaluation, this creates a circular reference that could potentially cause issues with garbage collection. Consider restructuring to avoid the circular reference, perhaps by creating the callback after the AsyncStreamWrapper is instantiated or using a different callback pattern.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set the callback after instantiating the stream_wrapper.

Comment on lines 33 to 34
* ``ENABLE_LITELLM_INSTRUMENTOR``: Enable/disable instrumentation (default: true)
* ``ARMS_LITELLM_INSTRUMENTATION_ENABLED``: Alternative enable/disable flag (default: true)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation claims that the ENABLE_LITELLM_INSTRUMENTOR environment variable can be used to enable/disable instrumentation, but this variable is never referenced in the actual code. Only ARMS_LITELLM_INSTRUMENTATION_ENABLED is actually implemented. Either implement support for this environment variable or remove it from the documentation to avoid confusion.

Suggested change
* ``ENABLE_LITELLM_INSTRUMENTOR``: Enable/disable instrumentation (default: true)
* ``ARMS_LITELLM_INSTRUMENTATION_ENABLED``: Alternative enable/disable flag (default: true)
* ``ARMS_LITELLM_INSTRUMENTATION_ENABLED``: Enable/disable instrumentation (default: true)

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of ARMS_LITELLM_INSTRUMENTATION_ENABLED were replaced with ENABLE_LITELLM_INSTRUMENTOR

suppress_token = context.attach(
context.set_value(SUPPRESS_LLM_SDK_KEY, True)
)
except Exception:
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add explanatory comment

Comment on lines 244 to 245
except Exception:
pass
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
pass
except Exception as decode_error:
# Ignore JSON parsing errors and fall back to the raw string,
# but log at debug level for diagnosability.
logger.debug(
"Failed to JSON-decode tool call arguments %r: %s",
arguments,
decode_error,
)

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add explanatory comment

Comment on lines 305 to 306
except Exception:
pass
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
pass
except Exception as handler_error:
# Swallow exceptions from telemetry failure reporting, but log them for diagnostics.
logger.debug(
"Error while reporting LLM failure in _handle_stream_end_with_handler: %s",
handler_error,
)

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions have been adopted

suppress_token = context.attach(
context.set_value(SUPPRESS_LLM_SDK_KEY, True)
)
except Exception:
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add explanatory comment

context.set_value(SUPPRESS_LLM_SDK_KEY, True)
)
except Exception:
pass
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
# Failed to attach suppression context; proceed without suppression.
logger.exception(
"Failed to attach suppression context for LiteLLM instrumentation"
)

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add explanatory comment

Copy link
Collaborator

@Cirilla-zmh Cirilla-zmh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Please address the remaining comments so we can move this PR forward.

@@ -0,0 +1,8 @@
litellm>=1.0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add these tests as github workflows?

Just refer to:

- 测试
- 添加新 instrumentation 时,请记住更新 `tox.ini`,在 `envlist``command_pre``commands` 部分添加适当的规则

## 本地运行测试
1. 转到您的 Python Agent 仓库目录。`git clone git@github.com:alibaba/loongsuite-python-agent.git && cd loongsuite-python-agent`
2. 确保您已安装 `tox``pip install tox`
3. 运行 `tox` 不带任何参数以运行所有包的测试。阅读更多关于 [tox](https://tox.readthedocs.io/en/latest/) 的信息。
由于执行依赖项安装的预步骤,某些测试可能很慢。为了帮助解决这个问题,您可以先运行一次 tox,然后使用 toxdir 中先前安装的依赖项运行测试,如下所示:
1. 首次运行(例如,opentelemetry-instrumentation-aiopg)
```console
tox -e py312-test-instrumentation-aiopg
```
2. 再次运行测试而不执行预步骤:
```console
.tox/py312-test-instrumentation-aiopg/bin/pytest instrumentation/opentelemetry-instrumentation-aiopg
```

"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't support Python 3.8 in loongsuite any more.

pytest
pytest-asyncio
openai
-e aliyun-semantic-conventions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it doesn't work here. Could you have a check again?

Test handling when max_tokens is exceeded.
"""

os.environ["DASHSCOPE_API_KEY"] = os.environ.get(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest.vcr may help to record LLM-releated requests and responses so that you could replay them next time you run the same tests.

Please check:

# ==================== VCR Configuration ====================
@pytest.fixture(scope="module")
def vcr_config():
"""Configure VCR for recording and replaying HTTP requests"""
return {
"filter_headers": [
("authorization", "Bearer test_api_key"),
("api-key", "test_api_key"),
],
"decode_compressed_response": True,
"before_record_response": scrub_response_headers,
}
class LiteralBlockScalar(str):
"""Format string as literal block scalar, preserving whitespace and not interpreting escape characters"""
def literal_block_scalar_presenter(dumper, data):
"""Represent scalar string as literal block using '|' syntax"""
return dumper.represent_scalar("tag:yaml.org,2002:str", data, style="|")
yaml.add_representer(LiteralBlockScalar, literal_block_scalar_presenter)
def process_string_value(string_value):
"""Format JSON or return long string as LiteralBlockScalar"""
try:
json_data = json.loads(string_value)
return LiteralBlockScalar(json.dumps(json_data, indent=2))
except (ValueError, TypeError):
if len(string_value) > 80:
return LiteralBlockScalar(string_value)
return string_value
def convert_body_to_literal(data):
"""Search for body strings in data and attempt to format JSON"""
if isinstance(data, dict):
for key, value in data.items():
# Handle response body case (e.g., response.body.string)
if key == "body" and isinstance(value, dict) and "string" in value:
value["string"] = process_string_value(value["string"])
# Handle request body case (e.g., request.body)
elif key == "body" and isinstance(value, str):
data[key] = process_string_value(value)
else:
convert_body_to_literal(value)
elif isinstance(data, list):
for idx, choice in enumerate(data):
data[idx] = convert_body_to_literal(choice)
return data
class PrettyPrintJSONBody:
"""Make request and response body recordings more readable"""
@staticmethod
def serialize(cassette_dict):
cassette_dict = convert_body_to_literal(cassette_dict)
return yaml.dump(
cassette_dict, default_flow_style=False, allow_unicode=True
)
@staticmethod
def deserialize(cassette_string):
return yaml.load(cassette_string, Loader=yaml.Loader)
@pytest.fixture(scope="module", autouse=True)
def fixture_vcr(vcr):
"""Register VCR serializer"""
vcr.register_serializer("yaml", PrettyPrintJSONBody)
return vcr
def scrub_response_headers(response):
"""
Scrub sensitive response headers. Note they are case-sensitive!
"""
# Clean response headers as needed
if "Set-Cookie" in response["headers"]:
response["headers"]["Set-Cookie"] = "test_set_cookie"
return response

@pytest.mark.vcr()
def test_model_call_basic(instrument_no_content, span_exporter, request):
"""Test basic model call"""
# Initialize agentscope
agentscope.init(project="test_basic")
# Create model
model = DashScopeChatModel(
api_key=request.config.option.api_key,
model_name="qwen-max",
)
# Prepare messages
messages = [{"role": "user", "content": "Hello!"}]
# Call model
async def call_model():
response = await model(messages)
if hasattr(response, "__aiter__"):
result = []
async for chunk in response:
result.append(chunk)
return result[-1] if result else response
return response
response = asyncio.run(call_model())
assert response is not None
# Verify spans
spans = span_exporter.get_finished_spans()
assert len(spans) >= 1, f"Expected at least 1 span, got {len(spans)}"
# Find chat model span
chat_spans = [span for span in spans if span.name.startswith("chat ")]
assert len(chat_spans) >= 1, (
f"No chat spans found. Available spans: {[s.name for s in spans]}"
)
# Verify span attributes
chat_span = chat_spans[0]
_assert_chat_span_attributes(
chat_span,
request_model="qwen-max",
expect_input_messages=False, # Do not capture content by default
expect_output_messages=False, # Do not capture content by default
expect_time_to_first_token=True,
)
print("✓ Model call (basic) completed successfully")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants