fix: add retry with exponential backoff to weather MCP tool by Ladas · Pull Request #222 · kagenti/agent-examples

Ladas · 2026-04-08T14:28:01Z

Summary

Add requests.Session with urllib3.Retry strategy (3 retries, exponential backoff)
Retry automatically on 429, 500, 502, 503, 504 status codes
Increase timeout from 10s to 30s (configurable via WEATHER_REQUEST_TIMEOUT)
Catch RequestException and return user-friendly message instead of crashing
All params configurable: WEATHER_MAX_RETRIES, WEATHER_BACKOFF_FACTOR, WEATHER_REQUEST_TIMEOUT

Context

The weather tool calls external Open-Meteo APIs that frequently timeout from CI runners, causing E2E test failures across ALL kagenti PRs. The tool had no error handling — any failure crashed the entire tool call.

The weather tool calls external Open-Meteo APIs that can timeout or return transient errors in CI environments. This caused E2E test failures across all kagenti PRs. Changes: - Use requests.Session with urllib3 Retry strategy (3 retries, exponential backoff, retry on 429/5xx) - Increase timeout from 10s to 30s (configurable via WEATHER_REQUEST_TIMEOUT) - Catch RequestException and return user-friendly error instead of crashing — lets the LLM retry the tool call gracefully - All retry params configurable via env vars (WEATHER_MAX_RETRIES, WEATHER_BACKOFF_FACTOR) Signed-off-by: Ladislav Smola <lsmola@redhat.com>

ChatOpenAI with default settings crashes on MaaS 503 responses (rate limit / service overload). Add max_retries=5 for automatic retry with backoff, and request_timeout=60 for slow MaaS endpoints. Signed-off-by: Ladislav Smola <lsmola@redhat.com>

pdettori

Good fix addressing a real CI reliability problem. The retry strategy and error handling are well-implemented — proper use of urllib3.Retry, sensible status code list, and user-friendly error messages that let the LLM retry gracefully.

One must-fix: unused import time is failing lint CI. Everything else is suggestions.

Areas reviewed: Python, Security, Commit conventions
Commits: 2, both signed-off ✓
CI: lint failing (unused import), all other checks passing

pdettori · 2026-04-09T16:12:28Z

mcp/weather_tool/weather_tool.py

 import logging
 import os
 import sys
+import time


must-fix: import time is unused — this is causing the lint CI failure. Remove this line.

pdettori · 2026-04-09T16:12:28Z

mcp/weather_tool/weather_tool.py

+    return session
+

 @mcp.tool(annotations={"readOnlyHint": True, "destructiveHint": False, "idempotentHint": True})


suggestion: _resilient_session() creates a new Session on every tool call. Since the retry config uses module-level constants, consider creating a single module-level session (_session = _resilient_session()) and reusing it. Avoids re-mounting adapters per call and requests.Session is thread-safe. Not blocking — current approach works correctly.

pdettori · 2026-04-09T16:12:28Z

a2a/weather_service/src/weather_service/graph.py

        openai_api_key=config.llm_api_key,
        openai_api_base=config.llm_api_base,
        temperature=0,
+        max_retries=5,


nit: max_retries=5 and request_timeout=60 are hardcoded. The weather tool side uses env vars for configurability. Consider the same pattern here for consistency, or add a brief comment explaining the values (e.g., "MaaS endpoints can be slow under load").

The has_valid_api_key check only allowed localhost but Kind clusters use 'dockerhost' for Ollama, and cluster-internal LiteLLM proxies use *.svc.cluster.local addresses. Both accept any API key. Also adds host.docker.internal for Docker Desktop compatibility. Signed-off-by: Ladislav Smola <lsmola@redhat.com>

…retry - Remove unused `import time` (lint failure) - Make HTTP session a module-level singleton for connection reuse - Make LLM max_retries and request_timeout configurable via env vars (LLM_MAX_RETRIES, LLM_REQUEST_TIMEOUT) - Accept dummy API keys for dockerhost and *.svc.cluster.local Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Signed-off-by: Ladislav Smola <lsmola@redhat.com>

pdettori

LGTM

Ladas mentioned this pull request Apr 8, 2026

feat: support AGENT_EXAMPLES_BRANCH override for weather tool builds kagenti/kagenti#1173

Open

pdettori requested changes Apr 9, 2026

View reviewed changes

Ladas added 3 commits April 9, 2026 19:21

style: apply ruff-format to weather_tool.py

e3234aa

Signed-off-by: Ladislav Smola <lsmola@redhat.com>

pdettori approved these changes Apr 9, 2026

View reviewed changes

pdettori merged commit e194691 into main Apr 9, 2026
10 checks passed

clawgenti mentioned this pull request Apr 13, 2026

Weekly Report 2026-04-13 kagenti/kagenti#1231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add retry with exponential backoff to weather MCP tool#222

fix: add retry with exponential backoff to weather MCP tool#222
pdettori merged 5 commits intomainfrom
fix/weather-tool-resilience

Ladas commented Apr 8, 2026

Uh oh!

pdettori left a comment

Uh oh!

pdettori Apr 9, 2026

Uh oh!

pdettori Apr 9, 2026

Uh oh!

pdettori Apr 9, 2026

Uh oh!

pdettori left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return session


		@mcp.tool(annotations={"readOnlyHint": True, "destructiveHint": False, "idempotentHint": True})

Conversation

Ladas commented Apr 8, 2026

Summary

Context

Uh oh!

pdettori left a comment

Choose a reason for hiding this comment

Uh oh!

pdettori Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants