Skip to content

Migrate retry_with_backoff to Rust-Majority Architecture #50

@msureshkumar88

Description

@msureshkumar88

Summary

Migrate the retry_with_backoff plugin from its current hybrid architecture (Python integration + Rust fast path) to a Rust-majority architecture (thin Python shim + complete Rust implementation), following the url_reputation pattern.

Current State

Architecture: Hybrid (60% Python + 40% Rust)

  • Python integration layer: 280 lines (config, text parsing, metadata, path selection)
  • Rust fast path: 433 lines (retry algorithm only)
  • Performance: 2-4x improvement over Python-only
  • Test coverage: 78% effective (47/60 tests)

Proposed Target State

Architecture: Rust-majority (10% Python + 90% Rust)

  • Python thin shim: ~50 lines (plugin interface, delegation, error handling)
  • Rust core: ~800 lines (complete business logic)
  • Performance: 3-5x improvement over Python-only (50% better than current)
  • Test coverage: 100% (60/60 tests)

Architecture Comparison

Current Architecture (Hybrid)

┌─────────────────────────────────────────────────────────────┐
│         Python Integration Layer (280 lines)                 │
│  ✅ Plugin framework interface (tool_post_invoke, etc.)     │
│  ✅ Configuration management (Pydantic, clamping)           │
│  ✅ Text content parsing (check_text_content=True)          │
│  ✅ Metadata attachment (retry_policy)                      │
│  ✅ Path selection (Rust vs Python)                         │
│  ✅ Fallback state management                               │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│         RetryStateManager (Rust - 433 lines)                 │
│  ✅ Exponential backoff calculation                         │
│  ✅ Failure detection (structured content only)             │
│  ✅ State tracking (consecutive_failures, TTL)              │
│  ✅ check_and_update() - atomic retry decision              │
└─────────────────────────────────────────────────────────────┘

Split: 40% Rust (retry algorithm) + 60% Python (integration, config, parsing)

Target Architecture (Rust-Majority)

┌─────────────────────────────────────────────────────────────┐
│              Python Thin Shim (~50 lines)                    │
│  - Plugin framework interface (tool_post_invoke)            │
│  - Delegates to RetryPluginCore                             │
│  - Error handling wrapper                                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│         RetryPluginCore (Rust - ~800 lines)                  │
│  ✅ Configuration management (validation, clamping)         │
│  ✅ Text content parsing (JSON extraction)                  │
│  ✅ Failure detection (structured + text)                   │
│  ✅ Exponential backoff calculation                         │
│  ✅ State management (TTL, eviction)                        │
│  ✅ Metadata generation (retry_policy dict)                 │
│  ✅ Complete business logic                                 │
└─────────────────────────────────────────────────────────────┘

Split: 90% Rust (complete logic) + 10% Python (thin shim)

Benefits

Performance

  • 3-5x improvement over Python-only (vs current 2-4x)
  • 1.5-2x improvement over current hybrid implementation
  • Faster JSON parsing (serde_json vs Python json)
  • Faster state management (Rust HashMap vs Python dict)
  • No Python/Rust boundary crossing for hot path

Maintainability

  • Single source of truth in Rust (vs split logic)
  • 82% reduction in Python code (280 → 50 lines)
  • Type-safe implementation with compile-time guarantees
  • Easier to test (pure Rust unit tests)
  • No logic duplication between Python and Rust

Consistency

  • Follows url_reputation pattern (standard plugin architecture)
  • Predictable behavior across all plugins
  • Easier onboarding for new developers

Scalability

  • Better memory efficiency
  • Lower CPU overhead
  • Handles high-throughput scenarios

Migration Strategy

Phase 1: Move Configuration to Rust (Week 1)

Goal: Migrate configuration management from Python (Pydantic) to Rust

Tasks:

  • Create RetryConfig struct with validation and clamping
  • Implement from_dict() constructor for Python interop
  • Add to_metadata_dict() for metadata generation
  • Write unit tests for configuration

Benefits:

  • Type-safe configuration in Rust
  • Validation and clamping in Rust
  • No Pydantic dependency
  • Faster configuration parsing

Deliverables: Rust config module, 10 unit tests

Phase 2: Move Text Content Parsing to Rust (Week 2)

Goal: Migrate JSON parsing and content extraction from Python to Rust

Tasks:

  • Define ContentItem enum (Text, Image, Resource)
  • Implement is_failure() function with JSON parsing
  • Add serde_json for fast JSON extraction
  • Handle all content types

Benefits:

  • Fast JSON parsing with serde_json
  • No Python JSON overhead
  • Type-safe content handling
  • Unified failure detection logic

Deliverables: Content parsing module, 15 unit tests

Phase 3: Move Metadata Generation to Rust (Week 3)

Goal: Migrate metadata dictionary construction from Python to Rust

Tasks:

  • Implement to_metadata_dict() in Rust
  • Add PyDict construction helpers
  • Verify Python compatibility

Benefits:

  • Metadata generation in Rust
  • No Python dict construction overhead
  • Type-safe metadata

Deliverables: Metadata generation, 5 unit tests

Phase 4: Create Complete Rust Core (Week 4)

Goal: Implement complete business logic in Rust

Tasks:

  • Create RetryPluginCore struct
  • Implement tool_post_invoke() method
  • Add state management (HashMap with TTL)
  • Implement eviction logic
  • Add exponential backoff calculation with jitter

Benefits:

  • Complete business logic in Rust
  • Single source of truth
  • Type-safe state management
  • Fast payload extraction
  • Integrated eviction logic

Deliverables: Complete Rust core, 20 integration tests

Phase 5: Create Python Thin Shim (Week 5)

Goal: Create minimal Python wrapper for plugin framework

Tasks:

  • Create new retry_with_backoff.py (~50 lines)
  • Implement RetryWithBackoffPlugin class
  • Add error handling wrapper
  • Optional Pydantic validation

Benefits:

  • Minimal Python code
  • Simple delegation to Rust
  • Optional Pydantic validation
  • Error handling wrapper

Deliverables: Python shim (~50 lines), package exports

Phase 6: Testing & Validation (Week 6)

Goal: Ensure migration maintains functionality and improves performance

Test Strategy:

  1. Unit Tests (Rust): Configuration, parsing, failure detection, backoff, state management, metadata
  2. Integration Tests (Python): Plugin loading, hook behavior, error handling, performance
  3. Migration Tests: Compare Rust-majority vs hybrid behavior, verify identical results

Validation Checklist:

  • All 60 original tests pass
  • Performance improvement: 3-5x (vs 2-4x current)
  • Memory usage: similar or better
  • Error handling: equivalent behavior
  • Configuration: identical validation
  • Text parsing: identical results

Deliverables: Test suite, benchmarks, migration guide

Implementation Checklist

Week 1: Configuration Migration

  • Create RetryConfig struct in Rust
  • Implement validation and clamping
  • Add from_dict() constructor
  • Add to_metadata_dict() method
  • Write unit tests for configuration
  • Update Python shim to use Rust config

Week 2: Text Content Parsing

  • Define ContentItem enum in Rust
  • Implement is_failure() function
  • Add JSON parsing with serde_json
  • Handle all content types (text, image, resource)
  • Write unit tests for parsing
  • Benchmark parsing performance

Week 3: Metadata Generation

  • Implement to_metadata_dict() in Rust
  • Add PyDict construction helpers
  • Write unit tests for metadata
  • Verify Python compatibility

Week 4: Complete Rust Core

  • Create RetryPluginCore struct
  • Implement tool_post_invoke() method
  • Add state management (HashMap)
  • Implement eviction logic
  • Add delay calculation
  • Write integration tests
  • Benchmark end-to-end performance

Week 5: Python Thin Shim

  • Create new retry_with_backoff.py (50 lines)
  • Implement RetryWithBackoffPlugin class
  • Add error handling wrapper
  • Optional Pydantic validation
  • Update package exports

Week 6: Testing & Validation

  • Run all 60 original tests
  • Add new Rust unit tests
  • Performance benchmarks
  • Memory profiling
  • Documentation updates
  • Migration guide

Success Criteria

Functional Requirements

  • All 60 original tests pass
  • Identical behavior to hybrid implementation
  • Error handling equivalent
  • Configuration validation equivalent
  • Text content parsing equivalent
  • Metadata generation equivalent

Performance Requirements

  • 3-5x improvement over Python-only
  • 1.5-2x improvement over hybrid
  • Memory usage: similar or better
  • Latency: <1ms for retry decision

Code Quality Requirements

  • Python shim: <100 lines
  • Rust core: well-documented
  • Test coverage: >90%
  • No clippy warnings
  • No unsafe code (unless justified)

Documentation Requirements

  • Migration guide
  • Architecture documentation
  • Performance benchmarks
  • API documentation

Timeline Options

Option 1: 6-Week Standard Track (Recommended)

Risk: Low (comfortable timeline)

Week Focus Deliverables
1 Configuration Rust config, validation, 10 tests
2 Text Parsing Content parsing, JSON, 15 tests
3 Metadata Metadata generation, 5 tests
4 Rust Core Complete core, 20 tests
5 Python Shim Thin shim, integration, 10 tests
6 Testing All tests, benchmarks, docs

Option 2: 4-Week Fast Track (Aggressive)

Risk: High (tight timeline)

Week Focus Deliverables
1 Config + Parsing Rust config, text parsing, 20 tests
2 Rust Core Complete RetryPluginCore, 30 tests
3 Python Shim Thin shim, integration tests
4 Testing All 60 tests, benchmarks, docs

Option 3: 8-Week Conservative Track (Safe)

Risk: Very Low (plenty of buffer)

Week Focus Deliverables
1-2 Configuration Rust config, validation, tests
3-4 Text Parsing Content parsing, JSON, tests
5 Metadata Metadata generation, tests
6 Rust Core Complete core, tests
7 Python Shim Thin shim, integration
8 Testing All tests, benchmarks, docs

Trade-offs Analysis

Advantages of Rust-Majority Architecture

Performance:

  • 3-5x improvement (vs 2-4x current)
  • Faster JSON parsing (serde_json vs Python json)
  • Faster state management (HashMap vs Python dict)
  • No Python/Rust boundary crossing for hot path

Maintainability:

  • Single source of truth (Rust)
  • No logic duplication
  • Type-safe implementation
  • Easier to test (pure Rust tests)

Consistency:

  • Follows url_reputation pattern
  • Standard plugin architecture
  • Predictable behavior

Scalability:

  • Better memory efficiency
  • Lower CPU overhead
  • Handles high-throughput scenarios

Disadvantages of Rust-Majority Architecture

Development Effort:

  • 4-6 weeks migration time
  • Requires Rust expertise
  • More complex PyO3 bindings

Flexibility:

  • Harder to modify logic (Rust vs Python)
  • Longer iteration cycles (compile time)
  • Less dynamic behavior

Dependencies:

  • Requires serde_json (JSON parsing)
  • Requires rand (jitter)
  • Larger binary size

Debugging:

  • Harder to debug Rust code
  • Less visibility into state
  • Requires Rust tooling

Comparison: Hybrid vs Rust-Majority

Aspect Hybrid (Current) Rust-Majority (Target)
Performance 2-4x improvement 3-5x improvement
Python Code 280 lines 50 lines
Rust Code 433 lines 800 lines
Maintainability Medium (split logic) High (single source)
Flexibility High (Python) Medium (Rust)
Development Time N/A (existing) 4-6 weeks
Testing 47/60 tests 60/60 tests
Pattern Unique hybrid Standard (url_reputation)

Alternatives Considered

Alternative 1: Keep Hybrid Architecture (Current)

Effort: 0 weeks (no change)
Performance: 2-4x improvement (current)
Maintainability: Medium (split logic)

When to choose:

  • Current performance is sufficient (2-4x)
  • Need maximum flexibility (Python)
  • Don't have time for migration
  • Prefer Python for business logic
  • Hybrid pattern works for your use case

Alternative 2: Hybrid with Improvements (Middle Ground)

Effort: 2-3 weeks
Performance: 2.5-3.5x improvement

Changes:

  1. Move text content parsing to Rust (Week 1-2)
  2. Move metadata generation to Rust (Week 3)
  3. Keep Python integration layer (no change)

Outcome:

  • Python integration: ~200 lines (vs 280)
  • Rust fast path: ~600 lines (vs 433)
  • Performance: 2.5-3.5x improvement
  • Maintainability: Medium-High

When to choose:

  • Want better performance (2.5-3.5x)
  • Want to keep flexibility
  • Have 2-3 weeks for improvements
  • Want incremental migration

Recommendation

Migrate to Rust-majority architecture (Option 1) using the 6-week standard track timeline.

Rationale:

  1. Significant performance improvement (50% better than current)
  2. Follows established pattern (url_reputation)
  3. Better maintainability (single source of truth)
  4. Reasonable timeline (6 weeks with low risk)
  5. Complete test coverage (60/60 tests)
  6. Standard architecture across all plugins

Dependencies

  • Rust toolchain (stable)
  • PyO3 (Python bindings)
  • serde_json (JSON parsing)
  • rand (jitter generation)

Related Issues

  • Original implementation: [Link to original PR]
  • Test coverage analysis: [Link to test analysis]
  • Performance benchmarks: [Link to benchmarks]

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions