Skip to content

Introduce a shared payload bridge for Rust-backed plugins #32

@lucarlig

Description

@lucarlig

Summary

Several Rust-backed plugins receive nearly identical gateway payload objects, then each plugin manually:

  • reads the same payload attributes (args, result, content, prompt_id, name, uri)
  • traverses nested Python containers
  • applies plugin-specific logic
  • mutates the payload back into framework result objects

We already have a tiny shared crate for framework object construction in crates/framework_bridge/src/lib.rs, but the payload access/traversal layer is still reimplemented per plugin.

This is now large enough to justify a deeper abstraction, but the reuse is uneven: the scanning/redaction plugins share much more structure than the policy/enforcement plugins.

Problem

Today, each plugin owns its own version of "read payload -> walk Python object tree -> transform -> write back":

This creates a few concrete costs:

  • duplicated traversal logic and path formatting
  • inconsistent supported container types across plugins
  • repeated hook-stage boilerplate
  • higher porting cost for future Rust migrations
  • harder-to-fix edge cases when payload shape handling is wrong in more than one plugin

What looks shared vs. what does not

Clearly shared

  • framework result construction
  • hook-stage payload field access
  • common payload subjects:
    • prompt args
    • prompt result/messages
    • tool args
    • tool result
    • resource content
  • nested traversal of Python objects into transformed Python objects
  • path tracking for findings/metadata
  • "changed vs unchanged" result handling

Not uniformly shared

  • plugin-specific detection logic and findings schemas
  • policy-only plugins like rate_limiter
  • some hook stages have special shapes:
    • pii_filter.prompt_post_fetch walks result.messages[*].content.text, not just one top-level field
    • secrets_detection.resource_post_fetch only scans content.text
    • encoded_exfil_detection scans richer mixed payloads and also parses JSON-like strings

So the right target is not "one crate that does everything for all plugins." The better target is "one shared payload bridge with pluggable traversal/scanner callbacks."

Options

Option 1: Expand cpex_framework_bridge into a real payload bridge

Add modules to the existing crate for:

  • hook payload readers/writers
  • stage result builders
  • nested Python traversal helpers
  • common path formatting helpers

Pros:

  • least crate sprawl
  • plugins already depend on cpex_framework_bridge
  • easy incremental rollout

Cons:

  • current crate is tiny and narrowly scoped; this would broaden its responsibility a lot
  • risk of mixing two concerns:
    • framework object construction
    • payload normalization/traversal

Option 2: Add a new crate dedicated to payload access/traversal

Create something like cpex_payload_bridge or cpex_plugin_payload with:

  • typed stage adapters
  • tree walker utilities
  • mutation result types
  • optional helpers for finding aggregation/path bookkeeping

Keep cpex_framework_bridge focused on framework object creation.

Pros:

  • cleaner separation of concerns
  • easier to test independently
  • clearer scope boundary for future contributors

Cons:

  • one more internal crate
  • slightly more wiring for each plugin

Option 3: Share only stage adapters, not tree traversal

Extract just:

  • read/write of args / result / content
  • common result/violation helpers

Leave recursive traversal inside each plugin.

Pros:

  • lowest risk
  • easiest short-term refactor

Cons:

  • leaves the biggest duplication in place
  • does not help much with future scanner-style plugins

Option 4: Normalize payloads into a canonical Rust value model

Convert incoming Python objects into an owned intermediate form, likely serde_json::Value plus a few escape hatches for non-JSON Python values, then run plugins on that model.

Pros:

  • strongest consistency
  • easiest scanner implementations once converted
  • opens door to non-Python-specific testing

Cons:

  • high risk of semantic mismatch with arbitrary Python objects
  • awkward for tuples, sets, custom objects, and rich framework payload objects
  • likely overkill for current repo

Recommendation

Take Option 2 in stages.

Start with a new internal crate focused on the scanning/redaction family:

  • pii_filter
  • secrets_detection
  • encoded_exfil_detection

Do not force rate_limiter, retry_with_backoff, or url_reputation onto the same abstraction immediately. They share some framework plumbing, but not enough deep-tree behavior to justify coupling them to a scanner-oriented bridge yet.

Suggested scope for phase 1

Build a crate that provides:

  1. Stage adapters

    • read common source values from payloads
    • write back modified values
    • build unchanged / modified / blocked results
  2. Generic recursive walker

    • strings
    • dicts
    • lists
    • optionally tuples / sets / __dict__ objects
    • configurable depth and collection limits
    • stable path formatting
  3. Callback-based transform API

    • plugin supplies string or node transform
    • bridge owns recursion, cloning, and mutation rebuild
  4. Shared mutation result type

    • changed flag
    • rebuilt value
    • findings payload
  5. Test fixtures for framework payload shims

    • so each plugin does not re-prove the same payload mechanics

pii_filter already contains the most complete traversal logic, so it is the strongest donor implementation for the walker. encoded_exfil_detection adds a useful extra requirement: optional JSON-string parsing during traversal.

Design constraints

  • Preserve Python object shapes on output. Do not silently coerce everything to JSON.
  • Keep plugin-specific findings/metadata schemas outside the shared crate.
  • Support incremental adoption plugin by plugin.
  • Avoid a macro-heavy API at first. Prefer plain Rust traits/functions until the common shape stabilizes.
  • Keep the crate internal to the workspace until at least two plugin migrations prove the boundary is correct.

Proposed rollout

  1. Extract common stage/payload helpers.
  2. Migrate secrets_detection first.
    • smallest useful target
    • should validate stage adapter ergonomics quickly
  3. Extract recursive walker from pii_filter into the new crate.
  4. Migrate pii_filter.
  5. Evaluate whether encoded_exfil_detection can reuse the walker directly or needs hook points for JSON-string parsing.
  6. Reassess whether rate_limiter benefits from only the stage/result helpers.

Acceptance criteria

  • At least secrets_detection and pii_filter use the shared crate for payload access and mutation.
  • The shared crate has direct unit tests for traversal behavior and path generation.
  • Plugin tests still cover end-to-end hook behavior.
  • encoded_exfil_detection either migrates or documents the gap that still blocks migration.
  • cpex_framework_bridge stays small, or its expanded scope is explicitly documented if we choose not to add a new crate.

Open questions

  • Should tuples/sets/custom Python objects be part of the phase-1 contract, or copied later from pii_filter if another plugin truly needs them?
  • Should path formatting be standardized across plugins now, even if that slightly changes existing finding output?
  • Is JSON-string parsing a plugin-specific extension point, or should the shared walker support optional secondary parses natively?
  • Do we want a typed enum for hook stages, or is a small stringly-typed adapter enough for now?

Why this matters now

The repo has already crossed the point where each new Rust port re-discovers the same payload mechanics. A shared payload bridge will reduce porting cost, reduce subtle payload-shape bugs, and let future plugins focus on domain logic instead of PyO3 tree surgery.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions