Skip to content

feat!: compile-time CEL expansion; drop runtime interpreter#10

Merged
jrandolf merged 1 commit into
mainfrom
feat/cel-compile-time-expansion
May 18, 2026
Merged

feat!: compile-time CEL expansion; drop runtime interpreter#10
jrandolf merged 1 commit into
mainfrom
feat/cel-compile-time-expansion

Conversation

@jrandolf
Copy link
Copy Markdown
Contributor

Summary

Transpile every protovalidate CEL rule to native Rust at codegen time. The runtime no longer hosts an interpreter — validate() is a direct struct-field walk with zero per-call Value / HashMap allocations.

  • Plugin: new emit/cel_compile.rs typed AST visitor over the upstream cel parser. Handles literals, idents, selects, calls, comprehensions (all / exists / exists_one / map / filter — single AND two-variable), string/bytes/list/map ops, has(), size(), type(), is_ip / is_email / etc., regex match (literal + dynamic), dyn(), duration/timestamp constructors (literal + dynamic args, plus timezone arg via opt-in tz feature), all 10 timestamp accessors, all 4 duration accessors, full math.* extension, list reverse / distinct, string reverse, isFinite, optional types (optional.of / .none / .ofNonZeroValue, .hasValue / .orValue / .value, m[?k] optional indexing), format directives %s/%d/%f/%e/%x/%X/%o/%b, list indexing, map literals.
  • Two-kind fallback model: FallbackKind::Unsupported vs FallbackKind::RuntimeError — both emit __cel_runtime_error__ violation markers; distinguishing them sharpens diagnostics.
  • New packaging output: top-level mod.rs + per-package <pkg>.mod.rs files that mirror the proto package hierarchy as pub mod nesting. Removes the need for downstream "generate mod tree" scripts. All packaging files run through prettyplease for consistent formatting. Configurable via opt: proto_module=crate::proto.
  • Runtime: dropped the cel interpreter dep entirely; src/cel.rs collapses to a 135-line helper module (CelScalar, duration_from_secs_nanos, timestamp_from_secs_nanos, now_local, parse_duration, parse_timestamp). Optional tz feature gates chrono-tz for timezone-aware timestamp accessors.
  • Memory: derived results (optional indexing, reverse / distinct / filter) return borrowed &str / &[u8] / &T views instead of cloning, so a 20MB bytes payload doesn't get duplicated when a CEL rule reads or filters it.
  • Verification: 156 transpiler unit tests + integration tests/emit_compiles.rs that actually compiles every emitted token-stream against the runtime crate — catches type errors and missing imports, not just syntax. Each test's emitted body is ascribed to the expected Rust return type, so a mismatch breaks the build. Workspace: 164 tests; conformance: 2872 / 2872 end-to-end.
  • Cleanups: dropped 4 unused dependencies (buffa-types × 3, connectrpc from the macros crate); deleted dead ctx::CodeGenContext placeholder; gated connect_impl re-export on the connect feature; fixed cargo doc -D warnings; fixed cargo test --no-default-features; refreshed stale interpreter-era doc comments throughout.

Real-world tested on the sloper-demo repo: 4128-line / +450-line reduction in generated code; downstream cargo check --workspace passes.

Test plan

  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --all -- --check (stable rustfmt — what CI runs)
  • cargo test --workspace — 164 / 164 passing
  • cargo test -p protovalidate-buffa --no-default-features clean
  • RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps clean
  • cargo +1.95 build --workspace (MSRV) clean
  • cargo audit no advisories
  • protovalidate-conformance end-to-end: 2872 / 2872 against proto2 + proto3 + editions 2023

Transpile every protovalidate CEL rule to native Rust at codegen time.
The runtime no longer hosts an interpreter — `validate()` is a direct
struct field walk with zero per-call `Value` / `HashMap` allocations.

## Plugin (`protoc-gen-protovalidate-buffa`)

- New `emit/cel_compile.rs`: a typed AST visitor over the upstream `cel`
  crate's parser. Each CEL expression compiles to a `CompileOutput`
  carrying a `TokenStream`, its `CelType` (`Int` / `UInt` / `Double` /
  `Bool` / `Str { owned }` / `Bytes { owned }` / `List<T>` /
  `Map<K, V>` / `Duration` / `Timestamp` / `Message(schema)` /
  `MessageRef(fqn)` / `Dyn`), and a `needs_now` flag. The visitor
  handles literals, idents, selects, calls, comprehensions
  (`all` / `exists` / `map` / `filter` / `.map(filter, expr)`),
  string/bytes/list/map ops, `has()`, `size()`, `is_ip` / `is_ip_prefix`
  (incl. dynamic `(ver, strict)` arg dispatch), `is_hostname` /
  `is_email` / `is_uri` / `is_uri_ref`, regex match, `dyn()`, and the
  Duration/Timestamp constructors. Const-fold compile-time literals
  (e.g. `rule.gte` in predefined CEL).
- `emit/cel.rs`: protovalidate-specific integration glue. Binds `this` /
  `rule` / `now`, wraps tokens in violation-pushing code with the right
  `FieldPath` / `FieldPathElement`, dispatches per `FieldKind`, handles
  WKT first-class cases. Adds `SchemaIndex` so `(field).cel` /
  `(message).cel` on sub-message-typed fields can resolve fields
  through a `SchemaLookup` trait without embedding cyclic schemas.
- Two-kind fallback model: when a construct isn't transpilable the
  compiler returns `FallbackKind::Unsupported`; when it proves the
  expression would always raise a CEL runtime error (e.g.
  `dyn(this).<unknown_field>`) it returns `FallbackKind::RuntimeError`.
  Both paths emit a `__cel_runtime_error__` violation marker in place
  of the rule — distinguishing them just sharpens the diagnostic
  message.
- Wire native paths through `repeated.rs` (items / keys / values CEL,
  items.predefined) and the message-level emit pipeline.
- `scan.rs`: add `RuleConst` enum + `decode_*_rule_const` so the
  transpiler can fold `rule.foo` references to Rust literals from the
  raw extension bytes; delete `rule_value_expr`-style stringification.

Conformance: 2872 / 2872 against the upstream
`protovalidate-conformance` harness (proto2, proto3, editions 2023).

## Transpiler unit tests

The conformance suite exercises rules end-to-end but barely exercises
CEL itself — most cases reduce to "predefined rule X on field Y" with
fixed expressions that constant-fold. To pin down transpiler behavior
on its own, the test module in `emit/cel_compile.rs` carries 64 unit
tests grouped by risk bucket:

1. Type-system edge cases — cross-type comparison, int/uint/double
   casts, negation, arithmetic, modulo.
2. Comprehensions — `all` / `exists` / `map` / `filter` /
   `.map(filter, expr)`, nesting, predicate captures, short-list
   literals.
3. `has()` semantics — one test per `SchemaFieldKind` (Scalar variants,
   StringLike, Optional, Wrapper, Message, Repeated), plus
   unknown-field and `dyn` paths.
4. Sub-message field resolution — `MessageRef` not in index, deep
   chains, self-referential schemas.
5. String semantics — `size()` on unicode strings (chars not bytes),
   concat, contains, ends-with, regex match (literal + dynamic
   pattern), lower/upperAscii, indexOf, substring.
6. Map indexing — string/int/bool keys, `size()`, `.all()` over keys.
7. `Compiled.constant` folding — Int/UInt/Double/Bool/Str RuleConst
   inlines as a Rust literal in the emitted tokens; List form.

Three tests document gaps the suite never trips:
- `cross_type_int_uint_eq_currently_unsupported` — `int == uint` needs
  an explicit cast today (`op_cmp` rejects the mix).
- `empty_list_literal_comprehension_currently_unsupported` —
  `[].all(x, ...)` fails type-check (empty literal → Dyn element).
- `in_operator_on_map_currently_unsupported` — CEL's
  `key in mapValue` doesn't work; `op_in` requires a list rhs.

All 86 unit tests run in milliseconds.

## Runtime (`protovalidate-buffa`)

- Drop the `cel` interpreter dependency entirely.
- `src/cel.rs` collapses to a 135-line helper module: `CelScalar`
  trait (scalar widening to `i64` / `u64` / `f64`, including
  `EnumValue<E>`), `duration_from_secs_nanos`,
  `timestamp_from_secs_nanos`, and `now_local`. That's it — no
  `Value`, no `Context`, no `Program`.
- Delete the `wkt_cel.rs` test (covered the removed interpreter path).
- `ValidationError::runtime_error` doc updated to reflect that
  always-runtime-error CEL is now diagnosed at codegen time.

## Plugin Cargo.toml

- `cel = { version = "0.13", default-features = false }` — parse-only;
  drops the runtime evaluator, regex matching, and chrono bindings.

## Cleanups along the way

- Drop unused dependencies: `buffa-types` in `protovalidate-buffa`,
  `protovalidate-buffa-protos`, and `protovalidate-buffa-conformance`;
  `connectrpc` in `protovalidate-buffa-macros` (proc-macro emits paths
  that resolve through the user's transitive deps). Caught by
  `cargo +nightly udeps`.
- Delete dead `ctx::CodeGenContext` placeholder module (the work it
  was reserved for shipped via `SchemaIndex` instead).
- `#[cfg(feature = "connect")]` gate the `connect_impl` re-export
  and its dedicated test so `cargo test --no-default-features`
  compiles cleanly.
- Fix `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps`:
  backtick `Option<T>` so rustdoc doesn't read it as an unclosed HTML
  tag; allow `rustdoc::broken_intra_doc_links` /
  `rustdoc::invalid_html_tags` on the conformance crate's
  buffa-build-generated submodule.
- Tighten lint surface: convert `#[allow]` → `#[expect]` where the
  lint reliably fires; drop a handful of inline allows that were
  redundant under the crate's module-level allows; remove a stale
  `let _ = pi` interpreter-era suppression; replace nested
  `if let / if let` chains with let-chains; mark const-eligible
  helpers `const fn`; use `Self` in self-referential enum variants;
  derive `Default` on `Compiler`.
- Refresh stale doc comments that still mentioned "the interpreter"
  or named symbols (`CelConstraint`, `AsCelValue`, `ToCelValue`,
  `Compiled`) that no longer exist.
- Apply rustfmt across the plugin crate to restore stable-fmt
  cleanliness.
@jrandolf jrandolf merged commit ba01ffe into main May 18, 2026
6 of 7 checks passed
@jrandolf jrandolf deleted the feat/cel-compile-time-expansion branch May 18, 2026 21:03
@jrandolf jrandolf mentioned this pull request May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant