feat(compiler): FFI and C# bindings for RVM, incl. host-await built-ins#672
feat(compiler): FFI and C# bindings for RVM, incl. host-await built-ins#672kusha wants to merge 10 commits into
Conversation
fa62435 to
54fbb82
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends the RVM host-await feature end-to-end by wiring “registered host-await builtins” through the Rust compiler, VM accessors, the FFI layer, and the C# bindings, with accompanying docs and tests.
Changes:
- Add compiler support to register host-awaitable builtin names at compile time and emit
HostAwaitfor natural function-call syntax. - Expose host-await suspension inspection (identifier/argument) and run-to-completion response preloading via FFI + C# APIs.
- Add Rust YAML regression cases plus new C# binding tests and documentation updates.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/rvm/rego/mod.rs | Extends YAML harness to pass registered host-await builtins into compilation and (optionally) assert host-await arguments in suspendable mode. |
| tests/rvm/rego/cases/registered_host_await.yaml | Adds regression coverage for registered builtins (shadowing, queueing, arg-count validation, overrides). |
| src/rvm/vm/machine.rs | Adds VM accessors for host-await identifier/argument when suspended. |
| src/languages/rego/compiler/rules.rs | Adds compile_from_policy_with_host_await and wires registrations into compilation. |
| src/languages/rego/compiler/mod.rs | Adds storage + registration API for host-awaitable builtins (with validation). |
| src/languages/rego/compiler/function_calls.rs | Updates call resolution/emission to support registered host-await builtins and emit identifier literals. |
| docs/rvm/vm-runtime.md | Documents new VM host-await accessors. |
| docs/rvm/instruction-set.md | Documents registered host-await builtin behavior and resolution order. |
| docs/rvm/architecture.md | Describes explicit vs registered HostAwait emission paths. |
| bindings/ffi/src/rvm.rs | Adds FFI structs/APIs for compile-with-builtins, response preloading, and suspension JSON accessors. |
| bindings/csharp/Regorus/Rvm.cs | Adds C# wrappers for host-await inspection + response preloading. |
| bindings/csharp/Regorus/Program.cs | Adds CompileFromModules overload supporting host-await builtin registrations; refactors compile paths. |
| bindings/csharp/Regorus/NativeMethods.cs | Adds P/Invoke declarations and FFI struct for host-await builtins + new VM functions. |
| bindings/csharp/Regorus/ModuleMarshalling.cs | Generalizes UTF-8 string pinning helper and adds host-await builtin marshalling. |
| bindings/csharp/Regorus/Compiler.cs | Introduces public HostAwaitBuiltin struct for registration. |
| bindings/csharp/Regorus.Tests/RvmProgramTests.cs | Adds suspendable + run-to-completion tests for registered host-await builtins. |
| bindings/csharp/README.md | Adds usage examples for registered host-await in both execution modes. |
| bindings/csharp/API.md | Documents new/expanded Program/Rvm/HostAwaitBuiltin APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Allow hosts to register function names at compile time so that calls to those names emit HostAwait instructions directly, enabling natural syntax like fetch(x) instead of __builtin_host_await(x, "fetch"). - Add host_await_builtins map and register_host_await_builtin() to Compiler - Validate arg_count == 1 and reject reserved __builtin_host_await name - Extend determine_call_target() resolution: explicit > registered > user > builtin - Both explicit and registered paths emit identical HostAwait bytecode - Add compile_from_policy_with_host_await() entry point in rules.rs - Extended test harness with HostAwaitBuiltinSpec and args assertion - 9 YAML test cases: suspend/resume, run-to-completion, multiple names, queue, shadowing, object packing, arg_count rejection, reserved name rejection, standard builtin override - Documentation: instruction-set.md, architecture.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Mark Birger <birgerm@yandex.ru>
… registration - Compiler::register_host_await_builtin now rejects duplicate, empty, and whitespace-only names. Previously a duplicate registration would silently overwrite the existing entry, which could mask the host's own registration mistakes. - YAML test cases added: empty registration list as no-op, duplicate name rejection, empty/whitespace name rejection, out-param (a, out) calling syntax with a single-arg registered builtin, and mixed __builtin_host_await + registered builtins in the same policy consuming from their respective identifier queues. - Test harness: replace assert_eq! on HostAwait argument mismatch with anyhow::Error so mismatches propagate through the case reporter instead of panicking and skipping the harness's normal error path. - YAML comment fix: "Registration panics" -> "Registration fails with an error" (registration returns Err, never panics). Addresses anakrish + Copilot inline review comments on PR microsoft#667. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…riants Addresses PR microsoft#667 review item microsoft#8: at the emit site in `compile_function_call`, the discrimination between explicit `__builtin_host_await(arg, id)` and a registered host-awaitable builtin was being recovered by string-comparing `original_fcn_path` against `"__builtin_host_await"`. The information was already known in `determine_call_target` and was being thrown away. Replace the single `CallTarget::HostAwait` variant with two: * `ExplicitHostAwait` (unit) — the two-argument call form. The identifier register comes from the user's second argument. * `RegisteredHostAwait { identifier: String }` — the one-argument call form for registered builtins. The identifier is the registered name and is captured in the variant at recognition time, so the emit site never re-derives it from the function path. This removes the magic-string comparison at the emit site (the source of truth is now `determine_call_target`) and makes both match sites in `compile_function_call` exhaustive over the two forms — adding a third host-await form in the future would force a compile error at every match site instead of silently falling through. Arities are now hardcoded in the `expected_args` extraction (`Some(2)` for explicit, `Some(1)` for registered) rather than carried in the variant; registered builtins are constrained to `arg_count == 1` at registration time, so there is no per-call variability to carry. Bytecode output is unchanged; the full RVM test suite (97 cases) and the registered_host_await suite (15 cases) pass without modification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…calls only PR microsoft#667 review (Medium): the docs implied registered host-await names shadow user functions and builtins unconditionally, but determine_call_target matches only the bare original_fcn_path. A package-qualified call such as data.demo.resolve(x) is therefore not intercepted -- it resolves through the normal path like any other call. Rather than expand registration to qualified paths (which would let a registered name leak into every package exposing a same-named rule), document the unqualified-only behavior and pin it with tests. - register_host_await_builtin: doc now states only the unqualified call form is intercepted; qualified calls resolve normally. - determine_call_target: inline comment explaining the deliberate original_fcn_path-only match. - docs/rvm/instruction-set.md: describe qualified-call resolution, including that builtins have no qualified form. - tests: cross-package and same-package qualified calls resolve to the rule; bare-name shadowing of a standard builtin; Unknown-function outcome when no rule exists at the qualified path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR microsoft#667 review (Low): the suspendable test harness compared the host-await argument via process_value(argument), but argument is already a runtime Value. process_value is a YAML-fixture decoder -- it rewrites "#undefined" to Undefined, {set!: [...]} to a set, and errors on a runtime Value::Set. Re-running it on the runtime argument could coerce a legitimate payload into a fixture sentinel (passing for the wrong reason) or error outright on sets. Compare the runtime argument directly against the expected value, which is already decoded once at YAML load time. Add a regression case (registered_builtin_suspendable_set_argument) that passes a set payload: it fails under the old double-processing ("unexpected set in value read from json/yaml") and passes with the fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…mode PR microsoft#667 review (Low): a run-to-completion host-await response could carry an `args:` payload expectation, but RTC execution pre-loads responses and never surfaces the call argument to the harness, so the expectation was parsed and silently dropped. A case with `args: "WRONG"` passed as long as the result matched -- asserting a payload that was never checked. Reject `args:` for run-to-completion fixtures at load time, directing the author to suspendable mode where arguments are validated. Also only build the run-to-completion response vector when the case actually runs in RTC mode, so a suspendable case using the shared host_await_responses field with `args:` is not wrongly rejected. Route the fixture-load error through the same want_error handling used for compilation errors, and add registered_builtin_run_to_completion_rejects_args which now fails loudly instead of passing silently. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…space PR microsoft#667 review (Low): register_host_await_builtin rejected all-whitespace names via name.trim().is_empty(), but accepted padded names like " lookup" or "lookup ". Those were inserted into host_await_builtins, but Rego function-call paths produce the trimmed identifier, so a padded registration could never match -- a silent dead registration. Reject any name that is not already trimmed (name != name.trim()) in addition to empty names, and update the error message accordingly. Add test cases for leading and trailing whitespace. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expose registered host-await builtins, Program compilation, and RVM accessors through the FFI layer and C# bindings. FFI (bindings/ffi/src/rvm.rs): - compile_from_modules with host-await builtin registration - set/get host-await responses, argument, identifier - RegorusHostAwaitBuiltin C struct C# (bindings/csharp/Regorus/): - Program.CompileFromModules overloads with HostAwaitBuiltin[] - Rvm.SetHostAwaitResponses, GetHostAwaitArgument, GetHostAwaitIdentifier - HostAwaitBuiltin readonly struct, ExecutionMode enum - ModuleMarshalling: PinnedUtf8Strings, PinnedHostAwaitBuiltins Rust (src/rvm/vm/machine.rs): - get_host_await_argument() and get_host_await_identifier() accessors Docs: README examples, API.md reference, vm-runtime.md accessors Tests: suspendable + run-to-completion C# scenarios
- HostAwaitBuiltin: single-arg (name) ctor; arg_count fixed to 1 - SetHostAwaitResponses: multi-identifier dictionary API - always route CompileFromModules through host-await FFI - dead-check/IIFE cleanup, pub(crate), expanded tests + docs
54fbb82 to
145ee89
Compare
|
@ review this PR using all the review skills in the repo. Use a separate agent for each skill. |
There was a problem hiding this comment.
Automated multi-perspective review of the host-await FFI / C# binding work. Inline comments below, roughly highest-impact first.
Two items I couldn't attach inline because they fall outside this PR's diff, but that are worth a look while you're in here:
RegoVM::set_host_await_responses(machine.rs) accumulates response queues with noenforce_limit()/fallible reservation. This is pre-existing code, not something you changed, but the new FFI surface makes it reachable with caller-supplied volume, so it may be worth hardening as a follow-up.- The
Vec::with_capacity(caller_len)panic-on-huge-length pattern I flagged inrvm.rsalready exists in the oldercompile_from_modulespaths too — a fallible-reservation pass across all of them would be a nice cleanup.
| ) -> RegorusResult { | ||
| with_unwind_guard(|| { | ||
| to_regorus_result(|| -> Result<()> { | ||
| let vm = to_ref(vm)?; |
There was a problem hiding this comment.
These new host-await entry points reach for the VM handle with to_ref(vm)?, but every other RVM method in this file (e.g. the run/step/result functions above) goes through to_shared_ref(vm as *const RegorusRvm)? and then takes the lock via try_read/try_write. That ordering isn't accidental: to_shared_ref is what lets the lock arbitrate concurrent access, whereas to_ref manufactures an exclusive &mut RegorusRvm straight from a caller-owned handle. If two foreign threads call into the same vm here, both can materialize that &mut before either lock is acquired, which is exactly the aliasing hazard the rest of the file is structured to avoid.
Could we switch these three (set_host_await_responses, get_host_await_argument, get_host_await_identifier) to to_shared_ref(...) followed by try_write/try_read, to match the established pattern?
| return Err(anyhow!("null response_sets pointer")); | ||
| } | ||
|
|
||
| let mut all = Vec::with_capacity(response_sets_len); |
There was a problem hiding this comment.
These Vec::with_capacity(response_sets_len) / VecDeque::with_capacity(set.values_len) (and the Vec::with_capacity(len) further down) reserve straight from caller-controlled usize values. A hostile or buggy caller passing something near usize::MAX will trip a capacity-overflow panic inside the unwind guard, which for this engine means process-wide poisoning rather than a clean error return.
I'll flag that this isn't new to your PR — the existing compile_from_modules paths in this file already use with_capacity(modules_len) the same way — so I don't think it should block the PR on its own. But since we're adding several more instances of the pattern, it'd be a good moment to switch to fallible reservation (try_reserve) or validate against a sane maximum before allocating, and return an FFI error instead of risking a panic.
|
|
||
| let id_str = from_c_str(set.identifier) | ||
| .map_err(|e| anyhow!("invalid identifier in response set at index {i}: {e}"))?; | ||
| let id_value = Value::String(id_str.into()); |
There was a problem hiding this comment.
Worth tracing the identifier round-trip here. On the way in we wrap the C string as Value::String(id_str.into()), but the getter below (get_host_await_identifier) hands the identifier back as to_json_str(), i.e. a JSON-encoded value. So a host that reads the current identifier and feeds it straight back gets Value::String("\"get_account\"") — the quotes become part of the string and it no longer matches.
It also means a HostAwait identifier that isn't a string (say an integer 42 baked into the policy) can never be matched from the FFI side at all, since everything coming through here is coerced to Value::String. If identifiers are intended to be string-only, it'd be clearer to document and enforce that and have the getter return the raw string; otherwise we probably want an identifier_json in and a parse back to Value.
| let vm = to_ref(vm)?; | ||
| let guard = vm.try_read()?; | ||
| match guard.get_host_await_argument() { | ||
| Some(arg) => Ok(Some(arg.to_json_str()?)), |
There was a problem hiding this comment.
to_json_str() is lossy for the Rego-only Value variants. If a policy hands a Set to HostAwait, it serializes here indistinguishably from an Array, and when the host resumes with that JSON it comes back as an Array — so a downstream is_set(resp) flips to false. Undefined has the same problem. For most payloads this is fine, but it's a silent shape change at the boundary. Either we should reject the non-JSON-representable variants explicitly at this boundary (so the failure is loud), or expose a typed marshalling path that preserves them.
| /// FFI boundary. The struct exists as a stable layout to allow future | ||
| /// expansion (e.g. an explicit `arg_count` field) without breaking ABI | ||
| /// when callers pin a fixed-size array of these. | ||
| #[repr(C)] |
There was a problem hiding this comment.
Since these #[repr(C)] structs get passed as C arrays, their size is baked into the caller's stride arithmetic. The moment we want to add a field later (an arity, a flags word, etc.) sizeof(RegorusHostAwaitBuiltin) changes and any client compiled against the old layout walks the array incorrectly — a silent ABI break rather than a link error. If we expect these to grow, it's cheap to add a struct_size/version (or a couple of reserved fields) now so future additions stay backward-compatible.
| pub fn compile_from_policy_with_host_await( | ||
| policy: &CompiledPolicy, | ||
| entry_points: &[&str], | ||
| host_await_builtins: &[(&str, usize)], |
There was a problem hiding this comment.
Minor design heads-up on the public signature: &[(&str, usize)] freezes the registration metadata to exactly "name + arity". The first time we want a third per-builtin attribute we're forced into either a breaking signature change or a parallel _v2 entry point. A small HostAwaitBuiltinSpec (with a constructor/builder) would let this grow additively. Not blocking, just easier to live with long-term while the API is still young.
| { | ||
| var result = API.regorus_engine_compile_program_with_entrypoints( | ||
| (RegorusEngine*)enginePtr, | ||
| var result = API.regorus_program_compile_from_modules_with_host_await( |
There was a problem hiding this comment.
Heads-up on a compatibility regression: the previous overloads used to call regorus_program_compile_from_modules, and now every path routes through regorus_program_compile_from_modules_with_host_await, including the no-builtins case. That means an existing app that updates only the managed package against an older native library will hit an EntryPointNotFoundException at runtime, even though it never asked for host-await.
Could we keep calling the original regorus_program_compile_from_modules when hostAwaitBuiltins is null/empty, and only reach for the new symbol when there's actually something to register? That keeps the old code path binary-compatible.
| { | ||
| if (modules is null) | ||
| return CompileFromModulesInner(dataJson, modules, entryPoints, hostAwaitBuiltins: null); | ||
| } /// <summary> |
There was a problem hiding this comment.
Tiny formatting bug: the /// <summary> starts on the same line as the closing }, so the compiler doesn't treat it as a doc comment for the following method — the XML docs for that member silently disappear. A newline before /// <summary> fixes it.
| { | ||
| for (int i = 0; i < count; i++) | ||
| { | ||
| var namePinned = Utf8Marshaller.Pin(builtins[i].Name); |
There was a problem hiding this comment.
These Utf8Marshaller.Pin(...) calls (the builtin name here, and the response key/value pins further down) hand the string to the native side as a C string, but nothing rejects an embedded \0. Rust's CStr stops at the first NUL, so new HostAwaitBuiltin("foo\0bar") silently registers "foo". It's an edge case, but it's the kind of silent truncation that's painful to debug, so it'd be worth rejecting embedded NULs in the public C# surface (or moving these to pointer+length strings).
| // test failure that has to be explicitly re-acknowledged. | ||
| var actual = vm.Execute(); | ||
| Assert.AreEqual( | ||
| "\"<undefined>\"", |
There was a problem hiding this comment.
This is the one I'd most want a second opinion on. The test codifies that a run-to-completion HostAwait with no available response yields the string "<undefined>" rather than surfacing an error. Functionally that means a missing response is swallowed as a rule-body failure and propagates as plain undefined.
In a policy engine that's a fail-open shape: if the host meant to supply a response and didn't (misconfig, dropped lookup, etc.), the rule quietly evaluates to undefined instead of telling anyone something went wrong. I'd argue HostAwaitResponseMissing should propagate as an execution error through RVM → FFI → C#, and this assertion should expect an exception. If undefined-on-missing is genuinely the desired contract, it'd be good to spell out why in a comment, because the default reading is surprising.
Summary
Exposes the registered host-await builtins,
Programcompilation, and RVM runtime accessors through the FFI layer and C# bindings. This is the companion to #667 (registered host-await builtins in the compiler/VM), making the feature usable from C# consumers.Motivation
The previous PR added registered host-await builtins to the Rust compiler and VM. However, the FFI boundary and C# bindings only exposed the raw
__builtin_host_awaitpath. This PR bridges the gap so C# consumers can:Program.CompileFromModules, which emitsHostAwaitinstructions directly.Rvm.SetHostAwaitResponsesto queue responses (for any number of identifiers) before execution.Rvm.GetHostAwaitIdentifier()andRvm.GetHostAwaitArgument()to determine which builtin suspended and with what argument.Changes
Rust VM (
src/rvm/vm/machine.rs)get_host_await_argument()andget_host_await_identifier()-const fnaccessors that return the argument/identifier when the VM is in a HostAwait-suspended state.FFI (
bindings/ffi/src/rvm.rs)RegorusHostAwaitBuiltin-#[repr(C)]struct carrying a builtin name. The argument count is fixed to 1 by the compiler, so it is not part of the FFI surface.regorus_program_compile_from_modules_with_host_await- compile from modules with registered host-await builtins; the non-host-await path is treated as the empty-builtins case.regorus_rvm_set_host_await_responses- pre-load responses for run-to-completion mode. Accepts an array ofRegorusHostAwaitResponseSet(one per identifier), so multiple identifiers are configured in a single call.regorus_rvm_get_host_await_argument/regorus_rvm_get_host_await_identifier- JSON accessors for suspension state.C# Bindings (
bindings/csharp/Regorus/)HostAwaitBuiltin- readonly struct wrapping a builtin name (single-argument constructor; the arg count is fixed to 1 at the compiler level).ExecutionMode- enum (RunToCompletion,Suspendable).Program.CompileFromModules- overload acceptingIReadOnlyList<HostAwaitBuiltin>.Rvm.SetHostAwaitResponses(IReadOnlyDictionary<string, IReadOnlyList<string>>)- queue JSON responses for any number of identifiers in one call. Replaces the entire response configuration.Rvm.GetHostAwaitArgument()/GetHostAwaitIdentifier()- read suspension state.ModuleMarshalling-PinnedUtf8Strings,PinnedHostAwaitBuiltins, andPinnedHostAwaitResponseSetshelpers for safe FFI marshalling.Documentation
bindings/csharp/README.md- suspendable and run-to-completion examples with registered builtins.bindings/csharp/API.md-Program,Rvm,HostAwaitBuiltin,ExecutionModeAPI reference.docs/rvm/vm-runtime.md- the new VM accessors.Tests (
bindings/csharp/Regorus.Tests/RvmProgramTests.cs)RegisteredHostAwait_Suspendable_SuspendAndResume- registersget_account, suspends, verifies identifier and argument, resumes with a response.RegisteredHostAwait_RunToCompletion_WithPreloadedResponses- registerstranslate, pre-loads a response, verifies end-to-end execution.RegisteredHostAwait_RunToCompletion_MultipleIdentifiersInSingleCall- preloads responses for two identifiers in oneSetHostAwaitResponsescall.RegisteredHostAwait_RunToCompletion_FailsWhenResponseQueueExhausted- missing response surfaces as an error.RegisteredHostAwait_GetAccessorsReturnNullWhenVmIsNotSuspended- accessors return null outside a suspension.RegisteredHostAwait_CompileRejectsEmptyOrWhitespaceName/CompileRejectsDuplicateRegistration/CompileRejectsReservedName- registration validation errors surface through the bindings.Notes
SetHostAwaitResponsesreplaces the full response set: it configures responses for all supplied identifiers in a single call and replaces any previously configured responses. This mirrors the underlyingRegoVM::set_host_await_responsesreplace-all semantics (pre-existing upstream behavior).CompileFromEnginedoes not yet support host-await builtins: OnlyCompileFromModulesacceptsHostAwaitBuiltin[]. This is a scoping choice to keep the PR smaller - there is no technical constraint preventing it. The engine-based path can be extended in a follow-up if needed.