feat: Allow xnet on engines by schneiderstefan · Pull Request #10215 · dfinity/ic

schneiderstefan · 2026-05-13T08:35:04Z

This commit opens engines up to send and receive XNet messages with 2
restrictions on messages that involve engines (engine->subnet,
subnet->engine, engine->engine, but not messages staying on the same
engine):

Only bounded wait calls are allowed
Messages cannot contain attached cycles

Messages that are not allowed will be rejected by the protocol and the
canister receives an error.

The implementation is in 3 places:

The StreamBuilder rejects these messages on the sending side
The XNet PayloadBuilder rejects these messages on the receiving side.
No honest subnet would send any of these messages, but this protects
from malicious subnets.
The NetworkTopology now contains all other subnets again. Previously,
when engines were not able to do any XNet, the NetworkTopology would
only contain other subnets if it could send messages to them. The
exception to this is the list of ecsda subnets, which does not contain
other subnets that do not allow sending cycles to them. This is because
the threshold ecdsa endpoint expect to be called with cycles attached.

This commit opens engines up to send and receive XNet messages with 2 restrictions on messages that involve engines (engine->subnet, subnet->engine, engine->engine, but not messages staying on the same engine): 1. Only bounded wait calls are allowed 2. Messages cannot contain attached cycles Messages that are not allowed will be rejected by the protocol and the canister receives an error. The implementation is in 3 places: 1. The StreamBuilder rejects these messages on the sending side 2. The XNet PayloadBuilder rejects these messages on the receiving side. No honest subnet would send any of these messages, but this protects from malicious subnets. 3. The NetworkTopology now contains all other subnets again. Previously, when engines were not able to do any XNet, the NetworkTopology would only contain other subnets if it could send messages to them. The exception to this is the list of ecsda subnets, which does not contain other subnets that do not allow sending cycles to them. This is because the threshold ecdsa endpoint expect to be called with cycles attached.

alin-at-dfinity

First round of comments, still going through it.

alin-at-dfinity · 2026-06-08T11:00:01Z

+        // subnets_for_certification also includes all three (via full_topology).
+        let cert_keys: Vec<_> = network_topology
            .subnets_for_certification()
            .keys()
            .copied()
            .collect();
-        assert!(all_keys.contains(&app_subnet_id));
-        assert!(all_keys.contains(&engine_subnet_id));
-        assert!(all_keys.contains(&nns_subnet_id));
+        assert!(cert_keys.contains(&app_subnet_id));
+        assert!(cert_keys.contains(&engine_subnet_id));
+        assert!(cert_keys.contains(&nns_subnet_id));


This appears to be the same as the previous two tests (only written differently). Do we still need full_topology now that all subnets see all subnets in every "view"?

Co-authored-by: Alin Sinpalean <58422065+alin-at-dfinity@users.noreply.github.com>

alin-at-dfinity

More nitpicks, for the most part.

alin-at-dfinity · 2026-06-08T12:12:17Z

+
+    let chain_key_enabled_subnets = btreemap! {
+        shared_chain_key() => Valid(vec![app_subnet_id, engine_subnet_id]),
+        engine_only_chain_key() => Valid(vec![engine_subnet_id]),


Clueless question: do system subnets have access to / use chain keys?

alin-at-dfinity · 2026-06-08T12:16:08Z

+/// Tests that a guaranteed-response request from a CloudEngine subnet (own subnet) to a
+/// non-engine subnet is rejected with a synthetic reject response.
+#[test]
+fn build_streams_engine_src_rejects_gr_request() {


Nit: AFAIK we've never used the GR abbreviation before (although I might have just missed it). It took me a second to parse it. I'd rather just spell it out.

alin-at-dfinity · 2026-06-08T12:20:13Z

+            originator_reply_callback: CallbackId::from(1),
+            refund: Cycles::new(100),
+            response_payload: Payload::Data(vec![]),
+            deadline: NO_DEADLINE,


Nit: Maybe we should make this a best-effort response, to make it clear that it's dropped because of the refund, not because it's guaranteed response.

alin-at-dfinity · 2026-06-08T12:22:18Z

+            // The refund must NOT have been routed into the REMOTE_SUBNET stream.
+            let routed_refunds = result_state
+                .streams()
+                .get(&REMOTE_SUBNET)
+                .map_or(0, |s| s.refund_count());
+            assert_eq!(
+                0, routed_refunds,
+                "Refund leaked across engine boundary (own_subnet_type={own_subnet_type:?}, \
+                 remote_subnet_type={remote_subnet_type:?})",
+            );


Should we also check that the refund is also gone from the refund pool (if that's what it's called)?

alin-at-dfinity · 2026-06-08T12:30:21Z

+/// Tests that a best-effort request carrying cycles from a CloudEngine subnet is dropped
+/// at the engine boundary: cycles observed as lost, accept signal pushed, and a critical
+/// error raised. Mirrors the sender-side test
+/// `build_streams_engine_src_rejects_cycles_request` on the receiving side, which is
+/// the security-critical filter against a malicious engine.


Just thinking out loud:

Should we be raising critical errors (i.e. page) if an engine misbehaves?

Should we be recording such cycles as lost to begin with? (Assuming it's trivial to not do so. I don't think this matters enough to add another 100 or even 50 lines of code.)

alin-at-dfinity · 2026-06-08T13:32:48Z

+                    // malicious peer. Drop it (any cycles are lost) and raise a critical
+                    // error.
+                    if let RequestOrResponse::Response(ref rep) = msg {
+                        let is_gr = rep.deadline == NO_DEADLINE;


Nit: Same observation regarding the GR abbreviation. You can either expand it or not bother with the intermediate boolean and just plug rep.deadline == NO_DEADLINE into the if condition below.

alin-at-dfinity · 2026-06-08T14:07:12Z

+                                    CompoundCycles::new(rep.refund, own_cost_schedule),
+                                );
+                            }
+                            stream.push_accept_signal();


Taking this even further, this branch only differs from the (two) Request branch(es) in that it produces an Accept instead of Reject(EngineNotAllowed).

Even if you don't feel like unifying them all, you can still put them under the same match block, there appears to be no need for a separate if.

alin-at-dfinity · 2026-06-08T14:10:31Z

        ),
+        RejectReason::EngineNotAllowed => (
+            RejectCode::SysFatal,
+            "Guaranteed-response calls from CloudEngine subnets are not allowed".to_string(),


Nit:

Suggested change

"Guaranteed-response calls from CloudEngine subnets are not allowed".to_string(),

"Guaranteed-response calls and cycle transfers to / from CloudEngine subnets are not allowed".to_string(),

alin-at-dfinity · 2026-06-08T14:19:29Z

+    #[serde(default, skip_serializing_if = "Vec::is_empty")]
+    pub engine_not_allowed_deltas: Vec<u64>,


Damn, this just occurred to me now: whether we go with an explicit certification version or not, a canonical_state change requires a staged rollout: first deploy the new "variant" (to all subnets, in this case, since it affects stream structure) with the full logic required to handle it (in this case, to inflate the signal into a reject response, so not much); and only then deploy replica binaries that may produce it.

If we're to do this properly, the new certification version would be introduced with the field; and it would become the "current version" once we have code that can produce it. An explicit version would also allow us to rely on tests to tell us when it's safe to merge follow-up changes.

You should also make a copy of the original RejectSignals type (no engine_not_allowed_deltas field) under rs/canonical_state/encoding/old_types.rs (likely as RejectSignalsV25) and update encoding/tests/compatibility.rs and encoding/tests/test_fixtures.rs to also cover the new variant.

alin-at-dfinity · 2026-06-08T14:30:11Z

            .map(|reason| reason as i32)
            .collect::<Vec<i32>>(),
-        [1, 2, 3, 4, 5, 6, 7]
+        [1, 2, 3, 4, 5, 6, 7, 8]


Similar (but less stringent) observation here as for rs/canonical_state. As per rs/replicated_state/best-practices-replicated-state.md, you should first deploy the ic_types and protobuf additions, without using them anywhere. Only once that has made it into a replica release can you proceed with the rest of the changes.

alin-at-dfinity

All done, LGTM modulo the one bug and the staged rollout.

alin-at-dfinity · 2026-06-08T15:28:25Z

-                    })
-            })
-            .collect();
+        let subnet_ids: Vec<_> = all_subnet_ids.into_iter().collect();


all_subnet_ids is already a Vec<SubnetId>.

Suggested change

let subnet_ids: Vec<_> = all_subnet_ids.into_iter().collect();

let subnet_ids = self

.registry

.get_subnet_ids(validation_context.registry_version)

.map_err(Error::RegistryGetSubnetsFailed)?

.unwrap_or_default();

alin-at-dfinity · 2026-06-08T15:34:16Z

            ("NNS", &uc_nns, "CloudEngine 2", &uc_ce_2a),
            ("CloudEngine 2", &uc_ce_2a, "NNS", &uc_nns),


Not part of your change, but these two appear redundant. AFAICT there is no difference between "CloudEngine 1" and "CloudEngine 2".

Suggested change

alin-at-dfinity · 2026-06-08T15:34:46Z

@@ -306,10 +315,10 @@
            ("CloudEngine 2", &uc_ce_2a, "CloudEngine 1", &uc_ce_1a),


Ditto.

Suggested change

("CloudEngine 2", &uc_ce_2a, "CloudEngine 1", &uc_ce_1a),

alin-at-dfinity · 2026-06-08T16:18:27Z

        let fixture = PayloadBuilderTestFixture::with_xnet_state_and_subnet_types(
-            1,
+            0,
            btreemap![cloud_engine_subnet => SubnetType::CloudEngine],
            None,
        );


Rant: This fixture is amazingly hard to read. Does this mean we get 4 subnets out of which SUBNET_1 is a CloudEngine? And that own_subnet_type defaults to Application?

github-actions Bot added the feat label May 13, 2026

schneiderstefan force-pushed the stschnei/no-cycles-or-unbounded-xnet-on-engines branch from 8364950 to 35f20e0 Compare June 4, 2026 15:29

schneiderstefan changed the title ~~feat: Allow limited xnet on engines~~ feat: Allow xnet on engines Jun 4, 2026

schneiderstefan marked this pull request as ready for review June 4, 2026 15:58

schneiderstefan requested a review from a team as a code owner June 4, 2026 15:58

github-actions Bot added the @core-protocol label Jun 4, 2026

alin-at-dfinity mentioned this pull request Jun 8, 2026

chore: Drop SystemMetadata::full_topology #10411

Draft

alin-at-dfinity reviewed Jun 8, 2026

View reviewed changes

schneiderstefan and others added 3 commits June 8, 2026 14:15

Update rs/messaging/src/message_routing.rs

388214d

Co-authored-by: Alin Sinpalean <58422065+alin-at-dfinity@users.noreply.github.com>

Automatically fixing code for linting and formatting issues

0309536

fix comment manually

fd8d5cf

alin-at-dfinity reviewed Jun 8, 2026

View reviewed changes

	"Guaranteed-response calls from CloudEngine subnets are not allowed".to_string(),
	"Guaranteed-response calls and cycle transfers to / from CloudEngine subnets are not allowed".to_string(),

		#[serde(default, skip_serializing_if = "Vec::is_empty")]
		pub engine_not_allowed_deltas: Vec<u64>,

-        let subnet_ids: Vec<_> = all_subnet_ids.into_iter().collect();
+        let subnet_ids = self
+            .registry
+            .get_subnet_ids(validation_context.registry_version)
+            .map_err(Error::RegistryGetSubnetsFailed)?
+            .unwrap_or_default();

		("NNS", &uc_nns, "CloudEngine 2", &uc_ce_2a),
		("CloudEngine 2", &uc_ce_2a, "NNS", &uc_nns),

		@@ -306,10 +315,10 @@
		("CloudEngine 2", &uc_ce_2a, "CloudEngine 1", &uc_ce_1a),

Conversation

schneiderstefan commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alin-at-dfinity left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alin-at-dfinity left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alin-at-dfinity left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

schneiderstefan commented May 13, 2026 •

edited

Loading