feat: allow engine-controller canister to update cloud engines directly#10431
feat: allow engine-controller canister to update cloud engines directly#10431NikolaMilosa wants to merge 6 commits into
Conversation
…ud engine subnet records
| /// Builds a registry that already has a `CloudEngine` subnet, suitable | ||
| /// for exercising the engine-controller permission checks on | ||
| /// `do_update_subnet`. | ||
| fn make_registry_with_cloud_engine_subnet() -> (Registry, SubnetId) { |
There was a problem hiding this comment.
In the test helpers, we already have such a method: https://github.com/dfinity/ic/blob/master/rs/registry/canister/tests/common/test_helpers.rs#L216
You might want to use that.
There was a problem hiding this comment.
Those are not accessible here because they are for integration tests 😟
| /// Creates a registry with a single subnet of the given `subnet_type`. | ||
| fn make_registry_with_subnet(subnet_type: SubnetType) -> (Registry, SubnetId) { |
There was a problem hiding this comment.
Here, you could probably use the test_helper method:
https://github.com/dfinity/ic/blob/master/rs/registry/canister/tests/common/test_helpers.rs#L216 it creates a registry with a system and a cloud engine subnet
There was a problem hiding this comment.
But those are integration test level things? I will add the same thing in the shared crate but I cannot use those in unit tests here
There was a problem hiding this comment.
Pull request overview
This PR expands the registry canister’s authorization model so the engine controller canister can directly perform limited subnet-management operations for CloudEngine subnets, enabling user-facing controls to be implemented via the engine controller without governance proposals.
Changes:
- Allow the engine controller canister to call
update_subnetanddeploy_guestos_to_all_subnet_nodeson the registry canister. - Enforce that engine-controller-initiated calls only target
CloudEnginesubnets (governance remains unrestricted). - Expose proxy payload types and add engine controller API surface + tests to forward these operations.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| rs/registry/canister/unreleased_changelog.md | Documents the newly allowed callers and CloudEngine-only restriction. |
| rs/registry/canister/src/mutations/do_update_subnet.rs | Adds caller-aware logic to restrict engine-controller updates to CloudEngine subnets; updates tests accordingly. |
| rs/registry/canister/src/mutations/do_deploy_guestos_to_all_subnet_nodes.rs | Adds caller-aware CloudEngine-only restriction for engine-controller deploys; adds tests. |
| rs/registry/canister/canister/canister.rs | Updates registry canister endpoints to authorize engine controller and pass through the caller principal to mutation handlers. |
| rs/engine_controller/src/lib.rs | Re-exports registry payload types for clients of the engine controller. |
| rs/engine_controller/engine_controller.did | Extends the engine controller public interface with update_subnet and deploy_guestos_to_all_subnet_nodes plus required type definitions. |
| rs/engine_controller/canister/tests.rs | Adds unit tests for the engine controller’s update_subnet payload validation and admin normalization helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
## What Stacked on top of #10431. That PR lets the engine-controller canister proxy `update_subnet`, but its field allow-list (`ensure_only_subnet_admins_set`) currently permits **only** `subnet_admins`. This PR widens that allow-list to also permit `is_halted`, so the engine controller can halt / unhalt the cloud-engine subnets it manages. ## Changes - `rs/engine_controller/canister/canister.rs` - Renamed `ensure_only_subnet_admins_set` → `ensure_only_allowed_fields_set` and moved `is_halted` into the allowed group; every other field still must be left at its default (`None` / `false`) or the call is rejected. - Updated the validator error message and the `update_subnet` doc comments. - `rs/engine_controller/canister/tests.rs` - Renamed the validator tests, added coverage that `is_halted` (both `true` and `false`) and `subnet_admins + is_halted` are accepted, and switched the "rejects other fields" test to assert on a still-disallowed field. ## Scope notes - The registry side is unchanged: `do_update_subnet` already accepts `is_halted` and only gates the engine controller to `CloudEngine` subnets, so no registry-level changes are needed. - `halt_at_cup_height` is intentionally **not** added — only the direct `is_halted` flag. Adding a new allowed field stays a conscious, code-level decision, matching the design intent of the base PR. - No `.did` change: `is_halted` is already part of the mirrored `UpdateSubnetPayload`, and the candid interface-equality test still passes. ## Testing `cargo test -p ic-engine-controller --bin engine-controller-canister` — all 10 unit tests pass (including the candid interface-match test). `cargo fmt --check` is clean. > Draft because it stacks on the not-yet-merged #10431; base is set to that > PR's branch (`nim-update-subnet`).
There was a problem hiding this comment.
This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):
-
Update
unreleased_changelog.md(if there are behavior changes, even if they are
non-breaking). -
Are there BREAKING changes?
-
Is a data migration needed?
-
Security review?
How to Satisfy This Automatic Review
-
Go to the bottom of the pull request page.
-
Look for where it says this bot is requesting changes.
-
Click the three dots to the right.
-
Select "Dismiss review".
-
In the text entry box, respond to each of the numbered items in the previous
section, declare one of the following:
-
Done.
-
$REASON_WHY_NO_NEED. E.g. for
unreleased_changelog.md, "No
canister behavior changes.", or for item 2, "Existing APIs
behave as before.".
Brief Guide to "Externally Visible" Changes
"Externally visible behavior change" is very often due to some NEW canister API.
Changes to EXISTING APIs are more likely to be "breaking".
If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.
If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.
Reference(s)
For a more comprehensive checklist, see here.
GOVERNANCE_CHECKLIST_REMINDER_DEDUP
|
✅ No security or compliance issues detected. Reviewed everything up to 9d47bef. Security Overview
Detected Code Changes| Change Type | Relevant files ... (code changes summary truncated to fit VCS comment limits.) |
schneiderstefan
left a comment
There was a problem hiding this comment.
Implementation-wise LGTM, but some high-level concerns.
| impl Registry { | ||
| pub fn do_update_subnet(&mut self, payload: UpdateSubnetPayload) { | ||
| println!("{}do_update_subnet: {:?}", LOG_PREFIX, payload); | ||
| pub fn do_update_subnet(&mut self, caller: PrincipalId, payload: UpdateSubnetPayload) { |
There was a problem hiding this comment.
Is there rate limiting in place, ideally in the registry canister itself? It could be dangerous if the end user is allowed to make changes and can do as many as they want.
There was a problem hiding this comment.
For transparency: at the moment we aren't even allowing this call, we need this for something completely different that isn't user facing. So users for now cannot do anything with this specifically and won't be able for some time still until we figure out everything. And we won't be constantly changing any of these values. I expect that per engine it could be 2-3 times per month. Basically we want to be able to halt a subnet if an engine payment is overdue. This will eventually go into this canister (the whole accounting logic), but in the interest of time we have to take it bit by bit
| @@ -15,14 +16,32 @@ use serde::Serialize; | |||
| impl Registry { | |||
| pub fn do_deploy_guestos_to_all_subnet_nodes( | |||
There was a problem hiding this comment.
Note that any pair of blessed versions might be incompatible at any time. So it's easy to shoot yourself in the foot if we give too many options to the end users.
There was a problem hiding this comment.
Absolutely! Again this is not exposed to the end users. This is still heavily controlled by us. I highly doubt that any user will ever care about that so this will most likely go away.
The users should be able to control their own settings for the engines they use. For now we are scoping that to only
subnet_adminsandreplica_version_idbut in the future we might want to allow more things.