Skip to content

Adds live migration ttrpc service definition#2691

Open
rawahars wants to merge 1 commit intomicrosoft:mainfrom
rawahars:lm-proto
Open

Adds live migration ttrpc service definition#2691
rawahars wants to merge 1 commit intomicrosoft:mainfrom
rawahars:lm-proto

Conversation

@rawahars
Copy link
Copy Markdown
Contributor

Summary

This pull request introduces a new live migration service to the codebase, including its protocol definitions, generated Go bindings, and integration into the project configuration. The migration service manages sandbox live migration workflows, handling preparation, memory transfer, and finalization. Additionally, it adds a type for capturing sandbox state during migration.

Live migration service introduction:

  • Added the migration package, including proto definitions (migration.proto) for the live migration service, which defines the migration workflow (preparation, memory transfer, finalization, and socket duplication) and all related messages and enums.
  • Added generated ttrpc Go bindings (migration_ttrpc.pb.go) for the migration service, providing server and client interfaces for use in Go code.
  • Added a package-level doc comment describing the purpose of the migration package in doc.go.
  • Registered the new migration package in the Protobuild.toml file, ensuring proto code generation for the package.

Sandbox state capture:

  • Added StatedState type to sandbox-spec/vm/v2/migration.go, used to capture and transfer the exported configuration of a sandbox during live migration, and registered it with the typeurl registry.

Comment thread internal/migration/migration.proto Outdated

message MigrationInitializeOptions {
// Origin is the side of migration the workflow is performed on.
string origin = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this be used for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread internal/migration/migration.proto Outdated
// PrepareMemoryTransferMode extends timeout for cross-version live migration.
bool prepare_memory_transfer_mode = 8;
// CompatibilityData is opaque VM compatibility data.
bytes compatibility_data = 9;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will compatibility data be used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data is sent to HCS. This is part of the initialize options that are sent to HCS- https://github.com/rawahars/hcsshim/blob/3a23596d02431453538cd967bd174889429c8038/internal/hcs/schema2/migration.go#L6

bytes compatibility_data = 9;
}

message MemoryMigrationTransferThrottleParams {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if there are HCS test cases for validating all these attributes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's chat offline about these.

// Migration is done.
MIGRATION_EVENT_DONE = 6;
// Migration recovery has been performed.
MIGRATION_EVENT_RECOVERY_DONE = 7;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When are Events MIGRATION_EVENT_RECOVERY_DONE and MIGRATION_EVENT_OFFLINE_DONE expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the set of events sent by HCS. We can chat offline about these.

// Taking the VM offline is done.
MIGRATION_EVENT_OFFLINE_DONE = 4;
// The VM has successfully started again after blackout phase.
MIGRATION_EVENT_BLACKOUT_EXITED = 5;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MIGRATION_EVENT_BLACKOUT_EXITED different from MIGRATION_EVENT_DONE?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes these are different. HCS sends them as separate events.

Comment thread internal/migration/migration.proto Outdated

enum FinalizeAction {
ACTION_UNSPECIFIED = 0;
STOP = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FINALIZE_ACTION_STOP, FINALIZE_ACTION_RESUME

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change.

Introduces the Live Migration ttrpc service with RPCs for preparing, transferring
memory, and finalizing sandbox live migration between source and destination
hosts. Includes sandbox state serialization for cross-host config propagation.

Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants