Group E2EE before discovery: safe one-shot lifecycle orchestration#2
Group E2EE before discovery: safe one-shot lifecycle orchestration#2chgaowei wants to merge 15 commits into
Conversation
The secure direct CLI now consumes the published ANP Go SDK v0.8.7, persists sent secure messages as E2EE, redacts secure status output, auto-acks decrypted inbound init messages from polling/listener paths, and flushes queued secure outbox items once peers confirm. This removes the temporary workspace SDK replacement while keeping local repair and retry commands available for restart recovery. Constraint: P5 requires direct.send operation_id/message_id equality, target key-service binding, and no leakage of ratchet material Constraint: Mainline builds must consume github.com/agent-network-protocol/anp/golang v0.8.7 without a committed workspace replace Rejected: Keep replace => ../anp/anp/golang | breaks CI and release portability Confidence: high Scope-risk: moderate Directive: Do not print raw p5-e2ee-sessions or reintroduce workspace-local ANP SDK replace in committed go.mod Tested: go test ./... Tested: go vet ./... Tested: awiki-system-test tests_v2 with message-service v2: 85 passed, 8 skipped Not-tested: Cross-device production relay outside the local system-test stack
The CLI needs a stable integration seam for P6 before real MLS is connected, so this introduces exec-based provider plumbing and diagnostic commands while documenting that the architecture draft remains contract-test only for this slice. Constraint: awiki-cli must stay pure Go and must not depend on a resident MLS process. Constraint: plaintext must be passed through stdin rather than argv. Rejected: Wire group create/add/send into live OpenMLS now | would imply production E2EE before service and SDK contracts are proven. Confidence: high Scope-risk: moderate Directive: Do not advertise real group E2EE from CLI commands until anp-mls uses real OpenMLS state and system tests cover the loop. Tested: go test ./internal/message ./internal/cli; git diff --check Not-tested: Real anp-mls installation discovery, transparent group send/decrypt happy path, packaged release binary inclusion
The CLI now treats group E2EE as a real anp-mls exec-backed flow instead of a dry diagnostic skeleton. The implementation keeps Go pure-Go/no-CGO, sends plaintext only through stdin to anp-mls, publishes opaque P6 payloads to message-service, and stores only derived group summaries in the business SQLite database. Constraint: awiki-cli must not link Rust/OpenMLS or store MLS private material in its business DB Constraint: System tests require AWIKI_ANP_MLS_BINARY/runtime path discovery before PATH fallback Rejected: Keep group E2EE as contract-test-only CLI commands | real MLS lane requires create/add/send/decrypt orchestration from normal CLI surfaces Confidence: medium Scope-risk: moderate Directive: Do not put application plaintext in anp-mls argv or message-service group.e2ee.send bodies Tested: go test ./internal/message ./internal/doctor -count=1 Tested: go test ./internal/cli -run TestGroupDryRunPlansRenderStableContracts -count=1 Tested: CGO_ENABLED=0 go test ./... -run '^$' -count=1 Tested: go vet ./internal/message ./internal/doctor ./internal/cli Not-tested: Full internal/cli suite; existing TestRuntimeValidationErrorsUseStableCodes timed out during remote upgrade/DID replace path
Task-8 focused E2E exposed that anp-mls may include device-scoped provider metadata in the generated KeyPackage result while message-service accepts a tighter public KeyPackage schema. The CLI now whitelists the service payload fields before signing and publishing, so device_id and any private provider-only fields stay local to anp-mls/CLI orchestration. Constraint: message-service rejects unsupported body.group_key_package fields with RPC 1003 Constraint: MLS private/provider metadata must not leak into service storage Rejected: Require message-service to accept all provider fields | unnecessarily broadens server storage contract and does not fix private-field leakage Confidence: high Scope-risk: narrow Directive: Keep group.e2ee.publish_key_package body limited to service-supported public KeyPackage fields Tested: go test ./internal/message -run 'TestBuildGroupE2EEPublishKeyPackageRPCParamsStripsProviderOnlyFields|TestBuildGroupE2EEAddRPCParamsIncludesConsumedKeyPackageID|TestBuildGroupE2EESendRPCParamsSendsOnlyOpaqueCipherObject|TestMLSExecProvider' -count=1 Tested: go test ./internal/message -count=1 Tested: go vet ./internal/message
Doctor now verifies the anp-mls compatibility contract and reports MLS state health before reviewers try the group E2EE path. The release helper documents and stages the Rust binary without changing the pure-Go CLI boundary. Constraint: awiki-cli must stay pure Go/no CGO and invoke anp-mls through stdin/stdout Rejected: vendor Rust into the Go binary | would violate the pure-Go packaging boundary Confidence: high Scope-risk: narrow Tested: go test ./internal/message ./internal/doctor ./internal/cli Tested: go vet ./internal/message ./internal/doctor Tested: scripts/release/build-anp-mls.sh --dry-run
The real MLS Alice/Bob loop publishes Bob's KeyPackage under a named device, while one-shot message reads previously tried only the default device state. The CLI now keeps anp-mls state agent/device-scoped, processes local welcome notices for stored identities, scans local device state during decrypt, and strips provider-local plaintext/OpenMLS fields before sending opaque P6 objects to the service. Constraint: Go CLI must remain pure Go / no CGO and invoke anp-mls as a one-shot helper. Constraint: Group E2EE remains hidden/test-only; this does not enable public discovery. Rejected: Share one MLS SQLite DB across local identities | OpenMLS private KeyPackage state is not namespaced and Alice add consumed Bob's local material. Confidence: high Scope-risk: moderate Directive: Do not send provider-local OpenMLS fields or application plaintext in group.e2ee.send payloads. Tested: go test ./internal/message ./internal/doctor ./internal/cli Tested: go vet ./internal/message ./internal/doctor Tested: focused awiki-system-test group E2EE local/negative target passed (2 passed). Tested: root make local-test passed (84 passed, 11 skipped).
The CLI now drives the hidden Group E2EE loop through the P6 target/security matrix, ratchet-tree welcome replay, explicit send AAD metadata, and durable notice repair without introducing a background process or CGO. The change also strengthens doctor/install diagnostics around the one-shot anp-mls binary and scoped state. Constraint: Group E2EE must remain hidden/test-only and must not imply public product support. Constraint: anp-mls receives plaintext only over stdin/stdout JSON, never argv. Rejected: Cache MLS epoch as P4 group_state_version | service state and MLS epochs are separate and must be bound independently. Rejected: Implement broad group lifecycle commands | v1 scope is publish/create/add/welcome/send/decrypt plus repair diagnostics. Confidence: high Scope-risk: moderate Directive: Keep help/discovery copy conservative until the service explicitly enables public discovery after security review. Tested: go test ./internal/message ./internal/cli ./internal/cmdmeta ./internal/doctor -count=1; go vet ./internal/message ./internal/doctor; focused awiki-system-test CLI real-MLS loop 2 passed Not-tested: Public beta packaging across all OS release artifacts
|
Security review completed for this PR set (2026-05-03): BLOCK_DISCOVERY. This PR should stay draft / hidden-test-only. Do not enable public discovery for Key blockers before any separate discovery-enable PR:
Focused evidence remains:
|
The CLI now signs anp-mls KeyPackage did_wba_binding objects with the active identity key before publication, keeping private MLS material in anp-mls while giving message-service a cryptographic ownership proof to verify. Constraint: awiki-cli must remain pure Go/no CGO and Group E2EE discovery stays hidden. Rejected: Trust provider-supplied binding proof | the provider cannot know the CLI identity key and would leave publish verification shape-only. Confidence: high Scope-risk: moderate Directive: Do not pass plaintext or private key material through argv; keep signing in-process and anp-mls JSON over stdin/stdout. Tested: cd awiki-cli && go test ./internal/anpsdk ./internal/message Tested: focused Group E2EE system test after deslop: 2 passed in 23.19s
PR-A needs awiki-cli to route E2EE membership exits through MLS pending commits without making same-epoch local-terminal self-leave look safe. The CLI now prepares remove commits, finalizes only after hidden service acceptance, repairs commit-delivery notices for one-shot clients, and blocks non-advancing self-leave before submitting any P6 leave mutation. Constraint: Public discovery remains hidden/test-only for group E2EE. Constraint: OpenMLS 0.8 self-leave can be local-terminal without advancing the group epoch. Rejected: Fall back to public group.remove/group.leave for E2EE groups | would separate membership changes from cryptographic epoch changes. Rejected: Submit same-epoch self-leave to group.e2ee.leave | service acceptance would rely on delivery suppression rather than MLS exclusion. Confidence: high Scope-risk: moderate Directive: Do not enable E2EE self-leave until anp-mls exposes an epoch-advancing commit or a reviewed leave-request/remaining-member flow exists. Tested: go test ./internal/message -run 'TestLeaveGroupE2EERejectsLocalTerminalSelfLeaveBeforeServiceSubmit|TestUnsupportedGroupE2EESelfLeaveReasonDetectsNonAdvancingEpoch' -v Tested: go test ./internal/message ./internal/cli ./internal/cmdmeta Tested: go test ./... Tested: go vet ./... Tested: git diff --check
PR-B1 requires one-shot CLI clients to avoid OpenMLS local-terminal self-leave artifacts. E2EE group leave now creates a hidden leave_request control-plane record, while owner/admin processing uses the existing epoch-advancing remove-member orchestration and carries the leave request id into the hidden remove payload. Constraint: Public group E2EE discovery remains hidden/test-only during PR-B1 Constraint: Go CLI must remain pure Go and delegate MLS commits to anp-mls Rejected: Submit group.e2ee.leave with a same-epoch local-terminal artifact | fails cryptographic exclusion semantics Confidence: high Scope-risk: moderate Directive: Do not route E2EE group leave back to provider.LeaveGroup unless anp-mls can prove epoch-advancing remaining-member exclusion Tested: go test -run 'TestBuildGroupE2EE|TestHTTPTransportGroupMethods|TestLeaveGroupE2EE|TestGroupDryRunPlans' ./internal/message ./internal/cli Tested: go test ./... Tested: go vet ./...
After an owner/admin remove is accepted, a one-shot CLI process may still hold a pending local MLS commit and initially encrypt at the previous epoch. Detect the service epoch-mismatch response, finalize any local pending commit reported by anp-mls status, repair notices, and retry encryption so remaining members can continue sending ciphertext without falling back to plaintext group messages. Constraint: Group E2EE remains hidden/test-only; no multi-device, cloud snapshot, rejoin, or public discovery expansion. Rejected: Fall back to group.base send on epoch mismatch | would store application plaintext in the service DB for E2EE groups. Confidence: high Scope-risk: moderate Directive: Do not bypass group.e2ee.send for known E2EE groups; stale epochs must repair/fail closed, not downgrade. Tested: go test -run 'Test.*GroupE2EE|Test.*Leave|TestGroup' ./internal/message ./internal/cli ./internal/cmdmeta Tested: go vet ./... Tested: live awiki-system-test lifecycle target passed: 3 passed in 26.50s Tested: git diff --check
PR-B1 update: safe Group E2EE leave-request before discoveryScope added in this push:
Latest commits across the PR-B1 stack:
Validation evidence:
Notes:
|
One-shot CLI recovery now compares local MLS state with the hidden service crypto head, safely finalizes accepted pending commits, replays durable welcome/commit notices, treats duplicate already-applied commits as delivered, and fails closed with needs_snapshot_or_readd when continuity cannot be proven. Status/send paths scan agent/device-scoped MLS state so members added through non-default KeyPackages can repair missed add commits and resume encrypted sends without a resident process. Constraint: PR-B2 is recovery hardening only; group E2EE remains hidden/test-only with no public discovery, multi-device, k1 compatibility, cloud snapshot, External Commit, or rejoin scope. Rejected: Always encrypt/process repair on the default device | KeyPackage-published members store MLS state under their actual device id and would be stranded after welcome repair. Rejected: Finalize any local pending commit during repair | local commits are finalized only when the service head proves the target epoch was accepted. Confidence: high Scope-risk: moderate Directive: Keep recovery fail-closed on missing notice gaps until a separately reviewed snapshot/re-add protocol exists. Tested: go test ./internal/message ./internal/cli ./internal/cmdmeta Tested: AWIKI_GROUP_E2EE_CONTRACT_TEST=1 uv run python manage_local_test_env.py run-tests --with-message-v2 --use-local-anp tests_v2/cli/test_awiki_cli_group_e2ee_recovery_local.py Not-tested: Full root make local-test.
PR-B2 update: missed add-commit recovery before discoveryScope added in the latest push:
What changed across the four-PR stack:
Latest PR-B2 commits:
Focused validation evidence:
CI / broad smoke after push:
Discovery remains blocked by the existing security-review gate; this update is recovery plumbing, not public readiness. |
The CLI now exposes hidden/test-only same-device group E2EE recovery UX: recovery KeyPackage publication, recover-member orchestration through anp-mls prepare/finalize/abort, and needs_snapshot_or_readd recovery artifacts. The recovery path submits group.e2ee.recover_member and keeps P4 group.add out of the flow. Constraint: Worker-3 scope is awiki-cli only after leader correction Constraint: Recovery must stay hidden/test-only and must not mutate P4 membership Rejected: Reuse group.add for recovery | violates PR-B3 P4/P6 separation Confidence: medium Scope-risk: moderate Tested: gofmt on modified Go files Tested: go test ./internal/message ./internal/cli ./internal/cmdmeta Tested: go test ./... Tested: go vet ./... Not-tested: Live message-service/anp-mls PR-B3 end-to-end because sibling service lane is owned by other workers
PR-B3 recovery KeyPackages must reach message-service with purpose, group_did, and device_id intact, while normal KeyPackages must not send empty optional recovery fields. The recover-member result also stops echoing a group.add debug marker so hidden/test-only recovery cannot be mistaken for P4 membership mutation. Constraint: Group E2EE stays hidden/test-only and recover-member must not mutate or imply P4 group.add. Rejected: Keep group.add as a forbidden-method debug echo | focused E2E treats any public group.add marker as an overclaim and it is unnecessary for orchestration. Confidence: high Scope-risk: narrow Directive: Do not drop purpose/group_did from recovery group_key_package payloads; service recovery lookup depends on them. Tested: go test ./internal/message -run 'PublishKeyPackage|SanitizeGroupKeyPackage|RecoverMember|GroupE2EE' Tested: go test ./internal/message ./internal/cli ./internal/cmdmeta Tested: go test ./... Tested: go vet ./... Tested: awiki-system-test focused CLI PR-B3 E2E 5 passed with --with-message-v2 --use-local-anp Not-tested: Public discovery enablement; intentionally out of scope.
The PR branch had fallen behind main far enough for GitHub to mark the pull request DIRTY. This merge keeps the hidden/test-only group E2EE recovery work reviewable while preserving current main changes around handle completion, listener probes, and English skill references. Constraint: Final PR confirmation must not claim merge readiness while GitHub reports conflicts Rejected: Leave awiki-cli PR dirty and only document the blocker | it would fail the requested pre-merge confirmation Confidence: medium Scope-risk: moderate Directive: Keep group E2EE discovery hidden; this merge only resolves base drift and must not enable public capability advertising Tested: go test ./internal/cli ./internal/message ./internal/runtime/listener ./internal/cmdmeta ./internal/doctor ./internal/anpsdk Not-tested: full cross-repo system test after base-drift merge
|
Withdrawn for now per maintainer request. Keeping the branch for later follow-up; not merging and not deleting the branch. |
Summary
PR-B3 for hidden/test-only Group E2EE recovery before public discovery.
This CLI slice wires owner/admin same-device recovery orchestration around
anp-mlsand the hidden message-service P6 control plane. It keeps the user-facing feature gated and diagnostic/repair oriented; public discovery remains off.Scope
group e2ee recover-memberorchestration.group.e2ee.recover_member-> finalize/abort without mutating P4 membership.purpose,group_did,device_id) through CLI/service boundaries.main; PR merge state is now clean after commit0ca3b48.Guardrails
group-e2eediscovery claim.Validation evidence
go test ./internal/cli ./internal/message ./internal/runtime/listener ./internal/cmdmeta ./internal/doctor ./internal/anpsdk-> pass.go test ./... && go vet ./...-> pass.unit-testchecks green after the base-sync push.3 passed.5 passed in 24.96s.Review status
Ready for reviewer attention as part of the four-PR PR-B3 set. This PR intentionally remains hidden/test-only and should not be used to announce production Group E2EE capability.