Skip to content

Group E2EE before discovery: add lifecycle MLS contracts#53

Closed
chgaowei wants to merge 13 commits into
masterfrom
feature/changshan/group-e2ee
Closed

Group E2EE before discovery: add lifecycle MLS contracts#53
chgaowei wants to merge 13 commits into
masterfrom
feature/changshan/group-e2ee

Conversation

@chgaowei
Copy link
Copy Markdown
Collaborator

@chgaowei chgaowei commented May 3, 2026

Summary

PR-B3 for hidden/test-only Group E2EE recovery before public discovery.

This ANP SDK/Rust slice adds reusable P6 recovery contracts and anp-mls one-shot state-machine support for owner/admin same-device member recovery. It keeps the MLS implementation behind local SDK/CLI orchestration and does not advertise public Group E2EE discovery.

Scope

  • Add/align P6 recovery wire types and contract artifacts for recovery KeyPackage binding.
  • Extend anp-mls with recover-member prepare/finalize/abort flows.
  • Preserve device-scoped local MLS state, operation idempotency, and fail-closed recovery semantics.
  • Support PR-B3 owner/admin re-add recovery and epoch-repair contract needs used by CLI/message-service/system-test.

Guardrails

  • Hidden/test-only only: no anp.group.e2ee.v1 public discovery enablement here.
  • No multi-device support.
  • No k1 DID compatibility.
  • No cloud snapshot or External Commit.
  • No claim that this is complete production MLS group lifecycle; it is PR-B3 recovery hardening before discovery.

Validation evidence

  • ANP PR checks: CodeQL matrix and Rust Python Interop are green on PR Group E2EE before discovery: add lifecycle MLS contracts #53.
  • Prior PR-B3 ANP local verification covered Rust/Go group E2EE contract tests and OpenMLS recovery state-machine tests.
  • Cross-repo focused system evidence:
    • Hidden/default discovery + flag-off smoke: 3 passed.
    • CLI recovery + negative focused E2E: 5 passed in 24.96s.
    • Logs retained under .omx/reports/group-e2ee-pr-b3-final/ in the workspace.
  • Final review/security evidence: no CRITICAL/HIGH blocker; discovery remains blocked until a separate public-discovery security gate.

Review status

Ready for reviewer attention as part of the four-PR PR-B3 set. Do not merge as a standalone public capability PR; merge with the matching awiki-cli, message-service, and awiki-system-test branches.

chgaowei added 5 commits May 2, 2026 08:18
The product integrations need stable P6 wire contracts before real OpenMLS is wired in, so this adds shared Rust and Go contract surfaces plus a one-shot anp-mls binary that can be exercised by services and CLI code without claiming production cryptography.

Constraint: First slice must shape API/storage/provider/test scaffolding before real OpenMLS integration.
Constraint: Go SDK path must remain pure Go and avoid CGO.
Rejected: Implement OpenMLS state in this commit | would couple four repositories before the contract boundary is stable.
Confidence: high
Scope-risk: moderate
Directive: Do not remove contract-test/non_cryptographic markers until real OpenMLS vectors and persistence are in place.
Tested: cargo test --manifest-path rust/Cargo.toml --test group_e2ee_contract_tests -- --nocapture; cd golang && go test ./group_e2ee; git diff --check
Not-tested: Real OpenMLS group state, SQLite lock/state restore, cross-language MLS cryptographic vectors
The one-shot anp-mls boundary now defaults to real OpenMLS operations backed by local SQLite state, file-lock serialization, and operation-id replay protection while keeping the explicit contract-test path available for old fixtures.

Constraint: awiki-cli invokes anp-mls through stdin/stdout and keeps MLS private material out of Go and message-service.
Constraint: Worker lane owns only anp/anp/rust changes; CLI/service lanes coordinate via the existing ExecProvider envelope.
Rejected: Keep deterministic contract artifacts as default | real E2EE acceptance requires cryptographic OpenMLS create/add/welcome/encrypt/decrypt.
Confidence: high
Scope-risk: moderate
Directive: Do not expose MLS private state outside --data-dir/state.db; contract-test output must stay explicitly marked non_cryptographic.
Tested: cargo fmt --check; cargo check --manifest-path Cargo.toml --bin anp-mls; cargo test --manifest-path Cargo.toml --test group_e2ee_contract_tests --test group_e2ee_real_mls_tests -- --nocapture; cargo test --manifest-path Cargo.toml group_e2ee -- --nocapture; cargo clippy --manifest-path Cargo.toml --bin anp-mls --test group_e2ee_real_mls_tests --test group_e2ee_contract_tests -- --allow warnings
Not-tested: Full workspace local-test and cross-repo awiki-cli/message-service E2E are owned by other lanes.
The Go client docs now distinguish the real Rust anp-mls path from explicit contract-test compatibility artifacts, preventing future readers from treating the P6 surface as scaffold-only after the real MLS lane landed.

Constraint: P6 contract-test mode remains available for compatibility tests

Confidence: high

Scope-risk: narrow

Tested: rg stale scaffold wording in updated ANP docs
The group E2EE release lane needs a stable binary compatibility probe plus negative evidence around local MLS state before awiki-cli doctor and system tests can rely on the Rust helper. The change keeps the one-shot exec boundary and hidden feature posture while validating incoming group/cipher claims against local SQLite bindings before OpenMLS work, and redacts decrypted plaintext from idempotency records.

Constraint: Scope is limited to anp/anp Rust anp-mls hardening; no public discovery, no k1 DID compatibility, and no AAD binding release claim.\nRejected: Persist decrypt responses verbatim for idempotent replay | plaintext would be retained in local operation rows.\nConfidence: high\nScope-risk: narrow\nDirective: Do not expose group E2EE publicly or claim ANP/MLS AAD binding until the protocol contract exists.\nTested: cd rust && cargo fmt --check; cargo check --manifest-path Cargo.toml --bin anp-mls; cargo test --manifest-path Cargo.toml group_e2ee -- --nocapture; git diff --check\nNot-tested: Cross-repo awiki-cli doctor integration and full workspace local-test are owned by other lanes.
The real anp-mls path now enforces the protocol details that must be true before any client or service can safely advertise Group E2EE: ratchet-tree welcome processing, canonical send AAD, and KeyPackage DID WBA binding checks. The feature remains hidden/test-only and does not claim complete MLS group lifecycle support.

Constraint: Public discovery must stay disabled until a separate security-reviewed enablement PR.
Constraint: CLI integration remains one-shot JSON stdin/stdout with no CGO requirement.
Rejected: Treat group_state_ref.group_state_version as MLS epoch | P4 service state versions and MLS epochs are independent and conflating them broke focused sends.
Rejected: Add k1 DID compatibility for Group E2EE | explicitly out of scope for this phase.
Confidence: high
Scope-risk: moderate
Directive: Do not advertise anp.group.e2ee.v1 from SDK/release surfaces until message-service discovery gate passes.
Tested: cargo fmt --manifest-path rust/Cargo.toml --check; cargo test --manifest-path rust/Cargo.toml group_e2ee --all-targets
Not-tested: MLS remove/leave, External Commit, multi-device sync, attachment E2EE, HTTP anp-mls serve
@chgaowei
Copy link
Copy Markdown
Collaborator Author

chgaowei commented May 3, 2026

Security review completed for this PR set (2026-05-03): BLOCK_DISCOVERY.

This PR should stay draft / hidden-test-only. Do not enable public discovery for anp.group.e2ee.v1 / group-e2ee in this PR set.

Key blockers before any separate discovery-enable PR:

  • DID WBA binding proof validation is currently shape-level; cryptographic proof verification and golden vectors are required.
  • Notice pull/fanout semantics need public-client hardening (missed live notification recovery, idempotent mark-delivered, observability).
  • Broad tests_v2 / root make local-test remains deferred and must be green or explicitly accepted before undrafting/merge.

Focused evidence remains:

  • real MLS CLI smoke: 2 passed in 19.52s
  • flag-off guard: 1 passed in 2.43s

chgaowei added 2 commits May 3, 2026 20:39
Shared Go/Rust golden vectors now exercise strict Appendix-B object proof verification for P6 did_wba_binding and prove tampering is rejected before any product surface advertises Group E2EE.

Constraint: Public discovery remains hidden until a separate APPROVE_DISCOVERY gate.
Rejected: Rust-only proof coverage | it would not prove SDK parity for CLI/service integrations.
Confidence: high
Scope-risk: narrow
Directive: Keep Go and Rust DID WBA binding vectors in sync before changing P6 proof semantics.
Tested: cd anp/anp/golang && go test ./proof
Tested: cd anp/anp/rust && cargo test --test proof_tests test_did_wba_binding_golden_vector_verifies_and_tamper_fails
PR-A remove/leave needs one-shot clients to prepare durable local state before the service accepts the composite P4/P6 mutation. The anp-mls binary now exposes remove-member, commit process/finalize/abort, and documented leave terminal-state handling while keeping service-facing artifacts opaque/public.

Constraint: OpenMLS 0.8 rejects same-member self-remove commits, so leave finalization marks local state left without pretending to advance MLS locally

Rejected: Merge remove commits during prepare | would let local epoch advance before service acceptance

Confidence: medium

Scope-risk: moderate

Tested: cargo fmt --check; cargo check --bin anp-mls; cargo test group_e2ee; cargo test --test group_e2ee_contract_tests -- --nocapture; cargo test --test group_e2ee_real_mls_tests -- --nocapture

Not-tested: End-to-end service/CLI orchestration owned by parallel workers
@chgaowei chgaowei changed the title Group E2EE Step B: keep anp-mls P6-aligned before discovery Group E2EE before discovery: add lifecycle MLS contracts May 3, 2026
chgaowei added 3 commits May 3, 2026 23:33
PR-B1 needs SDK-level wire vocabulary for safe self-leave without changing OpenMLS local-terminal semantics. Add Rust and Go group_e2ee leave_request/process objects, canonical Rust control-plane AAD validation, focused tests, and docs that keep the flow hidden/test-only and tied to owner/admin epoch-advancing remove commits.

Constraint: Scope is anp/anp only; message-service and awiki-cli runtime behavior are owned by sibling lanes

Rejected: Implementing External Commit or changing anp-mls group leave | explicitly outside PR-B1 fallback scope

Rejected: Public discovery enablement | PR-B1 remains hidden/test-only until security gates pass

Confidence: medium

Scope-risk: narrow

Directive: Do not treat same-member local-terminal leave as a service success; process leave requests through an authorized epoch-advancing remove commit

Tested: cd rust && cargo test group_e2ee -- --nocapture

Tested: cd rust && cargo fmt --check

Tested: cd golang && go test ./group_e2ee

Tested: git diff --check

Not-tested: Full cross-repo PR-B1 E2E before message-service/awiki-cli lanes land
The service and CLI lanes use group.e2ee.process_leave_request for the hidden PR-B1 owner/admin control-plane step. Align the ANP Rust and Go contract constants so generated clients do not call the obsolete dotted leave_request.process name.

Constraint: Keep Group E2EE hidden/test-only and avoid changing public discovery.

Rejected: Keep group.e2ee.leave_request.process as an SDK alias | would leave cross-repo focused tests split across two wire methods.

Confidence: high

Scope-risk: narrow

Tested: cd rust && cargo fmt --check && cargo test group_e2ee -- --nocapture

Tested: cd golang && go test ./group_e2ee

Tested: git diff --check
One-shot clients can observe a service-accepted epoch before the local OpenMLS pending commit has been finalized. Reporting pending commit summaries from group status gives the CLI a safe local recovery hook without exposing MLS private material or enabling public discovery.

Constraint: Keep Group E2EE hidden/test-only and single-device; no public discovery or multi-device recovery.

Rejected: Query the anp-mls SQLite database from awiki-cli | would break the provider boundary and couple Go CLI to Rust private schema.

Confidence: high

Scope-risk: narrow

Tested: cd rust && cargo fmt --check

Tested: cd rust && cargo test group_e2ee_remove_member_prepares_pending_commit_then_finalize_advances_epoch -- --nocapture

Tested: git diff --check
@chgaowei
Copy link
Copy Markdown
Collaborator Author

chgaowei commented May 4, 2026

PR-B1 update: safe Group E2EE leave-request before discovery

Scope added in this push:

  • Hidden/test-only PR-B1 safe leave-request flow only.
  • No public discovery enablement; anp.group.e2ee.v1 / group-e2ee remain hidden.
  • No multi-device, k1 compatibility, cloud snapshot, member update, rejoin, or External Commit.

Latest commits across the PR-B1 stack:

  • anp/anp: e9e0841, 3e2642a, 5bb6adb
  • awiki-cli: b71f55e, ba7f77f
  • message-service: 4304c5e, 99c82dd
  • awiki-system-test: 4a856b4, e119a9a, d800be5

Validation evidence:

  • Local env: awiki-system-test/manage_local_test_env.py check --with-message-v2 --use-local-anp passed.
  • Live lifecycle: AWIKI_GROUP_E2EE_CONTRACT_TEST=1 uv run pytest tests_v2/cli/test_awiki_cli_group_e2ee_lifecycle_local.py -q3 passed in 26.50s.
  • Discovery/flag-off: uv run python manage_local_test_env.py run-tests --keep-env --with-message-v2 --use-local-anp tests_v2/message_service/test_group_e2ee_contract.py tests_v2/message_service/test_group_e2ee_flag_off.py3 passed in 1.43s.
  • anp/anp: cargo fmt --check; cargo test group_e2ee_remove_member_prepares_pending_commit_then_finalize_advances_epoch -- --nocapture; git diff --check.
  • awiki-cli: focused group E2EE Go tests; go vet ./...; git diff --check.
  • awiki-system-test: focused ruff check / ruff format --check; git diff --check.

Notes:

  • Live local Postgres message-service test databases had stale migration checksums from prior runs; they were reset (message_service_a/b) before rerun.
  • The final lifecycle run proved the remaining-member post-remove send stays on group.e2ee.send and does not fall back to plaintext group.base delivery.

The recovery/repair path needs anp-mls status to distinguish a real
active group from persisted terminal metadata. Reporting local_epoch and
left/removed bindings gives CLI repair enough evidence to fail closed or
resume safely instead of treating any binding row as sendable group state.

Constraint: Group E2EE remains hidden/test-only and single-device; no public discovery or multi-device scope is added.
Rejected: Keep status=active whenever a binding exists | stale or locally-left devices would appear eligible to encrypt.
Confidence: high
Scope-risk: narrow
Directive: Do not collapse inactive binding status back to active without re-running CLI repair and safe-leave recovery tests.
Tested: cd rust && cargo fmt --check
Tested: cd rust && cargo test group_e2ee_leave_prepares_and_finalize_marks_local_state_left -- --nocapture
Not-tested: Full ANP Rust test suite.
@chgaowei
Copy link
Copy Markdown
Collaborator Author

chgaowei commented May 4, 2026

PR-B2 update: missed add-commit recovery before discovery

Scope added in the latest push:

  • Hidden/test-only PR-B2 recovery/repair hardening only.
  • No public discovery enablement; anp.group.e2ee.v1 / group-e2ee remain hidden.
  • No multi-device, k1 compatibility, cloud snapshot, member update, rejoin, or External Commit.

What changed across the four-PR stack:

  • missed add commit recovery: one-shot clients can diagnose local-vs-service epoch lag and replay durable commit-delivery notices after being offline.
  • device-scoped MLS state: CLI status/repair/send scan agent/device-scoped anp-mls state instead of assuming only default, so KeyPackage-published devices can recover and send.
  • service durable commit notice: group.e2ee.add stores commit-delivery notices for existing active members while keeping message-service opaque-only; no MLS private state or plaintext is stored server-side.

Latest PR-B2 commits:

  • anp/anp: 8eefd04 — status reports local_epoch and inactive left/removed bindings correctly.
  • awiki-cli: debbe39 — repair/status/send recovery across service head, durable notices, and device-scoped MLS state.
  • message-service: a050f28 — durable commit-delivery fanout for existing active members on add.
  • awiki-system-test: 6968177 — Alice/Bob/Carol recovery system test.

Focused validation evidence:

  • go test ./internal/message ./internal/cli ./internal/cmdmeta → passed.
  • cargo fmt --check → passed.
  • cargo test -p im-group group_e2ee_add_commit_notice_carries_replay_artifact_for_existing_members -- --nocapture → passed.
  • cargo test group_e2ee_leave_prepares_and_finalize_marks_local_state_left -- --nocapture → passed.
  • uv run ruff check manage_local_test_env.py tests_v2/message_service/test_group_e2ee_contract.py tests_v2/cli/test_awiki_cli_group_e2ee_recovery_local.py → passed.
  • uv run pytest tests_v2/message_service/test_group_e2ee_contract.py::test_group_e2ee_real_mls_loop_is_covered_by_focused_cli_tests -q1 passed in 0.03s.
  • AWIKI_GROUP_E2EE_CONTRACT_TEST=1 uv run python manage_local_test_env.py run-tests --with-message-v2 --use-local-anp tests_v2/cli/test_awiki_cli_group_e2ee_recovery_local.py1 passed in 11.63s and local env stopped.

CI / broad smoke after push:

  • anp/anp GitHub checks: CodeQL + Rust Python Interop all passed on feature/changshan/group-e2ee.
  • awiki-cli GitHub Actions: unit-test passed on debbe39.
  • message-service / awiki-system-test: no GitHub checks are currently reported on this branch.
  • awiki-cli broad: go test ./... → passed.
  • message-service broad: cargo test -p im-group -- --nocapture36 passed; cargo check --workspace → passed.
  • message-service v2 system broad: uv run python manage_local_test_env.py run-tests --with-message-v2 --use-local-anp tests_v2/message_service14 passed in 10.59s.
  • full tests_v2 smoke: uv run python manage_local_test_env.py run-tests --suite message-v2 --with-message-v2 --use-local-anp tests_v285 passed, 14 skipped in 151.89s; local env stopped.

Discovery remains blocked by the existing security-review gate; this update is recovery plumbing, not public readiness.

chgaowei added 2 commits May 4, 2026 18:02
Add the PR-B3 anp-mls recovery state-machine surface so owners can prepare a recovery add without advancing local state before service acceptance. Reuse the existing pending commit finalize/abort plumbing while requiring recovery-tagged KeyPackages bound to the target group, DID, and device.

Constraint: PR-B3 requires owner local MLS state to finalize only after service acceptance.

Rejected: Reuse immediate group add-member for recovery | it merges the owner MLS state before service acceptance.

Confidence: high

Scope-risk: moderate

Directive: Do not route recover-member through group add-member unless prepare/finalize/abort semantics remain intact.

Tested: cargo fmt --check; cargo check --workspace; cargo test --test group_e2ee_real_mls_tests; cargo test --test group_e2ee_contract_tests

Not-tested: message-service and awiki-cli integration are owned by separate task lanes; cargo clippy --workspace --all-targets currently fails on pre-existing unrelated lint warnings outside modified files.
Downstream CLI and service lanes need stable PR-B3 recovery shapes beyond the anp-mls binary. Add Rust and Go contract models for recovery KeyPackages, recover-member payloads, and pending-commit finalize/abort requests, then cover the contract-mode terminal commands.

Constraint: Recovery remains hidden/test-only and must not imply public discovery readiness.

Rejected: Add public discovery metadata | PR-B3 explicitly keeps P6 discovery disabled.

Confidence: high

Scope-risk: narrow

Tested: cargo fmt --check; cargo check --workspace; cargo test group_e2ee --lib; cargo test --test group_e2ee_contract_tests; cargo test --test group_e2ee_real_mls_tests; go test ./group_e2ee

Not-tested: message-service and awiki-cli integration are separate team lanes.
@chgaowei chgaowei marked this pull request as ready for review May 4, 2026 11:40
@chgaowei
Copy link
Copy Markdown
Collaborator Author

chgaowei commented May 4, 2026

Withdrawn for now per maintainer request. Keeping the branch for later follow-up; not merging and not deleting the branch.

@chgaowei chgaowei closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant