Challenge flow overhaul#125
Open
ilchu wants to merge 8 commits into
Open
Conversation
Until now the storage-provider pallet had zero tests covering its challenge workflow — `challenge_checkpoint`, `challenge_replica`, `respond_to_challenge`, and the on_finalize timeout-slash hook were all untested. That made the security-critical extrinsics the riskiest area to change. Add a `challenge_tests` module with 14 scenarios covering current behavior: - challenge_checkpoint creates challenge in storage / fails without snapshot / fails if provider not in primaries - respond_to_challenge with valid Proof defends and applies the response-time-based cost split - respond_to_challenge with bogus chunk or MMR proof currently errors with InvalidChallengeProof (commits to come will turn these into immediate-slash tests) - respond_to_challenge with Superseded when canonical has advanced - respond_to_challenge rejects nonexistent challenge / wrong provider / past deadline - on_finalize at deadline slashes the provider's stake to zero and pays the challenger a 10% reward - challenge_replica reads last_sync, rejects primary providers, and rejects replicas without a confirmed sync Two helpers: - `advance_to(n)` — runs both system and storage-provider hooks each block (the default `run_to_block` in mock.rs only runs system hooks) - `single_chunk_proof(data)` — builds (mmr_root, MmrProof, MerkleProof) for a single-leaf single-chunk MMR so tests can construct genuinely valid proofs without standing up real chunking infrastructure
Audit (AUDIT.md, Tier 3) flagged that the same `CommitmentPayload` signed
by a provider could be replayed indefinitely — there was no nonce,
timestamp, or block number bound into the signed bytes. An attacker who
captured a single signature could submit `challenge_offchain` against
the provider forever.
Bind a `nonce: u64` field into `CommitmentPayload` (bumping
`CURRENT_VERSION` from `0x01` to `0x02` so V1-shaped signatures can't
verify against the V2 layout). The pallet validates the nonce via a new
`T::MaxNonceAge` config constant on three sites:
- `checkpoint` extrinsic — caller passes the nonce all signing
providers agreed on; stored in `BucketSnapshot::commitment_nonce`
so `extend_checkpoint` can verify late-arriving signatures
- `challenge_offchain` extrinsic — caller passes the nonce the
provider used when signing the captured commitment
- `respond_to_challenge::Deleted` arm — admin signs over the new
deletion commitment with a fresh nonce
The pallet rejects nonces that are in the future or older than
`MaxNonceAge` blocks. `provider_checkpoint` is unchanged: it already
signs over `CheckpointProposal` which has its own `window` field for
replay protection.
Test infra:
- 3 new pallet tests (`challenge_offchain_rejects_old_nonce`,
`_rejects_future_nonce`, `checkpoint_rejects_old_nonce`) exercise
the recency gate via the parameter check that fires before
signature verification — so a dummy signature suffices.
- Existing challenge_tests baseline updated for the new
`BucketSnapshot` field.
Plumbing through the surrounding stack:
- `CommitRequest` / `CommitmentQuery` / `CommitmentResponse` /
`CheckpointSignatureResponse` / `DeleteRequest` / `DeleteResponse`
in provider-node carry the nonce
- Provider-node API handlers thread the caller-supplied nonce into
the signed `CommitmentPayload`
- Rust SDK `commit` / `get_commitment` / `get_checkpoint_signature`
take an explicit `nonce: u64` argument
- JS demo helpers (`uploadChunk`, `fetchCheckpointSignature`,
`submitClientCheckpoint`, `challengeOffchain`) snapshot the current
block number from chain and pass it through
- Runtime + mock configs set `MaxNonceAge` (24 h on the live
runtimes, 200 blocks in tests)
- Benchmarks pass `0u64` as nonce (running at low block numbers
keeps them inside `MaxNonceAge`)
Before: a provider who couldn't produce a real challenge proof could submit garbage via respond_to_challenge — the extrinsic failed with `InvalidChallengeProof` but the challenge remained pending until the deadline. Slashing fired only on timeout, so a misbehaving provider got to stall instead of being penalised promptly. After: in `respond_to_challenge`, each response variant is adjudicated into one of: - Ok(()) — response defends, run the existing cost-split path - Err(SlashReason) — response is a demonstrable lie, slash now `Proof` arm slashes with `InvalidProof` if either the chunk-Merkle proof or the MMR proof fails to verify. `Deleted` arm slashes with `InvalidDeletionClaim` if the `new_start_seq` doesn't cover the challenged seq or the admin signature is forged. `Superseded` arm slashes with `InvalidSupersededClaim` if the bucket has no snapshot or the challenged seq is past the canonical end. Parameter-shape errors (stale nonce, non-admin signer) still bubble up as DispatchError — those represent caller mistakes, not adversarial responses. The provider isn't punished for typos. New primitives type `SlashReason` (Timeout | InvalidProof | InvalidDeletionClaim | InvalidSupersededClaim) is carried in the `ChallengeSlashed` event so observers can tell which failure mode fired. `on_finalize` passes `Timeout`; the new immediate-slash paths pass the matching variant. Three new tests; two existing tests renamed from `_currently_errors` to `_slashes_immediately` and rewritten to assert provider stake is zero post-response and `challenges_failed` reflects the loss. The extrinsic itself returns Ok(()) — the slash IS the valid state transition; the provider sees their tx succeed and pays the price.
Three audit-flagged correctness bugs in the challenge path, all now
addressed:
1. **leaf_count=0 placeholder in challenge_offchain**
The pallet rebuilt the signed CommitmentPayload with a hardcoded
leaf_count=0, which only verified if the provider also signed with
0. Add a `leaf_count` parameter to the extrinsic. Provider-node
/commit, /commitment, and /delete now sign with the real
post-mutation leaf_count and surface it in the response so the
challenger can pass it through.
2. **start_seq=0 placeholder in challenge_replica**
`Replica.last_sync` was `Option<(H256, BlockNumber)>` — no way to
recover the start_seq at sync time, so challenges always used 0.
Replace with `Option<ReplicaSyncRecord { mmr_root, start_seq,
leaf_count, block }>` populated by `confirm_replica_sync` from
the bucket snapshot's actual sequence metadata.
3. **Hardcoded `100u32` challenge deposit**
At 12 decimals this was ~1e-10 of a token — challenge spam was
effectively free. Replace with `T::ChallengeDeposit::get()`. Live
runtimes set 1 token (1e12); mocks keep 100 so existing test math
doesn't shift.
Test updates:
- `challenge_replica_uses_last_sync` now asserts the stored start_seq
(7) appears on the resulting Challenge (previously asserted 0).
- `challenge_offchain_rejects_old_nonce` /
`_rejects_future_nonce` updated for the new 10-arg extrinsic
signature.
Surface-area updates: pallet Config (`ChallengeDeposit`); mocks
(pallet, drive-registry, s3-registry); both runtimes; provider-node
CommitResponse adds `leaf_count`; storage_user.rs SDK
CommitResponse adds `leaf_count`; JS demos thread `upload.leafCount`
into `challengeOffchain`. Benchmarks updated for the extra extrinsic
arg.
The Rust SDK's `get_total_challenge_earnings` and `get_challenge_stats`
have been returning `Ok(0)` with comments noting "no on-chain
per-challenger aggregate stored." This commit adds the aggregate so the
SDK can answer truthfully in the next commit.
New primitives type `ChallengerStatRecord<Balance>` (total_challenges,
successful_challenges, failed_challenges, total_earnings) and new
pallet storage `ChallengerStats: StorageMap<AccountId, _>` with
ValueQuery (default for new challengers).
Stats are bumped at four sites:
- `create_challenge` — `total_challenges += 1`
- Defended path in `respond_to_challenge` — `failed_challenges += 1`
(the challenger lost; the provider successfully proved possession)
- `slash_provider_for_failed_challenge` (used by both timeout and
invalid-response paths) — `successful_challenges += 1` and the
10%-of-slashed-stake reward added to `total_earnings`
Four new pallet tests cover all four state transitions.
ChallengeResponder was defined and exported from the lib but never
instantiated in command.rs — the entire module was dead code in
production. CI integration passes happened to work because the JS test
driver orchestrates challenge responses end-to-end via HTTP proof
endpoints + client SDK extrinsics, so the provider never needed to
detect challenges on its own.
This commit closes that gap.
`poll_challenges` previously returned `Ok(vec![])` with a TODO. Now:
- Resolves the provider's 32-byte account from the signer's public
key (preferred) or from the SS58-encoded `provider_id` fallback
- Iterates `StorageProvider::Challenges` storage via subxt dynamic
queries
- For each entry, manually decodes the SCALE-encoded `Vec<Challenge>`
(the byte layout of `Challenge<T>` is stable for deployed runtimes)
- Keeps only entries whose `provider` field matches our account,
preserving the original index for `ChallengeId` reconstruction
A round-trip unit test (`decode_challenge_vec_filters_by_provider`)
hand-rolls the SCALE bytes for two `Challenge` entries, runs the
decoder, and asserts it picks the right one with the right index. If
the pallet's struct layout ever drifts from what the decoder expects,
this test fires immediately.
Wiring: new `ChallengeResponderParams` CLI flags
(`--enable-challenge-responder`, `--challenge-poll-interval`),
`start_challenge_responder` startup helper in command.rs alongside the
other coordinators, instantiation in `run()`. Like the other
coordinators, it requires `--keyfile` to sign and is opt-in via the
enable flag.
Cost is bounded by `ChallengeTimeout` — storage entries past their
deadline are reaped in `on_finalize`, so iteration size is at worst
the number of distinct deadlines with at least one open challenge.
Also drop a redundant `storage_primitives::` qualifier on the
`ChallengerStats` storage definition (the type is already imported in
the prelude).
`get_total_challenge_earnings` and `get_challenge_stats` were the last of the audit's "Ok(0) with an apology comment" SDK stubs. The pallet now maintains `ChallengerStats` per account (added in the previous commit), so the SDK can answer for real. Both methods now fetch `StorageProvider::ChallengerStats` keyed by this client's challenger account. The on-chain `ChallengerStatRecord` decodes via a small `read_u128` helper that pulls each named field out of the dynamic value. ValueQuery on the pallet side means a never-bumped account decodes to a zero record — no None handling needed at the call site. The integration tests `test_get_challenge_stats` and `test_get_total_challenge_earnings` no longer assert "must be zero" — that was a hardcoded reflection of the SDK stubs. They now assert the more general invariant `total >= successful + failed` and that the earnings read succeeds without erroring.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The provider was where trust was supposed to land but the chain couldn't actually punish a provider that lied —
respond_to_challengeslashed only on timeout, signed commitments could be replayed forever, and several pallet placeholders made the math wrong. This PR closes all of those.Seven independently-reviewable commits, starting from a baseline test commit so the security-critical changes land into a covered area (the pallet had zero challenge tests before this).
Pallet — security
Slash on demonstrably-false response, not just on timeout.
respond_to_challengenow adjudicates each variant intoOk(())orErr(SlashReason). InvalidProof(chunk-Merkle or MMR mismatch), forgedDeletedadmin signature, and bogusSupersededclaim all slash immediately. Parameter-shape errors (stale nonce, non-admin signer) still bubble asDispatchError— those are caller mistakes, not lies. NewSlashReasonenum carried in theChallengeSlashedevent.Replay protection on
CommitmentPayload. Newnonce: u64field;CURRENT_VERSIONbumped to0x02so V1-shaped signatures can't verify against the V2 layout. NewT::MaxNonceAgeruntime constant (24h on live runtimes), recency-checked on every signature path —checkpoint,extend_checkpoint,challenge_offchain,Deletedresponse arm.BucketSnapshotcarriescommitment_nonceso late-arrivingextend_checkpointsignatures verify against the same payload the original signers signed.Three placeholder fixes.
challenge_offchainnow takesleaf_count: u64from the challenger rather than reconstructing with hardcoded0(which only worked if the provider also signed with 0).Replica.last_syncis nowOption<ReplicaSyncRecord { mmr_root, start_seq, leaf_count, block }>populated from the bucket snapshot atconfirm_replica_synctime;challenge_replicauses the storedstart_seqinstead of0u64. Challenge deposit is nowT::ChallengeDeposit(1 token live, was100u32≈ 1e-10 of a token — effectively free spam).Pallet — observability
ChallengerStatson-chain aggregate. New storage mapChallengerStats<AccountId, ChallengerStatRecord>withValueQuery. Bumped at four sites:create_challengeincrementstotal_challenges; defended path incrementsfailed_challenges;slash_provider_for_failed_challenge(used by both timeout and invalid-response paths) incrementssuccessful_challengesand adds the 10%-of-slashed-stake reward tototal_earnings.Provider node
Wire
ChallengeResponderinto startup. The module was defined and exported but never instantiated incommand.rs— entire path was dead code in production. New--enable-challenge-responderCLI flag alongside the other coordinators.Real
poll_challenges. WasOk(vec![]). Now resolves the provider's 32-byte account, iteratesStorageProvider::Challengesvia subxt, manually decodes the SCALE-encodedVec<Challenge>, and keeps only entries matching our account (preserving the original index forChallengeIdreconstruction). Cost is bounded byChallengeTimeout— past-deadline entries are reaped inon_finalize.API surface.
/commit,/commitment,/checkpoint-signature, and/deletenow take a caller-suppliednonce(the block the caller intends to submit at) and echo it in the response. Signed payloads use the real post-mutationleaf_count(was 0).Client SDK
get_total_challenge_earningsandget_challenge_statsare real — both readChallengerStatsfrom chain.commit/get_commitment/get_checkpoint_signaturetake an explicitnonce: u64so callers control what block the signature is bound to.JS demos
examples/papi/{full-flow,bucket-with-storage,checkpoint-rewards,sc-flow,sc-coverage}.jssnapshotSystem.Numberand thread the value throughuploadChunk,fetchCheckpointSignature,challengeOffchain,submitClientCheckpoint.upload.leafCountandupload.noncefrom the/commitresponse are passed tochallenge_offchain.Tests
22 new challenge tests in the pallet (up from 0) covering baseline happy/timeout/no-snapshot cases, replay-rejection for
challenge_offchainandcheckpoint, immediate-slash for all three invalid response variants,ChallengerStatsupdates on each transition, andchallenge_replicausing the storedstart_seq. New SCALE-decoder round-trip unit test inchallenge_responder.rs. Provider-node integration tests updated for the new nonce / leaf_count surface. Client integration tests no longer hardcode "earnings = 0" — they assert the realtotal >= successful + failedinvariant.Migration
Breaking:
CommitmentPayloadv1 signatures (no nonce) no longer verify on this runtime. Acceptable for a pre-launch parachain — dev-chain state is expected to break. Extrinsic signatures have new args (nonce,leaf_count); JS clients, Rust SDK callers, and benchmarks are all updated.