Challenge flow overhaul by ilchu · Pull Request #125 · paritytech/web3-storage

ilchu · 2026-06-03T12:43:22Z

The provider was where trust was supposed to land but the chain couldn't actually punish a provider that lied — respond_to_challenge slashed only on timeout, signed commitments could be replayed forever, and several pallet placeholders made the math wrong. This PR closes all of those.

Seven independently-reviewable commits, starting from a baseline test commit so the security-critical changes land into a covered area (the pallet had zero challenge tests before this).

Pallet — security

Slash on demonstrably-false response, not just on timeout. respond_to_challenge now adjudicates each variant into Ok(()) or Err(SlashReason). Invalid Proof (chunk-Merkle or MMR mismatch), forged Deleted admin signature, and bogus Superseded claim all slash immediately. Parameter-shape errors (stale nonce, non-admin signer) still bubble as DispatchError — those are caller mistakes, not lies. New SlashReason enum carried in the ChallengeSlashed event.

Replay protection on CommitmentPayload. New nonce: u64 field; CURRENT_VERSION bumped to 0x02 so V1-shaped signatures can't verify against the V2 layout. New T::MaxNonceAge runtime constant (24h on live runtimes), recency-checked on every signature path — checkpoint, extend_checkpoint, challenge_offchain, Deleted response arm. BucketSnapshot carries commitment_nonce so late-arriving extend_checkpoint signatures verify against the same payload the original signers signed.

Three placeholder fixes. challenge_offchain now takes leaf_count: u64 from the challenger rather than reconstructing with hardcoded 0 (which only worked if the provider also signed with 0). Replica.last_sync is now Option<ReplicaSyncRecord { mmr_root, start_seq, leaf_count, block }> populated from the bucket snapshot at confirm_replica_sync time; challenge_replica uses the stored start_seq instead of 0u64. Challenge deposit is now T::ChallengeDeposit (1 token live, was 100u32 ≈ 1e-10 of a token — effectively free spam).

Pallet — observability

ChallengerStats on-chain aggregate. New storage map ChallengerStats<AccountId, ChallengerStatRecord> with ValueQuery. Bumped at four sites: create_challenge increments total_challenges; defended path increments failed_challenges; slash_provider_for_failed_challenge (used by both timeout and invalid-response paths) increments successful_challenges and adds the 10%-of-slashed-stake reward to total_earnings.

Provider node

Wire ChallengeResponder into startup. The module was defined and exported but never instantiated in command.rs — entire path was dead code in production. New --enable-challenge-responder CLI flag alongside the other coordinators.

Real poll_challenges. Was Ok(vec![]). Now resolves the provider's 32-byte account, iterates StorageProvider::Challenges via subxt, manually decodes the SCALE-encoded Vec<Challenge>, and keeps only entries matching our account (preserving the original index for ChallengeId reconstruction). Cost is bounded by ChallengeTimeout — past-deadline entries are reaped in on_finalize.

API surface. /commit, /commitment, /checkpoint-signature, and /delete now take a caller-supplied nonce (the block the caller intends to submit at) and echo it in the response. Signed payloads use the real post-mutation leaf_count (was 0).

Client SDK

get_total_challenge_earnings and get_challenge_stats are real — both read ChallengerStats from chain. commit / get_commitment / get_checkpoint_signature take an explicit nonce: u64 so callers control what block the signature is bound to.

JS demos

examples/papi/{full-flow,bucket-with-storage,checkpoint-rewards,sc-flow,sc-coverage}.js snapshot System.Number and thread the value through uploadChunk, fetchCheckpointSignature, challengeOffchain, submitClientCheckpoint. upload.leafCount and upload.nonce from the /commit response are passed to challenge_offchain.

Tests

22 new challenge tests in the pallet (up from 0) covering baseline happy/timeout/no-snapshot cases, replay-rejection for challenge_offchain and checkpoint, immediate-slash for all three invalid response variants, ChallengerStats updates on each transition, and challenge_replica using the stored start_seq. New SCALE-decoder round-trip unit test in challenge_responder.rs. Provider-node integration tests updated for the new nonce / leaf_count surface. Client integration tests no longer hardcode "earnings = 0" — they assert the real total >= successful + failed invariant.

Migration

Breaking: CommitmentPayload v1 signatures (no nonce) no longer verify on this runtime. Acceptable for a pre-launch parachain — dev-chain state is expected to break. Extrinsic signatures have new args (nonce, leaf_count); JS clients, Rust SDK callers, and benchmarks are all updated.

Until now the storage-provider pallet had zero tests covering its challenge workflow — `challenge_checkpoint`, `challenge_replica`, `respond_to_challenge`, and the on_finalize timeout-slash hook were all untested. That made the security-critical extrinsics the riskiest area to change. Add a `challenge_tests` module with 14 scenarios covering current behavior: - challenge_checkpoint creates challenge in storage / fails without snapshot / fails if provider not in primaries - respond_to_challenge with valid Proof defends and applies the response-time-based cost split - respond_to_challenge with bogus chunk or MMR proof currently errors with InvalidChallengeProof (commits to come will turn these into immediate-slash tests) - respond_to_challenge with Superseded when canonical has advanced - respond_to_challenge rejects nonexistent challenge / wrong provider / past deadline - on_finalize at deadline slashes the provider's stake to zero and pays the challenger a 10% reward - challenge_replica reads last_sync, rejects primary providers, and rejects replicas without a confirmed sync Two helpers: - `advance_to(n)` — runs both system and storage-provider hooks each block (the default `run_to_block` in mock.rs only runs system hooks) - `single_chunk_proof(data)` — builds (mmr_root, MmrProof, MerkleProof) for a single-leaf single-chunk MMR so tests can construct genuinely valid proofs without standing up real chunking infrastructure

Audit (AUDIT.md, Tier 3) flagged that the same `CommitmentPayload` signed by a provider could be replayed indefinitely — there was no nonce, timestamp, or block number bound into the signed bytes. An attacker who captured a single signature could submit `challenge_offchain` against the provider forever. Bind a `nonce: u64` field into `CommitmentPayload` (bumping `CURRENT_VERSION` from `0x01` to `0x02` so V1-shaped signatures can't verify against the V2 layout). The pallet validates the nonce via a new `T::MaxNonceAge` config constant on three sites: - `checkpoint` extrinsic — caller passes the nonce all signing providers agreed on; stored in `BucketSnapshot::commitment_nonce` so `extend_checkpoint` can verify late-arriving signatures - `challenge_offchain` extrinsic — caller passes the nonce the provider used when signing the captured commitment - `respond_to_challenge::Deleted` arm — admin signs over the new deletion commitment with a fresh nonce The pallet rejects nonces that are in the future or older than `MaxNonceAge` blocks. `provider_checkpoint` is unchanged: it already signs over `CheckpointProposal` which has its own `window` field for replay protection. Test infra: - 3 new pallet tests (`challenge_offchain_rejects_old_nonce`, `_rejects_future_nonce`, `checkpoint_rejects_old_nonce`) exercise the recency gate via the parameter check that fires before signature verification — so a dummy signature suffices. - Existing challenge_tests baseline updated for the new `BucketSnapshot` field. Plumbing through the surrounding stack: - `CommitRequest` / `CommitmentQuery` / `CommitmentResponse` / `CheckpointSignatureResponse` / `DeleteRequest` / `DeleteResponse` in provider-node carry the nonce - Provider-node API handlers thread the caller-supplied nonce into the signed `CommitmentPayload` - Rust SDK `commit` / `get_commitment` / `get_checkpoint_signature` take an explicit `nonce: u64` argument - JS demo helpers (`uploadChunk`, `fetchCheckpointSignature`, `submitClientCheckpoint`, `challengeOffchain`) snapshot the current block number from chain and pass it through - Runtime + mock configs set `MaxNonceAge` (24 h on the live runtimes, 200 blocks in tests) - Benchmarks pass `0u64` as nonce (running at low block numbers keeps them inside `MaxNonceAge`)

Before: a provider who couldn't produce a real challenge proof could submit garbage via respond_to_challenge — the extrinsic failed with `InvalidChallengeProof` but the challenge remained pending until the deadline. Slashing fired only on timeout, so a misbehaving provider got to stall instead of being penalised promptly. After: in `respond_to_challenge`, each response variant is adjudicated into one of: - Ok(()) — response defends, run the existing cost-split path - Err(SlashReason) — response is a demonstrable lie, slash now `Proof` arm slashes with `InvalidProof` if either the chunk-Merkle proof or the MMR proof fails to verify. `Deleted` arm slashes with `InvalidDeletionClaim` if the `new_start_seq` doesn't cover the challenged seq or the admin signature is forged. `Superseded` arm slashes with `InvalidSupersededClaim` if the bucket has no snapshot or the challenged seq is past the canonical end. Parameter-shape errors (stale nonce, non-admin signer) still bubble up as DispatchError — those represent caller mistakes, not adversarial responses. The provider isn't punished for typos. New primitives type `SlashReason` (Timeout | InvalidProof | InvalidDeletionClaim | InvalidSupersededClaim) is carried in the `ChallengeSlashed` event so observers can tell which failure mode fired. `on_finalize` passes `Timeout`; the new immediate-slash paths pass the matching variant. Three new tests; two existing tests renamed from `_currently_errors` to `_slashes_immediately` and rewritten to assert provider stake is zero post-response and `challenges_failed` reflects the loss. The extrinsic itself returns Ok(()) — the slash IS the valid state transition; the provider sees their tx succeed and pays the price.

Three audit-flagged correctness bugs in the challenge path, all now addressed: 1. **leaf_count=0 placeholder in challenge_offchain** The pallet rebuilt the signed CommitmentPayload with a hardcoded leaf_count=0, which only verified if the provider also signed with 0. Add a `leaf_count` parameter to the extrinsic. Provider-node /commit, /commitment, and /delete now sign with the real post-mutation leaf_count and surface it in the response so the challenger can pass it through. 2. **start_seq=0 placeholder in challenge_replica** `Replica.last_sync` was `Option<(H256, BlockNumber)>` — no way to recover the start_seq at sync time, so challenges always used 0. Replace with `Option<ReplicaSyncRecord { mmr_root, start_seq, leaf_count, block }>` populated by `confirm_replica_sync` from the bucket snapshot's actual sequence metadata. 3. **Hardcoded `100u32` challenge deposit** At 12 decimals this was ~1e-10 of a token — challenge spam was effectively free. Replace with `T::ChallengeDeposit::get()`. Live runtimes set 1 token (1e12); mocks keep 100 so existing test math doesn't shift. Test updates: - `challenge_replica_uses_last_sync` now asserts the stored start_seq (7) appears on the resulting Challenge (previously asserted 0). - `challenge_offchain_rejects_old_nonce` / `_rejects_future_nonce` updated for the new 10-arg extrinsic signature. Surface-area updates: pallet Config (`ChallengeDeposit`); mocks (pallet, drive-registry, s3-registry); both runtimes; provider-node CommitResponse adds `leaf_count`; storage_user.rs SDK CommitResponse adds `leaf_count`; JS demos thread `upload.leafCount` into `challengeOffchain`. Benchmarks updated for the extra extrinsic arg.

The Rust SDK's `get_total_challenge_earnings` and `get_challenge_stats` have been returning `Ok(0)` with comments noting "no on-chain per-challenger aggregate stored." This commit adds the aggregate so the SDK can answer truthfully in the next commit. New primitives type `ChallengerStatRecord<Balance>` (total_challenges, successful_challenges, failed_challenges, total_earnings) and new pallet storage `ChallengerStats: StorageMap<AccountId, _>` with ValueQuery (default for new challengers). Stats are bumped at four sites: - `create_challenge` — `total_challenges += 1` - Defended path in `respond_to_challenge` — `failed_challenges += 1` (the challenger lost; the provider successfully proved possession) - `slash_provider_for_failed_challenge` (used by both timeout and invalid-response paths) — `successful_challenges += 1` and the 10%-of-slashed-stake reward added to `total_earnings` Four new pallet tests cover all four state transitions.

ChallengeResponder was defined and exported from the lib but never instantiated in command.rs — the entire module was dead code in production. CI integration passes happened to work because the JS test driver orchestrates challenge responses end-to-end via HTTP proof endpoints + client SDK extrinsics, so the provider never needed to detect challenges on its own. This commit closes that gap. `poll_challenges` previously returned `Ok(vec![])` with a TODO. Now: - Resolves the provider's 32-byte account from the signer's public key (preferred) or from the SS58-encoded `provider_id` fallback - Iterates `StorageProvider::Challenges` storage via subxt dynamic queries - For each entry, manually decodes the SCALE-encoded `Vec<Challenge>` (the byte layout of `Challenge<T>` is stable for deployed runtimes) - Keeps only entries whose `provider` field matches our account, preserving the original index for `ChallengeId` reconstruction A round-trip unit test (`decode_challenge_vec_filters_by_provider`) hand-rolls the SCALE bytes for two `Challenge` entries, runs the decoder, and asserts it picks the right one with the right index. If the pallet's struct layout ever drifts from what the decoder expects, this test fires immediately. Wiring: new `ChallengeResponderParams` CLI flags (`--enable-challenge-responder`, `--challenge-poll-interval`), `start_challenge_responder` startup helper in command.rs alongside the other coordinators, instantiation in `run()`. Like the other coordinators, it requires `--keyfile` to sign and is opt-in via the enable flag. Cost is bounded by `ChallengeTimeout` — storage entries past their deadline are reaped in `on_finalize`, so iteration size is at worst the number of distinct deadlines with at least one open challenge. Also drop a redundant `storage_primitives::` qualifier on the `ChallengerStats` storage definition (the type is already imported in the prelude).

`get_total_challenge_earnings` and `get_challenge_stats` were the last of the audit's "Ok(0) with an apology comment" SDK stubs. The pallet now maintains `ChallengerStats` per account (added in the previous commit), so the SDK can answer for real. Both methods now fetch `StorageProvider::ChallengerStats` keyed by this client's challenger account. The on-chain `ChallengerStatRecord` decodes via a small `read_u128` helper that pulls each named field out of the dynamic value. ValueQuery on the pallet side means a never-bumped account decodes to a zero record — no None handling needed at the call site. The integration tests `test_get_challenge_stats` and `test_get_total_challenge_earnings` no longer assert "must be zero" — that was a hardcoded reflection of the SDK stubs. They now assert the more general invariant `total >= successful + failed` and that the earnings read succeeds without erroring.

ilchu added 7 commits June 3, 2026 21:17

ilchu self-assigned this Jun 3, 2026

ilchu requested a review from bkontur June 3, 2026 12:43

Merge branch 'dev' into ic/challenge-workflow-overhaul

a31debf

ilchu mentioned this pull request Jun 10, 2026

Comprehensive codebase audit: security, performance, code quality, and CI gaps #149

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenge flow overhaul#125

Challenge flow overhaul#125
ilchu wants to merge 8 commits into
devfrom
ic/challenge-workflow-overhaul

ilchu commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ilchu commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pallet — security

Pallet — observability

Provider node

Client SDK

JS demos

Tests

Migration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ilchu commented Jun 3, 2026 •

edited

Loading