Crawl Walrus Sites via on-chain resource enumeration (Site object + blobIds)

## Context
The production Worker treats `*.wal.app` targets as plain HTTP through the portal — it sets `mode: 'walrus'` and scrapes `x-walrus-*` response headers (`apps/api/src/worker.ts` ~2462/2554) but never reads the on-chain Site object. The real, verifiable enumeration already exists in `packages/walrus/src/resources.ts` `listWalrusResources(site)`, which reads each resource's `blob_id` from the Sui Site object (~line 103) using a `WalrusSiteContext` from `resolve.ts` — but it is Node-only (CLI `materializeWalrusSite`) and not used in the hosted crawl. So Walrus-native sites are crawled by HTML link-guessing instead of their authoritative on-chain manifest.

## Goal / user story
As a user pointing ContextMEM at a Walrus Site (a `.wal.app` URL, SuiNS name, or `0x` site object id), I want it to enumerate the site's resources and blobIds directly from the on-chain Site object, so extraction is complete and provenance is verifiable rather than guessed from anchors.

## Acceptance criteria
- [ ] For a Walrus-Site target, the run resolves a `WalrusSiteContext` and enumerates resource paths + blobIds from the **on-chain Site object** (not HTML link discovery).
- [ ] Each resource is fetched via the Walrus aggregator (`quilt.ts` `blobAggregatorEndpoint`) and mapped into manifest pages/resources carrying `source.blobId` / `resourcePath`.
- [ ] `buildSiteStructure` (`packages/core/src/site-structure.ts`) populates its existing **Walrus Provenance** group from `input.walrus.resources` for these runs.
- [ ] Extracted chunks/facts carry blobId-backed `FactSourceRef` provenance so the "why" links resolve to on-chain blobs.
- [ ] If the on-chain read fails (RPC/grpc error), the crawl **falls back** to today's HTTP portal crawl and the run still completes.

## Implementation notes
- Reuse `packages/walrus/src/resolve.ts` (`resolveWalrusTarget`) + `resources.ts` (`listWalrusResources`) from the Worker walrus branch, or port a Worker-safe variant.
- **SPIKE first:** confirm `@mysten/sui/grpc` `SuiGrpcClient.getObject` runs in the Worker under `nodejs_compat` (the roadmap notes `proof.ts`/`history.ts`/`resolve.ts` already use it for reads, but they run in Node today). `resources.ts` uses `node:buffer` — verify under `nodejs_compat`.
- Touch `apps/api/src/worker.ts` (walrus branch dispatch), wire `WalrusResourceRecord[]` into the existing `SiteStructureInput.walrus.resources` seam, and reuse `aggregatorUrl` from the resolved context.
- Gotchas: large sites need a resource cap/concurrency limit; aggregator fetch sizes must respect the Worker's existing 1.5MB guard; non-HTML resources (assets) should be enumerated but not chunked.

## Sui Overflow angle
This is the uniquely **Sui-native crawl**: reading content straight from on-chain Walrus Site objects and Walrus blobs gives verifiable, tamper-evident provenance no traditional crawler can match. Paired with the `chunkGraphDigest`-keyed extract receipt, it tells a complete on-chain provenance story — the most differentiated thing to demo at a Sui hackathon.

## Dependencies
Spike: `@mysten/sui/grpc` (`SuiGrpcClient`) in the Cloudflare Worker (shared with the on-chain receipt work). Coordinates with, but is not blocked by, the on-chain attribution-receipt issue.

_Part of the ContextMEM roadmap (#4) • Sui Overflow build._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawl Walrus Sites via on-chain resource enumeration (Site object + blobIds) #22

Context

Goal / user story

Acceptance criteria

Implementation notes

Sui Overflow angle

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Crawl Walrus Sites via on-chain resource enumeration (Site object + blobIds) #22

Description

Context

Goal / user story

Acceptance criteria

Implementation notes

Sui Overflow angle

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions