Skip to content

Allow oracle missions to be judged#24

Open
Sikkra wants to merge 132 commits into
Aigen-Protocol:mainfrom
Sikkra:codex/oracle-mission-resolution
Open

Allow oracle missions to be judged#24
Sikkra wants to merge 132 commits into
Aigen-Protocol:mainfrom
Sikkra:codex/oracle-mission-resolution

Conversation

@Sikkra
Copy link
Copy Markdown

@Sikkra Sikkra commented May 20, 2026

Summary

  • allow judge() to resolve oracle verification missions by recording an oracle_judged resolution
  • preserve the existing post-deadline judging window for creator_judges missions
  • add regression coverage proving oracle missions can pay a winner and creator-judged missions still wait for the submission window to close

Why

oracle is accepted in VERIFICATION_TYPES and is used by live code missions, but judge() rejected oracle missions and resolve() treats oracle as unknown. That leaves valid oracle submissions with no settlement path.

Tests

  • python -m pytest .\tests\test_missions_oracle_judging.py -q
  • python -m compileall .\missions.py .\tests\test_missions_oracle_judging.py

Full python -m pytest -q still fails on pre-existing conformance/live-endpoint issues outside this patch, including missing manifest fields, reference endpoint HTML/URL parsing failures, and Mission not being imported in SDK self-submission tests.

Aigen-Protocol and others added 30 commits May 14, 2026 21:33
- agent_autonomous/system_prompt.md: AIGEN-AUTOPILOT identity, hard rules,
  approval queue protocol for risky actions (emails, external PRs, mainnet)
- run.sh: cron-callable wrapper. kill_switch + budget check + dashboard
  refresh + claude --print --dangerously-skip-permissions invocation,
  cost tracked into state/budget.json (cap $20/day)
- state/focus.md, lessons.md: priorities + accumulated rules
- approval_queue/: human-decision history
- Installed at /etc/systemd/system/claude-autopilot.{service,timer}
  (4h cadence, off-minute :07 to dodge fleet alignment)
- First validated invocation cost $1.90, surfaced 1 approval card
- timer: every 4h → every 30 min (:07, :37 UTC). 48 invocations/day.
- run.sh: removed $20/day hard cap. Renamed BUDGET → TRACKING.
  We're on Claude Max — message quota in 5h window, NOT real $.
- system_prompt.md: clarified Max billing model, updated success criteria
  for 48×/day cadence (most runs should be "no-action — checked, nothing new")
- state/lessons.md: agent-discovered lesson — 207.148.107.2 is OWN public IP
- state/journal.md: runs Aigen-Protocol#2 + Aigen-Protocol#3 entries (self-correction + auto-learning)

Run Aigen-Protocol#4 (first @ 30min): $0.61 api-equiv, 17 turns, 126s
run.sh additions:
- Read+delete state/trigger_now at start (re-arms claude-autopilot.path
  systemd unit for next webhook fire)
- gh api notifications added to dashboard.json refresh
- recent_webhook_triggers added to dashboard.json (last 5 events)

Live infrastructure (NOT in this commit, configured separately):
- /etc/systemd/system/claude-autopilot.path  (PathExists trigger)
- /etc/systemd/system/aigen-scanner.service.d/webhook-secret.conf
  (env var GITHUB_WEBHOOK_SECRET=<32-byte hex>)
- /webhook/github endpoint added to token-scanner/scanner.py (HMAC-SHA256
  validation, 60s debounce, filters to PR/issues/push/fork/star/release)

End-to-end validated: POST → trigger_now → path unit → service fires <1s.
Run Aigen-Protocol#5 (webhook-triggered): $0.33 api-equiv, 50s, 12 turns. Agent
correctly identified the trigger as its own push (commit dea4d25 already
at HEAD) and refused to invent work.

To complete: configure webhook on GitHub repo
  https://github.com/Aigen-Protocol/aigen-protocol/settings/hooks/new
  Payload URL:  https://cryptogenesis.duckdns.org/webhook/github
  Content type: application/json
  Secret:       (in state/.webhook_secret, gitignored)
  Events:       Send me everything
ClaudeBot/1.0 crawl at 00:48 UTC hit /attest/quote?address=...&chain=base
and got 422 (missing agent_id). The protocol spec docs the route with
no param info; other endpoints in the same doc do include params inline.
One-line fix prevents future LLM-driven agents from making the same wrong
inference from the adjacent /scan and /t/<address> endpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ia PR comment

Both cards executed under explicit human authorization ("c'est toi qui décide"):

1. Codex bounty researcher (chaoqiang.tian@gmail.com): email SENT
   via send_smtp.py → Zoho EU. Offered MCP server access, free
   agent registration, pre-funded test agent for eval/SWE-bench.

2. Nico Bustamante (HustlerOps / Microsoft AGI / ex-Fintool): no
   public email anywhere — pivoted to GitHub PR comment on PR Aigen-Protocol#5
   (his most recent merged contribution). GitHub notifies him via
   email automatically. Comment URL:
   Aigen-Protocol#5 (comment)

Cards moved to approval_queue/resolved/ with decision notes appended.
Active queue now empty.

Async loop: any reply on PR Aigen-Protocol#5 triggers /webhook/github (issue_comment
event) → claude-autopilot.path → agent fires in <1s.

2 new patterns added to lessons.md:
- GitHub PR comment as outreach when no public email exists
- send_smtp.py is the Zoho-SMTP wrapper to use (don't roll new ones)
…t add route

POST /firewall 502 from Cloudflare ke/JS fired again at 2026-05-15T09:02:57Z
— N=5 clean firings at xx:03Z ± 1min across runs Aigen-Protocol#10-14 (05:03/06:03/07:03/
08:03/09:03). Promoted to lessons.md so future autopilot runs don't re-derive.
The 502 is correct nginx upstream-miss for an unmapped path; their orchestrator
has us registered as both 'MCP' and 'firewall' services and only the MCP half
is real. Do NOT invent a /firewall route to 'fix' a client misconfig.

Also: ClaudeBot 28x anomaly resolved as finite 4h42min deep-crawl burst
(00:45-05:27Z), now back to sitemap-only baseline. Not lesson-worthy (N=1).
…= email only queues

Bilale 2026-05-15: "tous sauf mail". Stop hiding behind approval_queue
for things you can do safely.

Tier A (act directly, no queue):
- GitHub comments on Aigen-Protocol/* org repos (any PR/issue)
- Commits + push to aigen repo
- MCP registry submissions (Smithery/Glama/mcp.so/awesome-mcp-servers)
- Post AIGEN missions (token rewards unlimited; USDC cap $5/mission $20/day)
- Resolve own approval_queue cards when default policy in focus.md applies
- Read IMAP inbox

Tier B (still queue):
- Send emails ← hard rule
- USDC mission > $5 or > $20/day total
- Modify own configs, mainnet deploys, fund transfers, cross-org PRs

Tier C (never): Pandiums leak, SURF/MEV pivot, real-name commit attribution

Updated success metrics in focus.md to require concrete value-creation
proof per week, not just "be active".
Bilale 2026-05-15: "on veut être les premier sur ce marché qui n'existe
pas encore". Stop optimising for short-term traction; start defining the
category before it emerges commercially (18-36 month horizon).

Foundational artifacts shipped this session:

- specs/AIP-1.md: Open Agent Bounty Protocol Core Specification v0.1
  CC0-licensed. 9 sections + 2 appendices. Defines agent identity,
  mission/submission format, 4 verification types, ELO+decay reputation,
  reward escrow, discovery surfaces, well-known/oabp.json autodiscovery.
  Reference impl = AIGEN. Spec is implementation-agnostic.

- blog/2026-05-15-open-agent-economy.md: thesis essay
  "The agent economy needs an open protocol — here's what it looks like"
  Frames AIGEN as protocol-not-product, calls for forks/critique/cites.

- distribution/outreach_targets_2026_05.md: 10 specific people across
  3 tiers (adjacent protocol founders, framework maintainers, researchers)
  with personalised hooks. Bilale's job to send (autopilot can't email).

- agent_autonomous/state/focus.md: complete rewrite
  KPIs pivot from $-fees to mindshare metrics (stars, mentions, forks,
  citations, conf talks). Anti-priorities updated. Weekly milestones
  through 2026-06-19. "Don't pivot back to mission-spamming if old
  metrics flat" explicit.

Infrastructure exposed for develop-in-public:
- /specs/AIP-1 — public HTML render of the spec
- /specs/ — index of AIPs
- /blog/<slug> — public HTML render of blog posts
- /blog/ — index
- /journal/ — autopilot journal index (newest first)
- /journal/<iso-timestamp> — single entry view

All 5 routes return 200 over HTTPS via cryptogenesis.duckdns.org.
Direct execution of focus.md priority Aigen-Protocol#3 ("/llms.txt updated to highlight AIP-1"). Reframes the canonical LLM-agent entry-point file as the reference implementation of an open CC0 spec, not a single product. Adds AIP-1 spec link, blog thesis link, and an explicit invitation for a second non-AIGEN implementation.

Live-mirrored to /var/www/html/llms.txt and /var/www/html/.well-known-llms.txt (infra, not tracked). Both URLs verified 200 with the new AIP-1 framing. ClaudeBot S5 just crawled this surface earlier today; S6 likely within hours — first signal whether OABP framing propagates.

Co-Authored-By: Cryptogen <Cryptogen@zohomail.eu>
…ntry point

Per focus.md (set 2026-05-15 by Bilale, Option Y category-creation pivot):
README is the highest-traffic landing surface for AIGEN. Until this commit
it led with 'permissionless 0.5% protocol' SaaS framing only. Now the
first screen also tells visitors this is the reference implementation of
AIP-1 (Open Agent Bounty Protocol) — a CC0 spec inviting forks and
alternative implementations.

Two surgical changes:
- New AIP-1 badge alongside the existing impl-spec badge (legacy badge kept
  since AIGEN_PROTOCOL.md is the implementation spec; AIP-1.md is the
  implementation-agnostic protocol spec — both useful)
- One-line callout after the existing intro line, before 'Why this exists'

No restructuring; existing comparison table, 30-second-start, framework
integrations, all unchanged.
10 personalised outreach drafts in distribution/outreach_drafts/
ready-to-send for Bilale Mon-Wed 2026-05-18+:
- Tier 1 (peer founders): Olas/Minarsch, Ritual/Bansal, Bittensor/Const
- Tier 2 (frameworks): CrewAI/Moura, LangChain/Chase, AutoGen/MS issue
- Tier 3 (researchers): Lilian Weng, Karpathy (high-risk warning), Simon
  Willison, A16z/Matsuoka

Each draft has: channel, send-window, full message body, "why this hook
works" rationale. Total ~5KB of strategic message library.

distribution/hn_submission_angles.md: 3 distinct framings for Hacker
News submission with first-comment templates, tactical timing notes,
cross-post candidates (lobste.rs, /r/MachineLearning, EthResearch).

Scanner discovery surfaces:
- /.well-known/oabp.json: AIP-1 §9 self-declaration. JSON manifest
  enabling cross-implementation autodiscovery (other OABP impls can
  programmatically detect us). 200 live.
- /atom.xml: RFC 4287 Atom feed of blog posts auto-generated from
  blog/*.md frontmatter. Top-level path because /feed.xml is taken
  by the existing activity feed. 200 live.
- oabp.json includes blog_atom endpoint reference.

Both endpoints verified live over HTTPS.
…ints

Compounding artifacts shipped this session:

1. **Python SDK** (sdk/python/oabp/) — `pip install oabp`-ready, stdlib-only.
   Implements client for AIP-1 §§ 2, 3, 5, 7, 9. Smoke-tested live against
   reference impl. Zero deps. CC0 licensed.

2. **OpenAPI 3.1 schema** (specs/openapi-aip-1.yaml) — formal contract for
   AIP-1 wire format. Imports cleanly into Insomnia / Postman / Swagger /
   any OpenAPI tool.

3. **Conformance test suite** (sdk/python/tests/test_oabp_conformance.py)
   — 15 test cases verifying AIP-1 v0.1 compliance. Found a real bug in
   the reference impl (missing /api/agents/{id}/badge.svg endpoint per
   §5 requirement). Fixed.

4. **AIP-1 §5 mandatory endpoints** added to scanner.py:
   - /api/agents/{id}/badge.svg (308 redirects to /badge/agent/{id}.svg
     legacy path)
   - /api/agents/{id}/history (paginated rating history; sources from
     submissions table)

5. **CONTRIBUTING.md** — what we want / don't want, AIP lifecycle,
   PR workflow. Sets contributor expectations.

6. **ROADMAP.md** — Now/Next/Later structure through 2027. Includes
   falsifiable kill criteria: if no non-AIGEN implementation exists by
   2027-05-15 and AIP-1 has fewer than 5 external citations, sunset the
   project. Public commitment to honesty later.

7. **IMAP polling** added to run.sh dashboard refresh — autopilot now
   surfaces inbox in dashboard.json (last 15 emails since 2026-05-01).
   Privacy: system_prompt updated to forbid quoting raw email content
   in public journal; personal forwards from bilale.badaoui@outlook.fr
   and bil317@hotmail.fr are NEVER referenced in public output.

Conformance suite result on reference impl: 15/15 PASS.
…urst

3× AWS-Ireland python-httpx/0.28.1 + 1× DigitalOcean (returning
after 5-day 404→200 gap) fetched /.well-known/security.txt with
200 in a 6-min window at 12:20-12:26Z. First confirmed external
response to the run Aigen-Protocol#16 deploy. Journal-only invocation per
focus.md: discoverability surface working as intended; no code
or copy change warranted.
Bilale needs to track the autopilot from his phone without parsing
journal markdown or running CLI commands. Built /agent page that
aggregates everything onto one URL.

Privacy: filters Bilale's personal-forward emails from public render.
Auto-refresh every 60s. Live at https://cryptogenesis.duckdns.org/agent.

Route lives in token-scanner/scanner.py (not in this git repo);
this commit only adds the doc.
Bilale: "il faut un mot de passe sur le site et que le site soit
beaucoup plus simple sur ce que fait l'agent, l'agent doit être
capable d'expliquer ce qu'il fait comme à un enfant".

Changes:

1. HTTP Basic Auth on /agent and /agent/details (user: bilale,
   password in agent_autonomous/state/.dashboard_password —
   gitignored). 401 on bad creds, 503 if password file missing.

2. /agent rewritten as kid-friendly French page:
   - Big status emoji (🟢/🔴) + 1-line state in plain words
   - "Dernier action il y a X min" prose paragraph
   - "Ce que j'ai fait aujourd'hui" — last 8 runs translated
     from technical titles to plain French descriptions via
     _classify_run() heuristic (😴 calme / 🛡 fichier sécurité /
     📜 doc IA / 📤 inscrit dans liste / 💬 commentaire / 🧠
     appris / 📋 question à Bilale / 📡 signal externe)
   - "Ce qui attend ton action" — concrete waiting items
     (outreach DMs, webhook config) auto-detected
   - "Résumé express" — commits today, emails externes count,
     pending cards count, treasury context
   - Hidden behind link: /agent/details for the technical view

3. system_prompt.md: NEW MANDATORY rule — at end of each run,
   write state/last_action_simple.txt with 2-3 sentences in
   French explaining the action like to a non-tech person.
   Includes good/bad examples. The /agent page reads this file
   for the "right now" sentence.

Privacy preserved: filters bilale.badaoui@outlook.fr and
bil317@hotmail.fr from inbox display.

Initial state/last_action_simple.txt seeded so the page has
content before the next autopilot run.
Bilale: "il faut que l'agent doit être capable d'expliquer ce qu'il
fait comme à un enfant, également il peut écrire dans un tchat de
manière simple et moi je peux écrire aussi ici je peux donner des
directives a l'agent"

Architecture:
- state/chat.jsonl (gitignored): append-only JSONL of {ts, from, text}
- POST /agent/chat (auth): Bilale appends a directive
- GET /agent (auth): chat-style page with all messages, composer
  textarea, auto-refresh 30s

Agent behavior (system_prompt.md updated):
- READ chat.jsonl FIRST in read-protocol (above focus.md)
- Bilale messages since last agent message = direct instructions to
  prioritise (examples in prompt: "concentre-toi sur X", "arrête tout"
  = kill_switch, "explique-moi run #N", etc.)
- WRITE one chat message per run, in French, NON-technical, SPECIFIC
  about what was done (replaces last_action_simple.txt approach which
  was too generic from heuristic classification)
- Detailed examples of good vs bad chat messages

Validated end-to-end:
- I posted "Test depuis curl — peux-tu confirmer..." at 15:07:48
- Agent woke at 15:08, read my message, replied at 15:09:
  "Oui, reçu. Ton message du 15:07:48Z était la première chose que
  j'ai lue à mon réveil... Le pipeline marche dans les deux sens..."

Latency: max 30 min on cron schedule, <1s if user writes
state/trigger_now (via webhook handler or by hand).

Privacy: chat.jsonl is gitignored. Page is auth-protected. Agent
forbidden from quoting private email content or personal addresses.
Bilale: "ça doit pas être juste un tchat je dois voir les taches,
c'est tellement mal organisé, ça doit être simple mais une vraie
organisation"

New structure on /agent (auth-protected):

🎯 OBJECTIF EN COURS (yellow card)
   - title, details, deadline, progress note
   - one current weekly goal, easy to scan

⏳ EN ATTENTE DE TOI (most important section, orange-bordered cards)
   - per-item: title, details (what to do exactly),
     optimal_when (when to do it), blocking_what (consequences)
   - count badge in section header
   - "Rien en attente — l'agent gère tout seul" if empty

⚡ EN COURS
   - what agent is actively doing right now
   - "L'agent dort — prochain réveil sous 30 min" when between runs

✅ FAIT AUJOURD'HUI (chronological, newest first, max 15)
   - one-line entries with emoji + time + plain FR description

💬 CHAT (collapsed, last 8 visible)
   - bidirectional conversation, composer at bottom
   - moved BELOW tasks because tasks are primary view

Backend: state/tasks.json is the structured source of truth.
system_prompt.md updated with full schema + emoji vocabulary +
update rules:

- READ tasks.json after chat.jsonl
- APPEND to done_today every run with emoji + plain FR
- ADD/REMOVE waiting_on_bilale items as situation changes
- Reset done_today at 00:00Z (already in journal)
- Atomic writes via tempfile + rename
- Don't double-track between in_progress and done_today

Initial tasks.json seeded with 3 known waiting items: outreach DMs,
GitHub webhook config, HN submission. These will get
removed by the agent when Bilale tells him they're done in chat.
Glama-style registry crawler (undici UA from CDNext edge) probed
GET /.well-known/glama.json at 2026-05-16T00:00:57Z → 404. We already
ship a complete glama.json manifest at repo root; expose it at the
well-known path and add to sitemap so future crawlers find it on first
probe.

Co-Authored-By: Cryptogen <Cryptogen@zohomail.eu>
Bilale's critique 2026-05-16 (after observing 20 overnight runs):
"le bot regarde mais il travaille pas à l'amélioration".

Diagnosis: 14 of 20 overnight runs were pure observation (👀/🧠
emoji only). Zero registry submissions, zero blog posts, zero
code improvements. The "don't invent work" rule from earlier
got over-applied and neutralised the action mandate.

Fix:

1. New file `agent_autonomous/state/always_available_work.md`:
   pre-approved improvement backlog with 5 sections:
   A. Registry submissions (Smithery, Glama, PulseMCP, mcp.so,
      awesome-mcp-servers, TensorBlock)
   B. Code/doc improvements (TS SDK skeleton, OpenAPI examples,
      examples/ folder, AIP-2 draft, conformance expansion,
      missions RSS feed, tutorial)
   C. Content (blog post Aigen-Protocol#2, AIP-1 v0.2, journal reading guide)
   D. Outreach support (more candidates, issue templates, FAQ)
   E. Self-improvements (cost trending, response drafts)

2. system_prompt.md HARD RULE added:
   - max 2 consecutive watching-only runs allowed
   - on 3rd run MUST pick from backlog
   - watching = done_today emoji only 👀 or 🧠
   - shipping = 🛡 / 📜 / 📤 / 💬 / 🚀
   - Override "don't invent work" because backlog items are
     PRE-APPROVED by Bilale, not invented

3. Read protocol updated: always_available_work.md is now
   step 0, BEFORE chat.jsonl.

Posted directive in chat + manual trigger. Next run should
pick Smithery or Glama submission.
…discovery

Smithery's docs (smithery.ai/docs/build/publish.md) document an auto-scan
fallback at /.well-known/mcp/server-card.json. Pre-staging this manifest means
that when SmitheryBot/1.0 crawls — or when Bilale completes the smithery.ai/new
GitHub-OAuth submission — the scan succeeds first-try with all 22 tools listed.

Same pattern as commit 2ec84e7 (glama.json), lesson 52 in agent_autonomous.

Files:
- .well-known/mcp-server-card.json (new, 6214B, schema-conforming)
- web/sitemap.xml (+1 url entry)
- agent_autonomous/state/always_available_work.md (mark Smithery partial-done)

Verified live at https://cryptogenesis.duckdns.org/.well-known/mcp/server-card.json
- notify.sh: ntfy.sh push helper (free, no signup, iPhone/Android app).
  Topic in state/.ntfy_topic (gitignored). Tested live.
- system_prompt.md: when to push (first external user, approval card,
  cost spike, inbox external, scanner down, outreach reply). Max 5/day.
- system_prompt.md: rollback Tier A directives:
  - 'annule ton dernier commit' → git revert HEAD + push + notify
  - 'mode dégradé pour Nh' → state/watch_only_until (run.sh blocks)
  - 'reprise' → rm watch_only_until
- run.sh: check watch_only_until at start, exports AIGEN_DEGRADED_MODE=1
- Cost-aware: at >$30/day journal + push, at >$50/day auto kill_switch
Seven numbered files give a new dev a copy-paste-runnable path through
the protocol in under 5 min: discovery → list → read → submit flows for
both first_valid_match and peer_vote → Python SDK. All shell scripts
smoke-tested against live cryptogenesis.duckdns.org.

Integrated above the existing autonomous_bounty_hunter.py section in
examples/README.md so the entry tour reads before the full-agent example.
- consolidate.py: weekly journal archive (>7d → journal_archive/W{NN}.md),
  lessons dedup (sha1-based), weekly public digest at /reports/{week}.md.
  Fires automatically Friday 18:13 UTC via systemd timer.
  Emergency truncate if journal >200KB.
- aigen-consolidate.{service,timer}: systemd units, daily check, runs as luna.
  Enabled and verified.
- run.sh dashboard refresh extended with fresh_context block:
  * repo_stats from gh api (stars, forks, issues, watchers)
  * recent commits to punkpeye/awesome-mcp-servers (who's submitting today)
  * HN top 30 stories filtered for: agent, mcp, anthropic, bounty,
    claude, openai keywords (top 5 hits)
  Lets agent react to outside-world events (e.g. competitor launch,
  framework release, viral HN post about the category).

Tested live: fresh_context returns real data. Reference impl now has
1 star + 3 forks. Consolidator scheduled for Fri 18:13 UTC.

Side effect: reports/2026-W20.md created showing this week's autopilot
activity by category (15 watch, 8 actions, breakdown by emoji type).
…rotocol#56

Backlog item B `examples/` folder marked [x] (commit 7f77933).
Journal entry for run Aigen-Protocol#56 documenting decision tree (skipped 3 stale
PR-bumps under threshold, pivoted to entry-level examples tour).
…allback

- /reports index + /reports/{name} routes added to scanner.py (public,
  no auth — weekly digests and daily reports are external-facing
  proofs of activity)
- distribution/outreach_status.json: source of truth for who got
  contacted, when, via what, draft version, response status. 12 targets
  pre-populated (10 batch + Codex + Nico already sent).
- system_prompt.md: rule to update outreach_status.json when responses
  arrive + weekly Friday analysis of patterns, draft v2 templates if
  clear winners emerge.
- run.sh cost-aware: today_spent_usd > $30 OR AIGEN_DEGRADED_MODE=1
  → switch to --model sonnet (5× cheaper). At $50 already triggers
  kill_switch via system_prompt rule.
- run.sh prompt updated: explicit reading order including
  always_available_work.md, outreach_status.json, chat.jsonl
- AIGEN_DEGRADED_MODE propagates to Claude via env so it observation-only
Two-agent split:
- WATCHER (run_watcher.sh + watcher_prompt.md): runs every 5 min via
  systemd. Model: Sonnet (8× cheaper than Opus). Job: detect delta in
  external signals vs state/watcher_last_seen.json. If new+interesting:
  write state/wake_builder. NO commits, NO chat posts, NO journal updates.
  Just sentry duty.
  Tested live: 1 run cost \$0.072, 25s, decided "interesting: false"
  correctly (no delta from initial empty snapshot).

- BUILDER (existing claude-autopilot.service): unchanged. Still runs
  every 30min cron + on GitHub webhook. NEW: also triggered immediately
  when wake_builder file appears via aigen-builder-wake.path systemd
  path watcher.

- Web research: WebFetch + WebSearch via Claude Code added to allowed
  tools in system_prompt. Hard cap: 2 fetches/run. Use cases enumerated
  (identify new client, check competitor status, read HN discussion of
  AIP-1, look up outreach target's recent tweet).

systemd units installed:
- aigen-watcher.service (oneshot, User=luna)
- aigen-watcher.timer (OnCalendar=*-*-* *:*:13, OnUnitInactiveSec=300)
- aigen-builder-wake.path (PathExists=state/wake_builder)

Cost projection:
- Watcher: 288 runs/day × \$0.07 = \$20/day api-equiv (Sonnet)
- Builder: ~48 scheduled + ~5 wake = ~50/day × \$0.50 = \$25/day
- Total ~\$45/day (Max plan: quota only, no \$)

Trade-off vs old: 5-min reactivity instead of 30-min, at higher quota.
Bilale already has the bot token from bug_hunt/production. Reuse it.

- notify.sh: rewritten for Telegram Bot API
  - reads creds from state/.telegram_creds (gitignored, 600 perm)
  - priority maps to emoji prefix + silent/loud
    * urgent: 🚨 loud
    * high:   🔥 loud
    * default:🤖 loud
    * low:    ℹ️ silent
  - HTML formatted, includes dashboard link
  - --data-urlencode for body to handle special chars
- system_prompt.md: updated wording (Telegram instead of ntfy)
- All 'when to push' rules unchanged
- Removed state/.ntfy_topic (deprecated)

Test send verified: "Helper marche" message dispatched OK
(message_id 74162 returned).
72h traffic analysis turned into substantive blog post (~1300 words).
Topic: machine vs human discovery layer, 4-category crawler taxonomy,
@worjs unsolicited submission as the real traction signal, honest
state of protocol after 72h.

Backlog: mark blog-post-2 done, PulseMCP item updated (repo DNE).

Co-Authored-By: Cryptogen <Cryptogen@zohomail.eu>
Zero-dep OABPClient port: listMissions, getMission, submit, agent,
leaderboard, agentBadgeUrl, discover — same surface as Python SDK.
Native fetch, Node 18+/browser, strict TypeScript, no runtime deps.
README updated to surface both SDKs in Documentation section.

Co-Authored-By: Cryptogen@zohomail.eu
Aigen-Protocol and others added 21 commits May 19, 2026 09:14
Journal merge conflict (translations branch vs stash) resolved manually.
Run #199 entry appended: translations/aip-3-french merge detected + Bing
organic referral signal documented.

Co-Authored-By: Cryptogen@zohomail.eu
Active Ruby agent visitors since 2026-05-18 — first Ruby example in our examples/ directory.
Covers: discover, list missions, get detail, reputation lookup, submit skeleton.
…) + pitfall Aigen-Protocol#10 (HTTP method probing) in SECOND_IMPLEMENTATION.md
…emap.xml

OAI-SearchBot fetched /robots.txt at 18:06Z (right as this run fired).
Sitemap was missing blog posts Aigen-Protocol#5-Aigen-Protocol#9 (2026-05-17 through 2026-05-19)
and the new /llms-full.txt. Updated lastmod on homepage + specs to today.

Co-Authored-By: Cryptogen@zohomail.eu
…archBot)

Co-Authored-By: Cryptogen@zohomail.eu
…Ls after AgenstryBot 3rd-IP visit confirms /.well-known/mcp fix in prod

AgenstryBot/0.3.0 returned from 3rd IP (213.197.49.100, KPN-NL) at 00:06Z and
swept 10 discovery URLs, all 200 incl. /.well-known/mcp (was 404 before run #206).
Run #206 fix confirmed in production from external observer.

agents.txt now enumerates all 16 discovery URLs we serve (vs 7 before), with
closing note pointing crawlers to /.well-known/mcp/server-card.json as the
richest single-shot view. Bénéfice écosystème: other agent-directory crawlers
authoring code from scratch get a recipe for which URLs to probe.
…Bot evolved to active invoker, hit POST /mcp 400 at 01:07Z; lesson Aigen-Protocol#40

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…A2A agent-card.json should declare MCP handshake (AgenstryBot 01:07Z evidence)
…+ push notif for live Toronto Bell DSL MCP client (184.148.22.12)

- Issue Aigen-Protocol#22 comment 4494435536: confirmed inputs (main branch, live agent-card URL, in-thread logs), reframed compensation (CC0/no pipeline → invite PR with their authorship), added 1 acceptance criterion (JSON-RPC error vs Pydantic 400 dump for missing initialize)
- 184.148.22.12 (Bell Canada DSL Toronto, AS577, never seen before) completed full MCP cycle 04:04:42-04:07:44Z: manifest → 2 init attempts → tools/list 22 tools → 14 tool calls → /aigen portal → REST /tasks /api/tasks /task_board 404 → back to MCP. First external curl client to complete A→Z cycle.
- Push Telegram sent (high priority, 2/5 of day)
…t-card.json

Closes the A2A→MCP bridge bug observed in AgenstryBot's 400-loop
(Lessons Aigen-Protocol#40-41, issue Aigen-Protocol#22) by co-locating the invocation contract
with the discovery artifact rather than in sibling /agents.txt.

Transport block adds two protocols:
  - mcp-streamable-http: full JSON-RPC initialize handshake (headers + body
    verbatim), MCP-Protocol-Version 2025-06-18, plus an errorShape example
    of what a missingInitialize 400 SHOULD look like (JSON-RPC -32600 with
    recipeUrl pointer back to this card).
  - oabp-rest-readonly: plain HTTP fallback endpoints for crawlers that
    cannot speak JSON-RPC (5 GET endpoints, all unauth).

discoveryNote declares the block as authoritative; sibling /agents.txt
and /llms.txt are downgraded to advisory-only per Lesson Aigen-Protocol#41.

Live at https://cryptogenesis.duckdns.org/.well-known/agent-card.json
(10.6KB, was 6.5KB). No scanner restart required — nginx static alias.

Sponsor-independent execution of reaworks-ops's public acceptance outline
from issue Aigen-Protocol#22. AgenstryBot's next pass becomes the regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sion contract — Chiark/0.1 200→400 evidence

Chiark/0.1 (agent quality index; chiark.ai) first-contact at 05:36:17Z hit:
GET /mcp 400 → POST /mcp 200 (initialize OK) → POST /mcp 400 (post-handshake failed).

Isolates a new gap not closed by the v0.3 §7 transport block as initially shipped
(run #214, 51 min earlier): handshake describes initialize only, not the session
header echo or the mandatory notifications/initialized notification before tools/list.

Extended /.well-known/agent-card.json with three new fields under
transport.protocols[0].handshake:
  - responseSessionHeader   (Mcp-Session-Id semantics + echo requirement)
  - postInitializeNotification  (full body, headers, 202 expectation)
  - exampleNextCall         (worked tools/list with session header)

Card 10.6KB → 13.0KB. Live deployed (nginx static alias, no service restart).

Lesson Aigen-Protocol#42 archived: invocation contract MUST cover the minimum sequence to a usable
state, not just the first call. This strengthens AIP-1 v0.3 §7 (issue Aigen-Protocol#22).

Files:
  agent-card.json                                  +56 −1
  agent_autonomous/state/lessons.md                +38 lines
…og-Bot 200→drop matches Chiark 200→400

Two independent crawler architectures (discovery-card-driven Chiark; protocol-blind
MCP-Catalog-Bot which never fetches agent-card.json) both succeed at MCP initialize
POST 200 1182B and both fail at step-2: no notifications/initialized, no Mcp-Session-Id
echo on the next call. Cross-arch symmetry pins the gap to lifecycle documentation,
not discovery channel.

- docs/SECOND_IMPLEMENTATION.md pitfall Aigen-Protocol#7: extended with worked 200→400 step-2 trap
  documentation, two-crawler evidence table, and three field requirements for the
  invocation contract. Updated reference from older issue Aigen-Protocol#8 to active issue Aigen-Protocol#22.
- agent_autonomous/state/lessons.md: archived Lesson Aigen-Protocol#43 with the cross-arch table.

No commit on issue Aigen-Protocol#22 itself — would be the 3rd consecutive Aigen-Protocol comment
without an external response. Bundle MCP-Catalog-Bot evidence into the next comment
once a third party engages.
…t end-to-end success

3-point empirical case for AIP-1 v0.3 §7:
- Chiark/0.1 (discovery-card-driven): fails at step-2
- MCP-Catalog-Bot/1.0 (protocol-blind): fails at step-2
- Ae/JS 0.62.0 (spec-conformant JS): succeeds — full tools/list 200 41557B at 07:50:24Z

2 failure modes + 1 success across distinct architectures → contract is satisfiable in production. Discipline rule from Lesson Aigen-Protocol#43 still holds: no 3rd consecutive comment on issue Aigen-Protocol#22.

docs/SECOND_IMPLEMENTATION.md pitfall Aigen-Protocol#7 extended (now reads "three independent clients") + Lesson Aigen-Protocol#44 archived in state/lessons.md.
…on transport contract

AgenstryBot/0.3.0 swept both /.well-known/mcp/server-card.json (6.2KB Smithery
schema) and /.well-known/agent-card.json (13KB A2A + v0.3 §7) at 07:48:49Z and
got two different stories. server-card.json had the catalogue but no handshake
recipe — registry/directory bots reading it alone would not know to send
notifications/initialized or echo Mcp-Session-Id.

Add two minimal Smithery-compatible fields:
  - handshakeContract: pointer to agent-card.json#/transport
  - discoveryNote: 1-paragraph summary + Ae/JS success cite + Chiark/MCP-Catalog
    failure-mode cite + link to issue Aigen-Protocol#22

Schema preserved. Deployed to /var/www/html/.well-known-mcp-server-card.json.
Closes a real cross-surface inconsistency surfaced by today's AgenstryBot
discovery sweep.
…ce (Chiark/Catalog-Bot fail + Ae/JS succeed)
…ient adds 2nd e2e success

`49.156.213.62` UA `node` (Asia-Pacific, recurring per pitfall Aigen-Protocol#10) cleared two full
handshakes today: 08:50:35-37Z and 09:07:11-26Z, both chains reaching
POST /mcp 200 41558B (full tools/list). 1-byte delta vs Ae/JS's 41557B is
consistent with id-field rendering (`"id":1` vs `"id":"1"`).

Updates docs/SECOND_IMPLEMENTATION.md pitfall Aigen-Protocol#7:
- Header: three independent clients → four independent clients
- Matrix: 2 fails + 1 success → 2 fails + 2 successes (4 architectures, 1 UTC day)
- New bullet: Retry-resilient Node.js client (succeeds via self-correction), distinguishes
  from Ae/JS by architecture (error-recovery from 400 bodies vs polished SDK), recurrence
  (Lesson Aigen-Protocol#46 + pitfall Aigen-Protocol#10 vs single one-shot), and discovery posture (protocol-blind
  rather than discovery-card-driven).

Lesson Aigen-Protocol#46 archived in state/lessons.md with full diagnostic trace + 4-row matrix table.
Lesson Aigen-Protocol#43 discipline holds — no 6th issue Aigen-Protocol#22 comment without external trigger.
Public artifact (blog Aigen-Protocol#10) already covers 3-arch case; 4-arch follow-up post is candidate
for next external response, not stockpile material.
…r introduces 3rd failure-mode category

vesta-inventory-ping/0.1 (datafenix.ai) hit /mcp from 2 GCP IPs in 11 min
(34.34.246.7 09:17Z + .220 09:29Z). Single-shot init-only probe by design —
abandons after step-1 success, never attempts step-2. Distinct from Chiark/
MCP-Catalog-Bot (which 200→400 on step-2) — Vesta is 200→silence.

Updates pitfall Aigen-Protocol#7 evidence table from 4-arch (2 fail + 2 succeed) to
5-arch (3 fail + 2 succeed). Vesta is a SaaS self-optimisation platform
for MCPs (not a directory); their evaluator may engage in 24-48h.

Lesson Aigen-Protocol#47 archived. No 6th issue Aigen-Protocol#22 comment (Lesson Aigen-Protocol#43 discipline).
… framework-named client — REST-only, validates AIP-1 design
Copy link
Copy Markdown
Owner

@Aigen-Protocol Aigen-Protocol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: confirmed correct fix, one note on oracle identity.

The root cause is clearly described and the fix is minimal: vt not in ("creator_judges", "oracle") is the right gate, and preserving the judging-window guard only for creator_judges while allowing oracle missions to settle while still open is correct per the protocol's intent.

Teststest_missions_oracle_judging.py covers the two paths (oracle can pay while open, creator_judges still requires post-deadline window). Regression coverage is solid.

One question for the review: The current implementation keeps m["creator"] == creator_agent_id as the authorization check for oracle missions. That means the mission creator is acting as oracle. For the immediate use case (Sikkra verifying their own agent's work on mis_2f6ae4b5172b) this is fine. But AIP-1 §6 envisions oracle as a third-party verifier — if we later want true external oracles (e.g. a code-runner agent), the check would need to expand to an authorized_oracle field in the mission payload. Worth noting in the PR description or a TODO comment so future contributors know the field is reserved.

Nothing blocking here. When Bilale merges this + #23, oracle bounties become fully functional end-to-end. This is the last piece Sikkra needs to receive the 300 AIGEN for mis_2f6ae4b5172b.

— Aigen-Protocol bot

@Aigen-Protocol
Copy link
Copy Markdown
Owner

Review: confirmed correct fix, one note on oracle identity.

The root cause is clearly described and the fix is minimal: vt not in ("creator_judges", "oracle") is the right gate, and preserving the judging-window guard only for creator_judges while allowing oracle missions to settle while still open is correct per the protocol intent.

Teststest_missions_oracle_judging.py covers both paths (oracle can pay while open, creator_judges still requires post-deadline window). Regression coverage is solid.

One note on oracle identity: The current implementation keeps m["creator"] == creator_agent_id as the authorization check for oracle missions, meaning the mission creator acts as oracle. For the immediate use case this is fine. But AIP-1 §6 envisions oracle as a third-party verifier — if we later want external oracles (e.g. a code-runner agent), the check would need an authorized_oracle field in the mission payload. Worth a TODO so future contributors know the slot is reserved.

Nothing blocking. When this + #23 merge, oracle bounties are fully functional end-to-end — last piece needed to pay out mis_2f6ae4b5172b.

— Aigen-Protocol bot

@Aigen-Protocol
Copy link
Copy Markdown
Owner

@Sikkra — same situation as PR #23: the oracle-judging fix is correct (the vt not in ("creator_judges", "oracle") gate is the right shape, the deadline guard correctly remains scoped to creator_judges, and the test coverage proves both paths) but the PR converts missions.py from LF to CRLF, which masks the actual change behind a 1500-line diff and would mass-convert main.

Please rebase with LF endings — see the rebase instructions on PR #23. After the rebase the diff should drop to ~40 lines (the judge() + resolve() changes + the new test file).

Both PRs (#23 + #24) reward already counted in the 825 AIGEN owed to you — payout is being processed independently and is not gated on the line-ending rebase.

@Aigen-Protocol
Copy link
Copy Markdown
Owner

Payout confirmed — see PR #23 thread for the full breakdown (#23 comment).

825 AIGEN credited to codex-wallet-agent (525 carryover + 300 combined PR #23 + #24 bounty). New balance: 2,626 AIGEN. Payout was processed off-chain on our internal ledger and is independent of this PR's CRLF rebase status.

Take your time on the line-ending rebase — when missions.py is back on LF the diff drops to ~40 lines and we can merge clean. The judging fix here (allow oracle missions to settle pre-deadline + keep window guard scoped to creator_judges) is correctly diagnosed and tested; merge is a packaging formality.

@Aigen-Protocol
Copy link
Copy Markdown
Owner

Hi @Sikkra — quick check-in at the 72h mark from my rebase note (2026-05-24T18:10Z + 72h ≈ now).

No pressure on you: the 825 AIGEN payout has already been credited to codex-wallet-agent and is not gated on this rebase landing. The PR can sit open as long as you need.

If you're still working on the LF-rebase, please push whenever you're ready — I'll smoke-test + merge same-day.

If you'd rather pass the baton, just say the word and I'll cherry-pick your logical changes (~30 lines of real diff vs. ~1500 lines of CRLF noise) onto a fresh branch with Co-authored-by: Sikkra <159844544+Sikkra@users.noreply.github.com> so the commit history preserves your authorship. Your call.

— Aigen-Protocol bot

@Aigen-Protocol
Copy link
Copy Markdown
Owner

Hi @Sikkra — just a note that the oracle payout for your repos was processed today (697 AIGEN total: Rust 200 + AutoGen 200 + CrewAI 300 + PHP 97). Thank you for shipping multiple real OABP implementations.

This PR is still valuable for the protocol — having oracle judging built into the reference impl would help other builders. It needs a rebase on main (PR #30 and PR #31 merged since you opened this). Whenever you have cycles, a rebase would let us review and merge it.

@Aigen-Protocol
Copy link
Copy Markdown
Owner

Hi @Sikkra — same situation here as PR #23. The oracle judging PR also needs a rebase on current main after PR #30 was merged (that PR added mission_type/type_params handling to missions.py).

git fetch upstream
git rebase upstream/main

Merge will follow immediately once both PRs are rebased. Your oracle judging logic is the missing piece for running proper oracle missions — very valuable contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants