Skip to content

Create claude.yml#17

Open
wangfe wants to merge 17 commits into
AlphaBrainGroup:mainfrom
Alchedata:main
Open

Create claude.yml#17
wangfe wants to merge 17 commits into
AlphaBrainGroup:mainfrom
Alchedata:main

Conversation

@wangfe
Copy link
Copy Markdown

@wangfe wangfe commented Apr 25, 2026

No description provided.

wangfe added 2 commits April 23, 2026 11:31
7-page React+Vite demo: Home, Overview, Failure Map, Patch Plan,
Iteration Runner, Improvement Report, Platform Memory.
LIBERO Kitchen story: ckpt_v0.7 62% → ckpt_v0.8 74% (+12%).
Dark #07090f theme, indigo-violet gradients, SVG loop diagram.
@wangfe
Copy link
Copy Markdown
Author

wangfe commented Apr 25, 2026

approve

yaoge and others added 9 commits April 25, 2026 10:42
- Rewrite README.md: lead with Nvex orchestration layer narrative,
  two-layer architecture diagram, failure-to-fix loop, and demo
  quick-start; retain AlphaBrain technical detail as execution layer
- Add CLAUDE.md: project conventions and architecture reference
- Add demo/nvex-demo.html: standalone 7-page investor demo (all pages
  implemented: Project Hub, Overview, Failure Map, Patch Plan,
  Iteration Runner, Improvement Report, Platform Memory)
- Add prd.md: full Nvex product requirements document
- Add frontend-design.md: Nvex demo wireframe and page IA

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…agent design

- README.md: reposition demo for both investors and potential customers;
  add Who It's For section; add self-improving agent section
- IMPLEMENTATION_PLAN.md: full milestone plan (M1-M4) based on PRD and
  current codebase state; React component list; priority table for next sprint
- SELF_IMPROVEMENT_AGENT.md: brainstorm and design for autonomous
  failure-to-fix agent; three demo modes; agent architecture and tool registry
- demo/README.md: add audience mode table; link to SELF_IMPROVEMENT_AGENT.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rd component, and dual-scenario support

✅ MILESTONE 1 COMPLETION: Narrative MVP (Demo-Ready)

**Narrative Surfaces Enhanced:**
- All 7 pages (Home, ProjectOverview, FailureMap, PatchPlan, IterationRunner, ImprovementReport, PlatformMemory) upgraded with richer narrative content and UI
- Home page now features project hub with two demo scenarios (LIBERO Kitchen + RoboCasa) showcasing breadth
- Each page implements storytelling aligned with investor narrative: problem → diagnosis → solution → execution → results → memory

**New Component:**
- AssetCard.jsx: Reusable abstraction for recipes, templates, failure patterns, and reusable assets displayed across reports and memory pages
- Enables consistent asset visualization across the demo

**Data Layer Improvements:**
- mock data enriched with second scenario (RoboCasa_tabletop, non-LIBERO benchmark) to demonstrate domain breadth
- Mocked artifacts now include two complete before/after improvement loops with realistic metrics
- Added narrative context and supplementary fields to support page enrichment

**Styling Enhancements:**
- Extended styles.css with additional semantic classes for asset cards, improved spacing and visual hierarchy
- Refined dark theme (var(--grad)) consistency across all components

**Build & Deployment:**
- React demo builds successfully with Vite (npm run build → dist/)
- Dev server launches without errors (npm run dev → http://localhost:5173/)
- All page routes resolve; no missing component or import errors

**Documentation:**
- Updated IMPLEMENTATION_PLAN.md to accurately reflect M1 completion
- Clarified that M1 narrative MVP is feature-complete and production-ready for investor demos
- Updated roadmap with realistic M2-M4 effort estimates and dependencies

**Testing & Validation:**
- Vite build: ✅ Succeeded (CSS + JS bundled to dist/)
- Dev server: ✅ Responsive UI renders correctly
- Page navigation: ✅ All 7 routes functional
- Component rendering: ✅ No console errors

**What This Enables:**
1. Investor-ready demo showing full end-to-end Nvex narrative (failure diagnosis → patch planning → autonomous training → improvement reporting)
2. Foundation for M2 (real evaluation artifacts) and M3 (self-improving agent loop)
3. Clear roadmap for M4 (customer-grade multi-project platform)

**Next:** M2 Executable MVP — wire real AlphaBrain eval artifacts and implement rule-based patch planning engine
… patch planner

✅ MILESTONE 2 PHASE 1: Executable MVP Backend Foundation

**New Package: nvex_server/**

Introduces the core orchestration layer for Nvex that bridges React frontend to AlphaBrain training/eval.

**Schemas (nvex_server/schemas.py):**
- EvalRun: Represents benchmark evaluation results with per-task breakdown and failure clusters
- PatchPlan: Structured patch strategy output mapping failure diagnosis to training strategy
- IterationJob: Tracks training execution state, artifacts, and results
- ImprovementReport: Before/after uplift and generated reusable assets
- Request models: PlanGenerationRequest, IterationStartRequest for HTTP API
- Type aliases: ExecutionBackend (5 training modes), JobStatus, Severity, TrainingStrategy
- Validation: All Pydantic models use ConfigDict(extra='forbid') for strict schema enforcement

**Rule-Based Patch Planner (nvex_server/patch_plan_generator.py):**
- PatchPlanGenerator: Maps failure clusters to training strategies using keyword matching
- 6 Patch Rules hardcoded for common failure patterns:
  - occlusion → CL with lighting variants (120 episodes, 20 corrections)
  - recovery → fine-tune with teleop (80 episodes, 40 corrections)
  - language → VLM co-training with augmentation (60 episodes, 10 corrections)
  - lighting → CL with appearance shift (100 episodes, 15 corrections)
  - long-horizon → world model verification (90 episodes, 15 corrections)
  - generalization → cross-robot CL (140 episodes, 20 corrections)
- Confidence scoring based on failure severity and cluster share
- Uplift estimation: 4% baseline + 18% × share_of_failures (capped at 20%)
- Fallback handling: generates default rule if no clusters provided

**FastAPI Service Skeleton (nvex_server/app.py):**
- create_app() factory with in-memory store (InMemoryStore dataclass)
- Endpoints implemented (in-memory, not yet wired to AlphaBrain):
  - GET /health: Service health check
  - POST /api/eval/import: Ingest EvalRun artifacts
  - POST /api/plan/generate: Run PatchPlanGenerator on eval results
  - POST /api/iteration/start: Create IterationJob from patch plan
  - GET /api/iteration/{id}/status: Poll job state (simulates state transitions queued→running→completed)
  - GET /api/report/{iteration_id}: Fetch ImprovementReport
- All endpoints support the full schema contract; real job dispatch TBD in M2C

**Dependencies Added:**
- fastapi==0.115.12
- uvicorn==0.34.2
- pydantic==2.10.6 (already present, now explicit)

**Package Isolation:**
- nvex_server/__init__.py uses __getattr__ lazy import to avoid forcing FastAPI into all consumers
- patch_plan_generator and schemas can be imported without HTTP dependency

**Validation:**
- All files compile cleanly (python -m compileall nvex_server)
- PatchPlanGenerator tested live: occlusion cluster → continual_learning strategy confirmed
- FastAPI app instantiation verified: 5 API routes registered correctly

**Documentation:**
- Updated IMPLEMENTATION_PLAN.md to mark schemas, planner, and infrastructure as [x] Complete
- Updated priority table to reflect current backend readiness
- Clarified that real AlphaBrain job wiring is still pending M2C

**What This Enables:**
1. React frontend can now POST to /api/plan/generate with an EvalRun and receive a structured PatchPlan
2. IterationRunner page can call /api/iteration/start and poll /api/iteration/{id}/status
3. ImprovementReport page can fetch before/after metrics and reusable assets
4. Foundation for M2A (eval artifact exporter) and M2C (real job dispatcher) to extend the same backend

**What's Still Pending:**
- M2A: AlphaBrain benchmark result exporter (JSON artifact generation)
- M2C: JobDispatcher wrapping actual AlphaBrain training scripts
- React integration: consume endpoints instead of mock data
- Real training execution: currently simulated in-memory state transitions
- Update main README demo section with local uvicorn backend startup instructions
- Document /api endpoints exposed by nvex_server
- Remove stale 'all data mocked' language; clarify M2 is implemented
- Update demo README with API-backed M2 flow and startup sequence
- Remove 'in progress' and 'mock-only' descriptions for React app
- Document Vite proxy configuration and backend endpoint list
- Add notes on seeded demo artifacts (libero_kitchen_before/after_eval.json)
@guoweiyu
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. Since a large amount of content has been added, would it be convenient to quickly go through it in an online meeting? May be tuesday?

wangfe added 6 commits May 2, 2026 01:06
- Added SelfImprovementAgent for autonomous failure-to-fix loops
- Implemented demo mode (precomputed replay) and real mode skeleton
- Added LLMNarrator with OpenAI support and template fallback
- Created M3 schemas: AgentRunState, LoopIteration, AgentStep, FailureDiagnosis
- Exposed new API endpoints: /api/agent/run, /api/agent/advance, /api/demo/agent
- Enhanced frontend with AgentReasoningPanel and MultiIterationChart
- Updated IterationRunner and ImprovementReport with autonomous loop UI
- Verified backend routes and frontend build
- Added SelfImprovementAgent for autonomous failure-to-fix loops
- Implemented demo mode (precomputed replay) and real mode skeleton
- Added LLMNarrator with OpenAI support and template fallback
- Created M3 schemas: AgentRunState, LoopIteration, AgentStep, FailureDiagnosis
- Exposed new API endpoints: /api/agent/run, /api/agent/advance, /api/demo/agent
- Enhanced frontend with AgentReasoningPanel and MultiIterationChart
- Updated IterationRunner and ImprovementReport with autonomous loop UI
- Verified backend routes and frontend build
…- Add agent event stream model for live timeline rendering\n- Extend demo arc to 4 loops: 62->74->81->79 regression->rollback->85\n- Emit structured run events (start/step/iteration/rollback/stop)\n- Add step expected durations to drive realistic stream pacing\n- Add auto stream controls (start/pause) in runtime context and runner\n- Render streaming timeline and rollback callouts in reasoning panel\n- Mark regression and rollback in multi-iteration chart and report\n- Seed demo agent with max_iterations=4 for new investor flow\n- Convert Milestone 4 plan into concrete P0/P1/P2 backlog tickets\n- Update investor demo script to match autonomous stream + rollback narrative
- Highlight M2 and M3 as complete
- Document autonomous agent capabilities (LLM narration, multi-iteration, rollback)
- Update API surface with new agent endpoints
- Add key capabilities section
- Expand demo features documentation
- Update repository map with new agent.py and llm_narrator.py
- Update roadmap with status indicators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants