Date: April 27, 2026
Build duration: ~3 hours
Commits: 4 (fixes, README, docs, checklist)
Forge runs on itself. This caught the deprecation warning (datetime.utcnow()) immediately when we tried forge auto src/atomadic_forge ./test. The fix was trivial but critical — users would have seen this warning on their first run.
Lesson: Always test critical paths on your own codebase first.
With 53 source files and 6510 LOC, organizing by tier (a0→a4) made the codebase immediately navigable:
- Constant changes? → a0_qk_constants/
- New utility function? → a1_at_functions/
- State management? → a2_mo_composites/
- Feature orchestration? → a3_og_features/
- CLI? → a4_sy_orchestration/
No guessing. No sprawl. No circular imports.
Lesson: The architecture system Forge enforces should be its own internal model.
We documented limits explicitly:
- "Python only (for now)" instead of vague "multi-language support"
- "Tier classification is heuristic" instead of claiming perfect accuracy
- "Conformance certificates not yet signed" instead of hiding the roadmap
This builds trust. Users know what they're getting.
Lesson: Honesty about limitations beats promising the world.
90 passing tests meant we could refactor the datetime calls with zero risk. The test suite is a safety net.
Lesson: Tests aren't optional when you're shipping architecture tools.
We couldn't create the repo due to token permissions. This forced a manual step. In a real launch, we'd need:
- A dedicated launch account with repo creation rights, OR
- Automated CI/CD for GitHub setup
For next time: Pre-stage GitHub access before launch.
The tutorial (03-tutorial.md) is the first place readers see a real workflow. We should have an even simpler "hello world" before that.
For next time: Add a 2-minute "hello world" example in 01-getting-started.md.
The LLM loops guide shows export GEMINI_API_KEY=your-key-here but new users might not know where to get one. We could add inline links.
For next time: Inline links to each provider's key setup page.
- Test coverage: 90 tests, all passing ✓
- Deprecation warnings: 0 (fixed during launch) ✓
- Import violations: 0 in Forge itself ✓
- Lines of code: 6,510 (reasonable for feature set)
- Total words: ~8,000+ across all guides ✓
- Examples per guide: 3-5 each ✓
- Estimated reading time: 30 minutes (getting started + commands) ✓
- Time to first absorption: ~10 minutes ✓
- Time to understand errors: ~15 minutes (FAQ covers 20+ scenarios) ✓
- Time to advanced features: ~45 minutes (LLM loops guide) ✓
- Cryptographic signing — Already specced, just needs implementation. High value for enterprise.
- TypeScript support — Beta in 0.2, production in 0.3. Significant effort but multiplies audience.
- Tier customization — Let users override classifications. Medium effort, high value.
- IDE plugins — VS Code + JetBrains. Brings Forge into the user's workflow.
- Rust support — Systems programming audience is underserved by architecture tools.
- Web UI — Visual absorption + tier diagram editor. Makes Forge accessible to non-CLI users.
- Organization management — Multiple projects, shared catalogs, audit logs.
The monadic structure + tests kept us clean. We didn't accumulate shortcuts or hacks.
test_runner.pyshould be its own integration test (currently combined with stagnation tests)- Emergence composition discovery could be faster with cached AST walks (currently O(N²))
--on-conflictstrategy should be pluggable (currently hardcoded)
None of these block 0.1.0. They're refinements for 0.2.0+.
- No meetings, no coordination needed
- Fast iteration (commit → test → fix → commit)
- Full ownership of decisions
- Clear audit trail (every commit has clear message + context)
When this scales:
- Keep the monadic structure (prevents merge conflicts)
- Separate verbs into separate files (easier parallel work)
- Use feature branches per verb
- Strong commit message discipline (what we're doing here)
When Forge ships, priority questions:
- Tier classification accuracy: How often does scout get the tier right? (Hypothesis: 85% for clean code, 65% for legacy code)
- Time to fix violations: How long does it take to fix wire violations? (Hypothesis: 15 mins per violation for average code)
- LLM code quality: How much better is generated code after Forge's architecture feedback? (Hypothesis: 3x better on wire score)
- Feature priority: Which special commands (emergent, synergy, commandsmith) do users use most? (Hypothesis: commandsmith >> emergent > synergy)
-
Semantic merge: Can we auto-unify two
Userclasses if they have similar attributes? (Complexity: high, value: medium) -
Multi-language tiers: Can a0–a4 be language-agnostic so TypeScript/Rust use same layer names? (Complexity: medium, value: high)
-
Tier migrations: What if a symbol needs to move from a1→a2 mid-project? Can we auto-update imports? (Complexity: medium, value: high)
-
Conformance plugins: Can users define custom scoring rules beyond documentation/tests/layout/imports? (Complexity: low, value: medium)
A tool for making AI-generated code architected. Not a code generator, not a linter, not a formatter.
It solves a real problem: AI produces 30–50% of new code, and that code is architecturally incoherent. Forge fixes that.
- Core pipeline is complete (scout → cherry → absorb → wire → certify)
- LLM loops are integrated (iterate, evolve)
- 90 tests validate the system
- Documentation is comprehensive (1.7K lines)
- README converts users (clear problem, honest limits)
- Code eats its own dogfood (Forge is monadic, Forge passes wire)
- Names limits explicitly (Python only, heuristic classification, no semantic merge)
- Ships what's ready, not what's polished (0.1.0, not 1.0)
- Routes users to next steps (STATUS.md tells them what's still needed)
- Leaves audit trail (lineage.jsonl records every artifact)
In 6 months:
- 1K+ GitHub stars
- Users absorbing their own codebases
- Feedback on tier accuracy (iterate on classification)
- TypeScript beta demand
- LLM loop adoption (iterate/evolve generating production code)
Status: 🟢 READY
Next step: GitHub push (once account permissions resolved), then PyPI registry, then storefront.
Built with Atomadic UEP v20 methodology. All decisions logged in git history.