"Explore once. Exploit next."
An agentic harness where agents learn from trial-and-error execution and codify learnings into reusable skills.
Most agent memory stores facts. Most agent skills are static. Tsugi brings the best from the two worlds.
- Run 1: Research + Task Execution + Skill Creation
- Run 2: Skill Lookup + Skip Research (faster and token saving)
Tsugi captures procedural knowledge from successful task executions and saves them as reusable skills. When you ask an agent to integrate with a new API, it researches, experiments, handles errors, and eventually succeeds. That hard-won knowledge gets codified into a skill that makes the next execution instant.
Skills encode two types of knowledge:
- Procedural: Integration gotchas, validation rules, error patterns, multi-step workflows
- Preferences: Your taxonomies, classification rules, domain constraints
- Dual-Agent System - Task Agent executes work, Skill Agent codifies learnings
- Persistent Skill Library - Skills accumulate and compound over time
- Sandbox Execution - Isolated environments with environment variable injection
- Real-time Streaming - Watch agent reasoning and tool calls as they happen
- Extended Thinking - Full visibility into agent reasoning traces
- Native Grounding - Google Search and URL analysis built-in
- Conversation History - Persistent chat history with pinned comparisons
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ User Task │ ──▶ │ Task Agent │ ──▶ │ Skill Agent │
│ │ │ │ │ │
│ "Charge $50 │ │ 1. Search skills│ │ 1. Analyze │
│ on Stripe" │ │ 2. Research API │ │ transcript │
│ │ │ 3. Execute │ │ 2. Extract │
│ │ │ 4. Verify │ │ procedure │
│ │ │ 5. Suggest │ │ 3. Save skill │
│ │ │ codification │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
First execution - Agent explores, makes mistakes, self-corrects, and succeeds.
"Codify Skill" - Skill Agent extracts the working procedure with parameters.
Next execution - Agent finds the skill, skips research, executes directly.
- Framework: Next.js 16 with React 19 (App Router)
- LLM: Gemini 3 via Vercel AI SDK
- Database: SQLite/Turso for conversations, Vercel Blob for skills (prod)
- Sandbox: Local child_process (dev) or Vercel Sandbox microVM (prod)
- Observability: Braintrust for traces and token counting
- Styling: Tailwind CSS v4, Framer Motion, Lucide icons
- Node.js 18+
- pnpm
# Install dependencies
pnpm install
# Configure environment
cp .env.example .envAdd your API key to .env:
GOOGLE_GENERATIVE_AI_API_KEY=your_key_here
Get a key from Google AI Studio.
pnpm devOpen http://localhost:3000.
src/
├── app/ # Next.js App Router
│ ├── api/agent/ # SSE streaming endpoint
│ ├── api/conversations/ # Chat history CRUD
│ ├── api/skills/ # Skills API
│ └── task/ # Task execution page
├── components/ # React components
│ └── landing/ # Landing page components
├── hooks/ # useForgeChat, useConversations, useSkills
└── lib/
├── agent/ # Task Agent, Skill Agent, tools
├── db/ # SQLite/Turso database
├── sandbox/ # Sandbox executors
├── skills/ # Skill storage (local/cloud)
└── tools/ # Command execution
playground/ # Demo tasks
MEMORY/ # Plans and changelogs
pnpm dev # Start dev server
pnpm build # Production build
pnpm test # Run tests (watch mode)
pnpm test:run # Run tests once
pnpm lint # Lint codeTry these to see the system in action:
- "Send a hello world message to Discord" (requires webhook URL)
- "Charge $50 via Stripe API and then refund half" (requires Stripe key)
- "Summarize this YouTube video and save to Notion"
After successful execution, click "Codify Skill" to save the procedure. Run the same task again to see the speedup.
MIT
