Skip to content

code-rabi/rllm

Repository files navigation

RLLM: Recursive Large Language Models (TypeScript)

A TypeScript implementation of Recursive Language Models for processing large contexts with LLMs.

Inspired by Cloudflare's Code Mode approach.

Key differences from the Python version:

Installation

pnpm add rllm
# or
npm install rllm

Demo

RLLM analyzing a node_modules directory — the LLM writes JavaScript to parse dependencies, query sub-LLMs in parallel, and synthesize a final answer:

RLM.final.mp4

Built with Gemini Flash 3. See the full interactive example in examples/node-modules-viz/.

Quick Start

LLM writes JavaScript code that runs in a secure V8 isolate:

import { createRLLM } from 'rllm';

const rlm = createRLLM({
  model: 'gpt-4o-mini',
  verbose: true,
});

// Full RLM completion - prompt first, context in options
const result = await rlm.completion(
  "What are the key findings in this research?",
  { context: hugeDocument }
);

console.log(result.answer);
console.log(`Iterations: ${result.iterations}, Sub-LLM calls: ${result.usage.subCalls}`);

Structured Context with Zod Schema

For structured data, you can provide a Zod schema. The LLM will receive type information, enabling it to write better code:

import { z } from 'zod';
import { createRLLM } from 'rllm';

// Define schema for your data
const DataSchema = z.object({
  users: z.array(z.object({
    id: z.string(),
    name: z.string(),
    role: z.enum(['admin', 'user', 'guest']),
    activity: z.array(z.object({
      date: z.string(),
      action: z.string(),
    })),
  })),
  settings: z.record(z.string(), z.boolean()),
});

const rlm = createRLLM({ model: 'gpt-4o-mini' });

const result = await rlm.completion(
  "How many admin users are there? What actions did they perform?",
  {
    context: myData,
    contextSchema: DataSchema,  // LLM sees the type structure!
  }
);

The LLM will know it can access context.users, context.settings, etc. with full type awareness.

The LLM will write code like:

// LLM-generated code runs in V8 isolate
const chunks = [];
for (let i = 0; i < context.length; i += 50000) {
  chunks.push(context.slice(i, i + 50000));
}

const findings = await llm_query_batched(
  chunks.map(c => `Extract key findings from:\n${c}`)
);

const summary = await llm_query(`Combine findings:\n${findings.join('\n')}`);
print(summary);
giveFinalAnswer({ message: summary });

API Reference

createRLLM(options)

Create an RLLM instance with sensible defaults.

const rlm = createRLLM({
  model: 'gpt-4o-mini',      // Model name
  provider: 'openai',         // 'openai' | 'anthropic' | 'gemini' | 'openrouter' | 'custom'
  apiKey: process.env.KEY,    // Optional, uses env vars by default
  baseUrl: undefined,         // Optional, required for 'custom' provider
  verbose: true,              // Enable logging
});

Custom Provider (OpenAI-Compatible APIs)

Use the custom provider to connect to any OpenAI-compatible API (e.g., vLLM, Ollama, LM Studio, Azure OpenAI):

const rlm = createRLLM({
  provider: 'custom',
  model: 'llama-3.1-8b',
  baseUrl: 'http://localhost:8000/v1',  // Required for custom provider
  apiKey: 'your-api-key',               // Optional, depends on your API
  verbose: true,
});

Note: When using provider: 'custom', the baseUrl parameter is required. An error will be thrown if it's not provided.

RLLM Methods

Method Description
rlm.completion(prompt, options) Full RLM completion with code execution
rlm.chat(messages) Direct LLM chat
rlm.getClient() Get underlying LLM client

CompletionOptions

Option Type Description
context string | T The context data available to LLM-generated code
contextSchema ZodType<T> Optional Zod schema describing context structure

Sandbox Bindings

The V8 isolate provides these bindings to LLM-generated code:

Binding Description
context The loaded context data
llm_query(prompt, model?) Query sub-LLM
llm_query_batched(prompts, model?) Batch query sub-LLMs
giveFinalAnswer({ message, data? }) Return final answer
print(...) Console output

Real-time Events

Subscribe to execution events for visualizations, debugging, or streaming UIs:

const result = await rlm.completion("Analyze this data", {
  context: myData,
  onEvent: (event) => {
    switch (event.type) {
      case "iteration_start":
        console.log(`Starting iteration ${event.iteration}`);
        break;
      case "llm_query_start":
        console.log("LLM thinking...");
        break;
      case "code_execution_start":
        console.log(`Executing:\n${event.code}`);
        break;
      case "final_answer":
        console.log(`Answer: ${event.answer}`);
        break;
    }
  }
});
Event Type Description
iteration_start New iteration beginning
llm_query_start Main LLM query starting
llm_query_end Main LLM response received
code_execution_start V8 isolate executing code
code_execution_end Code execution finished
final_answer giveFinalAnswer() called with answer

Architecture

┌─────────────────────────────────────────────────────────────┐
│  RLLM TypeScript                                            │
│                                                             │
│  ┌─────────────┐    ┌──────────────────────────────────┐   │
│  │   RLLM      │    │  V8 Isolate (Sandbox)            │   │
│  │   Class     │───▶│                                  │   │
│  └─────────────┘    │  • context (injected data)       │   │
│        │            │  • llm_query() ──┐               │   │
│        │            │  • llm_query_batched()           │   │
│        ▼            │  • print() / console             │   │
│  ┌─────────────┐    │  • giveFinalAnswer()             │   │
│  │  LLMClient  │◀───┼──────────────────┘               │   │
│  │  (OpenAI)   │    │                                  │   │
│  └─────────────┘    │  LLM-generated JS code runs here │   │
│                     └──────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

No TCP. No subprocess. Direct function calls via bindings.

Why V8 Isolates? (Not TCP/Containers)

The Python RLLM uses subprocess + TCP sockets for code execution. We use V8 isolates instead:

Python RLLM:  LLM → Python exec() → subprocess → TCP socket → LMHandler
TypeScript:   LLM → V8 isolate (same process) → direct function calls

Benefits:

  • No TCP/network - Direct function calls via bindings
  • Fast startup - Isolates spin up in milliseconds
  • Secure - V8's built-in memory isolation
  • Simple - No containers, no socket servers

Development

# Install dependencies
pnpm install

# Build
pnpm build

# Run example
pnpm example

# Run tests
pnpm test

License

MIT - Same as the original Python RLLM.

Credits

Based on the Recursive Language Models paper and Python implementation by Alex Zhang et al.

Reference: RLM Blogpost

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published