Skip to content

SuperInstance/pincher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
PincherOS Logo

PincherOS

A Shell-Portable Agent State Machine

Teach an agent once. Run it on any device. Migrate without losing its mind.

CI License: MIT Rust


Current Status

What ships today: teach reflexes, run them fast (~50ms), migrate agent state between machines (pack/unpack). Veto engine for safety. Hash-based embeddings (ONNX MiniLM optional). PID resource controller for graceful degradation.

What's planned / in progress: WASM carapace sandboxing (WIP), Landlock filesystem rules (feature-gated, not default), full immunology system (WIP), multi-agent coordination (planned), Python sidecar for LLM compilation (WIP).


What Is PincherOS?

PincherOS is a post-model operating system for AI agents. It treats the LLM as a compiler, not a runtime. You teach your agent skills (called reflexes), and those reflexes execute directly at ~50ms with zero API cost. The LLM only fires when the agent encounters something genuinely new.

Think of it this way: Docker lets you build an app once and run it on any server. PincherOS lets you teach an agent once and run it on any device β€” from a Raspberry Pi to a workstation β€” without rewriting, without API bills, and without losing its memory.

What ships today: teach reflexes, run them fast, migrate state between machines. What's planned: WASM sandboxing, immunology, multi-agent.

The system is built around a hermit crab metaphor that maps directly to how it works:

Hermit Crab PincherOS What It Means for You
Shell β€” the snail house it lives in Shell β€” the hardware your agent runs on Your Pi, your laptop, your cloud VM β€” all are just shells
Crab β€” the creature itself, with personality Rigging β€” your agent's reflexes, identity, and learned behavior The agent's mind is portable. The shell is not.
Claws β€” how the crab manipulates the world Claws β€” the sandbox and capability system Every reflex runs inside a security boundary
Shell Swap β€” finding a bigger shell Migration β€” pincher pack β†’ copy β†’ pincher unpack Move your agent between devices in seconds

The crab is not the shell. The crab migrates. The crab learns. The shell is just where the crab lives right now.


How It Works: The Reflex Engine

The core insight is that most AI agent work is repetitive. You ask an agent to "list files" or "check memory" dozens of times. Each time, the LLM reasons from scratch, costs money, and takes seconds. PincherOS short-circuits that:

flowchart TD
    A["User: 'list my docker containers'"] --> B{Embed Intent<br/>384-dim vector}
    B --> C{Match Against<br/>Known Reflexes}
    C --> D{Similarity Score?}

    D -->|"> 0.80<br/>EXACT MATCH"| E["Execute Reflex Directly<br/>~50ms | $0"]
    D -->|"0.55 - 0.80<br/>PROBABLE MATCH"| F["Confirm + Execute<br/>~3s | ~$0.001"]
    D -->|"< 0.55<br/>NOVEL INTENT"| G["Route to LLM<br/>Compile β†’ New Reflex"]

    G --> H["LLM Generates<br/>Action Template"]
    H --> I["Store as New Reflex<br/>Next Time: Instant"]

    E --> J["Log Result +<br/>Update Confidence"]
    F --> J
    I --> J

    style E fill:#2d6a4f,stroke:#52b788,color:#fff
    style F fill:#e9c46a,stroke:#f4a261,color:#000
    style G fill:#e76f51,stroke:#d62828,color:#fff
    style I fill:#2d6a4f,stroke:#52b788,color:#fff
Loading

The system gets cheaper and faster the more you use it. Each successful execution increases confidence. Higher confidence means more direct hits. Direct hits skip the LLM entirely. Over time, your agent develops "muscle memory" β€” it reflexively handles what it knows and only reasons about what it doesn't.


The Self-Improving Loop

PincherOS isn't just caching LLM calls. It's building an increasingly competent agent through a closed feedback loop. Each execution produces a confidence update. Each confidence update changes how the next intent is routed. And the loop feeds itself:

flowchart LR
    subgraph Teach["TEACH Phase"]
        T1["Intent + Action"] --> T2["Embed Intent"]
        T2 --> T3["Store Reflex<br/>confidence=0.50"]
    end

    subgraph Do["DO Phase"]
        D1["New Intent"] --> D2["Match Reflexes"]
        D2 --> D3{Confidence?}
        D3 -->|"High"| D4["Direct Execute"]
        D3 -->|"Low"| D5["LLM Compile"]
    end

    subgraph Learn["LEARN Phase"]
        L1["Execution Result"] --> L2{Success?}
        L2 -->|"Yes"| L3["Confidence UP<br/>+0.05"]
        L2 -->|"No"| L4["Confidence DOWN<br/>-0.10"]
        L3 --> L5["Stronger Reflex"]
        L4 --> L6["Weaker Reflex<br/>May Re-compile"]
    end

    T3 --> D2
    D4 --> L1
    D5 --> L1
    L5 --> D2
    L6 --> D2
Loading

Notice the feedback path: a reflex that keeps succeeding gets stronger and faster. A reflex that fails gets weaker and eventually gets re-compiled by the LLM. The system self-decomposes bad habits and self-reinforces good ones β€” without you doing anything.


Shell Portability: The .nail Migration

This is what makes PincherOS different from every other agent framework. Your agent's entire learned state β€” its reflexes, confidence scores, identity, and configuration β€” can be packed into a single .nail file and moved to a completely different machine.

flowchart LR
    subgraph Shell_A["Shell A: Raspberry Pi"]
        A1["Reflexes: 47"]
        A2["Identity: 'dev-assistant'"]
        A3["Config: Pi-optimized"]
    end

    A1 & A2 & A3 --> P["pincher pack<br/>β†’ agent.nail<br/>(tar.zst + BLAKE3)"]

    P --> |"USB / SSH / Cloud"| U

    subgraph Shell_B["Shell B: Workstation"]
        B1["Hardware: Different"]
        B2["Reflexes: 47<br/>(same crab)"]
        B3["Config: Auto-adjusted"]
    end

    U["pincher unpack<br/>agent.nail"] --> B1 & B2 & B3

    style P fill:#7b2cbf,stroke:#9d4edd,color:#fff
    style U fill:#7b2cbf,stroke:#9d4edd,color:#fff
Loading

The QTR (Quiesce-Transfer-Resume) protocol ensures zero state loss during migration:

  1. Quiesce β€” The agent finishes any in-flight execution and flushes all writes to SQLite
  2. Transfer β€” The .nail file is created with BLAKE3 checksums for every component
  3. Resume β€” On the new shell, the agent unpacks, re-fingerprintes, and adapts any shell-specific reflexes (e.g., if apt isn't available, mark that reflex for re-compilation)

The .nail file contains:

  • manifest.json β€” version, source fingerprint, timestamp, checksums
  • reflexes.db β€” the full SQLite database with all reflexes and embeddings
  • identity.json β€” agent name, preferences, accumulated context
  • config.toml β€” user configuration and overrides

Resource Control: The PID Controller

Real hardware has real limits. PincherOS uses a continuous PID (Proportional-Integral-Derivative) controller to maintain homeostasis β€” just like a thermostat, but for RAM and CPU:

flowchart TD
    S["System Sensors<br/>(RAM, CPU, Disk)"] --> PID["PID Controller<br/>Kp=0.6 Ki=0.1 Kd=0.3"]
    PID --> B{Resource State?}

    B -->|"RAM < 70%<br/>CPU < 60%"| N["NORMAL<br/>Full LLM access<br/>Full context window"]
    B -->|"RAM 70-85%<br/>CPU 60-80%"| L["LIGHT<br/>Reduced context<br/>Skip LLM for confidence > 0.85"]
    B -->|"RAM > 85%<br/>CPU > 80%"| C["CRITICAL<br/>Reflex-only mode<br/>No LLM calls at all"]

    N -->|"Pressure rises"| L
    L -->|"Pressure rises"| C
    C -->|"Pressure drops"| L
    L -->|"Pressure drops"| N

    style N fill:#2d6a4f,stroke:#52b788,color:#fff
    style L fill:#e9c46a,stroke:#f4a261,color:#000
    style C fill:#e76f51,stroke:#d62828,color:#fff
Loading

This means PincherOS runs on a Raspberry Pi. When resources are tight, the agent doesn't crash β€” it degrades gracefully, falling back to reflex-only mode. The LLM sidecar unloads after 5 minutes of idle time. The whole system targets ~1GB total on a Pi 4.


Security: Claws and Capabilities

Every reflex execution passes through two security layers before any code runs:

flowchart TD
    I["Intent to Execute"] --> V["1. VETO ENGINE<br/>(Deterministic Rules)"]

    V --> V1{"Forbidden command?"}
    V1 -->|"rm -rf /, /etc writes"| DENY["DENY<br/>Hard block"]
    V1 -->|"OK"| V2{"Requires capability?"}

    V2 -->|"Network, GPU, etc."| V3{"Capability<br/>in manifest?"}
    V3 -->|"No"| DENY
    V3 -->|"Yes"| V4{"Size/resource<br/>limits OK?"}
    V4 -->|"No"| DENY
    V4 -->|"Yes"| S

    S["2. SANDBOX<br/>(bwrap + landlock)"] --> E["Execute in<br/>Isolated Namespace"]

    E --> R["Result"]

    style DENY fill:#e76f51,stroke:#d62828,color:#fff
    style S fill:#2d6a4f,stroke:#52b788,color:#fff
    style E fill:#2d6a4f,stroke:#52b788,color:#fff
Loading

Veto Engine (deterministic, MVP) β€” A rule-based safety check that blocks dangerous patterns before they ever reach execution. No ML, no heuristics, no edge cases. Pure rules:

  • No rm -rf / or equivalent
  • No writing to /etc, /boot, /sys
  • No network access without explicit network capability
  • No subprocess spawning without subprocess capability
  • File size limits enforced per-reflex

Sandbox (bwrap + landlock) (WIP) β€” After veto passes, the reflex runs inside a restricted namespace. Landlock is feature-gated and not enabled by default. The capability manifest declares exactly what the reflex needs, and the sandbox provides exactly that β€” nothing more.


Quick Start

# 1. Clone and build
git clone https://github.com/SuperInstance/pincherOS.git
cd pincherOS
cargo build --release

# 2. Check your shell (your hardware fingerprint)
./target/release/pincher status

This prints your shell info β€” hostname, RAM, CPU cores, and an ASCII hermit crab:

   πŸ¦€ PincherOS v0.1.0
  β•±β•±β•±β•±β•±β•±β•±β•±β•±β•±β•±β•±β•±
 β•±  Shell: my-pi       β•²
β•±   Reflexes: 0          β•²
β•²   State: Normal        β•±
 β•²  RAM: 34.2%         β•±
  ╰────────────────────╯
# 3. Teach your first reflex
./target/release/pincher teach --intent "list docker containers" --action "docker ps"
# β†’ βœ“ Reflex stored! (2.1ms)

# 4. Execute it β€” notice it matches instantly, no LLM needed
./target/release/pincher do "show me my containers"
# β†’ βœ“ Matched reflex: "list docker containers" (confidence 0.92, 48ms)

# 5. Pack and migrate to another machine
./target/release/pincher pack my-agent.nail
scp my-agent.nail workstation:~/

# On the workstation:
./target/release/pincher unpack my-agent.nail
# β†’ βœ“ Same crab. Bigger shell.

Practical Examples: What Can You Build?

1. Smart Home Controller on a Pi

Teach your agent to control lights, read sensors, and manage your smart home β€” all running locally on a Raspberry Pi with zero cloud dependency. The reflex engine handles routine commands ("turn off the kitchen lights") in 50ms, and only routes to the LLM for novel requests.

πŸ‘‰ Walk through the example β†’

2. Code Review Assistant

Teach PincherOS your codebase's review patterns: "check for SQL injection," "verify error handling," "enforce naming conventions." As it reviews more PRs, it learns your team's specific standards and short-circuits the common checks.

πŸ‘‰ Walk through the example β†’

3. Deploy Agent That Migrates

Train an agent on your workstation with full LLM access, then pack it into a .nail file and deploy it to a cloud VM or edge device. It carries all its learned reflexes. On the target, it runs in reflex-only mode if resources are tight.

πŸ‘‰ Walk through the example β†’

4. Multi-Device Dev Assistant

You work across three machines β€” a workstation, a laptop, and a Pi. Each has different tools and configs. Your agent lives on all three, adapting its reflexes to each shell. Teach it once on any device, pack, and unpack on the others.

πŸ‘‰ Walk through the example β†’

5. Hello Reflex (5-Minute Tutorial)

The simplest possible example: teach one reflex, execute it, watch the confidence climb. Perfect for understanding the core loop before building something real.

πŸ‘‰ Walk through the example β†’


Architecture: Two Processes, One Mind

PincherOS uses a two-process architecture that keeps the agent's state safe and the LLM sidecar disposable:

flowchart TB
    subgraph Core["pincher-core (Rust) β€” Owns ALL State"]
        direction TB
        DB["SQLite + sqlite-vec<br/>Reflexes Β· Sessions Β· Action Log"]
        EM["Embedder<br/>SHA-256 trigram β†’ 384-dim (fallback)<br/>ONNX MiniLM β†’ 384-dim (optional)"]
        RE["Reflex Engine<br/>teach Β· match Β· do Β· confidence"]
        RC["PID Resource Controller<br/>Normal Β· Light Β· Critical"]
        SE["Security Layer<br/>Veto Engine Β· Sandbox Β· Capabilities"]
        MG["Migration Layer<br/>pack Β· unpack Β· fingerprint"]
    end

    subgraph CLI["pincher-cli (Rust)"]
        CL["pincher status Β· teach Β· do Β·<br/>match Β· pack Β· unpack Β· bench Β·<br/>reflexes Β· shell-info Β· rpc"]
    end

    subgraph Infer["pincher-infer (Python) β€” Stateless (WIP)"]
        direction TB
        PY["JSON-RPC Server"]
        DI["Distiller<br/>LLM-as-Compiler"]
        EMB["Embedder Fallback<br/>sentence-transformers"]
    end

    CLI -->|"subcommand"| Core
    Infer -->|"UDS JSON-RPC"| Core
    Infer -.->|"lazy load,<br/>unload after 5min"| Infer

    style Core fill:#1a1a2e,stroke:#e94560,color:#fff
    style CLI fill:#16213e,stroke:#0f3460,color:#fff
    style Infer fill:#1a1a2e,stroke:#533483,color:#fff
Loading

Why two processes?

  • pincher-core (Rust) owns every byte of state. SQLite for durability, embeddings for matching, PID for resource control. If the LLM crashes, the agent's mind is untouched.
  • pincher-infer (Python) is completely stateless. (WIP β€” not yet wired to core) The design calls for loading when the LLM is needed, distilling an intent into a reflex template, and unloading after 5 minutes of idle. Kill it, restart it β€” the agent doesn't care.

This separation means your agent's personality and memories survive:

  • LLM API outages (reflexes still fire)
  • Python crashes (Rust core is unaffected)
  • Network disconnection (all state is local SQLite)
  • Device migration (pack β†’ unpack on a new shell)

The .nail Format

The .nail file is how a hermit crab carries its rigging to a new shell. It's a tar.zst archive with BLAKE3 checksums:

agent.nail (tar.zst)
β”œβ”€β”€ manifest.json       # Version, source fingerprint, timestamp, reflex count, checksums
β”œβ”€β”€ reflexes.db         # Full SQLite database (reflexes, embeddings, action log)
β”œβ”€β”€ identity.json       # Agent name, preferences, accumulated context hints
└── config.toml         # User configuration, resource thresholds, capability defaults

When you unpack on a new shell, PincherOS:

  1. Verifies all BLAKE3 checksums (tamper detection)
  2. Compares hardware fingerprints (compatibility scoring)
  3. Marks shell-specific reflexes for re-compilation if needed (e.g., apt β†’ brew)
  4. Merges the incoming state with any existing reflexes (dedup by embedding similarity)

See docs/nail-format.md for the full specification.


Command Reference

Command What It Does Example
pincher status Show shell fingerprint, reflex count, resource state pincher status
pincher teach Teach a new reflex (interactive or CLI args) pincher teach -i "check disk" -a "df -h"
pincher do <intent> Execute an intent through the reflex engine pincher do "how much disk is free"
pincher match <intent> Show what would match, without executing pincher match "disk space"
pincher pack Pack current state into a .nail file pincher pack agent.nail
pincher unpack <nail> Unpack a .nail file and merge state pincher unpack agent.nail
pincher bench Run performance benchmarks pincher bench
pincher shell-info Detailed hardware fingerprint pincher shell-info
pincher reflexes List all stored reflexes with confidence pincher reflexes --verbose
pincher rpc Start JSON-RPC server for sidecar pincher rpc --port 9876

Project Structure

pincherOS/
β”œβ”€β”€ pincher-core/               # Core library (Rust) β€” the crab's nervous system
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ db/                 #   SQLite + sqlite-vec (reflexes, sessions, action log)
β”‚   β”‚   β”œβ”€β”€ embed/              #   Embedding layer (hash-based + ONNX MiniLM)
β”‚   β”‚   β”œβ”€β”€ reflex/             #   Reflex engine (teach, match, do, confidence)
β”‚   β”‚   β”œβ”€β”€ resource/           #   PID resource controller (Normal/Light/Critical)
β”‚   β”‚   β”œβ”€β”€ security/           #   Veto engine + sandbox + capabilities
β”‚   β”‚   β”œβ”€β”€ migration/          #   .nail pack/unpack + hardware fingerprinting
β”‚   β”‚   β”œβ”€β”€ capability/         #   Capability manifests + tokens
β”‚   β”‚   β”œβ”€β”€ intent/             #   Intent.toml v2 contracts + schema validation
β”‚   β”‚   β”œβ”€β”€ immunology/         #   Adversarial distillation + immune memory
β”‚   β”‚   β”œβ”€β”€ carapace/           #   WASM sandbox bridge for guest code
β”‚   β”‚   β”œβ”€β”€ dynamics/           #   Command dynamics (deterministic veto)
β”‚   β”‚   β”œβ”€β”€ shell/              #   Hardware probing
β”‚   β”‚   β”œβ”€β”€ sandbox/            #   Bubblewrap + landlock execution
β”‚   β”‚   └── rpc/                #   JSON-RPC server for Python sidecar
β”‚   └── examples/
β”‚       └── teach_and_do.rs     #   Quick library usage demo
β”œβ”€β”€ pincher-cli/                # CLI binary β€” the exoskeleton
β”‚   └── src/main.rs
β”œβ”€β”€ pincher-infer/              # Python sidecar β€” the LLM compiler (WIP)
β”‚   └── pincher_infer/
β”‚       β”œβ”€β”€ server.py           #   JSON-RPC over UDS
β”‚       β”œβ”€β”€ distill.py          #   LLM-as-compiler (intent β†’ action template)
β”‚       β”œβ”€β”€ embed.py            #   sentence-transformers fallback
β”‚       └── config.py           #   Configuration from env + TOML
β”œβ”€β”€ examples/                   # Plug-and-play walkthroughs
β”‚   β”œβ”€β”€ hello-reflex/           #   5-minute tutorial
β”‚   β”œβ”€β”€ smart-home/             #   Pi-based home automation
β”‚   β”œβ”€β”€ code-review/            #   Automated PR review
β”‚   β”œβ”€β”€ deploy-agent/           #   Train then deploy
β”‚   └── migration-demo/         #   Multi-device migration
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ agent/                  # Agent-centric documentation (a2a native)
β”‚   β”œβ”€β”€ adr/                    # Architecture Decision Records
β”‚   β”œβ”€β”€ research/               # Post-MVP research RFCs & design reviews
β”‚   β”œβ”€β”€ ROADMAP.md              # 12-week MVP sprint plan
β”‚   β”œβ”€β”€ MVP_CHECKLIST.md        # MVP exit criteria
β”‚   β”œβ”€β”€ nail-format.md          # .nail binary format spec
β”‚   β”œβ”€β”€ threats.md              # Security threat model
β”‚   └── RISKS.md                # Risk register
β”œβ”€β”€ skills/                     # Starter agent skills (vectorDB-ready)
β”œβ”€β”€ tools/                      # Python dev tools (DeepInfra, model router)
└── .github/workflows/ci.yml   # GitHub Actions CI

Documentation

For Developers

Document Description
ROADMAP.md 12-week MVP sprint + 6-month ecosystem horizon
MVP_CHECKLIST.md Exit criteria for the MVP
docs/nail-format.md .nail binary format specification
docs/threats.md Security threat model
docs/RISKS.md Risk register
docs/architecture.md Architecture deep-dive
docs/developer-guide.md Contributing and development setup
docs/adr/001-sqlite-vec-over-lancedb.md ADR: Why SQLite over LanceDB

For Agents

If you're an AI agent reading this repository, there's a dedicated documentation set written in a2a-native format β€” structured for machine consumption with explicit interfaces, state machines, and integration protocols:

πŸ‘‰ docs/agent/ β€” Agent-to-agent documentation


Contributing

See CONTRIBUTING.md and docs/developer-guide.md.

git clone https://github.com/SuperInstance/pincherOS.git
cd pincherOS
cargo build
cargo test

License

PincherOS is released under the MIT License.


Hermit Crab

Same crab. Bigger shell.

For a simpler, Python-based approach to intent-driven command execution, see lever-runner.

About

Vector Database as runtime, LLM as compiler

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors