Ripperdoc Benchmark

Benchmark framework for evaluating Ripperdoc AI coding agent using the Harbor framework.

Overview

This repository integrates Ripperdoc as a custom agent in the Harbor benchmark framework, enabling standardized evaluation on various coding benchmarks including terminal-bench-2.

Project Structure

ripperdoc-benchmark/
├── agents/
│   ├── ripperdoc.py           # Harbor agent wrapper for Ripperdoc
│   └── install-ripperdoc.sh.j2 # Installation template for containers
├── pyproject.toml              # Project configuration
├── setup.py                    # Package setup
└── README.md                   # This file

Dependencies

Ripperdoc SDK: Located at /mnt/hdd1/QuantmewRipperdoc
Harbor Framework: https://github.com/laude-institute/harbor
Python: 3.10+

Installation

1. Install Harbor Framework

Harbor requires Python 3.12+. Install from source:

git clone https://github.com/laude-institute/harbor.git
cd harbor
pip install -e .

2. Install Ripperdoc Benchmark Package

cd /mnt/hdd1/xiahan_github/ripperdoc-benchmark
pip install -e .

3. Set Up API Keys

Configure your API keys as environment variables:

# For Anthropic Claude models
export ANTHROPIC_API_KEY="your-api-key"

# For OpenAI models
export OPENAI_API_KEY="your-api-key"

# For DeepSeek models
export DEEPSEEK_API_KEY="your-api-key"

Usage

Using the Ripperdoc Conda Environment (Recommended)

The ripperdoc conda environment has Python 3.12 and Harbor pre-installed.

# Activate the environment
conda activate ripperdoc

# Run tests
python test_agent.py

# Run benchmark with Ripperdoc
./run_ripperdoc_benchmark.sh

# Run with Terminus-2 for comparison
./run_ripperdoc_benchmark.sh --terminus

Running Benchmarks with Terminus-2 (Reference)

First, test the framework with Terminus-2:

harbor run -d terminal-bench@2.0 --agent terminus-2

Running Benchmarks with Ripperdoc

Use the --agent-import-path flag to specify the Ripperdoc agent:

harbor run \
  -d terminal-bench@2.0 \
  --agent-import-path agents.ripperdoc:Ripperdoc \
  --model glm-4.7

Model Configuration

Specify different models using the --model flag:

# GLM-4.7 (default)
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --model glm-4.7

# Claude Sonnet
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --model claude/claude-sonnet-4-20250514

# GPT-4
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --model openai/gpt-4

Additional Options

# Set maximum thinking tokens for reasoning
export MAX_THINKING_TOKENS=20000
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --model glm-4.7

# Run on specific tasks
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --task-ids task_1,task_2

Supported Datasets

Currently using terminal-bench-2 from https://github.com/laude-institute/terminal-bench-2/

Future datasets may be added for custom evaluation.

Development

Ripperdoc Agent Implementation

The Ripperdoc agent (agents/ripperdoc.py) implements Harbor's BaseInstalledAgent interface:

Installation: Uses Jinja2 template to install Ripperdoc in container
Execution: Runs Ripperdoc headlessly using the Python SDK
Trajectory: Converts Ripperdoc history to ATIF format for analysis

Adding New Features

To extend Ripperdoc's capabilities for benchmarking:

Modify agents/ripperdoc.py to add new tool support
Update ALLOWED_TOOLS list for tool filtering
Adjust trajectory conversion in _convert_events_to_trajectory()

Troubleshooting

Harbor Installation Fails

Harbor requires Python 3.12+. If you have Python 3.9:

# Install pyenv and switch to Python 3.12
pyenv install 3.12
pyenv local 3.12

Ripperdoc SDK Not Found

Ensure the SDK is accessible:

ls -la /mnt/hdd1/QuantmewRipperdoc

If missing, install from source:

pip install -e /mnt/hdd1/QuantmewRipperdoc

API Key Errors

Verify your environment variables are set:

echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY

License

Apache License 2.0 - see LICENSE file for details.

Credits

Ripperdoc - Open-source AI coding agent
Harbor Framework - Agent benchmark framework
Terminal Bench 2 - Benchmark dataset

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
agents		agents
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_claude_code.sh		run_claude_code.sh
run_codex.sh		run_codex.sh
run_oracle_test.sh		run_oracle_test.sh
run_ripperdoc.sh		run_ripperdoc.sh
run_terminus2.sh		run_terminus2.sh
setup.py		setup.py
test_agent.py		test_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ripperdoc Benchmark

Overview

Project Structure

Dependencies

Installation

1. Install Harbor Framework

2. Install Ripperdoc Benchmark Package

3. Set Up API Keys

Usage

Using the Ripperdoc Conda Environment (Recommended)

Running Benchmarks with Terminus-2 (Reference)

Running Benchmarks with Ripperdoc

Model Configuration

Additional Options

Supported Datasets

Development

Ripperdoc Agent Implementation

Adding New Features

Troubleshooting

Harbor Installation Fails

Ripperdoc SDK Not Found

API Key Errors

License

Credits

About

Uh oh!

Releases

Packages

Languages

License

quantmew/ripperdoc-benchmark

Folders and files

Latest commit

History

Repository files navigation

Ripperdoc Benchmark

Overview

Project Structure

Dependencies

Installation

1. Install Harbor Framework

2. Install Ripperdoc Benchmark Package

3. Set Up API Keys

Usage

Using the Ripperdoc Conda Environment (Recommended)

Running Benchmarks with Terminus-2 (Reference)

Running Benchmarks with Ripperdoc

Model Configuration

Additional Options

Supported Datasets

Development

Ripperdoc Agent Implementation

Adding New Features

Troubleshooting

Harbor Installation Fails

Ripperdoc SDK Not Found

API Key Errors

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages