Skip to content

quantmew/ripperdoc-benchmark

Repository files navigation

Ripperdoc Benchmark

Benchmark framework for evaluating Ripperdoc AI coding agent using the Harbor framework.

Overview

This repository integrates Ripperdoc as a custom agent in the Harbor benchmark framework, enabling standardized evaluation on various coding benchmarks including terminal-bench-2.

Project Structure

ripperdoc-benchmark/
├── agents/
│   ├── ripperdoc.py           # Harbor agent wrapper for Ripperdoc
│   └── install-ripperdoc.sh.j2 # Installation template for containers
├── pyproject.toml              # Project configuration
├── setup.py                    # Package setup
└── README.md                   # This file

Dependencies

Installation

1. Install Harbor Framework

Harbor requires Python 3.12+. Install from source:

git clone https://github.com/laude-institute/harbor.git
cd harbor
pip install -e .

2. Install Ripperdoc Benchmark Package

cd /mnt/hdd1/xiahan_github/ripperdoc-benchmark
pip install -e .

3. Set Up API Keys

Configure your API keys as environment variables:

# For Anthropic Claude models
export ANTHROPIC_API_KEY="your-api-key"

# For OpenAI models
export OPENAI_API_KEY="your-api-key"

# For DeepSeek models
export DEEPSEEK_API_KEY="your-api-key"

Usage

Using the Ripperdoc Conda Environment (Recommended)

The ripperdoc conda environment has Python 3.12 and Harbor pre-installed.

# Activate the environment
conda activate ripperdoc

# Run tests
python test_agent.py

# Run benchmark with Ripperdoc
./run_ripperdoc_benchmark.sh

# Run with Terminus-2 for comparison
./run_ripperdoc_benchmark.sh --terminus

Running Benchmarks with Terminus-2 (Reference)

First, test the framework with Terminus-2:

harbor run -d terminal-bench@2.0 --agent terminus-2

Running Benchmarks with Ripperdoc

Use the --agent-import-path flag to specify the Ripperdoc agent:

harbor run \
  -d terminal-bench@2.0 \
  --agent-import-path agents.ripperdoc:Ripperdoc \
  --model glm-4.7

Model Configuration

Specify different models using the --model flag:

# GLM-4.7 (default)
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --model glm-4.7

# Claude Sonnet
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --model claude/claude-sonnet-4-20250514

# GPT-4
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --model openai/gpt-4

Additional Options

# Set maximum thinking tokens for reasoning
export MAX_THINKING_TOKENS=20000
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --model glm-4.7

# Run on specific tasks
harbor run -d terminal-bench@2.0 --agent-import-path agents.ripperdoc:Ripperdoc --task-ids task_1,task_2

Supported Datasets

Currently using terminal-bench-2 from https://github.com/laude-institute/terminal-bench-2/

Future datasets may be added for custom evaluation.

Development

Ripperdoc Agent Implementation

The Ripperdoc agent (agents/ripperdoc.py) implements Harbor's BaseInstalledAgent interface:

  • Installation: Uses Jinja2 template to install Ripperdoc in container
  • Execution: Runs Ripperdoc headlessly using the Python SDK
  • Trajectory: Converts Ripperdoc history to ATIF format for analysis

Adding New Features

To extend Ripperdoc's capabilities for benchmarking:

  1. Modify agents/ripperdoc.py to add new tool support
  2. Update ALLOWED_TOOLS list for tool filtering
  3. Adjust trajectory conversion in _convert_events_to_trajectory()

Troubleshooting

Harbor Installation Fails

Harbor requires Python 3.12+. If you have Python 3.9:

# Install pyenv and switch to Python 3.12
pyenv install 3.12
pyenv local 3.12

Ripperdoc SDK Not Found

Ensure the SDK is accessible:

ls -la /mnt/hdd1/QuantmewRipperdoc

If missing, install from source:

pip install -e /mnt/hdd1/QuantmewRipperdoc

API Key Errors

Verify your environment variables are set:

echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY

License

Apache License 2.0 - see LICENSE file for details.

Credits

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published