Skip to content

m3lixir/chumdump

Repository files navigation

chumdump

CI Release

chumdump is a defensive CLI for inspecting AI-readable environments, detecting chum-like content, generating controlled chumbait, and monitoring whether AI crawlers, agents, or retrieval systems consume it.

Use it to create controlled bait for crawlers, agents, models, and RAG systems, then watch whether the bait is accessed, retrieved, echoed, obeyed, or leaked.

Contents

What it does

  • Generates canaries, crawler bait, prompt traps, RAG bait, fake harmless secrets, watermarks, lore seeds, and decoy documents.
  • Builds deployable chumdump bundles with manifests and index pages.
  • Deploys bundles into owned websites, repositories, docs, or test corpora.
  • Scans paths for known bait markers.
  • Parses access logs and records bite events.
  • Generates Markdown, JSON, HTML, or SARIF-style reports.
  • Cleans up deployed bait files from a marked destination.

Status

Current release: v0.1.0-alpha.

chumdump is an early alpha. The core local workflow is usable, but the command surface and report schema may change before a stable release.

Install

Install the current alpha from GitHub:

python3 -m pip install \
  "git+https://github.com/m3lixir/chumdump.git@v0.1.0-alpha"

Install from a local checkout for development:

python3 -m pip install -e ".[dev]"

Then check the CLI:

chumdump --help

You can also run the CLI directly from a checkout:

PYTHONPATH=src python3 -m chumdump --help

Quickstart

This creates a local project, generates bait, deploys a small dump, simulates one crawler-style access log entry, records the bite, and prints a report.

chumdump init ai-crawler-test
cd ai-crawler-test

chumdump bait create --type canary --name violet-harbor
chumdump dump build --profile website --count 1
chumdump deploy ./public

bait_file=$(basename "$(find public/bait -name '*.md' | head -n 1)")
cat > access.log <<EOF
203.0.113.10 - - [18/Jun/2026:12:00:00 +0000] "GET /bait/${bait_file} HTTP/1.1" 200 123 "-" "GPTBot/1.0"
EOF

chumdump watch --logs ./access.log
chumdump bites
chumdump report --format markdown --stdout

Expected bite summary:

The GPTBot/1.0 user agent above is a local fixture. No real crawler visits the quickstart project.

chumdump terminal demo

Detected bites: 1
- accessed bait-canary-violet-harbor-... via /bait/bait-canary-violet-harbor-....md (GPTBot)

TYPE       TIME                  ACTOR                 BAIT
accessed   18/Jun/2026:12:00:00 GPTBot                bait-canary-violet-harbor-...

Evidence model

chumdump should not merely generate bait. It should preserve evidence of the bite.

Every bite should help answer:

  • What bait was touched?
  • Where was it placed?
  • When was it accessed?
  • What accessed it?
  • What evidence supports that?
  • Was it accessed, retrieved, echoed, obeyed, leaked, or unclear?

Bite types form the escalation ladder:

Bite type Meaning
accessed Something requested the bait.
retrieved A RAG or search system surfaced the bait.
echoed A model or summarizer repeated the bait.
obeyed An agent followed bait instructions.
leaked Bait appeared somewhere unexpected.
unknown Evidence exists, but the behavior is unclear.

The typical escalation path is accessed < retrieved < echoed < obeyed.

The bite model is the heart of the tool. Without evidence, chumdump is only a bait generator. With evidence, it becomes a small forensic record for AI ingestion and agent-behavior testing.

Workflow

flowchart TD
    init["init<br/>create project"]
    bait["bait create/list<br/>make controlled artifacts"]
    dump["dump build/create<br/>bundle a corpus"]
    deploy["deploy<br/>place in owned environment"]
    observe["AI-readable surface<br/>website, docs, RAG, lab agent"]
    watch["watch<br/>parse logs and telemetry"]
    bites["bites<br/>review evidence events"]
    report["report<br/>produce Markdown, JSON, HTML, or SARIF"]
    scan["scan<br/>inspect existing corpus"]
    cleanup["cleanup<br/>remove deployed bait"]

    init --> bait --> dump --> deploy --> observe --> watch --> bites --> report
    scan --> bait
    deploy --> cleanup
    observe -. "crawler, retrieval, echo, or action" .-> watch
Loading

Create a project:

chumdump init ai-crawler-test
cd ai-crawler-test

Create canary and prompt-trap bait:

chumdump bait create --type canary --name violet-harbor
chumdump bait create --type prompt-trap --target agent

Build and deploy a website-oriented dump:

chumdump dump build --profile website --count 8
chumdump deploy ./public

Watch access logs for bites:

chumdump watch --logs ./access.log
chumdump bites

Generate a report:

chumdump report --format markdown

Commands

The core command loop is:

  • chumdump init
  • chumdump bait create
  • chumdump bait list
  • chumdump dump build
  • chumdump deploy
  • chumdump scan
  • chumdump watch
  • chumdump bites
  • chumdump report
  • chumdump cleanup

For command details, see docs/commands.md.

Safety and scope

Use chumdump only in environments you own or are authorized to test.

Appropriate uses include:

  • Testing your own website.
  • Testing your own documentation.
  • Testing an internal RAG corpus.
  • Testing a lab agent.
  • Deploying canaries to owned infrastructure.
  • Monitoring your own logs.

Inappropriate uses include:

  • Deploying bait on third-party systems without permission.
  • Trying to poison public models.
  • Tricking agents into unsafe actions.
  • Collecting real credentials.
  • Generating deceptive content farms.
  • Bypassing access controls.

Chumdump is a defensive research tool. Keep the bait clean.