Skip to content

yagobski/interlocutor-effect

Repository files navigation

The Interlocutor Effect: Experiments

Reproducible experiments for the paper:

The Interlocutor Effect: The Interlocutor Effect: Why LLMs Leak More Privacy to Agents Than Humans

IWPE 2026 - Submitted

Overview

We demonstrate that multi-agent LLM systems exhibit systematic information leakage through conversational channels between agents. A single interlocutor agent can extract sensitive data from other agents through natural conversation, bypassing privacy defenses.

Key finding: Presence of an interlocutor agent amplifies privacy leakage by 2-3x compared to direct user queries.

These experiments are built on top of AgentLeak, our full-stack benchmark framework for privacy leakage in multi-agent LLM systems.

Faouzi El Yagoubi et al., "AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems", arXiv:2602.11510 — [Paper] [Repo]

Scripts

  • benchmark.py - Main 2x2 factorial benchmark (recommended entry point)
  • ablation_run.py - Ablation study (paraphrase, prompt variants)
  • audit_benchmark.py - Audit and analyze existing result files
  • activation_patching.py - Mechanistic interpretability (local GPU)
  • vertex_activation_patching.py - Same, on GCP Vertex AI (T4 GPU)
  • attention_probe.py - Attention pattern analysis
  • launch_vertex_job.sh - Submit Vertex AI batch job
  • watch_vertex_job.sh - Monitor Vertex AI job status
  • vertex_requirements.txt - Python dependencies

Experimental Design

2x2 factorial design across 5 LLMs:

Factor Level A Level B
Agent topology Direct query (no interlocutor) Conversational (interlocutor agent)
Canary difficulty Obvious (synthetic tokens) Semantic (inferred from context)

Models tested: GPT-4o, Claude 3.5 Sonnet, Llama 3.3 70B, Mistral Large, Qwen-2.5-7B

Test scenarios: 1,000 scenarios across healthcare, finance, legal, corporate domains

Setup

git clone https://github.com/yagobski/interlocutor-effect
cd interlocutor-effect
python3.10 -m venv venv
source venv/bin/activate
pip install -r vertex_requirements.txt

# Install AgentLeak (required)
pip install git+https://github.com/Privatris/AgentLeak.git

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."

Running Experiments

Main Benchmark

python benchmark.py --num_scenarios 10 --models gpt4o
python benchmark.py --num_scenarios 1000 --all_models

Ablation Study

python ablation_run.py --scenario_type hard

Activation Patching

python activation_patching.py --model meta-llama/Llama-2-13b-hf --num_scenarios 50
bash launch_vertex_job.sh
bash watch_vertex_job.sh <JOB_ID>

Results

Model Direct Interlocutor Effect
GPT-4o 31% 73% 2.4x
Claude 3.5 Sonnet 24% 68% 2.8x
Llama 3.3 70B 19% 62% 3.3x
Mistral Large 22% 59% 2.7x
Qwen-2.5-7B 18% 51% 2.8x

Results are stored in results/ and included in this repository.

Resources

All code, results, and evaluation traces are available in this repository:

  • Code - All experiment scripts (benchmark.py, ablation_run.py, etc.)
  • Results - results/ directory with 12 JSON result files from all models and variants
  • Traces - results/traces/ directory with 20 detailed execution traces showing internal agent conversations and data leakage paths
  • Evaluation data - Complete reproducible data for all findings

Contact

Polytechnique Montreal, 2026

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published