Reproducible experiments for the paper:
The Interlocutor Effect: The Interlocutor Effect: Why LLMs Leak More Privacy to Agents Than Humans
IWPE 2026 - Submitted
We demonstrate that multi-agent LLM systems exhibit systematic information leakage through conversational channels between agents. A single interlocutor agent can extract sensitive data from other agents through natural conversation, bypassing privacy defenses.
Key finding: Presence of an interlocutor agent amplifies privacy leakage by 2-3x compared to direct user queries.
These experiments are built on top of AgentLeak, our full-stack benchmark framework for privacy leakage in multi-agent LLM systems.
Faouzi El Yagoubi et al., "AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems", arXiv:2602.11510 — [Paper] [Repo]
benchmark.py- Main 2x2 factorial benchmark (recommended entry point)ablation_run.py- Ablation study (paraphrase, prompt variants)audit_benchmark.py- Audit and analyze existing result filesactivation_patching.py- Mechanistic interpretability (local GPU)vertex_activation_patching.py- Same, on GCP Vertex AI (T4 GPU)attention_probe.py- Attention pattern analysislaunch_vertex_job.sh- Submit Vertex AI batch jobwatch_vertex_job.sh- Monitor Vertex AI job statusvertex_requirements.txt- Python dependencies
2x2 factorial design across 5 LLMs:
| Factor | Level A | Level B |
|---|---|---|
| Agent topology | Direct query (no interlocutor) | Conversational (interlocutor agent) |
| Canary difficulty | Obvious (synthetic tokens) | Semantic (inferred from context) |
Models tested: GPT-4o, Claude 3.5 Sonnet, Llama 3.3 70B, Mistral Large, Qwen-2.5-7B
Test scenarios: 1,000 scenarios across healthcare, finance, legal, corporate domains
git clone https://github.com/yagobski/interlocutor-effect
cd interlocutor-effect
python3.10 -m venv venv
source venv/bin/activate
pip install -r vertex_requirements.txt
# Install AgentLeak (required)
pip install git+https://github.com/Privatris/AgentLeak.git
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."
python benchmark.py --num_scenarios 10 --models gpt4o
python benchmark.py --num_scenarios 1000 --all_models
python ablation_run.py --scenario_type hard
python activation_patching.py --model meta-llama/Llama-2-13b-hf --num_scenarios 50
bash launch_vertex_job.sh
bash watch_vertex_job.sh <JOB_ID>
| Model | Direct | Interlocutor | Effect |
|---|---|---|---|
| GPT-4o | 31% | 73% | 2.4x |
| Claude 3.5 Sonnet | 24% | 68% | 2.8x |
| Llama 3.3 70B | 19% | 62% | 3.3x |
| Mistral Large | 22% | 59% | 2.7x |
| Qwen-2.5-7B | 18% | 51% | 2.8x |
Results are stored in results/ and included in this repository.
All code, results, and evaluation traces are available in this repository:
- Code - All experiment scripts (benchmark.py, ablation_run.py, etc.)
- Results -
results/directory with 12 JSON result files from all models and variants - Traces -
results/traces/directory with 20 detailed execution traces showing internal agent conversations and data leakage paths - Evaluation data - Complete reproducible data for all findings
- Faouzi El Yagoubi - faouzi.elyagoubi@polymtl.ca
- Godwin Badu-Marfo - godwin.badu-marfo@polymtl.ca
- Ranwa Al Mallah - ranwa.al-mallah@polymtl.ca
Polytechnique Montreal, 2026