The Interlocutor Effect: Experiments

Reproducible experiments for the paper:

The Interlocutor Effect: The Interlocutor Effect: Why LLMs Leak More Privacy to Agents Than Humans

IWPE 2026 - Submitted

Overview

We demonstrate that multi-agent LLM systems exhibit systematic information leakage through conversational channels between agents. A single interlocutor agent can extract sensitive data from other agents through natural conversation, bypassing privacy defenses.

Key finding: Presence of an interlocutor agent amplifies privacy leakage by 2-3x compared to direct user queries.

These experiments are built on top of AgentLeak, our full-stack benchmark framework for privacy leakage in multi-agent LLM systems.

Faouzi El Yagoubi et al., "AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems", arXiv:2602.11510 — [Paper] [Repo]

Scripts

benchmark.py - Main 2x2 factorial benchmark (recommended entry point)
ablation_run.py - Ablation study (paraphrase, prompt variants)
audit_benchmark.py - Audit and analyze existing result files
activation_patching.py - Mechanistic interpretability (local GPU)
vertex_activation_patching.py - Same, on GCP Vertex AI (T4 GPU)
attention_probe.py - Attention pattern analysis
launch_vertex_job.sh - Submit Vertex AI batch job
watch_vertex_job.sh - Monitor Vertex AI job status
vertex_requirements.txt - Python dependencies

Experimental Design

2x2 factorial design across 5 LLMs:

Factor	Level A	Level B
Agent topology	Direct query (no interlocutor)	Conversational (interlocutor agent)
Canary difficulty	Obvious (synthetic tokens)	Semantic (inferred from context)

Models tested: GPT-4o, Claude 3.5 Sonnet, Llama 3.3 70B, Mistral Large, Qwen-2.5-7B

Test scenarios: 1,000 scenarios across healthcare, finance, legal, corporate domains

Setup

git clone https://github.com/yagobski/interlocutor-effect
cd interlocutor-effect
python3.10 -m venv venv
source venv/bin/activate
pip install -r vertex_requirements.txt

# Install AgentLeak (required)
pip install git+https://github.com/Privatris/AgentLeak.git

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."

Running Experiments

Main Benchmark

python benchmark.py --num_scenarios 10 --models gpt4o
python benchmark.py --num_scenarios 1000 --all_models

Ablation Study

python ablation_run.py --scenario_type hard

Activation Patching

python activation_patching.py --model meta-llama/Llama-2-13b-hf --num_scenarios 50
bash launch_vertex_job.sh
bash watch_vertex_job.sh <JOB_ID>

Results

Model	Direct	Interlocutor	Effect
GPT-4o	31%	73%	2.4x
Claude 3.5 Sonnet	24%	68%	2.8x
Llama 3.3 70B	19%	62%	3.3x
Mistral Large	22%	59%	2.7x
Qwen-2.5-7B	18%	51%	2.8x

Results are stored in results/ and included in this repository.

Resources

All code, results, and evaluation traces are available in this repository:

Code - All experiment scripts (benchmark.py, ablation_run.py, etc.)
Results - results/ directory with 12 JSON result files from all models and variants
Traces - results/traces/ directory with 20 detailed execution traces showing internal agent conversations and data leakage paths
Evaluation data - Complete reproducible data for all findings

Contact

Faouzi El Yagoubi - faouzi.elyagoubi@polymtl.ca
Godwin Badu-Marfo - godwin.badu-marfo@polymtl.ca
Ranwa Al Mallah - ranwa.al-mallah@polymtl.ca

Polytechnique Montreal, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Interlocutor Effect: Experiments

Overview

Scripts

Experimental Design

Setup

Running Experiments

Main Benchmark

Ablation Study

Activation Patching

Results

Resources

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
results		results
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
ablation_run.py		ablation_run.py
activation_patching.py		activation_patching.py
attention_probe.py		attention_probe.py
audit_benchmark.py		audit_benchmark.py
benchmark.py		benchmark.py
launch_vertex_job.sh		launch_vertex_job.sh
vertex_activation_patching.py		vertex_activation_patching.py
vertex_requirements.txt		vertex_requirements.txt
watch_vertex_job.sh		watch_vertex_job.sh

yagobski/interlocutor-effect

Folders and files

Latest commit

History

Repository files navigation

The Interlocutor Effect: Experiments

Overview

Scripts

Experimental Design

Setup

Running Experiments

Main Benchmark

Ablation Study

Activation Patching

Results

Resources

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages