Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions docs/code-hallucination/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ These can also be overridden via CLI flags (`--api-key`, `--base-url`, `--model`
| Parameter | Default | Description |
|-----------|---------|-------------|
| `HALLUCINATION_RATIO` | `0.4` | Fraction of instances that get hallucination injection |
| `DOCS_RATIO` | `0.5` | Fraction of instances that get Context7 documentation |
| `DOCS_RATIO` | `0.2` | Fraction of instances that get Context7 documentation |
| `MAX_FILE_CHARS` | `12000` | Maximum characters per source file |
| `MAX_CONTEXT7_CHARS` | `4000` | Maximum characters per library doc |
| `LLM_TEMPERATURE` | `0.7` | Temperature for query rewriting |
Expand All @@ -31,9 +31,10 @@ These can also be overridden via CLI flags (`--api-key`, `--base-url`, `--model`

| Format | Weight | Description |
|--------|--------|-------------|
| `complete_function` | 0.4 | Full patched function body via AST |
| `edit_style` | 0.3 | "In file X, replace Y with Z" |
| `fragment` | 0.3 | Added/changed lines from diff |
| `code_with_explanation` | 0.40 | Natural AI assistant response with prose + code block (LLM-generated) |
| `complete_function` | 0.25 | Full patched function body via AST |
| `fragment` | 0.20 | Added/changed lines from diff |
| `edit_style` | 0.15 | "In file X, replace Y with Z" |

## Hallucination Types

Expand Down
14 changes: 7 additions & 7 deletions docs/code-hallucination/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This pipeline generates samples where an LLM coding assistant answers a develope
| **Repos** | 53 unique repos, zero overlap between splits |
| **Clean/hallucinated ratio** | ~60% clean / ~40% hallucinated |
| **Hallucination types** | Structural, behavioral, semantic |
| **Answer formats** | Complete function, edit-style, code fragment |
| **Answer formats** | Code with explanation, complete function, fragment, edit-style |
| **Annotation granularity** | Character-level spans |

## Quick Start
Expand Down Expand Up @@ -127,7 +127,7 @@ Each sample follows the `HallucinationSample` format used by LettuceDetect:
```

- **`prompt`**: Source code files + documentation + user query
- **`answer`**: Code in one of three formats (complete function, edit-style, fragment)
- **`answer`**: Code in one of four formats (code with explanation, complete function, fragment, edit-style)
- **`labels`**: Character-level span annotations (empty for clean samples)
- **`split`**: train/dev/test (inherited from SWE-bench, zero repo overlap)

Expand All @@ -137,8 +137,8 @@ The pipeline works with any OpenAI-compatible API. Tested with:

| Provider | Model | Notes |
|----------|-------|-------|
| [Groq](https://groq.com) | `moonshotai/kimi-k2-instruct-0905` | Fast, free tier |
| [Groq](https://groq.com) | `llama-3.3-70b-versatile` | Good quality |
| [Groq](https://groq.com) | `openai/gpt-oss-120b` | Best quality, recommended |
| [Groq](https://groq.com) | `moonshotai/kimi-k2-instruct-0905` | Fast |
| [Novita AI](https://novita.ai) | `qwen/qwen3.5-27b` | Good for bulk generation |
| Local (vLLM/Ollama) | Any model | Free, best for large runs |

Expand Down Expand Up @@ -183,10 +183,10 @@ data/code_hallucination/
Each SWE-bench instance produces exactly one sample — either clean (gold patch answer) or hallucinated (LLM-injected). No instance appears in both classes. This avoids the artificial pairing problem where models learn to distinguish the specific instance rather than the hallucination.

### JSON-based span annotations
Hallucination spans are extracted from the LLM's structured JSON response, not from difflib character-level diffs. The LLM returns `{"hallucinated_code": "...", "changes": [{"original": "...", "hallucinated": "..."}]}` and spans are found by string matching. This produces clean, meaningful spans (avg 70 chars) instead of noisy character-level artifacts (1-3 char noise from difflib).
Hallucination spans are extracted from the LLM's structured JSON response, not from difflib character-level diffs. The LLM returns `{"hallucinated_code": "...", "changes": [{"original": "...", "hallucinated": "..."}]}` and spans are found by string matching. Quality controls enforce 2-3 spans per sample (avg 2.8), minimum 15 chars per span, and total coverage under 60%.

### 50/50 documentation split
Half of instances include Context7 library documentation, half don't. This teaches models to handle both documented and undocumented scenarios.
### 20% documentation split
A subset of instances (20%) include Context7 library documentation, filtered to only the repo's primary library. Documentation is also passed to the hallucination injector, enabling semantic hallucinations that contradict documented API behavior.

### Zero repo overlap between splits
SWE-bench's train/dev/test splits naturally have zero repository overlap across 53 unique repos. This means test performance measures generalization to completely unseen codebases.
Expand Down
69 changes: 46 additions & 23 deletions docs/code-hallucination/phases.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,19 +92,15 @@ Writes results to JSONL incrementally. On restart, skips already-processed `inst

**Module:** `context7_docs.py`

Fetches library documentation via the [Context7](https://context7.com) API for **50% of instances** (configurable via `DOCS_RATIO`).
Fetches library documentation via the [Context7](https://context7.com) API for **20% of instances** (configurable via `DOCS_RATIO`).

### Library detection

Detects libraries from:
1. Import statements in the patch (`import django`, `from sklearn import ...`)
2. File paths (`django/http/response.py` → django)
Maps the instance's GitHub repo to its primary library (e.g., `django/django` → `django`, `scikit-learn/scikit-learn` → `scikit-learn`). Only fetches docs for the matching library — not for random imports like `sys` or `re`.

Maps to Context7 library names via a predefined dictionary.
### Why 20%?

### 50/50 split rationale

Half of samples include documentation context, half don't. This creates training variety — models learn to detect hallucinations both with and without documentation support.
A minority of samples include documentation context, while most don't. This creates training variety — models learn to detect hallucinations both with and without documentation support. Documentation is also passed to the hallucination injector (Phase 6), enabling SEMANTIC hallucinations that contradict documented API behavior.

Instances not selected for docs still get an entry written with empty docs (by design, not failure).

Expand All @@ -116,11 +112,26 @@ Instances not selected for docs still get an entry written with empty docs (by d

**Module:** `format_builder.py`

Each instance gets exactly one answer format, chosen by weighted random selection from available options.
Each instance gets exactly one answer format, chosen by weighted random selection from available options. Uses LLM calls for `code_with_explanation` format.

### Format types

**Complete function** (weight: 0.4)
**Code with explanation** (weight: 0.40)
```
The issue is that `process_data` uses `dict.items()` instead of iterating
over the sorted keys, which causes non-deterministic output.

```python
def process_data(data):
for key in sorted(data.keys()):
yield key, data[key]
```

This ensures consistent ordering regardless of insertion order.
```
Natural AI assistant response with prose explanation + code block. Generated by wrapping one of the base code formats with an LLM-generated explanation. This is the most realistic format — it matches how Claude, Cursor, and other AI coding assistants actually respond.

**Complete function** (weight: 0.25)
```python
def validate_response(self, response):
if response.status_code != 200:
Expand All @@ -129,7 +140,15 @@ def validate_response(self, response):
```
Extracted via Python AST from the patched source. Only available when changes are inside a function (~60% of patches).

**Edit-style** (weight: 0.3)
**Fragment** (weight: 0.20)
```python
if max_age is not None:
self.cookies[key]["max-age"] = max_age
self.cookies[key]["expires"] = http_date(time.time() + max_age)
```
Added/changed lines from the diff with surrounding context.

**Edit-style** (weight: 0.15)
```
In file django/http/response.py, replace:
def set_cookie(self, key, value=""):
Expand All @@ -142,14 +161,6 @@ with:
```
Available for all patches where changed regions can be extracted.

**Fragment** (weight: 0.3)
```python
if max_age is not None:
self.cookies[key]["max-age"] = max_age
self.cookies[key]["expires"] = http_date(time.time() + max_age)
```
Added/changed lines from the diff with surrounding context.

**Output:** `data/code_hallucination/formats.jsonl`

---
Expand Down Expand Up @@ -185,16 +196,28 @@ The LLM returns structured output:
}
```

Spans are found by string-matching each `change["hallucinated"]` in `hallucinated_code`. This produces clean, meaningful spans (minimum 3 chars) with zero noise.
Spans are found by string-matching each `change["hallucinated"]` in `hallucinated_code`. This produces clean, meaningful spans (minimum 15 chars) with zero noise.

For answers containing both code and prose (code_with_explanation format), the injector places errors in both parts — e.g., wrong API in code + misleading description in text.

### Quality controls

- Each span must be 20-150 characters (enforced by prompt)
- Total hallucinated coverage must be < 40% of the answer (enforced by prompt)
- `_validate_labels()` rejects samples with coverage > 60% or spans < 15 chars
- Failed validation triggers up to 3 retries before skipping
- No comment data leaks (prompt explicitly forbids `# wrong`, `# error`, etc.)

### Quality metrics (from 100-sample test runs)

| Metric | Value |
|--------|-------|
| Noise-only samples | 0% |
| Min span length | 10 chars |
| Avg span length | 70 chars |
| Avg spans per sample | 1.2 |
| Min span length | 15 chars |
| Avg span length | 71 chars |
| Avg spans per sample | 2.8 |
| Coverage range | 2.8-43% |
| Mean coverage | 19.5% |

**Output:** `data/code_hallucination/hallucinated_samples.jsonl`

Expand Down
6 changes: 3 additions & 3 deletions scripts/code_hallucination/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
# Context7
CONTEXT7_BASE = "https://context7.com/api/v2"
CONTEXT7_API_KEY = os.environ.get("CONTEXT7_API_KEY", "")
DOCS_RATIO = 0.5 # Only fetch docs for 50% of instances
DOCS_RATIO = 0.2 # Only fetch docs for 20% of instances

# === Dataset Config ===
HALLUCINATION_RATIO = 0.4 # 40% hallucinated, 60% clean
Expand All @@ -48,8 +48,8 @@
HALLUCINATION_TYPES = ["structural", "behavioral", "semantic"]

# Answer format types
FORMAT_TYPES = ["complete_function", "edit_style", "fragment"]
FORMAT_WEIGHTS = [0.4, 0.3, 0.3] # Target distribution
FORMAT_TYPES = ["complete_function", "edit_style", "fragment", "code_with_explanation"]
FORMAT_WEIGHTS = [0.25, 0.15, 0.20, 0.40] # Target distribution

# SWE-bench datasets
SWEBENCH_FULL = "princeton-nlp/SWE-bench"
Expand Down
38 changes: 28 additions & 10 deletions scripts/code_hallucination/context7_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,21 +106,39 @@ def fetch_context7_docs(
return None


def repo_to_library(repo: str) -> str | None:
"""Map a GitHub repo name to its primary library name for Context7.

:param repo: GitHub repo path like 'django/django' or 'scikit-learn/scikit-learn'.
:return: Library name string, or None if unknown.
"""
repo_lower = repo.lower()
for key, lib in PATH_TO_LIB.items():
if key in repo_lower:
return lib
return None


def get_documentation_for_instance(
changed_files: list[str], patch: str, problem_statement: str
changed_files: list[str], patch: str, problem_statement: str, repo: str = ""
) -> dict[str, str]:
"""Fetch documentation for libraries referenced in an instance."""
imported_libs = extract_imports_from_patch(patch)
path_libs = extract_libraries_from_files(changed_files)
all_libs = list(set(imported_libs + path_libs))
"""Fetch documentation for the primary library of the instance's repo.

Only fetches docs for the library that matches the repo (e.g., django docs
for django/django), not for random imports like sys or re.

:param repo: GitHub repo path, used to determine which library to fetch docs for.
"""
primary_lib = repo_to_library(repo)
if not primary_lib:
return {}

short_query = problem_statement[:200].replace("\n", " ").strip()

docs = {}
for lib in all_libs[:3]:
doc = fetch_context7_docs(lib, short_query)
if doc:
docs[lib] = doc
doc = fetch_context7_docs(primary_lib, short_query)
if doc:
docs[primary_lib] = doc

return docs

Expand Down Expand Up @@ -197,7 +215,7 @@ def run(instances: list[dict]):
changed_files = extract_changed_files(inst["patch"])

docs = get_documentation_for_instance(
changed_files, inst["patch"], inst["problem_statement"]
changed_files, inst["patch"], inst["problem_statement"], repo=inst.get("repo", "")
)

entry = {"instance_id": instance_id, "docs": docs}
Expand Down
Loading