KRLabsOrg · adaamko · Mar 5, 2026 · Mar 5, 2026
diff --git a/docs/code-hallucination/configuration.md b/docs/code-hallucination/configuration.md
@@ -19,7 +19,7 @@ These can also be overridden via CLI flags (`--api-key`, `--base-url`, `--model`
 | Parameter | Default | Description |
 |-----------|---------|-------------|
 | `HALLUCINATION_RATIO` | `0.4` | Fraction of instances that get hallucination injection |
-| `DOCS_RATIO` | `0.5` | Fraction of instances that get Context7 documentation |
+| `DOCS_RATIO` | `0.2` | Fraction of instances that get Context7 documentation |
 | `MAX_FILE_CHARS` | `12000` | Maximum characters per source file |
 | `MAX_CONTEXT7_CHARS` | `4000` | Maximum characters per library doc |
 | `LLM_TEMPERATURE` | `0.7` | Temperature for query rewriting |
@@ -31,9 +31,10 @@ These can also be overridden via CLI flags (`--api-key`, `--base-url`, `--model`
 
 | Format | Weight | Description |
 |--------|--------|-------------|
-| `complete_function` | 0.4 | Full patched function body via AST |
-| `edit_style` | 0.3 | "In file X, replace Y with Z" |
-| `fragment` | 0.3 | Added/changed lines from diff |
+| `code_with_explanation` | 0.40 | Natural AI assistant response with prose + code block (LLM-generated) |
+| `complete_function` | 0.25 | Full patched function body via AST |
+| `fragment` | 0.20 | Added/changed lines from diff |
+| `edit_style` | 0.15 | "In file X, replace Y with Z" |
 
 ## Hallucination Types
 

diff --git a/docs/code-hallucination/index.md b/docs/code-hallucination/index.md
@@ -17,7 +17,7 @@ This pipeline generates samples where an LLM coding assistant answers a develope
 | **Repos** | 53 unique repos, zero overlap between splits |
 | **Clean/hallucinated ratio** | ~60% clean / ~40% hallucinated |
 | **Hallucination types** | Structural, behavioral, semantic |
-| **Answer formats** | Complete function, edit-style, code fragment |
+| **Answer formats** | Code with explanation, complete function, fragment, edit-style |
 | **Annotation granularity** | Character-level spans |
 
 ## Quick Start
@@ -127,7 +127,7 @@ Each sample follows the `HallucinationSample` format used by LettuceDetect:
 ```
 
 - **`prompt`**: Source code files + documentation + user query
-- **`answer`**: Code in one of three formats (complete function, edit-style, fragment)
+- **`answer`**: Code in one of four formats (code with explanation, complete function, fragment, edit-style)
 - **`labels`**: Character-level span annotations (empty for clean samples)
 - **`split`**: train/dev/test (inherited from SWE-bench, zero repo overlap)
 
@@ -137,8 +137,8 @@ The pipeline works with any OpenAI-compatible API. Tested with:
 
 | Provider | Model | Notes |
 |----------|-------|-------|
-| [Groq](https://groq.com) | `moonshotai/kimi-k2-instruct-0905` | Fast, free tier |
-| [Groq](https://groq.com) | `llama-3.3-70b-versatile` | Good quality |
+| [Groq](https://groq.com) | `openai/gpt-oss-120b` | Best quality, recommended |
+| [Groq](https://groq.com) | `moonshotai/kimi-k2-instruct-0905` | Fast |
 | [Novita AI](https://novita.ai) | `qwen/qwen3.5-27b` | Good for bulk generation |
 | Local (vLLM/Ollama) | Any model | Free, best for large runs |
 
@@ -183,10 +183,10 @@ data/code_hallucination/
 Each SWE-bench instance produces exactly one sample — either clean (gold patch answer) or hallucinated (LLM-injected). No instance appears in both classes. This avoids the artificial pairing problem where models learn to distinguish the specific instance rather than the hallucination.
 
 ### JSON-based span annotations
-Hallucination spans are extracted from the LLM's structured JSON response, not from difflib character-level diffs. The LLM returns `{"hallucinated_code": "...", "changes": [{"original": "...", "hallucinated": "..."}]}` and spans are found by string matching. This produces clean, meaningful spans (avg 70 chars) instead of noisy character-level artifacts (1-3 char noise from difflib).
+Hallucination spans are extracted from the LLM's structured JSON response, not from difflib character-level diffs. The LLM returns `{"hallucinated_code": "...", "changes": [{"original": "...", "hallucinated": "..."}]}` and spans are found by string matching. Quality controls enforce 2-3 spans per sample (avg 2.8), minimum 15 chars per span, and total coverage under 60%.
 
-### 50/50 documentation split
-Half of instances include Context7 library documentation, half don't. This teaches models to handle both documented and undocumented scenarios.
+### 20% documentation split
+A subset of instances (20%) include Context7 library documentation, filtered to only the repo's primary library. Documentation is also passed to the hallucination injector, enabling semantic hallucinations that contradict documented API behavior.
 
 ### Zero repo overlap between splits
 SWE-bench's train/dev/test splits naturally have zero repository overlap across 53 unique repos. This means test performance measures generalization to completely unseen codebases.

diff --git a/docs/code-hallucination/phases.md b/docs/code-hallucination/phases.md
@@ -92,19 +92,15 @@ Writes results to JSONL incrementally. On restart, skips already-processed `inst
 
 **Module:** `context7_docs.py`
 
-Fetches library documentation via the [Context7](https://context7.com) API for **50% of instances** (configurable via `DOCS_RATIO`).
+Fetches library documentation via the [Context7](https://context7.com) API for **20% of instances** (configurable via `DOCS_RATIO`).
 
 ### Library detection
 
-Detects libraries from:
-1. Import statements in the patch (`import django`, `from sklearn import ...`)
-2. File paths (`django/http/response.py` → django)
+Maps the instance's GitHub repo to its primary library (e.g., `django/django` → `django`, `scikit-learn/scikit-learn` → `scikit-learn`). Only fetches docs for the matching library — not for random imports like `sys` or `re`.
 
-Maps to Context7 library names via a predefined dictionary.
+### Why 20%?
 
-### 50/50 split rationale
-
-Half of samples include documentation context, half don't. This creates training variety — models learn to detect hallucinations both with and without documentation support.
+A minority of samples include documentation context, while most don't. This creates training variety — models learn to detect hallucinations both with and without documentation support. Documentation is also passed to the hallucination injector (Phase 6), enabling SEMANTIC hallucinations that contradict documented API behavior.
 
 Instances not selected for docs still get an entry written with empty docs (by design, not failure).
 
@@ -116,11 +112,26 @@ Instances not selected for docs still get an entry written with empty docs (by d
 
 **Module:** `format_builder.py`
 
-Each instance gets exactly one answer format, chosen by weighted random selection from available options.
+Each instance gets exactly one answer format, chosen by weighted random selection from available options. Uses LLM calls for `code_with_explanation` format.
 
 ### Format types
 
-**Complete function** (weight: 0.4)
+**Code with explanation** (weight: 0.40)
+```
+The issue is that `process_data` uses `dict.items()` instead of iterating
+over the sorted keys, which causes non-deterministic output.
+
+```python
+def process_data(data):
+    for key in sorted(data.keys()):
+        yield key, data[key]
+```
+
+This ensures consistent ordering regardless of insertion order.
+```
+Natural AI assistant response with prose explanation + code block. Generated by wrapping one of the base code formats with an LLM-generated explanation. This is the most realistic format — it matches how Claude, Cursor, and other AI coding assistants actually respond.
+
+**Complete function** (weight: 0.25)
 ```python
 def validate_response(self, response):
     if response.status_code != 200:
@@ -129,7 +140,15 @@ def validate_response(self, response):
 ```
 Extracted via Python AST from the patched source. Only available when changes are inside a function (~60% of patches).
 
-**Edit-style** (weight: 0.3)
+**Fragment** (weight: 0.20)
+```python
+if max_age is not None:
+    self.cookies[key]["max-age"] = max_age
+    self.cookies[key]["expires"] = http_date(time.time() + max_age)
+```
+Added/changed lines from the diff with surrounding context.
+
+**Edit-style** (weight: 0.15)
 ```
 In file django/http/response.py, replace:
     def set_cookie(self, key, value=""):
@@ -142,14 +161,6 @@ with:
 ```
 Available for all patches where changed regions can be extracted.
 
-**Fragment** (weight: 0.3)
-```python
-if max_age is not None:
-    self.cookies[key]["max-age"] = max_age
-    self.cookies[key]["expires"] = http_date(time.time() + max_age)
-```
-Added/changed lines from the diff with surrounding context.
-
 **Output:** `data/code_hallucination/formats.jsonl`
 
 ---
@@ -185,16 +196,28 @@ The LLM returns structured output:
 }
 ```
 
-Spans are found by string-matching each `change["hallucinated"]` in `hallucinated_code`. This produces clean, meaningful spans (minimum 3 chars) with zero noise.
+Spans are found by string-matching each `change["hallucinated"]` in `hallucinated_code`. This produces clean, meaningful spans (minimum 15 chars) with zero noise.
+
+For answers containing both code and prose (code_with_explanation format), the injector places errors in both parts — e.g., wrong API in code + misleading description in text.
+
+### Quality controls
+
+- Each span must be 20-150 characters (enforced by prompt)
+- Total hallucinated coverage must be < 40% of the answer (enforced by prompt)
+- `_validate_labels()` rejects samples with coverage > 60% or spans < 15 chars
+- Failed validation triggers up to 3 retries before skipping
+- No comment data leaks (prompt explicitly forbids `# wrong`, `# error`, etc.)
 
 ### Quality metrics (from 100-sample test runs)
 
 | Metric | Value |
 |--------|-------|
 | Noise-only samples | 0% |
-| Min span length | 10 chars |
-| Avg span length | 70 chars |
-| Avg spans per sample | 1.2 |
+| Min span length | 15 chars |
+| Avg span length | 71 chars |
+| Avg spans per sample | 2.8 |
+| Coverage range | 2.8-43% |
+| Mean coverage | 19.5% |
 
 **Output:** `data/code_hallucination/hallucinated_samples.jsonl`
 

diff --git a/scripts/code_hallucination/config.py b/scripts/code_hallucination/config.py
@@ -30,7 +30,7 @@
 # Context7
 CONTEXT7_BASE = "https://context7.com/api/v2"
 CONTEXT7_API_KEY = os.environ.get("CONTEXT7_API_KEY", "")
-DOCS_RATIO = 0.5  # Only fetch docs for 50% of instances
+DOCS_RATIO = 0.2  # Only fetch docs for 20% of instances
 
 # === Dataset Config ===
 HALLUCINATION_RATIO = 0.4  # 40% hallucinated, 60% clean
@@ -48,8 +48,8 @@
 HALLUCINATION_TYPES = ["structural", "behavioral", "semantic"]
 
 # Answer format types
-FORMAT_TYPES = ["complete_function", "edit_style", "fragment"]
-FORMAT_WEIGHTS = [0.4, 0.3, 0.3]  # Target distribution
+FORMAT_TYPES = ["complete_function", "edit_style", "fragment", "code_with_explanation"]
+FORMAT_WEIGHTS = [0.25, 0.15, 0.20, 0.40]  # Target distribution
 
 # SWE-bench datasets
 SWEBENCH_FULL = "princeton-nlp/SWE-bench"

diff --git a/scripts/code_hallucination/context7_docs.py b/scripts/code_hallucination/context7_docs.py
@@ -106,21 +106,39 @@ def fetch_context7_docs(
         return None
 
 
+def repo_to_library(repo: str) -> str | None:
+    """Map a GitHub repo name to its primary library name for Context7.
+
+    :param repo: GitHub repo path like 'django/django' or 'scikit-learn/scikit-learn'.
+    :return: Library name string, or None if unknown.
+    """
+    repo_lower = repo.lower()
+    for key, lib in PATH_TO_LIB.items():
+        if key in repo_lower:
+            return lib
+    return None
+
+
 def get_documentation_for_instance(
-    changed_files: list[str], patch: str, problem_statement: str
+    changed_files: list[str], patch: str, problem_statement: str, repo: str = ""
 ) -> dict[str, str]:
-    """Fetch documentation for libraries referenced in an instance."""
-    imported_libs = extract_imports_from_patch(patch)
-    path_libs = extract_libraries_from_files(changed_files)
-    all_libs = list(set(imported_libs + path_libs))
+    """Fetch documentation for the primary library of the instance's repo.
+
+    Only fetches docs for the library that matches the repo (e.g., django docs
+    for django/django), not for random imports like sys or re.
+
+    :param repo: GitHub repo path, used to determine which library to fetch docs for.
+    """
+    primary_lib = repo_to_library(repo)
+    if not primary_lib:
+        return {}
 
     short_query = problem_statement[:200].replace("\n", " ").strip()
 
     docs = {}
-    for lib in all_libs[:3]:
-        doc = fetch_context7_docs(lib, short_query)
-        if doc:
-            docs[lib] = doc
+    doc = fetch_context7_docs(primary_lib, short_query)
+    if doc:
+        docs[primary_lib] = doc
 
     return docs
 
@@ -197,7 +215,7 @@ def run(instances: list[dict]):
                 changed_files = extract_changed_files(inst["patch"])
 
             docs = get_documentation_for_instance(
-                changed_files, inst["patch"], inst["problem_statement"]
+                changed_files, inst["patch"], inst["problem_statement"], repo=inst.get("repo", "")
             )
 
             entry = {"instance_id": instance_id, "docs": docs}