diff --git a/docs/code-hallucination/configuration.md b/docs/code-hallucination/configuration.md index e77e073..d87ee08 100644 --- a/docs/code-hallucination/configuration.md +++ b/docs/code-hallucination/configuration.md @@ -19,7 +19,7 @@ These can also be overridden via CLI flags (`--api-key`, `--base-url`, `--model` | Parameter | Default | Description | |-----------|---------|-------------| | `HALLUCINATION_RATIO` | `0.4` | Fraction of instances that get hallucination injection | -| `DOCS_RATIO` | `0.5` | Fraction of instances that get Context7 documentation | +| `DOCS_RATIO` | `0.2` | Fraction of instances that get Context7 documentation | | `MAX_FILE_CHARS` | `12000` | Maximum characters per source file | | `MAX_CONTEXT7_CHARS` | `4000` | Maximum characters per library doc | | `LLM_TEMPERATURE` | `0.7` | Temperature for query rewriting | @@ -31,9 +31,10 @@ These can also be overridden via CLI flags (`--api-key`, `--base-url`, `--model` | Format | Weight | Description | |--------|--------|-------------| -| `complete_function` | 0.4 | Full patched function body via AST | -| `edit_style` | 0.3 | "In file X, replace Y with Z" | -| `fragment` | 0.3 | Added/changed lines from diff | +| `code_with_explanation` | 0.40 | Natural AI assistant response with prose + code block (LLM-generated) | +| `complete_function` | 0.25 | Full patched function body via AST | +| `fragment` | 0.20 | Added/changed lines from diff | +| `edit_style` | 0.15 | "In file X, replace Y with Z" | ## Hallucination Types diff --git a/docs/code-hallucination/index.md b/docs/code-hallucination/index.md index 4a6f01f..5cd0c90 100644 --- a/docs/code-hallucination/index.md +++ b/docs/code-hallucination/index.md @@ -17,7 +17,7 @@ This pipeline generates samples where an LLM coding assistant answers a develope | **Repos** | 53 unique repos, zero overlap between splits | | **Clean/hallucinated ratio** | ~60% clean / ~40% hallucinated | | **Hallucination types** | Structural, behavioral, semantic | -| **Answer formats** | Complete function, edit-style, code fragment | +| **Answer formats** | Code with explanation, complete function, fragment, edit-style | | **Annotation granularity** | Character-level spans | ## Quick Start @@ -127,7 +127,7 @@ Each sample follows the `HallucinationSample` format used by LettuceDetect: ``` - **`prompt`**: Source code files + documentation + user query -- **`answer`**: Code in one of three formats (complete function, edit-style, fragment) +- **`answer`**: Code in one of four formats (code with explanation, complete function, fragment, edit-style) - **`labels`**: Character-level span annotations (empty for clean samples) - **`split`**: train/dev/test (inherited from SWE-bench, zero repo overlap) @@ -137,8 +137,8 @@ The pipeline works with any OpenAI-compatible API. Tested with: | Provider | Model | Notes | |----------|-------|-------| -| [Groq](https://groq.com) | `moonshotai/kimi-k2-instruct-0905` | Fast, free tier | -| [Groq](https://groq.com) | `llama-3.3-70b-versatile` | Good quality | +| [Groq](https://groq.com) | `openai/gpt-oss-120b` | Best quality, recommended | +| [Groq](https://groq.com) | `moonshotai/kimi-k2-instruct-0905` | Fast | | [Novita AI](https://novita.ai) | `qwen/qwen3.5-27b` | Good for bulk generation | | Local (vLLM/Ollama) | Any model | Free, best for large runs | @@ -183,10 +183,10 @@ data/code_hallucination/ Each SWE-bench instance produces exactly one sample — either clean (gold patch answer) or hallucinated (LLM-injected). No instance appears in both classes. This avoids the artificial pairing problem where models learn to distinguish the specific instance rather than the hallucination. ### JSON-based span annotations -Hallucination spans are extracted from the LLM's structured JSON response, not from difflib character-level diffs. The LLM returns `{"hallucinated_code": "...", "changes": [{"original": "...", "hallucinated": "..."}]}` and spans are found by string matching. This produces clean, meaningful spans (avg 70 chars) instead of noisy character-level artifacts (1-3 char noise from difflib). +Hallucination spans are extracted from the LLM's structured JSON response, not from difflib character-level diffs. The LLM returns `{"hallucinated_code": "...", "changes": [{"original": "...", "hallucinated": "..."}]}` and spans are found by string matching. Quality controls enforce 2-3 spans per sample (avg 2.8), minimum 15 chars per span, and total coverage under 60%. -### 50/50 documentation split -Half of instances include Context7 library documentation, half don't. This teaches models to handle both documented and undocumented scenarios. +### 20% documentation split +A subset of instances (20%) include Context7 library documentation, filtered to only the repo's primary library. Documentation is also passed to the hallucination injector, enabling semantic hallucinations that contradict documented API behavior. ### Zero repo overlap between splits SWE-bench's train/dev/test splits naturally have zero repository overlap across 53 unique repos. This means test performance measures generalization to completely unseen codebases. diff --git a/docs/code-hallucination/phases.md b/docs/code-hallucination/phases.md index 6e93da0..40e30cc 100644 --- a/docs/code-hallucination/phases.md +++ b/docs/code-hallucination/phases.md @@ -92,19 +92,15 @@ Writes results to JSONL incrementally. On restart, skips already-processed `inst **Module:** `context7_docs.py` -Fetches library documentation via the [Context7](https://context7.com) API for **50% of instances** (configurable via `DOCS_RATIO`). +Fetches library documentation via the [Context7](https://context7.com) API for **20% of instances** (configurable via `DOCS_RATIO`). ### Library detection -Detects libraries from: -1. Import statements in the patch (`import django`, `from sklearn import ...`) -2. File paths (`django/http/response.py` → django) +Maps the instance's GitHub repo to its primary library (e.g., `django/django` → `django`, `scikit-learn/scikit-learn` → `scikit-learn`). Only fetches docs for the matching library — not for random imports like `sys` or `re`. -Maps to Context7 library names via a predefined dictionary. +### Why 20%? -### 50/50 split rationale - -Half of samples include documentation context, half don't. This creates training variety — models learn to detect hallucinations both with and without documentation support. +A minority of samples include documentation context, while most don't. This creates training variety — models learn to detect hallucinations both with and without documentation support. Documentation is also passed to the hallucination injector (Phase 6), enabling SEMANTIC hallucinations that contradict documented API behavior. Instances not selected for docs still get an entry written with empty docs (by design, not failure). @@ -116,11 +112,26 @@ Instances not selected for docs still get an entry written with empty docs (by d **Module:** `format_builder.py` -Each instance gets exactly one answer format, chosen by weighted random selection from available options. +Each instance gets exactly one answer format, chosen by weighted random selection from available options. Uses LLM calls for `code_with_explanation` format. ### Format types -**Complete function** (weight: 0.4) +**Code with explanation** (weight: 0.40) +``` +The issue is that `process_data` uses `dict.items()` instead of iterating +over the sorted keys, which causes non-deterministic output. + +```python +def process_data(data): + for key in sorted(data.keys()): + yield key, data[key] +``` + +This ensures consistent ordering regardless of insertion order. +``` +Natural AI assistant response with prose explanation + code block. Generated by wrapping one of the base code formats with an LLM-generated explanation. This is the most realistic format — it matches how Claude, Cursor, and other AI coding assistants actually respond. + +**Complete function** (weight: 0.25) ```python def validate_response(self, response): if response.status_code != 200: @@ -129,7 +140,15 @@ def validate_response(self, response): ``` Extracted via Python AST from the patched source. Only available when changes are inside a function (~60% of patches). -**Edit-style** (weight: 0.3) +**Fragment** (weight: 0.20) +```python +if max_age is not None: + self.cookies[key]["max-age"] = max_age + self.cookies[key]["expires"] = http_date(time.time() + max_age) +``` +Added/changed lines from the diff with surrounding context. + +**Edit-style** (weight: 0.15) ``` In file django/http/response.py, replace: def set_cookie(self, key, value=""): @@ -142,14 +161,6 @@ with: ``` Available for all patches where changed regions can be extracted. -**Fragment** (weight: 0.3) -```python -if max_age is not None: - self.cookies[key]["max-age"] = max_age - self.cookies[key]["expires"] = http_date(time.time() + max_age) -``` -Added/changed lines from the diff with surrounding context. - **Output:** `data/code_hallucination/formats.jsonl` --- @@ -185,16 +196,28 @@ The LLM returns structured output: } ``` -Spans are found by string-matching each `change["hallucinated"]` in `hallucinated_code`. This produces clean, meaningful spans (minimum 3 chars) with zero noise. +Spans are found by string-matching each `change["hallucinated"]` in `hallucinated_code`. This produces clean, meaningful spans (minimum 15 chars) with zero noise. + +For answers containing both code and prose (code_with_explanation format), the injector places errors in both parts — e.g., wrong API in code + misleading description in text. + +### Quality controls + +- Each span must be 20-150 characters (enforced by prompt) +- Total hallucinated coverage must be < 40% of the answer (enforced by prompt) +- `_validate_labels()` rejects samples with coverage > 60% or spans < 15 chars +- Failed validation triggers up to 3 retries before skipping +- No comment data leaks (prompt explicitly forbids `# wrong`, `# error`, etc.) ### Quality metrics (from 100-sample test runs) | Metric | Value | |--------|-------| | Noise-only samples | 0% | -| Min span length | 10 chars | -| Avg span length | 70 chars | -| Avg spans per sample | 1.2 | +| Min span length | 15 chars | +| Avg span length | 71 chars | +| Avg spans per sample | 2.8 | +| Coverage range | 2.8-43% | +| Mean coverage | 19.5% | **Output:** `data/code_hallucination/hallucinated_samples.jsonl` diff --git a/scripts/code_hallucination/config.py b/scripts/code_hallucination/config.py index 85591b9..7f71f99 100644 --- a/scripts/code_hallucination/config.py +++ b/scripts/code_hallucination/config.py @@ -30,7 +30,7 @@ # Context7 CONTEXT7_BASE = "https://context7.com/api/v2" CONTEXT7_API_KEY = os.environ.get("CONTEXT7_API_KEY", "") -DOCS_RATIO = 0.5 # Only fetch docs for 50% of instances +DOCS_RATIO = 0.2 # Only fetch docs for 20% of instances # === Dataset Config === HALLUCINATION_RATIO = 0.4 # 40% hallucinated, 60% clean @@ -48,8 +48,8 @@ HALLUCINATION_TYPES = ["structural", "behavioral", "semantic"] # Answer format types -FORMAT_TYPES = ["complete_function", "edit_style", "fragment"] -FORMAT_WEIGHTS = [0.4, 0.3, 0.3] # Target distribution +FORMAT_TYPES = ["complete_function", "edit_style", "fragment", "code_with_explanation"] +FORMAT_WEIGHTS = [0.25, 0.15, 0.20, 0.40] # Target distribution # SWE-bench datasets SWEBENCH_FULL = "princeton-nlp/SWE-bench" diff --git a/scripts/code_hallucination/context7_docs.py b/scripts/code_hallucination/context7_docs.py index 4c1df87..b287015 100644 --- a/scripts/code_hallucination/context7_docs.py +++ b/scripts/code_hallucination/context7_docs.py @@ -106,21 +106,39 @@ def fetch_context7_docs( return None +def repo_to_library(repo: str) -> str | None: + """Map a GitHub repo name to its primary library name for Context7. + + :param repo: GitHub repo path like 'django/django' or 'scikit-learn/scikit-learn'. + :return: Library name string, or None if unknown. + """ + repo_lower = repo.lower() + for key, lib in PATH_TO_LIB.items(): + if key in repo_lower: + return lib + return None + + def get_documentation_for_instance( - changed_files: list[str], patch: str, problem_statement: str + changed_files: list[str], patch: str, problem_statement: str, repo: str = "" ) -> dict[str, str]: - """Fetch documentation for libraries referenced in an instance.""" - imported_libs = extract_imports_from_patch(patch) - path_libs = extract_libraries_from_files(changed_files) - all_libs = list(set(imported_libs + path_libs)) + """Fetch documentation for the primary library of the instance's repo. + + Only fetches docs for the library that matches the repo (e.g., django docs + for django/django), not for random imports like sys or re. + + :param repo: GitHub repo path, used to determine which library to fetch docs for. + """ + primary_lib = repo_to_library(repo) + if not primary_lib: + return {} short_query = problem_statement[:200].replace("\n", " ").strip() docs = {} - for lib in all_libs[:3]: - doc = fetch_context7_docs(lib, short_query) - if doc: - docs[lib] = doc + doc = fetch_context7_docs(primary_lib, short_query) + if doc: + docs[primary_lib] = doc return docs @@ -197,7 +215,7 @@ def run(instances: list[dict]): changed_files = extract_changed_files(inst["patch"]) docs = get_documentation_for_instance( - changed_files, inst["patch"], inst["problem_statement"] + changed_files, inst["patch"], inst["problem_statement"], repo=inst.get("repo", "") ) entry = {"instance_id": instance_id, "docs": docs} diff --git a/scripts/code_hallucination/format_builder.py b/scripts/code_hallucination/format_builder.py index 78531ab..9d290f7 100644 --- a/scripts/code_hallucination/format_builder.py +++ b/scripts/code_hallucination/format_builder.py @@ -2,8 +2,96 @@ import json import random - -from .config import FORMAT_TYPES, FORMAT_WEIGHTS, FORMATS_PATH, SOURCE_CACHE_DIR +import textwrap +import time + +from openai import OpenAI + +from .config import ( + API_BASE_URL, + API_KEY, + FORMAT_TYPES, + FORMAT_WEIGHTS, + FORMATS_PATH, + LLM_TEMPERATURE, + MAX_RETRIES, + MODEL, + RETRY_DELAY, + SOURCE_CACHE_DIR, +) + +EXPLANATION_SYSTEM_PROMPT = textwrap.dedent("""\ + You are a helpful AI coding assistant (like Claude or Cursor). + Given a user's coding question and the correct code fix, write a natural response + that a developer would receive from an AI assistant. + + Your response MUST: + - Start with a brief explanation (1-3 sentences) of what the issue is and how to fix it + - Include the code in a properly formatted code block (```python) + - Optionally end with a short note about what changed or why + + Your response must NOT: + - Include phrases like "Here's the fix" or "I'll help you with that" — just explain directly + - Be longer than necessary — keep it concise + - Change the code in any way — use it exactly as provided + - Add any imports or code not in the original + + Example style: + The issue is that `process_data` uses `dict.items()` instead of iterating + over the sorted keys, which causes non-deterministic output. + + ```python + def process_data(data): + for key in sorted(data.keys()): + yield key, data[key] + ``` + + This ensures consistent ordering regardless of insertion order. +""") + + +def _generate_explanation( + client: OpenAI, model: str, code: str, query: str, context: str +) -> str | None: + """Use LLM to wrap code in a natural explanation.""" + user_msg = f"""User's question: {query} + +Context (relevant source code): +{context[:3000]} + +Correct code fix: +```python +{code} +``` + +Write a natural AI assistant response that includes this exact code.""" + + for attempt in range(MAX_RETRIES): + try: + response = client.chat.completions.create( + model=model, + messages=[ + {"role": "system", "content": EXPLANATION_SYSTEM_PROMPT}, + {"role": "user", "content": user_msg}, + ], + temperature=LLM_TEMPERATURE, + max_tokens=2000, + ) + result = response.choices[0].message.content.strip() + # Verify the code is actually in the response + if code[:50] in result or "```" in result: + return result + if attempt < MAX_RETRIES - 1: + continue + return None + except Exception as e: + if attempt < MAX_RETRIES - 1: + wait = RETRY_DELAY * (attempt + 1) + print(f" Explanation error (attempt {attempt + 1}): {e}. Retrying in {wait}s...") + time.sleep(wait) + else: + return None + return None def assign_format(source_data: dict) -> tuple[str, str]: @@ -11,6 +99,10 @@ def assign_format(source_data: dict) -> tuple[str, str]: Returns (format_type, answer_text). Falls back if preferred format isn't available. + + Note: code_with_explanation is handled separately since it needs LLM calls. + This function returns ("code_with_explanation", base_code) and the caller + wraps it with an explanation. """ has_functions = bool(source_data.get("modified_functions")) has_edit = bool(source_data.get("edit_style")) @@ -28,9 +120,12 @@ def assign_format(source_data: dict) -> tuple[str, str]: if not available: return None, None + # code_with_explanation can use any base format + all_available = available + ["code_with_explanation"] + # Weighted random choice from available formats weights = [] - for fmt in available: + for fmt in all_available: idx = FORMAT_TYPES.index(fmt) weights.append(FORMAT_WEIGHTS[idx]) @@ -38,12 +133,22 @@ def assign_format(source_data: dict) -> tuple[str, str]: total = sum(weights) weights = [w / total for w in weights] - chosen = random.choices(available, weights=weights, k=1)[0] + chosen = random.choices(all_available, weights=weights, k=1)[0] # Build answer text - if chosen == "complete_function": + if chosen == "code_with_explanation": + # Pick the best base code to wrap with explanation + if has_functions: + funcs = source_data["modified_functions"] + func = max(funcs, key=lambda f: len(f.get("patched", ""))) + answer = func["patched"] + elif has_fragment: + answer = source_data["patch_code"] + elif has_edit: + answer = source_data["edit_style"] + return "code_with_explanation", answer + elif chosen == "complete_function": funcs = source_data["modified_functions"] - # Take the first (or longest) modified function func = max(funcs, key=lambda f: len(f.get("patched", ""))) answer = func["patched"] elif chosen == "edit_style": @@ -54,7 +159,14 @@ def assign_format(source_data: dict) -> tuple[str, str]: return chosen, answer -def run(instances: list[dict], source_cache_dir=SOURCE_CACHE_DIR): +def run( + instances: list[dict], + source_cache_dir=SOURCE_CACHE_DIR, + api_key: str = API_KEY, + base_url: str = API_BASE_URL, + model: str = MODEL, + queries: dict[str, str] | None = None, +): """Run Phase 5: Assign formats and build answers. Returns list of dicts with instance_id, format_type, answer. @@ -65,9 +177,16 @@ def run(instances: list[dict], source_cache_dir=SOURCE_CACHE_DIR): FORMATS_PATH.parent.mkdir(parents=True, exist_ok=True) + if queries is None: + queries = {} + + # Only init LLM client if we'll need it (lazy) + client = None + results = [] format_counts = {fmt: 0 for fmt in FORMAT_TYPES} skipped = 0 + explanation_failures = 0 for inst in instances: instance_id = inst["instance_id"] @@ -86,6 +205,23 @@ def run(instances: list[dict], source_cache_dir=SOURCE_CACHE_DIR): skipped += 1 continue + # Generate explanation wrapper for code_with_explanation format + if fmt == "code_with_explanation": + if client is None: + client = OpenAI(api_key=api_key, base_url=base_url) + print(f" LLM client initialized for code_with_explanation ({base_url})") + + query = queries.get(instance_id, inst.get("problem_statement", "")[:500]) + context = source_data.get("patch_code", "") + explained = _generate_explanation(client, model, answer, query, context) + + if explained is None: + # Fallback: use raw code as fragment + fmt = "fragment" + explanation_failures += 1 + else: + answer = explained + results.append( { "instance_id": instance_id, @@ -101,6 +237,8 @@ def run(instances: list[dict], source_cache_dir=SOURCE_CACHE_DIR): f.write(json.dumps(entry) + "\n") print(f"\nAssigned formats for {len(results)} instances (skipped {skipped})") + if explanation_failures: + print(f" Explanation generation failures (fell back to fragment): {explanation_failures}") for fmt, count in format_counts.items(): pct = count * 100 // max(len(results), 1) print(f" {fmt}: {count} ({pct}%)") diff --git a/scripts/code_hallucination/hallucination_injector.py b/scripts/code_hallucination/hallucination_injector.py index 19dfeed..941315c 100644 --- a/scripts/code_hallucination/hallucination_injector.py +++ b/scripts/code_hallucination/hallucination_injector.py @@ -27,7 +27,8 @@ INJECTION_SYSTEM_PROMPT = textwrap.dedent("""\ You are a code hallucination injector for building a hallucination detection dataset. - Given correct code and context, create a hallucinated version with a specific type of error. + Given a correct answer (which may be pure code OR code with natural language explanation) + and context, create a hallucinated version with specific types of errors. Hallucination types: - STRUCTURAL: Change a function call, import, or parameter to something that @@ -37,20 +38,34 @@ off-by-one errors, swapped conditions, wrong argument values. - SEMANTIC: Code that looks like it addresses the user's request but does something subtly different or opposite. The code parses, uses real APIs, - but fails to do what was asked. + but fails to do what was asked. If library documentation is provided, + you can make the code contradict the documented API (wrong parameter names, + wrong return types, deprecated usage, etc.). + For answers with explanations, you may also make the explanation contradict + the code or describe incorrect behavior. Rules: - - Make changes PLAUSIBLE - something an LLM would realistically generate + - Make 2-3 DISTINCT changes spread across different parts of the answer + - Each changed span must be 20-150 characters long (not too short, not too long) + - Total hallucinated text must be LESS THAN 40% of the original answer length + - Keep most of the answer CORRECT — do NOT rewrite the entire thing + - Changes should be in different functions/blocks/paragraphs, not adjacent lines + - Make changes PLAUSIBLE — something an LLM would realistically generate - Changes must be SUBTLE, not obviously broken - - The hallucinated code must still be syntactically valid - - Make 1-3 changes, not more + - The code in the hallucinated answer must still be syntactically valid + - Do NOT add comments explaining or hinting at the hallucination (no "# wrong", + "# error", "# typo", "# nonexistent", etc.) — the errors must be invisible + to someone skimming the answer + - If the answer contains both code and explanation, inject errors in BOTH parts + (e.g. wrong API in code + misleading description in text) + - Preserve the overall structure: keep markdown formatting, code blocks, etc. Respond in this exact JSON format (no markdown, no code blocks): { - "hallucinated_code": "the full modified code with hallucinations injected", + "hallucinated_code": "the full modified answer with hallucinations injected", "changes": [ { - "original": "exact original code that was changed", + "original": "exact original text that was changed", "hallucinated": "what you changed it to", "explanation": "why this is a hallucination" } @@ -58,8 +73,10 @@ } IMPORTANT: - - "original" must be an exact substring of the correct code - - "hallucinated" must be an exact substring of your hallucinated_code + - You MUST include 2-3 changes in the "changes" array + - "original" must be an exact substring of the correct answer + - "hallucinated" must be an exact substring of your hallucinated answer + - Each "hallucinated" value must be at least 20 characters long - Return ONLY valid JSON, nothing else """) @@ -71,17 +88,26 @@ def inject_hallucination( hall_type: str, user_query: str = "", context: str = "", + documentation: dict[str, str] | None = None, ) -> dict | None: """Inject a hallucination and get back structured JSON with spans. Returns dict with 'hallucinated_code' and 'changes', or None if failed. """ + docs_section = "" + if documentation: + docs_parts = [f"Documentation for {lib}:\n{doc}" for lib, doc in documentation.items()] + docs_section = ( + "\n\nLibrary documentation (the hallucination could contradict this):\n" + + "\n\n".join(docs_parts) + ) + user_msg = f"""Hallucination type to inject: {hall_type.upper()} -User's original request: {user_query[:500]} +User's original request: {user_query} Context (source code): -{context[:2000]} +{context}{docs_section} Correct code to modify: {clean_answer} @@ -151,7 +177,7 @@ def build_labels_from_changes( labels = [] for change in changes: h_span = change.get("hallucinated", "") - if not h_span or len(h_span) < 3: + if not h_span or len(h_span) < 15: continue if h_span not in hallucinated_code: continue @@ -190,14 +216,23 @@ async def _inject_one_async( hall_type: str, user_query: str, context: str, + documentation: dict[str, str] | None = None, ) -> dict | None: """Async version of inject_hallucination for batch processing.""" + docs_section = "" + if documentation: + docs_parts = [f"Documentation for {lib}:\n{doc}" for lib, doc in documentation.items()] + docs_section = ( + "\n\nLibrary documentation (the hallucination could contradict this):\n" + + "\n\n".join(docs_parts) + ) + user_msg = f"""Hallucination type to inject: {hall_type.upper()} -User's original request: {user_query[:500]} +User's original request: {user_query} Context (source code): -{context[:2000]} +{context}{docs_section} Correct code to modify: {clean_answer} @@ -233,6 +268,29 @@ async def _inject_one_async( return None +def _validate_labels(hallucinated_code: str, labels: list[dict]) -> tuple[bool, str]: + """Validate that hallucination labels meet quality thresholds. + + :return: (is_valid, reason) tuple. + """ + if not labels: + return False, "no_labels" + + total_span = sum(lab["end"] - lab["start"] for lab in labels) + code_len = len(hallucinated_code) if hallucinated_code else 1 + coverage = total_span / code_len + + if coverage > 0.60: + return False, f"coverage_too_high ({coverage:.0%})" + + for lab in labels: + span_len = lab["end"] - lab["start"] + if span_len < 15: + return False, f"span_too_short ({span_len} chars)" + + return True, "" + + def _process_result(result, instance_id, hall_type, fmt_data, model): """Process a single injection result into a JSONL entry.""" if result is None: @@ -240,8 +298,11 @@ def _process_result(result, instance_id, hall_type, fmt_data, model): hallucinated_code = result["hallucinated_code"] changes = result.get("changes", []) labels = build_labels_from_changes(hallucinated_code, changes, hall_type) - if not labels: + + valid, reason = _validate_labels(hallucinated_code, labels) + if not valid: return None + return { "instance_id": instance_id, "hallucinated_answer": hallucinated_code, @@ -256,6 +317,7 @@ def run( instances_to_inject: list[dict], formats: dict[str, dict], queries: dict[str, str], + docs: dict[str, dict] | None = None, api_key: str = API_KEY, base_url: str = API_BASE_URL, model: str = MODEL, @@ -269,6 +331,9 @@ def run( print("Phase 6: Hallucination Injection") print("=" * 60) + if docs is None: + docs = {} + HALLUCINATED_PATH.parent.mkdir(parents=True, exist_ok=True) print(f"Using {base_url} with model {model}") @@ -285,9 +350,9 @@ def run( print(f"Remaining: {len(to_process)} instances to inject") if BATCH_SIZE > 1: - results = _run_batched(to_process, formats, queries, api_key, base_url, model) + results = _run_batched(to_process, formats, queries, docs, api_key, base_url, model) else: - results = _run_sequential(to_process, formats, queries, api_key, base_url, model) + results = _run_sequential(to_process, formats, queries, docs, api_key, base_url, model) # Stats type_counts = {} @@ -307,7 +372,7 @@ def run( return results -def _run_sequential(to_process, formats, queries, api_key, base_url, model): +def _run_sequential(to_process, formats, queries, docs, api_key, base_url, model): """Sequential processing for remote APIs (rate-limited).""" client = OpenAI(api_key=api_key, base_url=base_url) processed = 0 @@ -326,10 +391,24 @@ def _run_sequential(to_process, formats, queries, api_key, base_url, model): hall_type = HALLUCINATION_TYPES[i % len(HALLUCINATION_TYPES)] query = queries.get(instance_id, "") - context = inst.get("problem_statement", "")[:2000] - - result = inject_hallucination(client, model, clean_answer, hall_type, query, context) - entry = _process_result(result, instance_id, hall_type, fmt_data, model) + context = inst.get("problem_statement", "") + instance_docs = docs.get(instance_id, {}) + + # Try injection with up to 2 quality retries + entry = None + for attempt in range(3): + result = inject_hallucination( + client, + model, + clean_answer, + hall_type, + query, + context, + documentation=instance_docs, + ) + entry = _process_result(result, instance_id, hall_type, fmt_data, model) + if entry is not None: + break if entry is None: if result is not None: @@ -349,7 +428,7 @@ def _run_sequential(to_process, formats, queries, api_key, base_url, model): return results -def _run_batched(to_process, formats, queries, api_key, base_url, model): +def _run_batched(to_process, formats, queries, docs, api_key, base_url, model): """Async batch processing for local vLLM (no rate limiting needed).""" aclient = AsyncOpenAI(api_key=api_key, base_url=base_url) processed = 0 @@ -378,10 +457,19 @@ async def process_batches(): hall_type = HALLUCINATION_TYPES[global_idx % len(HALLUCINATION_TYPES)] query = queries.get(instance_id, "") - context = inst.get("problem_statement", "")[:2000] + context = inst.get("problem_statement", "") + instance_docs = docs.get(instance_id, {}) tasks.append( - _inject_one_async(aclient, model, clean_answer, hall_type, query, context) + _inject_one_async( + aclient, + model, + clean_answer, + hall_type, + query, + context, + documentation=instance_docs, + ) ) batch_meta.append((instance_id, hall_type, fmt_data)) diff --git a/scripts/code_hallucination/pipeline.py b/scripts/code_hallucination/pipeline.py index d4e79a9..ea7fc23 100644 --- a/scripts/code_hallucination/pipeline.py +++ b/scripts/code_hallucination/pipeline.py @@ -93,10 +93,11 @@ def run_test(n: int = 5, api_key: str = API_KEY, base_url: str = API_BASE_URL, m run_docs(selected) - # Phase 5: Assign formats + # Phase 5: Assign formats (needs LLM for code_with_explanation) from .format_builder import run as run_formats - run_formats(selected) + queries_dict = load_jsonl_dict(QUERIES_PATH, value_key="query") + run_formats(selected, api_key=api_key, base_url=base_url, model=model, queries=queries_dict) # Phase 8: Select targets (before phase 6) from .splitter import select_hallucination_targets @@ -107,16 +108,17 @@ def run_test(n: int = 5, api_key: str = API_KEY, base_url: str = API_BASE_URL, m from .hallucination_injector import run as run_inject formats = load_jsonl_dict(FORMATS_PATH) - queries = load_jsonl_dict(QUERIES_PATH, value_key="query") + docs = load_jsonl_dict(DOCS_PATH, value_key="docs") to_inject = [i for i in selected if i["instance_id"] in targets] - run_inject(to_inject, formats, queries, api_key=api_key, base_url=base_url, model=model) + run_inject( + to_inject, formats, queries_dict, docs=docs, api_key=api_key, base_url=base_url, model=model + ) # Phase 7: Assemble from .sample_assembler import run as run_assemble - docs = load_jsonl_dict(DOCS_PATH, value_key="docs") hallucinations = load_jsonl_dict(HALLUCINATED_PATH) - samples, metadata = run_assemble(selected, queries, docs, formats, hallucinations, targets) + samples, metadata = run_assemble(selected, queries_dict, docs, formats, hallucinations, targets) # Phase 9: Validate from .validator import run as run_validate @@ -189,7 +191,14 @@ def main(): from .format_builder import run from .swebench_loader import load_instances - run(load_instances()) + queries = load_jsonl_dict(QUERIES_PATH, value_key="query") + run( + load_instances(), + api_key=args.api_key, + base_url=args.base_url, + model=args.model, + queries=queries, + ) elif phase == 6: from .hallucination_injector import run from .splitter import select_hallucination_targets @@ -198,12 +207,14 @@ def main(): instances = load_instances() formats = load_jsonl_dict(FORMATS_PATH) queries = load_jsonl_dict(QUERIES_PATH, value_key="query") + docs = load_jsonl_dict(DOCS_PATH, value_key="docs") targets = select_hallucination_targets(instances) to_inject = [i for i in instances if i["instance_id"] in targets] run( to_inject, formats, queries, + docs=docs, api_key=args.api_key, base_url=args.base_url, model=args.model, diff --git a/scripts/code_hallucination/source_fetcher.py b/scripts/code_hallucination/source_fetcher.py index fa0c3ac..0fc25da 100644 --- a/scripts/code_hallucination/source_fetcher.py +++ b/scripts/code_hallucination/source_fetcher.py @@ -184,6 +184,93 @@ def get_functions(source: str) -> dict[str, str]: return modified +def apply_patch_in_memory(original_source: str, patch: str, filepath: str) -> str | None: + """Apply a unified diff patch to source code in memory, without git. + + Parses the unified diff to extract hunks for the target file, + then applies them line-by-line to produce the patched source. + + :param original_source: The original file content. + :param patch: The full unified diff (may contain multiple files). + :param filepath: The specific file to extract and apply hunks for. + :return: Patched source string, or None if application fails. + """ + # Split patch into per-file sections + file_patch_lines = [] + in_target_file = False + + for line in patch.split("\n"): + if line.startswith("diff --git"): + match = re.match(r"diff --git a/(.+?) b/(.+)$", line) + in_target_file = match is not None and match.group(2) == filepath + continue + if in_target_file: + file_patch_lines.append(line) + + if not file_patch_lines: + return None + + # Parse hunks from the file-specific patch lines + hunks = [] + current_hunk = None + + for line in file_patch_lines: + if line.startswith("@@"): + # Parse hunk header: @@ -old_start,old_count +new_start,new_count @@ + hunk_match = re.match(r"@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@", line) + if hunk_match: + current_hunk = { + "old_start": int(hunk_match.group(1)), + "lines": [], + } + hunks.append(current_hunk) + elif line.startswith("---") or line.startswith("+++"): + continue + elif current_hunk is not None: + current_hunk["lines"].append(line) + + if not hunks: + return None + + # Apply hunks to original source + original_lines = original_source.split("\n") + result_lines = [] + orig_idx = 0 # 0-based index into original_lines + + try: + for hunk in hunks: + hunk_start = hunk["old_start"] - 1 # Convert to 0-based + + # Copy unchanged lines before this hunk + while orig_idx < hunk_start and orig_idx < len(original_lines): + result_lines.append(original_lines[orig_idx]) + orig_idx += 1 + + # Apply hunk lines + for line in hunk["lines"]: + if line.startswith("+"): + result_lines.append(line[1:]) + elif line.startswith("-"): + orig_idx += 1 # Skip the removed line + elif line.startswith(" "): + result_lines.append(line[1:]) + orig_idx += 1 + elif line == "": + # Empty line in diff context — treat as context + if orig_idx < len(original_lines): + result_lines.append(original_lines[orig_idx]) + orig_idx += 1 + + # Copy remaining lines after last hunk + while orig_idx < len(original_lines): + result_lines.append(original_lines[orig_idx]) + orig_idx += 1 + + return "\n".join(result_lines) + except (IndexError, ValueError): + return None + + def extract_code_from_patch(patch: str) -> str: """Extract added/changed lines from a unified diff as code fragment. @@ -368,17 +455,20 @@ def fetch_source_for_instance( # Edit-style format edit_style = build_edit_style_answer(patch, changed_files) - # Complete function format (needs patched source via git apply) + # Complete function format — extract modified functions modified_functions = [] - if repo_dir is not None: - for filepath in changed_files: - if filepath in source_files: - patched_source = apply_patch_and_get_file(repo_dir, commit, patch, filepath) - if patched_source: - funcs = extract_modified_functions(source_files[filepath], patched_source) - for func in funcs: - func["file"] = filepath - modified_functions.extend(funcs) + for filepath in changed_files: + if filepath not in source_files: + continue + if repo_dir is not None: + patched_source = apply_patch_and_get_file(repo_dir, commit, patch, filepath) + else: + patched_source = apply_patch_in_memory(source_files[filepath], patch, filepath) + if patched_source: + funcs = extract_modified_functions(source_files[filepath], patched_source) + for func in funcs: + func["file"] = filepath + modified_functions.extend(funcs) return { "instance_id": instance["instance_id"],