json.loads() fails on LLM responses wrapped in markdown code fences

## Summary

`extraction.py` and `memory_strategies.py` use bare `json.loads()` to parse LLM responses. Many models (especially smaller/fast ones like llama-3.1-8b, llama-3.2-3b) wrap JSON output in markdown code fences (` ```json ... ``` `) or add commentary around the JSON. This causes `JSONDecodeError` and silent data loss.

## Affected code

**extraction.py:**
- `extract_entities_llm()` at line 153: `json.loads(response.content).get("entities", [])`
- `extract_topics_llm()` at line 192: `json.loads(response.content).get("topics", [])`

**memory_strategies.py:**
- `DiscreteMemoryStrategy.extract_memories()` at line 175: `json.loads(response.content)`
- `SummaryMemoryStrategy.extract_memories()` at line 267
- `UserPreferencesMemoryStrategy.extract_memories()` at line 360
- `CustomMemoryStrategy.extract_memories()` at line 444

All 6 call sites use bare `json.loads()` without any pre-processing.

## Reproduction

```python
# What the LLM actually returns:
response = '```json\n{"entities": ["Redis", "Snowflake"]}\n```'

json.loads(response)  # ❌ JSONDecodeError

# In extract_entities_llm: retries 3 times, each fails, returns []
# In DiscreteMemoryStrategy: retries 3 times, raises RetryError
```

Actual LLM responses observed in production (Snowflake Cortex llama3.1-8b):

```
```json
{"entities": ["Apache Kafka", "Netflix", "PyTorch"]}
```​
```

```
Here are the extracted topics:
```json
{"topics": ["data engineering", "recommendation engines"]}
```​
I found these topics in the text.
```

A full reproduction test is available at `tests/test_upstream_issues.py::TestMarkdownFenceJsonParsing`.

## Impact

- **extraction.py**: `extract_entities_llm` and `extract_topics_llm` silently return `[]` — entities and topics are never populated for affected memories. The catch block logs the error but the data is lost.
- **memory_strategies.py**: `DiscreteMemoryStrategy.extract_memories` raises `RetryError` after 3 failed attempts, which can crash the worker task.

## Suggested fix

Add a helper to strip markdown fences before `json.loads()`:

```python
import re

def _strip_markdown_fences(text: str) -> str:
    """Strip markdown code fences from LLM responses."""
    match = re.search(r'```(?:json)?\s*\n?(.*?)\n?\s*```', text, re.DOTALL)
    if match:
        return match.group(1).strip()
    return text
```

Apply to all 6 call sites:
```python
response_data = json.loads(_strip_markdown_fences(response.content))
```

Note: `response_format={"type": "json_object"}` is already passed to the LLM, but many models (especially open-source ones) ignore this hint and still wrap output in markdown.

## Environment

- AMS version: `main` at `fd73560`
- Discovered with Snowflake Cortex `llama3.1-8b` and `mistral-large2`, but affects any model that wraps JSON in markdown fences

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json.loads() fails on LLM responses wrapped in markdown code fences #236

Summary

Affected code

Reproduction

Impact

Suggested fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

json.loads() fails on LLM responses wrapped in markdown code fences #236

Description

Summary

Affected code

Reproduction

Impact

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions