Summary
extraction.py and memory_strategies.py use bare json.loads() to parse LLM responses. Many models (especially smaller/fast ones like llama-3.1-8b, llama-3.2-3b) wrap JSON output in markdown code fences (```json ... ```) or add commentary around the JSON. This causes JSONDecodeError and silent data loss.
Affected code
extraction.py:
extract_entities_llm() at line 153: json.loads(response.content).get("entities", [])
extract_topics_llm() at line 192: json.loads(response.content).get("topics", [])
memory_strategies.py:
DiscreteMemoryStrategy.extract_memories() at line 175: json.loads(response.content)
SummaryMemoryStrategy.extract_memories() at line 267
UserPreferencesMemoryStrategy.extract_memories() at line 360
CustomMemoryStrategy.extract_memories() at line 444
All 6 call sites use bare json.loads() without any pre-processing.
Reproduction
# What the LLM actually returns:
response = '```json\n{"entities": ["Redis", "Snowflake"]}\n```'
json.loads(response) # ❌ JSONDecodeError
# In extract_entities_llm: retries 3 times, each fails, returns []
# In DiscreteMemoryStrategy: retries 3 times, raises RetryError
Actual LLM responses observed in production (Snowflake Cortex llama3.1-8b):
```json
{"entities": ["Apache Kafka", "Netflix", "PyTorch"]}
```
Here are the extracted topics:
```json
{"topics": ["data engineering", "recommendation engines"]}
```
I found these topics in the text.
A full reproduction test is available at tests/test_upstream_issues.py::TestMarkdownFenceJsonParsing.
Impact
- extraction.py:
extract_entities_llm and extract_topics_llm silently return [] — entities and topics are never populated for affected memories. The catch block logs the error but the data is lost.
- memory_strategies.py:
DiscreteMemoryStrategy.extract_memories raises RetryError after 3 failed attempts, which can crash the worker task.
Suggested fix
Add a helper to strip markdown fences before json.loads():
import re
def _strip_markdown_fences(text: str) -> str:
"""Strip markdown code fences from LLM responses."""
match = re.search(r'```(?:json)?\s*\n?(.*?)\n?\s*```', text, re.DOTALL)
if match:
return match.group(1).strip()
return text
Apply to all 6 call sites:
response_data = json.loads(_strip_markdown_fences(response.content))
Note: response_format={"type": "json_object"} is already passed to the LLM, but many models (especially open-source ones) ignore this hint and still wrap output in markdown.
Environment
- AMS version:
main at fd73560
- Discovered with Snowflake Cortex
llama3.1-8b and mistral-large2, but affects any model that wraps JSON in markdown fences
Summary
extraction.pyandmemory_strategies.pyuse barejson.loads()to parse LLM responses. Many models (especially smaller/fast ones like llama-3.1-8b, llama-3.2-3b) wrap JSON output in markdown code fences (```json ... ```) or add commentary around the JSON. This causesJSONDecodeErrorand silent data loss.Affected code
extraction.py:
extract_entities_llm()at line 153:json.loads(response.content).get("entities", [])extract_topics_llm()at line 192:json.loads(response.content).get("topics", [])memory_strategies.py:
DiscreteMemoryStrategy.extract_memories()at line 175:json.loads(response.content)SummaryMemoryStrategy.extract_memories()at line 267UserPreferencesMemoryStrategy.extract_memories()at line 360CustomMemoryStrategy.extract_memories()at line 444All 6 call sites use bare
json.loads()without any pre-processing.Reproduction
Actual LLM responses observed in production (Snowflake Cortex llama3.1-8b):
A full reproduction test is available at
tests/test_upstream_issues.py::TestMarkdownFenceJsonParsing.Impact
extract_entities_llmandextract_topics_llmsilently return[]— entities and topics are never populated for affected memories. The catch block logs the error but the data is lost.DiscreteMemoryStrategy.extract_memoriesraisesRetryErrorafter 3 failed attempts, which can crash the worker task.Suggested fix
Add a helper to strip markdown fences before
json.loads():Apply to all 6 call sites:
Note:
response_format={"type": "json_object"}is already passed to the LLM, but many models (especially open-source ones) ignore this hint and still wrap output in markdown.Environment
mainatfd73560llama3.1-8bandmistral-large2, but affects any model that wraps JSON in markdown fences