Context
The current default decoder-only model for enrichment and consolidation is mlx-community/Ministral-3-3B-Instruct-2512-4bit. Google released Gemma 4 on April 2, 2026, and the E4B variant is a strong candidate to replace it.
Why Gemma 4 E4B
- Native structured JSON output / function calling -- reduces parsing failures in the enrichment pipeline (currently needs fallback
_parse_enrichment_text parser)
- 128K context window (vs ~32K) -- could eliminate the multi-window enrichment workaround for long transcripts
- ~4B effective params at similar memory footprint (~2.5 GB 4-bit)
- Apache 2.0 license (same as Ministral)
- Configurable reasoning mode could improve consolidation quality
Blocker: PLE quantization bug
As of April 11, 2026, standard MLX 4-bit quants from mlx-community and unsloth produce garbage output because they incorrectly quantize Per-Layer Embedding (PLE) layers. This is a novel architecture feature in Gemma 4's edge models.
Tracking: https://huggingface.co/mlx-community/gemma-4-e2b-4bit/discussions/1
Workarounds:
bf16 versions work but use ~10 GB (too large for background enrichment)
- PLE-safe community quant exists:
FakeRockert543/gemma-4-e4b-it-MLX-4bit
- Upstream fix expected soon (model is 9 days old)
Acceptance criteria
- Wait for stable PLE-safe 4-bit MLX quant (either upstream fix or validated community quant)
- Run controlled A/B on enrichment pipeline: same transcripts, compare JSON compliance rate and extraction quality between Ministral 3.3B and Gemma 4 E4B
- Run LOCOMO conv3 gate with Gemma 4 E4B enrichment to check for regression
- If results are equal or better, update
DEFAULT_DECODER_MODEL in _model_router.py and DEFAULTS["consolidation"] in config.py
- Update
eval_config.json with new model references
Files involved
src/synapt/recall/_model_router.py (DEFAULT_DECODER_MODEL)
src/synapt/recall/config.py (DEFAULTS dict)
evaluation/eval_config.json (enrichment_model entries)
src/synapt/recall/enrich.py (may simplify _parse_enrichment_text fallback if JSON compliance improves)
Context
The current default decoder-only model for enrichment and consolidation is
mlx-community/Ministral-3-3B-Instruct-2512-4bit. Google released Gemma 4 on April 2, 2026, and the E4B variant is a strong candidate to replace it.Why Gemma 4 E4B
_parse_enrichment_textparser)Blocker: PLE quantization bug
As of April 11, 2026, standard MLX 4-bit quants from
mlx-communityandunslothproduce garbage output because they incorrectly quantize Per-Layer Embedding (PLE) layers. This is a novel architecture feature in Gemma 4's edge models.Tracking: https://huggingface.co/mlx-community/gemma-4-e2b-4bit/discussions/1
Workarounds:
bf16versions work but use ~10 GB (too large for background enrichment)FakeRockert543/gemma-4-e4b-it-MLX-4bitAcceptance criteria
DEFAULT_DECODER_MODELin_model_router.pyandDEFAULTS["consolidation"]inconfig.pyeval_config.jsonwith new model referencesFiles involved
src/synapt/recall/_model_router.py(DEFAULT_DECODER_MODEL)src/synapt/recall/config.py(DEFAULTS dict)evaluation/eval_config.json(enrichment_model entries)src/synapt/recall/enrich.py(may simplify _parse_enrichment_text fallback if JSON compliance improves)