Skip to content

eval: benchmark Gemma 4 E4B as enrichment model replacement for Ministral 3.3B #661

@laynepenney

Description

@laynepenney

Context

The current default decoder-only model for enrichment and consolidation is mlx-community/Ministral-3-3B-Instruct-2512-4bit. Google released Gemma 4 on April 2, 2026, and the E4B variant is a strong candidate to replace it.

Why Gemma 4 E4B

  • Native structured JSON output / function calling -- reduces parsing failures in the enrichment pipeline (currently needs fallback _parse_enrichment_text parser)
  • 128K context window (vs ~32K) -- could eliminate the multi-window enrichment workaround for long transcripts
  • ~4B effective params at similar memory footprint (~2.5 GB 4-bit)
  • Apache 2.0 license (same as Ministral)
  • Configurable reasoning mode could improve consolidation quality

Blocker: PLE quantization bug

As of April 11, 2026, standard MLX 4-bit quants from mlx-community and unsloth produce garbage output because they incorrectly quantize Per-Layer Embedding (PLE) layers. This is a novel architecture feature in Gemma 4's edge models.

Tracking: https://huggingface.co/mlx-community/gemma-4-e2b-4bit/discussions/1

Workarounds:

  • bf16 versions work but use ~10 GB (too large for background enrichment)
  • PLE-safe community quant exists: FakeRockert543/gemma-4-e4b-it-MLX-4bit
  • Upstream fix expected soon (model is 9 days old)

Acceptance criteria

  1. Wait for stable PLE-safe 4-bit MLX quant (either upstream fix or validated community quant)
  2. Run controlled A/B on enrichment pipeline: same transcripts, compare JSON compliance rate and extraction quality between Ministral 3.3B and Gemma 4 E4B
  3. Run LOCOMO conv3 gate with Gemma 4 E4B enrichment to check for regression
  4. If results are equal or better, update DEFAULT_DECODER_MODEL in _model_router.py and DEFAULTS["consolidation"] in config.py
  5. Update eval_config.json with new model references

Files involved

  • src/synapt/recall/_model_router.py (DEFAULT_DECODER_MODEL)
  • src/synapt/recall/config.py (DEFAULTS dict)
  • evaluation/eval_config.json (enrichment_model entries)
  • src/synapt/recall/enrich.py (may simplify _parse_enrichment_text fallback if JSON compliance improves)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions