Skip to content

Dogfood: Accumulate curated regression corpus from successful runs #366

@antonsynd

Description

@antonsynd

Summary

Each dogfood run regenerates all programs from scratch, re-encountering the same LLM mistakes. Successful programs (47.5% of the last run) are discarded after verification. This wastes both the generation cost and the testing value of known-good programs.

Proposal: Treat successful dogfood outputs as a growing regression corpus. New runs focus generation budget on finding new bugs rather than re-covering old ground.

Design

Corpus accumulation

After each dogfood run:

  1. Programs that compiled and produced correct output are added to a permanent corpus (e.g., dogfood_corpus/)
  2. Each entry includes: .spy source files, expected output, metadata (feature tags, complexity, generation date)
  3. Deduplication: skip programs that are structurally similar to existing corpus entries (same feature combination + complexity tier)

Run structure change

Current:  Generate 200 new programs → compile all → triage all
Proposed: Run corpus (N existing) + generate M new programs → compile all → triage new only

Corpus as regression suite

The corpus doubles as a regression test suite:

  • Run the full corpus on each compiler change (or a random sample for CI)
  • If a previously-passing program breaks, it's a compiler regression — high signal
  • Corpus programs can be promoted to proper test fixtures if they cover unique patterns

Growth model

  • Run 1: 0 corpus + 200 new → ~95 pass → corpus = 95
  • Run 2: 95 corpus + 150 new → ~70 new pass → corpus = 165
  • Run 5: ~300 corpus + 100 new → focus entirely on new patterns
  • Eventually the corpus covers most feature combinations and new runs target edge cases

Pruning

  • Remove corpus entries that duplicate existing test fixtures
  • Remove entries whose feature combinations are fully covered by newer, more complex entries
  • Keep corpus size bounded (e.g., 500 programs max)

Impact

  • Each run becomes progressively more valuable (diminishing-waste, not diminishing-returns)
  • Corpus serves as a living regression suite for compiler changes
  • Generation budget focuses on novel feature combinations
  • Reduces triage burden — only new programs need investigation

Implementation

  • Location: build_tools/ + new dogfood_corpus/ directory
  • Metadata format: extend existing metadata.json with corpus fields
  • Similarity detection: hash feature tags + complexity tier
  • CI integration: optional dotnet test target or standalone script

Discovered via

Dogfood analysis session 2026-03-10 — observed that 47.5% of generated programs succeed but are discarded after each run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions