Dogfood: Accumulate curated regression corpus from successful runs

## Summary

Each dogfood run regenerates all programs from scratch, re-encountering the same LLM mistakes. Successful programs (47.5% of the last run) are discarded after verification. This wastes both the generation cost and the testing value of known-good programs.

**Proposal**: Treat successful dogfood outputs as a growing regression corpus. New runs focus generation budget on finding new bugs rather than re-covering old ground.

## Design

### Corpus accumulation

After each dogfood run:
1. Programs that compiled and produced correct output are added to a permanent corpus (e.g., `dogfood_corpus/`)
2. Each entry includes: `.spy` source files, expected output, metadata (feature tags, complexity, generation date)
3. Deduplication: skip programs that are structurally similar to existing corpus entries (same feature combination + complexity tier)

### Run structure change

```
Current:  Generate 200 new programs → compile all → triage all
Proposed: Run corpus (N existing) + generate M new programs → compile all → triage new only
```

### Corpus as regression suite

The corpus doubles as a regression test suite:
- Run the full corpus on each compiler change (or a random sample for CI)
- If a previously-passing program breaks, it's a compiler regression — high signal
- Corpus programs can be promoted to proper test fixtures if they cover unique patterns

### Growth model

- Run 1: 0 corpus + 200 new → ~95 pass → corpus = 95
- Run 2: 95 corpus + 150 new → ~70 new pass → corpus = 165
- Run 5: ~300 corpus + 100 new → focus entirely on new patterns
- Eventually the corpus covers most feature combinations and new runs target edge cases

### Pruning

- Remove corpus entries that duplicate existing test fixtures
- Remove entries whose feature combinations are fully covered by newer, more complex entries
- Keep corpus size bounded (e.g., 500 programs max)

## Impact

- Each run becomes progressively more valuable (diminishing-waste, not diminishing-returns)
- Corpus serves as a living regression suite for compiler changes
- Generation budget focuses on novel feature combinations
- Reduces triage burden — only new programs need investigation

## Implementation

- Location: `build_tools/` + new `dogfood_corpus/` directory
- Metadata format: extend existing `metadata.json` with corpus fields
- Similarity detection: hash feature tags + complexity tier
- CI integration: optional `dotnet test` target or standalone script

## Discovered via

Dogfood analysis session 2026-03-10 — observed that 47.5% of generated programs succeed but are discarded after each run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dogfood: Accumulate curated regression corpus from successful runs #366

Summary

Design

Corpus accumulation

Run structure change

Corpus as regression suite

Growth model

Pruning

Impact

Implementation

Discovered via

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dogfood: Accumulate curated regression corpus from successful runs #366

Description

Summary

Design

Corpus accumulation

Run structure change

Corpus as regression suite

Growth model

Pruning

Impact

Implementation

Discovered via

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions