Summary
The dogfood pipeline currently relies entirely on LLM prompting to generate valid Sharpy code. Despite extensive prompt engineering, ~30% of failures are the LLM making mechanical mistakes (wrong decorator syntax, missing self, instance fields in static classes, etc.). Prompting has hit diminishing returns — the LLM "knows" the rules but inconsistently applies them.
Proposal: Add a deterministic repair pass between LLM generation and compilation that mechanically fixes known error patterns using the Sharpy parser itself.
Design
Pipeline change
Current: LLM generates → compile → run → compare output
Proposed: LLM generates → PARSE WITH SHARPY → REPAIR AST → compile → run → compare output
Repair rules (ordered by dogfood impact)
| Rule |
Pattern |
Fix |
Dogfood items |
| Decorator line split |
@abstract class X: on one line |
Split to @abstract\nclass X: |
5+ |
| Python decorator rename |
@staticmethod |
Replace with @static |
1+ |
| Static class field fix |
Plain field in @static class |
Add @static decorator |
4 |
| Missing self on property |
property get foo() -> T: |
Add self parameter |
1+ |
Expected output validation
For C2 failures (wrong expected output), run the generated code through Python first where semantics overlap. Use Python's stdout as the expected output instead of trusting the LLM's prediction. This eliminates float formatting mismatches (42.0 vs 42) and computation errors.
Early skip
If the generated code fails to parse even after repair, skip immediately instead of wasting 3 retry attempts.
Impact
- Eliminates ~60% of C1 (prompting) failures mechanically
- Eliminates most C2 (output mismatch) failures via Python oracle
- Improves signal-to-noise ratio from ~30% compiler bugs to ~60%+
- Meta-dogfooding: uses the Sharpy parser as part of the pipeline
Implementation
- Location:
build_tools/ Python pipeline
- Can call
sharpyc emit ast to parse and detect issues
- Repair rules are simple string/regex transforms on known patterns
- Python oracle is a subprocess call to
python3
Discovered via
Dogfood analysis session 2026-03-10 — observed that prompting has hit diminishing returns for code generation quality.
Summary
The dogfood pipeline currently relies entirely on LLM prompting to generate valid Sharpy code. Despite extensive prompt engineering, ~30% of failures are the LLM making mechanical mistakes (wrong decorator syntax, missing
self, instance fields in static classes, etc.). Prompting has hit diminishing returns — the LLM "knows" the rules but inconsistently applies them.Proposal: Add a deterministic repair pass between LLM generation and compilation that mechanically fixes known error patterns using the Sharpy parser itself.
Design
Pipeline change
Repair rules (ordered by dogfood impact)
@abstract class X:on one line@abstract\nclass X:@staticmethod@static@static class@staticdecoratorproperty get foo() -> T:selfparameterExpected output validation
For C2 failures (wrong expected output), run the generated code through Python first where semantics overlap. Use Python's stdout as the expected output instead of trusting the LLM's prediction. This eliminates float formatting mismatches (
42.0vs42) and computation errors.Early skip
If the generated code fails to parse even after repair, skip immediately instead of wasting 3 retry attempts.
Impact
Implementation
build_tools/Python pipelinesharpyc emit astto parse and detect issuespython3Discovered via
Dogfood analysis session 2026-03-10 — observed that prompting has hit diminishing returns for code generation quality.