Skip to content

Commit 093ef59

Browse files
committed
DCE: Update refactor plan for per-file state and immutable results
Key design principles added: - Separate per-file input (keyed by filename) from project-wide analysis - Analysis results are immutable - returned by solver, not mutated - Enable incremental updates by replacing one file's data Updated tasks to emphasize: - Per-file state with merge functions for project-wide view - Solver returns AnalysisResult.t instead of mutating input - Task 3 marked partial (input/output mixing fixed in Task 8)
1 parent 5066e28 commit 093ef59

File tree

1 file changed

+190
-66
lines changed

1 file changed

+190
-66
lines changed

analysis/reanalyze/DEADCODE_REFACTOR_PLAN.md

Lines changed: 190 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
- Global mutable state is eliminated
66
- Side effects (logging, file I/O) live at the edges
77
- Processing files in different orders gives the same results
8+
- **Incremental analysis is possible** - can reprocess one file without redoing everything
89

910
**Why?** The current architecture makes:
1011
- Incremental/reactive analysis impossible (can't reprocess one file)
@@ -14,6 +15,50 @@
1415

1516
---
1617

18+
## Key Design Principles
19+
20+
### 1. Separate per-file input from project-wide analysis
21+
22+
**Per-file source data** (can be incrementally updated):
23+
- Source annotations (`@dead`, `@live`, `@genType` from AST)
24+
- Declarations defined in that file
25+
- References made from that file
26+
- Keyed by filename so we can replace one file's data
27+
28+
**Project-wide analysis** (computed from merged per-file data):
29+
- Deadness solver operates on merged view of all files
30+
- Results are **immutable** - returned as data, not mutated
31+
32+
### 2. Analysis results are immutable
33+
34+
The solver should:
35+
- Take source data as **read-only input**
36+
- Return results as **new immutable data**
37+
- Never mutate input state during analysis
38+
39+
```ocaml
40+
(* WRONG - current design mutates state during analysis *)
41+
let resolveRecursiveRefs ~state ... =
42+
...
43+
AnnotationState.annotate_dead state decl.pos (* mutation! *)
44+
45+
(* RIGHT - return results as data *)
46+
let solve_deadness ~source_annotations ~decls ~refs =
47+
... compute ...
48+
{ dead_positions; issues; annotations_to_write } (* return, don't mutate *)
49+
```
50+
51+
### 3. Enable incremental updates
52+
53+
When file F changes:
54+
1. Replace `per_file_data[F]` with new data from re-processing F
55+
2. Re-merge into project-wide view
56+
3. Re-run solver (returns new results)
57+
58+
This requires per-file data to be **keyed by filename**.
59+
60+
---
61+
1762
## Current Problems (What We're Fixing)
1863

1964
### P1: Global "current file" context
@@ -39,7 +84,9 @@
3984
- `DeadType.TypeDependencies.delayedItems` - deferred type deps
4085
- `ProcessDeadAnnotations.positionsAnnotated` - annotation tracking
4186

42-
**Impact**: Order-dependent. Processing files in different orders can give different results because queue processing happens at arbitrary times.
87+
**Additional problem**: `positionsAnnotated` mixes **input** (source annotations from AST) with **output** (positions the solver determines are dead). The solver mutates this during analysis, violating purity.
88+
89+
**Impact**: Order-dependent. Processing files in different orders can give different results because queue processing happens at arbitrary times. Mixing input/output prevents incremental analysis.
4390

4491
### P4: Global configuration reads
4592
**Problem**: Analysis code directly reads `!Common.Cli.debug`, `RunConfig.runConfig.transitive`, etc. scattered throughout. Can't run analysis with different configs without mutating globals.
@@ -65,63 +112,79 @@
65112
## End State
66113

67114
```ocaml
68-
(* Configuration: all inputs as immutable data *)
115+
(* Configuration: immutable *)
69116
type config = {
70-
run : RunConfig.t; (* transitive, suppress lists, etc. *)
117+
run : RunConfig.t;
71118
debug : bool;
72119
write_annotations : bool;
73120
live_names : string list;
74121
live_paths : string list;
75122
exclude_paths : string list;
76123
}
77124
78-
(* Per-file analysis state - everything needed to analyze one file *)
79-
type file_state = {
125+
(* Per-file source data - extracted from one file's AST *)
126+
type file_data = {
80127
source_path : string;
81128
module_name : Name.t;
82129
is_interface : bool;
83-
annotations : annotation_state;
84-
(* ... other per-file state *)
85-
}
86-
87-
(* Project-level analysis state - accumulated across all files *)
88-
type project_state = {
89-
decls : decl PosHash.t;
90-
value_refs : PosSet.t PosHash.t;
91-
type_refs : PosSet.t PosHash.t;
92-
file_refs : FileSet.t FileHash.t;
93-
optional_args : optional_args_state;
94-
exceptions : exception_state;
95-
(* ... *)
130+
source_annotations : annotated_as PosHash.t; (* @dead/@live/@genType in source *)
131+
decls : decl list; (* declarations defined here *)
132+
value_refs : (pos * pos) list; (* references made from here *)
133+
type_refs : (pos * pos) list;
134+
file_refs : string list; (* files this file depends on *)
96135
}
97136
98-
(* Pure analysis function *)
99-
val analyze_file : config -> file_state -> project_state -> Cmt_format.cmt_infos -> project_state
137+
(* Per-file data keyed by filename - enables incremental updates *)
138+
type per_file_state = file_data StringMap.t
100139
101-
(* Pure deadness solver *)
102-
val solve_deadness : config -> project_state -> analysis_result
140+
(* Project-wide merged view - computed from per_file_state *)
141+
type merged_state = {
142+
all_annotations : annotated_as PosHash.t; (* merged from all files *)
143+
all_decls : decl PosHash.t; (* merged from all files *)
144+
all_value_refs : PosSet.t PosHash.t; (* merged from all files *)
145+
all_type_refs : PosSet.t PosHash.t;
146+
all_file_refs : FileSet.t StringMap.t;
147+
}
103148
149+
(* Analysis results - IMMUTABLE, returned by solver *)
104150
type analysis_result = {
105-
dead_decls : decl list;
106-
issues : Common.issue list;
151+
dead_positions : PosSet.t;
152+
issues : issue list;
107153
annotations_to_write : (string * line_annotation list) list;
108154
}
109155
110-
(* Side effects at the edge *)
156+
(* Pure: extract data from one file *)
157+
val process_file : config -> Cmt_format.cmt_infos -> file_data
158+
159+
(* Pure: merge per-file data into project-wide view *)
160+
val merge_file_data : per_file_state -> merged_state
161+
162+
(* Pure: solve deadness - takes READ-ONLY input, returns IMMUTABLE result *)
163+
val solve_deadness : config -> merged_state -> analysis_result
164+
165+
(* Orchestration with side effects at edges *)
111166
let run_analysis ~config ~cmt_files =
112-
(* Pure: analyze all files *)
113-
let project_state =
167+
(* Pure: process each file independently *)
168+
let per_file =
114169
cmt_files
115-
|> List.fold_left (fun state file ->
116-
analyze_file config (file_state_for file) state (load_cmt file)
117-
) empty_project_state
170+
|> List.map (fun path -> (path, process_file config (load_cmt path)))
171+
|> StringMap.of_list
118172
in
119-
(* Pure: solve deadness *)
120-
let result = solve_deadness config project_state in
173+
(* Pure: merge into project-wide view *)
174+
let merged = merge_file_data per_file in
175+
(* Pure: solve deadness - NO MUTATION *)
176+
let result = solve_deadness config merged in
121177
(* Impure: report results *)
122178
result.issues |> List.iter report_issue;
123179
if config.write_annotations then
124-
result.annotations_to_write |> List.iter write_annotations_to_file
180+
result.annotations_to_write |> List.iter write_to_file
181+
182+
(* Incremental update when file F changes *)
183+
let update_file ~config ~per_file ~changed_file =
184+
let new_file_data = process_file config (load_cmt changed_file) in
185+
let per_file = StringMap.add changed_file new_file_data per_file in
186+
let merged = merge_file_data per_file in
187+
solve_deadness config merged
125188
```
126189

127190
---
@@ -173,36 +236,60 @@ Each task should:
173236
**Value**: Removes hidden global state. Makes annotation tracking testable.
174237

175238
**Changes**:
176-
- [ ] Change `ProcessDeadAnnotations` functions to take/return explicit `state` instead of mutating `positionsAnnotated` ref
177-
- [ ] Thread `annotation_state` through `DeadCode.processCmt`
178-
- [ ] Delete the global `positionsAnnotated`
239+
- [x] Create `AnnotationState.t` module with explicit state type and accessor functions
240+
- [x] Change `ProcessDeadAnnotations` functions to take explicit `~state:AnnotationState.t`
241+
- [x] Thread `annotation_state` through `DeadCode.processCmt` and `Reanalyze.loadCmtFile`
242+
- [x] Update `declIsDead`, `doReportDead`, `resolveRecursiveRefs`, `reportDead` to use explicit state
243+
- [x] Update `DeadOptionalArgs.check` to take explicit state
244+
- [x] Delete the global `positionsAnnotated`
245+
246+
**Status**: Partially complete ⚠️
247+
248+
**Known limitation**: Current implementation still mixes concerns:
249+
- Source annotations (from `@dead`/`@live`/`@genType` in files) - INPUT
250+
- Analysis results (positions solver determined are dead) - OUTPUT
251+
252+
The solver currently **mutates** `AnnotationState` via `annotate_dead` during `resolveRecursiveRefs`.
253+
This violates the principle that analysis results should be immutable and returned.
254+
255+
**TODO** (in later task):
256+
- [ ] Separate `SourceAnnotations.t` (per-file, read-only input) from analysis results
257+
- [ ] Make `SourceAnnotations` keyed by filename for incremental updates
258+
- [ ] Solver should return dead positions as part of `analysis_result`, not mutate state
179259

180260
**Test**: Process two files "simultaneously" (two separate state values) - should not interfere.
181261

182262
**Estimated effort**: Small (well-scoped module)
183263

184264
### Task 4: Localize analysis tables (P2) - Part 1: Declarations
185265

186-
**Value**: First step toward incremental analysis. Can analyze a subset of files with isolated state.
266+
**Value**: First step toward incremental analysis. Per-file declaration data enables replacing one file's contributions.
187267

188268
**Changes**:
189-
- [ ] Change `DeadCommon.addDeclaration_` and friends to take `decl_state : decl PosHash.t` parameter
190-
- [ ] Thread through `DeadCode.processCmt` - allocate fresh state, pass through, return updated state
191-
- [ ] Accumulate per-file states in `Reanalyze.processCmtFiles`
269+
- [ ] Create `FileDecls.t` type for per-file declarations (keyed by filename)
270+
- [ ] `process_file` returns declarations for that file only
271+
- [ ] Store as `file_decls : decl list StringMap.t` (per-file, keyed by filename)
272+
- [ ] Create `merge_decls : file_decls -> decl PosHash.t` for project-wide view
192273
- [ ] Delete global `DeadCommon.decls`
193274

275+
**Incremental benefit**: When file F changes, just replace `file_decls[F]` and re-merge.
276+
194277
**Test**: Analyze files with separate decl tables - should not interfere.
195278

196279
**Estimated effort**: Medium (core data structure, many call sites)
197280

198281
### Task 5: Localize analysis tables (P2) - Part 2: References
199282

200-
**Value**: Completes the localization of analysis state.
283+
**Value**: Completes per-file reference tracking for incremental analysis.
201284

202285
**Changes**:
203-
- [ ] Same pattern as Task 4 but for `ValueReferences.table` and `TypeReferences.table`
204-
- [ ] Thread explicit `value_refs` and `type_refs` parameters
205-
- [ ] Delete global reference tables
286+
- [ ] Create `FileRefs.t` for per-file references (keyed by filename)
287+
- [ ] `process_file` returns references made from that file
288+
- [ ] Store as `file_value_refs : (pos * pos) list StringMap.t`
289+
- [ ] Create `merge_refs` for project-wide view
290+
- [ ] Delete global `ValueReferences.table` and `TypeReferences.table`
291+
292+
**Incremental benefit**: When file F changes, replace `file_refs[F]` and re-merge.
206293

207294
**Test**: Same as Task 4.
208295

@@ -213,40 +300,63 @@ Each task should:
213300
**Value**: Removes order dependence. Makes analysis deterministic.
214301

215302
**Changes**:
216-
- [ ] `DeadOptionalArgs`: Thread explicit `state` with `delayed_items` and `function_refs`, delete global refs
217-
- [ ] `DeadException`: Thread explicit `state` with `delayed_items` and `declarations`, delete global refs
218-
- [ ] `DeadType.TypeDependencies`: Thread explicit `type_deps_state`, delete global ref
219-
- [ ] Update `forceDelayedItems` calls to operate on explicit state
303+
- [ ] `DeadOptionalArgs`: Return delayed items from file processing, merge later
304+
- [ ] `DeadException`: Return delayed items from file processing, merge later
305+
- [ ] `DeadType.TypeDependencies`: Return delayed items from file processing, merge later
306+
- [ ] `forceDelayedItems` operates on merged delayed items (pure function)
307+
- [ ] Delete global refs
308+
309+
**Key insight**: Delayed items should be **returned** from file processing, not accumulated in globals.
310+
This makes them per-file and enables incremental updates.
220311

221312
**Test**: Process files in different orders - delayed items should be processed consistently.
222313

223314
**Estimated effort**: Medium (3 modules, each similar to Task 3)
224315

225316
### Task 7: Localize file/module tracking (P2 + P3)
226317

227-
**Value**: Removes last major global state. Makes cross-file analysis explicit.
318+
**Value**: Per-file dependency tracking enables incremental dependency graph updates.
228319

229320
**Changes**:
230-
- [ ] `FileReferences`: Replace global `table` with explicit `file_refs_state` parameter
231-
- [ ] `DeadModules`: Replace global `table` with explicit `module_state` parameter
232-
- [ ] Thread both through analysis pipeline
233-
- [ ] `iterFilesFromRootsToLeaves`: take explicit state, return ordered file list (pure)
321+
- [ ] `FileReferences`: Store per-file as `file_deps : string list StringMap.t`
322+
- [ ] Create `merge_file_refs` for project-wide dependency graph
323+
- [ ] `DeadModules`: Track per-file module usage, merge for project-wide view
324+
- [ ] `iterFilesFromRootsToLeaves`: pure function on merged file refs, returns ordered list
325+
326+
**Incremental benefit**: When file F changes, update `file_deps[F]` and re-merge graph.
234327

235328
**Test**: Build file reference graph in isolation, verify topological ordering is correct.
236329

237330
**Estimated effort**: Medium (cross-file logic, but well-contained)
238331

239-
### Task 8: Separate analysis from reporting (P5)
332+
### Task 8: Separate analysis from reporting (P5) - Immutable Results
240333

241-
**Value**: Core analysis is now pure. Can get results as data. Can test without I/O.
334+
**Value**: Solver returns immutable results. No mutation during analysis. Pure function.
242335

243336
**Changes**:
244-
- [ ] `DeadCommon.reportDead`: Return `issue list` instead of calling `Log_.warning`
337+
- [ ] Create `AnalysisResult.t` type with `dead_positions`, `issues`, `annotations_to_write`
338+
- [ ] `solve_deadness`: Return `AnalysisResult.t` instead of mutating state
339+
- [ ] Remove `AnnotationState.annotate_dead` call from `resolveRecursiveRefs`
340+
- [ ] Dead positions are part of returned result, not mutated into input state
245341
- [ ] `Decl.report`: Return `issue` instead of logging
246342
- [ ] Remove all `Log_.warning`, `Log_.item`, `EmitJson` calls from `Dead*.ml` modules
247-
- [ ] `Reanalyze.runAnalysis`: Call pure analysis, then separately report issues
343+
- [ ] `Reanalyze.runAnalysis`: Call pure solver, then separately report from result
344+
345+
**Key principle**: The solver takes **read-only** merged state and returns **new immutable** results.
346+
No mutation of input state during analysis.
248347

249-
**Test**: Run analysis, capture result list, verify no I/O side effects occurred.
348+
```ocaml
349+
(* Before - WRONG *)
350+
let solve ~state =
351+
... AnnotationState.annotate_dead state pos ... (* mutates input! *)
352+
353+
(* After - RIGHT *)
354+
let solve ~merged_state =
355+
let dead_positions = ... compute ... in
356+
{ dead_positions; issues; annotations_to_write } (* return new data *)
357+
```
358+
359+
**Test**: Run analysis, capture result, verify input state unchanged.
250360

251361
**Estimated effort**: Medium (many logging call sites, but mechanical)
252362

@@ -296,17 +406,23 @@ Each task should:
296406
## Execution Strategy
297407

298408
**Completed**: Task 1 ✅, Task 2 ✅, Task 10 ✅
409+
**Partially complete**: Task 3 ⚠️ (state explicit but still mixes input/output)
299410

300-
**Remaining order**: 3 → 4 → 5 → 6 → 7 → 8 → 9 → 11 (test)
411+
**Remaining order**: 4 → 5 → 6 → 7 → 8 → 9 → 11 (test)
301412

302413
**Why this order?**
303414
- Tasks 1-2 remove implicit dependencies (file context, config) - ✅ DONE
304-
- Tasks 3-7 localize global state - can be done incrementally now that inputs are explicit
305-
- Tasks 8-9 separate pure/impure - can only do this once state is local
415+
- Task 3 makes annotation tracking explicit - ⚠️ PARTIAL (needs input/output separation in Task 8)
416+
- Tasks 4-7 make state **per-file** for incremental updates
417+
- Task 8 makes solver **pure** with immutable results (also fixes Task 3's input/output mixing)
418+
- Task 9 separates annotation computation from file writing
306419
- Task 10 verifies no global config reads remain - ✅ DONE
307-
- Task 11 validates everything
420+
- Task 11 validates everything including incremental updates
308421

309-
**Alternative**: Could do 3-7 in any order (they're mostly independent).
422+
**Key architectural milestones**:
423+
1. **After Task 7**: All state is per-file, keyed by filename
424+
2. **After Task 8**: Solver is pure, returns immutable results
425+
3. **After Task 11**: Incremental updates verified working
310426

311427
**Time estimate**:
312428
- Best case (everything goes smoothly): 2-3 days
@@ -331,12 +447,20 @@ After all tasks:
331447
**Pure analysis function**
332448
- Can call analysis and get results as data
333449
- No side effects (logging, file I/O) during analysis
450+
- **Solver returns immutable results** - no mutation of input state
451+
452+
**Per-file state enables incremental updates**
453+
- All per-file data (annotations, decls, refs) keyed by filename
454+
- Can replace one file's data: `per_file_state[F] = new_data`
455+
- Re-merge and re-solve without reprocessing other files
334456

335-
**Incremental analysis possible**
336-
- Can create empty state and analyze just one file
337-
- Can update state with new file without reanalyzing everything
457+
**Clear separation of input vs output**
458+
- Source annotations (from AST) are **read-only input**
459+
- Analysis results (dead positions, issues) are **immutable output**
460+
- Solver takes input, returns output - no mixing
338461

339462
**Testable**
340463
- Can test analysis without mocking I/O
341464
- Can test with different configs without mutating globals
342465
- Can test with isolated state
466+
- Can verify solver doesn't mutate its input

0 commit comments

Comments
 (0)