55- Global mutable state is eliminated
66- Side effects (logging, file I/O) live at the edges
77- Processing files in different orders gives the same results
8+ - ** Incremental analysis is possible** - can reprocess one file without redoing everything
89
910** Why?** The current architecture makes:
1011- Incremental/reactive analysis impossible (can't reprocess one file)
1415
1516---
1617
18+ ## Key Design Principles
19+
20+ ### 1. Separate per-file input from project-wide analysis
21+
22+ ** Per-file source data** (can be incrementally updated):
23+ - Source annotations (` @dead ` , ` @live ` , ` @genType ` from AST)
24+ - Declarations defined in that file
25+ - References made from that file
26+ - Keyed by filename so we can replace one file's data
27+
28+ ** Project-wide analysis** (computed from merged per-file data):
29+ - Deadness solver operates on merged view of all files
30+ - Results are ** immutable** - returned as data, not mutated
31+
32+ ### 2. Analysis results are immutable
33+
34+ The solver should:
35+ - Take source data as ** read-only input**
36+ - Return results as ** new immutable data**
37+ - Never mutate input state during analysis
38+
39+ ``` ocaml
40+ (* WRONG - current design mutates state during analysis *)
41+ let resolveRecursiveRefs ~state ... =
42+ ...
43+ AnnotationState.annotate_dead state decl.pos (* mutation! *)
44+
45+ (* RIGHT - return results as data *)
46+ let solve_deadness ~source_annotations ~decls ~refs =
47+ ... compute ...
48+ { dead_positions; issues; annotations_to_write } (* return, don't mutate *)
49+ ```
50+
51+ ### 3. Enable incremental updates
52+
53+ When file F changes:
54+ 1 . Replace ` per_file_data[F] ` with new data from re-processing F
55+ 2 . Re-merge into project-wide view
56+ 3 . Re-run solver (returns new results)
57+
58+ This requires per-file data to be ** keyed by filename** .
59+
60+ ---
61+
1762## Current Problems (What We're Fixing)
1863
1964### P1: Global "current file" context
3984- ` DeadType.TypeDependencies.delayedItems ` - deferred type deps
4085- ` ProcessDeadAnnotations.positionsAnnotated ` - annotation tracking
4186
42- ** Impact** : Order-dependent. Processing files in different orders can give different results because queue processing happens at arbitrary times.
87+ ** Additional problem** : ` positionsAnnotated ` mixes ** input** (source annotations from AST) with ** output** (positions the solver determines are dead). The solver mutates this during analysis, violating purity.
88+
89+ ** Impact** : Order-dependent. Processing files in different orders can give different results because queue processing happens at arbitrary times. Mixing input/output prevents incremental analysis.
4390
4491### P4: Global configuration reads
4592** Problem** : Analysis code directly reads ` !Common.Cli.debug ` , ` RunConfig.runConfig.transitive ` , etc. scattered throughout. Can't run analysis with different configs without mutating globals.
65112## End State
66113
67114``` ocaml
68- (* Configuration: all inputs as immutable data *)
115+ (* Configuration: immutable *)
69116type config = {
70- run : RunConfig.t; (* transitive, suppress lists, etc. *)
117+ run : RunConfig.t;
71118 debug : bool;
72119 write_annotations : bool;
73120 live_names : string list;
74121 live_paths : string list;
75122 exclude_paths : string list;
76123}
77124
78- (* Per-file analysis state - everything needed to analyze one file *)
79- type file_state = {
125+ (* Per-file source data - extracted from one file's AST *)
126+ type file_data = {
80127 source_path : string;
81128 module_name : Name.t;
82129 is_interface : bool;
83- annotations : annotation_state;
84- (* ... other per-file state *)
85- }
86-
87- (* Project-level analysis state - accumulated across all files *)
88- type project_state = {
89- decls : decl PosHash.t;
90- value_refs : PosSet.t PosHash.t;
91- type_refs : PosSet.t PosHash.t;
92- file_refs : FileSet.t FileHash.t;
93- optional_args : optional_args_state;
94- exceptions : exception_state;
95- (* ... *)
130+ source_annotations : annotated_as PosHash.t; (* @dead/@live/@genType in source *)
131+ decls : decl list; (* declarations defined here *)
132+ value_refs : (pos * pos) list; (* references made from here *)
133+ type_refs : (pos * pos) list;
134+ file_refs : string list; (* files this file depends on *)
96135}
97136
98- (* Pure analysis function *)
99- val analyze_file : config -> file_state -> project_state -> Cmt_format.cmt_infos -> project_state
137+ (* Per-file data keyed by filename - enables incremental updates *)
138+ type per_file_state = file_data StringMap.t
100139
101- (* Pure deadness solver *)
102- val solve_deadness : config -> project_state -> analysis_result
140+ (* Project-wide merged view - computed from per_file_state *)
141+ type merged_state = {
142+ all_annotations : annotated_as PosHash.t; (* merged from all files *)
143+ all_decls : decl PosHash.t; (* merged from all files *)
144+ all_value_refs : PosSet.t PosHash.t; (* merged from all files *)
145+ all_type_refs : PosSet.t PosHash.t;
146+ all_file_refs : FileSet.t StringMap.t;
147+ }
103148
149+ (* Analysis results - IMMUTABLE, returned by solver *)
104150type analysis_result = {
105- dead_decls : decl list ;
106- issues : Common. issue list;
151+ dead_positions : PosSet.t ;
152+ issues : issue list;
107153 annotations_to_write : (string * line_annotation list) list;
108154}
109155
110- (* Side effects at the edge *)
156+ (* Pure: extract data from one file *)
157+ val process_file : config -> Cmt_format.cmt_infos -> file_data
158+
159+ (* Pure: merge per-file data into project-wide view *)
160+ val merge_file_data : per_file_state -> merged_state
161+
162+ (* Pure: solve deadness - takes READ-ONLY input, returns IMMUTABLE result *)
163+ val solve_deadness : config -> merged_state -> analysis_result
164+
165+ (* Orchestration with side effects at edges *)
111166let run_analysis ~config ~cmt_files =
112- (* Pure: analyze all files *)
113- let project_state =
167+ (* Pure: process each file independently *)
168+ let per_file =
114169 cmt_files
115- |> List.fold_left (fun state file ->
116- analyze_file config (file_state_for file) state (load_cmt file)
117- ) empty_project_state
170+ |> List.map (fun path -> (path, process_file config (load_cmt path)))
171+ |> StringMap.of_list
118172 in
119- (* Pure: solve deadness *)
120- let result = solve_deadness config project_state in
173+ (* Pure: merge into project-wide view *)
174+ let merged = merge_file_data per_file in
175+ (* Pure: solve deadness - NO MUTATION *)
176+ let result = solve_deadness config merged in
121177 (* Impure: report results *)
122178 result.issues |> List.iter report_issue;
123179 if config.write_annotations then
124- result.annotations_to_write |> List.iter write_annotations_to_file
180+ result.annotations_to_write |> List.iter write_to_file
181+
182+ (* Incremental update when file F changes *)
183+ let update_file ~config ~per_file ~changed_file =
184+ let new_file_data = process_file config (load_cmt changed_file) in
185+ let per_file = StringMap.add changed_file new_file_data per_file in
186+ let merged = merge_file_data per_file in
187+ solve_deadness config merged
125188```
126189
127190---
@@ -173,36 +236,60 @@ Each task should:
173236** Value** : Removes hidden global state. Makes annotation tracking testable.
174237
175238** Changes** :
176- - [ ] Change ` ProcessDeadAnnotations ` functions to take/return explicit ` state ` instead of mutating ` positionsAnnotated ` ref
177- - [ ] Thread ` annotation_state ` through ` DeadCode.processCmt `
178- - [ ] Delete the global ` positionsAnnotated `
239+ - [x] Create ` AnnotationState.t ` module with explicit state type and accessor functions
240+ - [x] Change ` ProcessDeadAnnotations ` functions to take explicit ` ~state:AnnotationState.t `
241+ - [x] Thread ` annotation_state ` through ` DeadCode.processCmt ` and ` Reanalyze.loadCmtFile `
242+ - [x] Update ` declIsDead ` , ` doReportDead ` , ` resolveRecursiveRefs ` , ` reportDead ` to use explicit state
243+ - [x] Update ` DeadOptionalArgs.check ` to take explicit state
244+ - [x] Delete the global ` positionsAnnotated `
245+
246+ ** Status** : Partially complete ⚠️
247+
248+ ** Known limitation** : Current implementation still mixes concerns:
249+ - Source annotations (from ` @dead ` /` @live ` /` @genType ` in files) - INPUT
250+ - Analysis results (positions solver determined are dead) - OUTPUT
251+
252+ The solver currently ** mutates** ` AnnotationState ` via ` annotate_dead ` during ` resolveRecursiveRefs ` .
253+ This violates the principle that analysis results should be immutable and returned.
254+
255+ ** TODO** (in later task):
256+ - [ ] Separate ` SourceAnnotations.t ` (per-file, read-only input) from analysis results
257+ - [ ] Make ` SourceAnnotations ` keyed by filename for incremental updates
258+ - [ ] Solver should return dead positions as part of ` analysis_result ` , not mutate state
179259
180260** Test** : Process two files "simultaneously" (two separate state values) - should not interfere.
181261
182262** Estimated effort** : Small (well-scoped module)
183263
184264### Task 4: Localize analysis tables (P2) - Part 1: Declarations
185265
186- ** Value** : First step toward incremental analysis. Can analyze a subset of files with isolated state .
266+ ** Value** : First step toward incremental analysis. Per-file declaration data enables replacing one file's contributions .
187267
188268** Changes** :
189- - [ ] Change ` DeadCommon.addDeclaration_ ` and friends to take ` decl_state : decl PosHash.t ` parameter
190- - [ ] Thread through ` DeadCode.processCmt ` - allocate fresh state, pass through, return updated state
191- - [ ] Accumulate per-file states in ` Reanalyze.processCmtFiles `
269+ - [ ] Create ` FileDecls.t ` type for per-file declarations (keyed by filename)
270+ - [ ] ` process_file ` returns declarations for that file only
271+ - [ ] Store as ` file_decls : decl list StringMap.t ` (per-file, keyed by filename)
272+ - [ ] Create ` merge_decls : file_decls -> decl PosHash.t ` for project-wide view
192273- [ ] Delete global ` DeadCommon.decls `
193274
275+ ** Incremental benefit** : When file F changes, just replace ` file_decls[F] ` and re-merge.
276+
194277** Test** : Analyze files with separate decl tables - should not interfere.
195278
196279** Estimated effort** : Medium (core data structure, many call sites)
197280
198281### Task 5: Localize analysis tables (P2) - Part 2: References
199282
200- ** Value** : Completes the localization of analysis state .
283+ ** Value** : Completes per-file reference tracking for incremental analysis .
201284
202285** Changes** :
203- - [ ] Same pattern as Task 4 but for ` ValueReferences.table ` and ` TypeReferences.table `
204- - [ ] Thread explicit ` value_refs ` and ` type_refs ` parameters
205- - [ ] Delete global reference tables
286+ - [ ] Create ` FileRefs.t ` for per-file references (keyed by filename)
287+ - [ ] ` process_file ` returns references made from that file
288+ - [ ] Store as ` file_value_refs : (pos * pos) list StringMap.t `
289+ - [ ] Create ` merge_refs ` for project-wide view
290+ - [ ] Delete global ` ValueReferences.table ` and ` TypeReferences.table `
291+
292+ ** Incremental benefit** : When file F changes, replace ` file_refs[F] ` and re-merge.
206293
207294** Test** : Same as Task 4.
208295
@@ -213,40 +300,63 @@ Each task should:
213300** Value** : Removes order dependence. Makes analysis deterministic.
214301
215302** Changes** :
216- - [ ] ` DeadOptionalArgs ` : Thread explicit ` state ` with ` delayed_items ` and ` function_refs ` , delete global refs
217- - [ ] ` DeadException ` : Thread explicit ` state ` with ` delayed_items ` and ` declarations ` , delete global refs
218- - [ ] ` DeadType.TypeDependencies ` : Thread explicit ` type_deps_state ` , delete global ref
219- - [ ] Update ` forceDelayedItems ` calls to operate on explicit state
303+ - [ ] ` DeadOptionalArgs ` : Return delayed items from file processing, merge later
304+ - [ ] ` DeadException ` : Return delayed items from file processing, merge later
305+ - [ ] ` DeadType.TypeDependencies ` : Return delayed items from file processing, merge later
306+ - [ ] ` forceDelayedItems ` operates on merged delayed items (pure function)
307+ - [ ] Delete global refs
308+
309+ ** Key insight** : Delayed items should be ** returned** from file processing, not accumulated in globals.
310+ This makes them per-file and enables incremental updates.
220311
221312** Test** : Process files in different orders - delayed items should be processed consistently.
222313
223314** Estimated effort** : Medium (3 modules, each similar to Task 3)
224315
225316### Task 7: Localize file/module tracking (P2 + P3)
226317
227- ** Value** : Removes last major global state. Makes cross-file analysis explicit .
318+ ** Value** : Per-file dependency tracking enables incremental dependency graph updates .
228319
229320** Changes** :
230- - [ ] ` FileReferences ` : Replace global ` table ` with explicit ` file_refs_state ` parameter
231- - [ ] ` DeadModules ` : Replace global ` table ` with explicit ` module_state ` parameter
232- - [ ] Thread both through analysis pipeline
233- - [ ] ` iterFilesFromRootsToLeaves ` : take explicit state, return ordered file list (pure)
321+ - [ ] ` FileReferences ` : Store per-file as ` file_deps : string list StringMap.t `
322+ - [ ] Create ` merge_file_refs ` for project-wide dependency graph
323+ - [ ] ` DeadModules ` : Track per-file module usage, merge for project-wide view
324+ - [ ] ` iterFilesFromRootsToLeaves ` : pure function on merged file refs, returns ordered list
325+
326+ ** Incremental benefit** : When file F changes, update ` file_deps[F] ` and re-merge graph.
234327
235328** Test** : Build file reference graph in isolation, verify topological ordering is correct.
236329
237330** Estimated effort** : Medium (cross-file logic, but well-contained)
238331
239- ### Task 8: Separate analysis from reporting (P5)
332+ ### Task 8: Separate analysis from reporting (P5) - Immutable Results
240333
241- ** Value** : Core analysis is now pure. Can get results as data. Can test without I/O .
334+ ** Value** : Solver returns immutable results. No mutation during analysis. Pure function .
242335
243336** Changes** :
244- - [ ] ` DeadCommon.reportDead ` : Return ` issue list ` instead of calling ` Log_.warning `
337+ - [ ] Create ` AnalysisResult.t ` type with ` dead_positions ` , ` issues ` , ` annotations_to_write `
338+ - [ ] ` solve_deadness ` : Return ` AnalysisResult.t ` instead of mutating state
339+ - [ ] Remove ` AnnotationState.annotate_dead ` call from ` resolveRecursiveRefs `
340+ - [ ] Dead positions are part of returned result, not mutated into input state
245341- [ ] ` Decl.report ` : Return ` issue ` instead of logging
246342- [ ] Remove all ` Log_.warning ` , ` Log_.item ` , ` EmitJson ` calls from ` Dead*.ml ` modules
247- - [ ] ` Reanalyze.runAnalysis ` : Call pure analysis, then separately report issues
343+ - [ ] ` Reanalyze.runAnalysis ` : Call pure solver, then separately report from result
344+
345+ ** Key principle** : The solver takes ** read-only** merged state and returns ** new immutable** results.
346+ No mutation of input state during analysis.
248347
249- ** Test** : Run analysis, capture result list, verify no I/O side effects occurred.
348+ ``` ocaml
349+ (* Before - WRONG *)
350+ let solve ~state =
351+ ... AnnotationState.annotate_dead state pos ... (* mutates input! *)
352+
353+ (* After - RIGHT *)
354+ let solve ~merged_state =
355+ let dead_positions = ... compute ... in
356+ { dead_positions; issues; annotations_to_write } (* return new data *)
357+ ```
358+
359+ ** Test** : Run analysis, capture result, verify input state unchanged.
250360
251361** Estimated effort** : Medium (many logging call sites, but mechanical)
252362
@@ -296,17 +406,23 @@ Each task should:
296406## Execution Strategy
297407
298408** Completed** : Task 1 ✅, Task 2 ✅, Task 10 ✅
409+ ** Partially complete** : Task 3 ⚠️ (state explicit but still mixes input/output)
299410
300- ** Remaining order** : 3 → 4 → 5 → 6 → 7 → 8 → 9 → 11 (test)
411+ ** Remaining order** : 4 → 5 → 6 → 7 → 8 → 9 → 11 (test)
301412
302413** Why this order?**
303414- Tasks 1-2 remove implicit dependencies (file context, config) - ✅ DONE
304- - Tasks 3-7 localize global state - can be done incrementally now that inputs are explicit
305- - Tasks 8-9 separate pure/impure - can only do this once state is local
415+ - Task 3 makes annotation tracking explicit - ⚠️ PARTIAL (needs input/output separation in Task 8)
416+ - Tasks 4-7 make state ** per-file** for incremental updates
417+ - Task 8 makes solver ** pure** with immutable results (also fixes Task 3's input/output mixing)
418+ - Task 9 separates annotation computation from file writing
306419- Task 10 verifies no global config reads remain - ✅ DONE
307- - Task 11 validates everything
420+ - Task 11 validates everything including incremental updates
308421
309- ** Alternative** : Could do 3-7 in any order (they're mostly independent).
422+ ** Key architectural milestones** :
423+ 1 . ** After Task 7** : All state is per-file, keyed by filename
424+ 2 . ** After Task 8** : Solver is pure, returns immutable results
425+ 3 . ** After Task 11** : Incremental updates verified working
310426
311427** Time estimate** :
312428- Best case (everything goes smoothly): 2-3 days
@@ -331,12 +447,20 @@ After all tasks:
331447✅ ** Pure analysis function**
332448- Can call analysis and get results as data
333449- No side effects (logging, file I/O) during analysis
450+ - ** Solver returns immutable results** - no mutation of input state
451+
452+ ✅ ** Per-file state enables incremental updates**
453+ - All per-file data (annotations, decls, refs) keyed by filename
454+ - Can replace one file's data: ` per_file_state[F] = new_data `
455+ - Re-merge and re-solve without reprocessing other files
334456
335- ✅ ** Incremental analysis possible**
336- - Can create empty state and analyze just one file
337- - Can update state with new file without reanalyzing everything
457+ ✅ ** Clear separation of input vs output**
458+ - Source annotations (from AST) are ** read-only input**
459+ - Analysis results (dead positions, issues) are ** immutable output**
460+ - Solver takes input, returns output - no mixing
338461
339462✅ ** Testable**
340463- Can test analysis without mocking I/O
341464- Can test with different configs without mutating globals
342465- Can test with isolated state
466+ - Can verify solver doesn't mutate its input
0 commit comments