Skip to content

refactor: extract caching and discovery from parser.py #924

@microsasa

Description

@microsasa

Problem

parser.py (1,178 lines) has 5 distinct responsibilities mixed together:

  1. Event parsing
  2. Session discovery (filesystem scanning, plan.md probing)
  3. Multi-layer caching (3 cache types, 5 mutable module globals, complex invalidation)
  4. Summary building (first pass, resume detection, active/completed paths)
  5. Orchestration (get_all_sessions — 179 lines doing 7 things)

The caching and discovery concerns are self-contained and separable.

Proposed Extraction

cache.py (~150 lines)

  • Cache dataclasses: _DiscoveryCache, _CachedSession, _CachedEvents, _SortedSessionsCache
  • Module-level state: _SESSION_CACHE, _EVENTS_CACHE, _DISCOVERY_CACHE, _sorted_sessions_cache, _config_file_id
  • Insert/evict/fingerprint helpers
  • get_cached_events()

discovery.py (~250 lines)

  • _full_scandir_discovery() — filesystem scanning
  • _discover_with_identity() — identity tracking + plan.md probing
  • discover_sessions() — public API wrapper

parser.py stays as public façade (~700 lines)

  • Event parsing, config reading, summary building pipeline
  • build_session_summary() and get_all_sessions() re-exported
  • Public API unchanged — no import changes for consumers

Risk

get_all_sessions() orchestrates caching + discovery + parsing. It will need to call into both extracted modules — that's the trickiest seam.

Testing

All existing tests must pass unchanged. No public API changes.

Note

Do not attempt via pipeline — this is a structural refactor that requires manual coordination.

Metadata

Metadata

Assignees

No one assigned

    Labels

    awCreated by agentic workflowaw-dispatchedIssue has been dispatched to implementercode-healthCode cleanup and maintenance

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions