Skip to content

feat: extract @libscope/parsers — standalone format conversion package #490

@RobertLD

Description

@RobertLD

Summary

Part of #488. Extract src/core/parsers/ into a standalone @libscope/parsers package. Parsers have zero upward imports to the rest of libscope — this is the lowest-risk extraction in the split.

Problem / Motivation

Format parsers (PDF, DOCX, EPUB, PPTX, CSV, JSON, YAML, HTML → text/markdown) are useful independently of semantic search. Today they're buried inside @libscope/core and carry all of core's dependencies along for the ride. Extracting them allows projects that only need format conversion to take a minimal dependency.

Proposed Solution

Move src/core/parsers/ into packages/parsers/src/ with its own package.json. Each parser's format-specific library (pdf-parse, mammoth, epub2, etc.) stays as an optionalDependency so consumers only install what they need.

Acceptance Criteria

  • @libscope/parsers builds independently with npm run build — zero imports from @libscope/core or any other @libscope/* package
  • All existing parser tests pass when run from the @libscope/parsers package directory
  • Each format-specific library (pdf-parse, mammoth, csv-parse, js-yaml, epub2, node-html-markdown, etc.) is declared as an optionalDependency
  • @libscope/core depends on @libscope/parsers and all parser imports in core are updated to the new package path
  • Existing libscope CLI behaviour is unchanged

Out of Scope

  • Adding new parsers or changing parser output format
  • Removing the gzip auto-detection logic in pack file I/O (that lives in core/packs.ts, not parsers)

Technical Notes

  • src/core/parsers/ currently has zero upward imports — no imports from indexing, search, DB, or providers. Clean extraction.
  • The parsers directory contains: markdown.ts, pdf.ts, docx.ts, xlsx.ts, csv.ts, json.ts, yaml.ts, html.ts, epub.ts, pptx.ts
  • All parsers share src/logger.ts and src/errors.ts — these will need to be re-exported from @libscope/core or duplicated into a shared internal package

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestrefactorCode refactoring

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions