Local-first CLI for agent and human document retrieval with provenance-grounded answers, local vector-store, and predictable machine-readable output.
- Optimized for agentic retrieval loops with fast multi-step questions and answers.
- Runs locally with a persistent Chroma-backed index.
- Ingests
.pdf,.docx,.txt, and.mdwith provenance metadata (doc_id,source,title). - Uses sentence-aware chunking for better retrieval quality.
- Supports deterministic
--jsonoutput for automation and agents. - Exposes stable CLI workflows for ingest, search, diagnostics, and inventory.
Use SKILL.md when you want an agent to drive docctl end-to-end.
The skill makes session for fast iterative retrieval.
Requirements:
- Python 3.12 or 3.13
pip
# 1) Install from PyPI
pip install docctl
# 2) Verify CLI
docctl --help
# 3) Ingest supported files
docctl ingest ./docs --recursive --approve-write --allow-model-download
# 4) Search indexed content
docctl search "security gateway diagnostics" --top-k 5 --allow-model-download
# 5) Show one chunk by id (replace with an id from search output)
docctl show <chunk_id_from_search> --allow-model-download| Command | Purpose |
|---|---|
docctl ingest <path> |
Ingest one supported file or a directory of supported files (mutates local index state). |
docctl export <archive_path> |
Export current index data to one .zip snapshot file. |
docctl import <archive_path> |
Import index data from one .zip snapshot file (mutating). |
docctl search <query> |
Search indexed content with optional metadata filters. |
docctl show <chunk_id> |
Show one indexed chunk by exact id. |
docctl stats |
Show index statistics. |
docctl catalog |
Show index summary and per-document inventory. |
docctl doctor |
Run local diagnostics for index and embedding setup. |
docctl session |
Run a read-only NDJSON request session on stdin/stdout. |
Use --json for deterministic machine-readable output:
docctl --json search "security gateway diagnostics" --top-k 5 --allow-model-downloadUse session for NDJSON request/response flows. For agents, this is the preferred fast path whenever one workflow needs two or more read operations:
cat <<'EOF' | docctl session --allow-model-download
{"id":"q1","op":"search","query":"security gateway diagnostics","top_k":5}
{"id":"q2","op":"catalog"}
EOFGlobal options:
--index-path(default:.docctl)--collection(default:default)--json(deterministic JSON payloads on stdout)--verbose(extra diagnostics)
Model downloads are explicit:
- Use
--allow-model-downloadwhen embedding artifacts are not already available.
Mutation boundaries:
ingestandimportare mutating.search,show,stats,catalog,doctor, andsessionare read-only.exportis read-only.
Run core quality checks:
make lint
make format-check
make typecheck
make security-lint
make import-lint
make test
make test-cov
make check-markdown-linksApply formatting fixes:
make formatBuild release artifacts locally:
make build-dist
make check-dist
make release-dry-run- ARCHITECTURE.md
- docs/design-docs/index.md
- docs/product-specs/index.md
- docs/references/index.md
- SECURITY.md (canonical vulnerability disclosure policy)
- docs/RELIABILITY.md
- docs/SECURITY.md (internal implementation security guardrails)
- docs/PLANS.md
For implementation and validation workflow, start with:
- AGENTS.md
- ARCHITECTURE.md
- The indexed docs under
docs/listed above.