plaude-api — Plaud → Vault ingestion pipeline

Continuously pulls Plaud voice recordings into the Obsidian vault, classifies them by topic, enriches with entity backlinks + action-item rollups + case-study candidates, and falls back to LLM classification when keyword rules miss. Runs every 30 minutes via systemd user timer.

Pipeline stages

[plaud cloud]
      │
      │ every 30 min (systemd-user timer)
      ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ L1  pull-to-vault.sh         discover + fetch transcript+summary             │
│        → writes 999 Inbox/Transcripts/_raw/<date>--<slug>--<short-id>.md     │
│        → idempotent: durable .pulled-ids ledger (discovery window --days 90) │
├──────────────────────────────────────────────────────────────────────────────┤
│ L2  screen-and-route.py      keyword classifier (4 buckets + fallback)       │
│        → moves to Personal-Health / Casual / Learning / Business             │
│        → unmatched → _unclassified/                                          │
│        → patches frontmatter: category, sync_blocked, routed_at              │
├──────────────────────────────────────────────────────────────────────────────┤
│ LLM llm-reclassify.py        Haiku 4.5 classifier for _unclassified/         │
│        → JSON-mode output, prompt-cached system rules                        │
│        → noops silently if ANTHROPIC_API_KEY unset                           │
│        → patches frontmatter: llm_reclassified, llm_confidence, llm_reason   │
├──────────────────────────────────────────────────────────────────────────────┤
│ L3  enrich-routed.py         entity backlinks + action items + VDC queue     │
│        → scans AI summary + transcript head for entity-name mentions         │
│        → appends `## Recent Mentions` lines to 300 Entities/<entity>.md      │
│        → (Business+Personal-Health) parses Plan/Next-Steps → 200 Notes/     │
│          _action-items/<date>.md                                             │
│        → (Business) if delivery-signal language + Company hit, appends to   │
│          200 Notes/Value-Delivered/_candidates.md                            │
│        → flips frontmatter: ingestion_status routed → enriched               │
└──────────────────────────────────────────────────────────────────────────────┘

Filesystem layout

~/apps/plaude-api/
├── pull-to-vault.sh                # L1
├── screen-and-route.py             # L2
├── llm-reclassify.py               # LLM fallback
├── enrich-routed.py                # L3
└── README.md

~/.config/systemd/user/
├── plaud-sync.service              # oneshot, 4 ExecStarts (L1 → L2 → LLM → L3)
└── plaud-sync.timer                # OnCalendar=*:0/30, Persistent=true

~/.config/plaud-sync/
├── env                             # ANTHROPIC_API_KEY=... (mode 600)
└── env.example

~/vault/999 Inbox/Transcripts/
├── _raw/                           # L1 staging — L2 drains this
├── _unclassified/                  # L2 fallback — LLM drains this
├── Personal-Health/                # [sync_blocked: true]
├── Casual/                         # [sync_blocked: true]
├── Learning/
├── Business/
└── .ingestion.log                  # append-only audit log

~/vault/200 Notes/_action-items/<date>.md            # L3 output (Business + Personal-Health)
~/vault/200 Notes/Value-Delivered/_candidates.md     # L3 output (Business + delivery signal)
~/vault/300 Entities/{People,Companies,Projects}/    # L3 appends `## Recent Mentions`

Operations

# Trigger manual full-cycle run
systemctl --user start plaud-sync.service

# Watch live progress
journalctl --user -u plaud-sync.service -f

# Tail the ingestion log
tail -f "$HOME/vault/999 Inbox/Transcripts/.ingestion.log"

# Timer status
systemctl --user list-timers plaud-sync.timer

# Pull a specific recording ad-hoc (bypasses timer + L2 + LLM + L3)
./pull-to-vault.sh <file_id>

# Re-classify _unclassified/ via LLM only (will no-op without API key)
./llm-reclassify.py --dry-run        # preview
./llm-reclassify.py                  # apply
./llm-reclassify.py --max 5          # cap calls

# Re-enrich (entity scan / action items / VDC) without re-pulling
./enrich-routed.py --dry-run
./enrich-routed.py

L2 keyword classifier rules (priority order)

Category	sync_blocked	Trigger keywords (case-insensitive whole-word)
`Personal-Health`	✓	clinical, doctor, medical, ultrasound, appointment, visit, patient, diagnosis, prescription, OB, obstetric, fetal, pregnancy, gynec
`Casual`	✓	casual chat, gathering, family, personal, life plan, date night, hangout, catch-up
`Learning`	–	lecture, course, training, tutorial, webinar, workshop, class, seminar
`Business`	–	strategy, project, meeting, scoping, consultation, discovery, business, client, proposal, onboarding, kickoff, standup, review, integration, deployment, automation, pipeline, product demo
`_unclassified`	–	(fallback, picked up by LLM next stage)

Edit screen-and-route.py RULES to tune. Order matters — first match wins. sync_blocked: true is a frontmatter flag for downstream vault-sync logic to honor — no enforcement here, semantic only.

LLM classifier (Haiku 4.5)

Only runs on files in _unclassified/. Never re-classifies already-routed files.
Uses prompt caching on the system prompt (rules + few-shot examples) — per-call delta is just title + 1500-char summary excerpt.
Returns JSON: {"category": "...", "confidence": 0.0–1.0, "reason": "..."} — provenance is preserved in transcript frontmatter (llm_model, llm_confidence, llm_reason, llm_reclassified_at).
Cost: ~$0.0003/file at current Haiku 4.5 pricing once cache is warm.
Silently no-ops if ANTHROPIC_API_KEY is unset — does not block the rest of the pipeline.
Key is loaded by systemd via EnvironmentFile=-/home/jgatlit/.config/plaud-sync/env (the - makes the file optional).

To rotate the key: edit ~/.config/plaud-sync/env. To disable LLM stage entirely: comment out the ExecStart=…llm-reclassify.py line in plaud-sync.service and daemon-reload.

L3 enrichment

Entity-mention backlinks — scans the AI summary + first 8KB of transcript for any vault entity name from 300 Entities/{People,Companies,Projects}/. Filename stem is the canonical wikilink target. Names ≤3 chars are skipped (too noisy). For each hit, appends a dated line under ## Recent Mentions in the entity file. Idempotent — duplicate lines aren't written.

Action-item extraction — for Business + Personal-Health categories only. Parses bullets under any heading matching Plan / Next Steps / Action Items / To-Do in the AI summary. Strips [ ] / [x] prefixes, filters sub-section labels (lines ending with :), filters @Person assignment headers, filters placeholder "insert more" lines. Writes to 200 Notes/_action-items/<recorded-date>.md under a ## From [[transcript-stem]] block — one block per source, idempotent.

VDC candidate hook — for Business category only. If the transcript mentions a Company entity AND the summary contains delivery-signal language (closed, shipped, delivered, wrapped, live, launched, signed, deployed, in production, went live), appends a candidate row to 200 Notes/Value-Delivered/_candidates.md for operator triage. Composes with the /case-study-extract skill — operator picks promising candidates and runs /case-study-extract <client-slug> <delivery-event-slug> to author the canonical case study.

After enrichment, transcript frontmatter is updated with entity_mentions: [list], action_items_rollup: [[date]], vdc_candidate: true, and ingestion_status: enriched. Files with enriched status are skipped on subsequent runs.

Authentication

Plaud CLI — @plaud-ai/cli reads tokens from ~/.plaud/tokens.json. When the token expires, run plaud login interactively (uses port 8199 — see PORT_REALLOCATION_2026-05-23.md in ~/projects/project-tracker/).

Plaud MCP (separate from CLI auth) — for ad-hoc use from Claude Code, the MCP server has its own token cache. Call mcp__plaud__login once after install.

Anthropic — ~/.config/plaud-sync/env mode 600. Currently sourced from the same key already in use by project-tracker. Rotating the project-tracker key requires updating both locations.

Idempotency

Every stage is safe to re-run:

L1 dedupes via the durable ledger 999 Inbox/Transcripts/.pulled-ids (full file_id per line, appended only on successful pull), OR'd with a legacy inbox-tree filename scan. The ledger survives files being moved/renamed downstream out of the inbox — which is why the discovery window can safely be wide (--days 90, for late-transcription resilience) without re-pulling already-filed recordings. Seed/re-seed from existing vault frontmatter: grep -rhoE '^plaud_file_id:[[:space:]]*[0-9a-f]{32}' ~/vault | awk '{print $2}' | sort -u >> "999 Inbox/Transcripts/.pulled-ids". Dry-run a window with DRY_RUN=1 ./pull-to-vault.sh --days N. Backfill specific old recordings by explicit id: ./pull-to-vault.sh <file_id> ...
L2 only processes files in _raw/ — once moved to a bucket, never touched again
LLM only processes files in _unclassified/
L3 skips files with ingestion_status: enriched
Entity backlinks, action-item blocks, and VDC candidate rows all check for existing content before appending

What's NOT included (deliberately deferred)

PII redaction enforcement — sync_blocked: true is a semantic flag, not enforcement. Any future vault-sync (to remote, GitHub, etc.) must honor it. The vault is operator-trusted local-only at present.
Pluggable provider for LLM — currently Anthropic-direct. Could swap to AI Gateway for failover/cost-tracking. Not done because direct Haiku 4.5 is already <$0.001/file.
Webhook-driven ingestion — Plaud's webhooks are on the B2B Transcription API, not the MCP/CLI surface. Would require a separate API key + provisioning. Polling at 30 min is fine for human-cadence recording.
Auto-promotion of VDC candidates to /case-study-extract — operator-curated only by design (see case-study-extract skill: "Operator-curated, never auto-fired"). The queue is a triage prompt, not an autopilot.
Cross-recording entity disambiguation — if "Maria" appears in two different recordings referring to two different people in the vault, both get the same backlink. Acceptable false-positive rate for v1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plaude-api — Plaud → Vault ingestion pipeline

Pipeline stages

Filesystem layout

Operations

L2 keyword classifier rules (priority order)

LLM classifier (Haiku 4.5)

L3 enrichment

Authentication

Idempotency

What's NOT included (deliberately deferred)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
enrich-routed.py		enrich-routed.py
llm-reclassify.py		llm-reclassify.py
pull-to-vault.sh		pull-to-vault.sh
screen-and-route.py		screen-and-route.py

Folders and files

Latest commit

History

Repository files navigation

plaude-api — Plaud → Vault ingestion pipeline

Pipeline stages

Filesystem layout

Operations

L2 keyword classifier rules (priority order)

LLM classifier (Haiku 4.5)

L3 enrichment

Authentication

Idempotency

What's NOT included (deliberately deferred)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages