You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 23, 2026. It is now read-only.
Preprocess already scraped LLVM Phabricator data in markdown (MD) into document objects (content + metadata) for Pinecone upsert. No scraping or Phabricator API calls.
Scope
Input: Scraped Phabricator MD (existing scrape structure). Output: Documents with content (e.g. title + description) and metadata (e.g. object_id, type, author, status, created_at, project, url).
In scope: Parse/validate MD (files, front matter if any); normalize text; define document schema; one doc per revision/task or chunk as agreed. Out of scope: Fetching from Phabricator; calling Pinecone.
Result
Library or CLI: MD input (file/dir) → list of { content, metadata }. Config for field mapping/truncation. Code, tests, and doc schema README.
Summary
Preprocess already scraped LLVM Phabricator data in markdown (MD) into document objects (content + metadata) for Pinecone upsert. No scraping or Phabricator API calls.
Scope
content(e.g. title + description) andmetadata(e.g.object_id,type,author,status,created_at,project,url).Result
Library or CLI: MD input (file/dir) → list of
{ content, metadata }. Config for field mapping/truncation. Code, tests, and doc schema README.Acceptance criteria
content+metadata.