You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 23, 2026. It is now read-only.
Preprocess already scraped LLVM Bugzilla data in JSON into document objects (content + metadata) for Pinecone upsert. No scraping or Bugzilla API calls.
In scope: Parse/validate JSON; normalize text (strip HTML); define document schema; one doc per bug or per comment as agreed. Out of scope: Fetching from Bugzilla; calling Pinecone.
Result
Library or CLI: JSON input (file/stream) → list of { content, metadata }. Config for field mapping/truncation. Code, tests, and doc schema README.
Summary
Preprocess already scraped LLVM Bugzilla data in JSON into document objects (content + metadata) for Pinecone upsert. No scraping or Bugzilla API calls.
Scope
content(e.g. summary + description, optional comments) andmetadata(e.g.bug_id,product,component,status,priority,reporter,created_at,url).Result
Library or CLI: JSON input (file/stream) → list of
{ content, metadata }. Config for field mapping/truncation. Code, tests, and doc schema README.Acceptance criteria
content+metadata.