Batch PDF/DOCX/HTML ingestion pipeline — chunks documents with Docling, enriches with Gemini (summaries, keywords, hypo-QA), and stores embeddings in NeonDB/pgvector with token usage tracking.
-
Updated
Apr 6, 2026 - Python
Batch PDF/DOCX/HTML ingestion pipeline — chunks documents with Docling, enriches with Gemini (summaries, keywords, hypo-QA), and stores embeddings in NeonDB/pgvector with token usage tracking.
Add a description, image, and links to the structure-aware-chunking topic page so that developers can more easily learn about it.
To associate your repository with the structure-aware-chunking topic, visit your repo's landing page and select "manage topics."