This repository contains the pipeline for the DualGraph project, including dataset building, RAG indexing and querying, and evaluation.
- databuilder/ - Tools for building SpecsQA dataset from raw, scraped HTML files
- dualgraphrag/ - Main DualGraph RAG system including indexing and querying functionality
- evaluation/ - Evaluation tools and metrics
- questions/ - Question files used for querying the system
See individual directory READMEs for detailed usage instructions.
- DualGraph_scraping: Code used to scrape the SpecsQA dataset - https://github.com/SamsungLabs/DualGraph_scraping
- DualGraph_dataset: Raw scraped data for SpecsQA dataset - https://github.com/SamsungLabs/DualGraph_dataset
This project is licensed under the terms of the MIT license. See LICENSE for details.
Note: Portions of the code in this directory are adapted from Nano Graph RAG. Their original license is included and can be found in LICENSE_nano-graphrag.
© 2026 Samsung Labs. All rights reserved.