Skip to content

Feature Request: Layered RAG Query Strategy for Wiki Page Generation #447

@jorschac

Description

@jorschac

Is your feature request related to a problem?

In our team's practical scenarios, the current single-pass RAG query approach has the following limitations:

  1. Single generic query construction: Uses a one-size-fits-all query pattern like "Generate comprehensive wiki page content for {title} focusing on files: {files}", which lacks specificity and focus.

  2. Low information density: The generic query often results in overly broad retrieval results that dilute relevant information with less pertinent content.

  3. Underutilization of wiki structure metadata: The current approach fails to leverage rich structural information available in the wiki schema, including:

    • Page importance levels (high/medium/low)
    • Section hierarchies and relationships
    • Related pages and dependencies
    • Parent-child page structures

These limitations result in suboptimal context quality for LLM-based page generation, leading to less accurate and less comprehensive wiki content.

Describe the solution you'd like

Based on repetitive tests and validations, we Implemented a Layered RAG Query Strategy that performs multiple focused queries across three strategic dimensions, then merges and deduplicates the results to provide high-quality context for page generation.

Solution Architecture

graph TD
Start[Wiki Page Generation Request] --> Check{Check Page<br/>Importance}
Check --> Layer1[Layer 1: Core Content Query]
Layer1 --> L1Query["Query: Implementation details<br/>and core functionality<br/>in specified files"]
L1Query --> Layer2[Layer 2: Architecture Query]
Layer2 --> Importance{Page<br/>Importance?}
Importance -->|High| L2High["Query: Architecture integration,<br/>dependencies, and<br/>system interactions"]
Importance -->|Medium| L2Med["Query: Integration patterns<br/>and dependencies"]
Importance -->|Low| Skip[Skip Layer 2]
L2High --> Layer3[Layer 3: Context Query]
L2Med --> Layer3
Skip --> Layer3
Layer3 --> L3Query["Query: Context from<br/>- Section relationships<br/>- Related pages<br/>- Parent/child structure"]
L3Query --> Merge[Merge All Retrieved Documents]
Merge --> Dedup[Deduplicate by Content Hash]
Dedup --> Rank[Rank by Document Length]
Rank --> TopN[Select Top 30 Documents]
TopN --> Generate[Generate Wiki Page with<br/>High-Quality Context]
Generate --> Output[Output: Comprehensive<br/>Wiki Page Content]
style Layer1 fill:#e1f5e1
style Layer2 fill:#fff4e1
style Layer3 fill:#e1f0ff
style Merge fill:#ffe1f0
style Output fill:#f0e1ff
Loading
  1. Layer 1 - Core Content Query: Focuses on implementation details and core functionality within the page's specified file paths

  2. Layer 2 - Architecture Query: Adaptive query based on page importance:

    • High importance: Deep dive into architecture integration, dependencies, and system interactions
    • Medium importance: Focus on integration patterns and dependencies
    • Low importance: Skip to reduce overhead
  3. Layer 3 - Context Query: Leverages wiki structure metadata:

    • Section relationships and hierarchies
    • Related pages connections
    • Parent-child page structures
  4. Post-Processing Pipeline:

    • Merge documents from all query layers
    • Deduplicate using content-based hashing (first 200 characters)
    • Rank by document length (longer documents typically contain more information)
    • Select top 30 most relevant documents

Expected Benefits

Page Content Accuracy has been significantly improved in our AI Coding Workflows

  • Higher information density: Focused queries retrieve more relevant content
  • Adaptive resource usage: Query depth scales with page importance
  • Better context quality: Multiple perspectives provide comprehensive coverage
  • Improved wiki quality: LLM generates more accurate content with better context

Describe alternatives you've considered

no

Additional context

Performance impact: Increases query count from 1 to 2-3 per page (depending on importance), but significantly improves output quality, making the trade-off worthwhile for high-quality documentation generation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions