Skip to content

Prompt API: consider local use cases for RAG and Agentic RAG #8

@zolkis

Description

@zolkis

[Refers to / transferred from the Prompt API ]

As a possible complex local application to prompting, Retrieval-Augmented Generation (RAG) enhances AI systems by dynamically retrieving external (to the model) information to improve response accuracy and relevance. Agentic RAG extends these capabilities with autonomous decision-making and iterative refinement, particularly valuable in local deployments for enhanced privacy and customization.

Some of the main RAG use cases include:

  • Enhanced search & question answering: RAG improves search engines with up-to-date featured snippets and powers domain-specific QA systems that combine proprietary data (e.g., medical literature or legal documents) with foundational LLM knowledge.
  • Healthcare: Provides clinicians with real-time access to medical guidelines and research.
  • Legal: Accelerates case law research and compliance checks.
  • E-commerce: Delivers personalized product recommendations using customer behavior data.
  • Content generation: generates context-aware summaries, reports, and marketing copy by retrieving relevant source materials.
  • Enterprise knowledge management: enables chatbots to answer internal queries about company policies, manufacturing protocols, or regulatory updates.

Agentic RAG introduces autonomous decision-making layers to traditional RAG workflows (by using the LLMs not only for generating answers, but also preparing the prompts), enabling:

  • Adaptive problem solving: customer support agents offering discounts proactively.
  • Continuous learning: medical diagnosis systems updating with new research.
  • Multi-step reasoning: legal tools cross-referencing precedents and statutes.
  • Local deployment advantages: privacy-focused healthcare data analysis.

Traditional RAG relies on cloud-based models/vector DBs, where limited customization due to API constraints.

Agentic RAG in a local deployment have advantages like:

  • Data privacy: runs fully offline via tools like Langchain/Qdrant.
  • Customization: supports domain-specific models (e.g., Gemma-3 for multilingual tasks).
  • Cost Efficiency: eliminates API fees using local LLMs.
  • Transparency: provides verifiable source citations from internal databases.

A local Agentic RAG pipeline (e.g. RAGapp deployed via Docker) typically includes:

# Simplified architecture
local_llm = Ollama(model="llama3")  # Local model
vector_db = Qdrant(embeddings=FastEmbed())  # On-premise storage
agent = LangchainAgent(
    tools=[KnowledgeBase(retriever=vector_db)],
    system_prompt="Generate markdown responses with citations"
)

This approach is particularly impactful in regulated industries like healthcare and finance, where data sovereignty and low-latency responses are critical.

In the CG, please consider/discuss/decide if any of these use cases would make it worth developing support as a new type of standardized assistance API, or as example application of prompting (if apps are a better place to manage RAG).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions