GitHub - CGCL-codes/FlowRAG: FlowRAG: Continual Learning for Retrieval-Augmented Generation

FlowRAG: Continual Learning for Retrieval-Augmented Generation

Overview

FlowRAG is a lightweight continual learning framework for dynamic retriever adaptation in Retrieval-Augmented Generation (RAG).
It targets real-world RAG settings where document corpora evolve over time, requiring the retriever to adapt to new domains while mitigating catastrophic forgetting on previously learned knowledge.

Key Features

🔄 Continual Learning: Seamlessly adapt to new domains without forgetting previous knowledge
🎯 FusionPrompt: Novel prompt-tuning approach for knowledge fusion
🔧 Flexible Architecture: Modular design supporting different retrievers and generators

Highlights

Parameter-efficient: updates only ~0.64% of retriever parameters.
Robust continual learning: strong retrieval & QA performance with improved resistance to catastrophic forgetting in non-stationary settings.
Validated across domains: evaluated on four QA domains/datasets (e.g., CovidQA / NewsQA / ConvQA / NQ) under sequential training.

Installation

Prerequisites

Python 3.11+
CUDA 12.1+ (for GPU support)
Git

git clone https://github.com/intellistream/FlowRAG.git
cd FlowRAG
conda env create -f environment.yml && conda activate flowrag
pip install -e .

Quick Start

1. Download Datasets from Google Drive and extract to cl_datasets/.

2. Build Index:

python -m src.retrieval.index --dataset nq --retriever contriever

3. Run FlowRAG:

python run.py --config configs/online_flowrag.yaml
# Or: python run.py --datasets nq covidqa convqa --cl_method fp --retriever contriever

Models

Component	Model	HuggingFace Path
Retriever	Contriever	facebook/contriever
Generator	Qwen2.5-7B-Instruct	Qwen/Qwen2.5-7B-Instruct
Generator	Llama3-ChatQA-2-8B	nvidia/Llama3-ChatQA-2-8B

Experiments

# Offline (single dataset)
python run.py --datasets nq --cl_method offline --retriever contriever

# Online Baselines (replug/emdr/fid)
python run.py --datasets nq covidqa convqa newnewsqa --cl_method replug --retriever contriever

# FlowRAG (ours)
python run.py --datasets nq covidqa convqa newnewsqa --cl_method fp --retriever contriever \
    --prompt_len 150 --prompt_layer 7 --use_ilf --use_cef

See configs/ for detailed configuration options.

Evaluation

python -m src.evaluation.evaluator --exp_name flowrag_exp1 --num_tasks 4

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
configs		configs
playground		playground
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowRAG: Continual Learning for Retrieval-Augmented Generation

Overview

Key Features

Highlights

Installation

Prerequisites

Quick Start

Models

Experiments

Evaluation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

CGCL-codes/FlowRAG

Folders and files

Latest commit

History

Repository files navigation

FlowRAG: Continual Learning for Retrieval-Augmented Generation

Overview

Key Features

Highlights

Installation

Prerequisites

Quick Start

Models

Experiments

Evaluation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages