Med-RwR is the first Multimodal Medical Reasoning-with-Retrieval framework, which proactively retrieves external knowledge by querying observed symptoms or domain-specific medical concepts during reasoning. This approach encourages the model to ground its diagnostic analysis in verifiable external information retrieved after analyzing both visual and textual inputs.
- [2025/11] Demo codes released.
- [2025/11] The model is released on HuggingFace.
- [2025/10] The paper is available on arXiv.
The required dependencies are as follows.
conda create -n medrwr python==3.10
conda activate medrwr
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txtOur code is implemented based on SWIFT. Please setup the required version (v3.3.0) with the following commands.
git clone https://github.com/xmed-lab/Med-RwR.git
pip install -e .The knowledge base is available at HuggingFace. Please download from the link and put inside retrieve/knowledge_base.
Start the retriever, where we employ BGE-M3 as an example:
python retrieve/retrieve.pyRun the demo code:
python demo.pyUpdate the question and image path in the code with your own values before running it.
We refer to the codes from SWIFT, R1-Searcher, ZeroSearch. Thank the authors for their contribution to the community.
If you find this project useful, please consider citing:
@article{wang2025proactive,
title={Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models},
author={Wang, Lehan and Qin, Yi and Yang, Honglong and Li, Xiaomeng},
journal={arXiv preprint arXiv:2510.18303},
year={2025}
}
