Thank you very much for your open-source work!
I found that UI-R1 only supports single image and single round dialogue training. If I want to support multi image and multi round chat training, how do I modify the code?
My multi round chat data sequence is roughly as follows:
[
user_text1
user_image1
assistant_response1
user_image2
assistant_response2
user_image3
assistant_response3
user_image4
...
]