We should be able to install in a repo a single PR bot as we do now.
But activate multiple versions.
Then when the bot is first run on a review a version is randomly selected.
Re reviews would use same version.
Laminar traces should track which version was used.
This will allow us to quickly compare similar versions of the PR review bot in similar environments getting an accurate measure of
suggestion_accuracy = ai_suggestions_reflected / ai_suggestions
We should be able to install in a repo a single PR bot as we do now.
But activate multiple versions.
Then when the bot is first run on a review a version is randomly selected.
Re reviews would use same version.
Laminar traces should track which version was used.
This will allow us to quickly compare similar versions of the PR review bot in similar environments getting an accurate measure of
suggestion_accuracy = ai_suggestions_reflected / ai_suggestions