Hi authors, thank you for this amazing piece of work! Do you have a rough cost estimate (e.g. API costs) for running all the evals on this benchmark?