Evaluating the ability of LLM Agents to conduct research: the core focus is — what exactly are the gaps between AI Agents and real human researchers?
- AARRI-bench(Act As a Real Research Intern)(ongoing)
- AARRA-bench(Act As a Real Research Assistant)(to be continued)
- AARRS-bench(Act As a Real Research Scientist)(to be continued)