Generic Support for Meta's Agent Research Environment by cemde · Pull Request #55 · parameterlab/MASEval

cemde · 2026-03-28T07:13:28Z

Description

Coming

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Code quality improvement (refactoring, formatting, etc.)

Checklist

Contribution

I have read the CONTRIBUTING.md guide.
Commits follow "How to write a good git commit message"

Documentation

Added/updated docstrings for new/modified functions as instructed CONTRIBUTING.md
Updated relevant documentation in docs/ (if applicable)
Tag github issue with this PR (if applicable)

Changelog

Added entry to CHANGELOG.md under [Unreleased] section
- Use Added section for new features
- Use Changed section for modifications to existing functionality
- Use Fixed section for bug fixes
- Use Removed section for deprecated/removed features
OR this is a documentation-only change (no changelog needed)

Example:
- Support for multi-agent tracing (PR:#123)

Architecture (if applicable)

Core/Interface separation: Changes in maseval/core/ do NOT import from maseval/interface/
Dependencies: New core dependencies added sparingly; framework integrations go to optional dependencies

Additional Notes

6-task plan covering: environments package, AREToolWrapper, AREEnvironment core + shorthand path, pyproject.toml extra, and integration tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-28T07:20:00Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
maseval/benchmark/gaia2
__init__.py
environment.py
gaia2.py
tool_wrapper.py
maseval/interface
__init__.py
maseval/interface/environments
__init__.py					13-14
are.py					21-22, 137, 239-240, 371, 381, 394-395, 407-408, 425
are_tool_wrapper.py					67, 98
Project Total

_{This report was generated by python-coverage-comment-action}

Record simulation_time_before, simulation_time_after, and simulation_time_elapsed in invocation meta dict, matching Gaia2GenericTool behavior. Gracefully returns None when simulation time is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Delegate AREToolWrapper metadata extraction to ARE's AppToolAdapter (canonical source of truth) instead of reading attributes directly. Remove getattr fallbacks in _extract_schema so missing arg_type or has_default attributes raise immediately, surfacing ARE API changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the outer try/except Exception block that silently returned ([], [], False) on any error. Exceptions now propagate so the benchmark runner can classify them via fail_on_task_error / fail_on_setup_error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Existing standalone tool wrapper tests need the same autouse fixture that mocks AppToolAdapter, since AREToolWrapper now delegates to it for metadata extraction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gaia2Environment now inherits tool wrapping, notification polling, lifecycle control, and cleanup from AREEnvironment. Only setup_state (preprocess_scenario + judge) and GAIA2-specific gather_traces/config remain as overrides. Gaia2GenericTool is now an alias for AREToolWrapper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main branch renamed task_data to environment_data throughout. Updated AREEnvironment, Gaia2Environment, and all tests to use the new parameter name. Regenerated uv.lock. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove unused variable in shorthand path test - Add ty: ignore[unknown-argument] for ARE Scenario constructor (optional dependency not resolvable by type checker) - Apply ruff formatting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

21 integration tests that exercise AREEnvironment against real ARE apps and scenarios — no mocks. Covers: - Lifecycle (scenario path, shorthand path, start/stop, pause/resume) - Tool wrapping (metadata, calling, error tracing, history) - AUI tool filtering - Oracle mode with real ARE simulation - Tracing and config with real tool calls - Simulation time advancement via wait_for_notification - Convenience accessors returning real ARE objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cemde and others added 9 commits March 27, 2026 22:30

design specifications

1a2b1f5

Add AREEnvironment implementation plan

96e03d1

6-task plan covering: environments package, AREToolWrapper, AREEnvironment core + shorthand path, pyproject.toml extra, and integration tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add maseval/interface/environments/ package

55f36a6

feat: add AREToolWrapper with tracing and metadata

15d1082

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add AREEnvironment with scenario path and lifecycle control

ee1ee4c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add shorthand construction path tests for AREEnvironment

0a0222a

feat: add 'are' optional dependency extra

a9b6dab

test: add ARE integration smoke tests

c239b03

added docs

fb8f8ef

cemde and others added 12 commits March 28, 2026 08:54

are issues identifies

5915415

gaia2 simplifaction and issue fixing plan

dcf7139

fix(are): remove hasattr fallbacks from oracle mode

0304e40

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(are): add AUI tool filtering and get_turn_notifications

0c95bdb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic Support for Meta's Agent Research Environment#55

Generic Support for Meta's Agent Research Environment#55
cemde wants to merge 21 commits intomainfrom
feature/are-support

cemde commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cemde commented Mar 28, 2026

Description

Type of Change

Checklist

Contribution

Documentation

Changelog

Architecture (if applicable)

Additional Notes

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 28, 2026 •

edited

Loading