Skip to content

[feat][evaluation] Coze Coding Evaluation Target Support#461

Open
HearyShen wants to merge 15 commits intomainfrom
feat/coze_coding
Open

[feat][evaluation] Coze Coding Evaluation Target Support#461
HearyShen wants to merge 15 commits intomainfrom
feat/coze_coding

Conversation

@HearyShen
Copy link
Collaborator

What type of PR is this?

Check the PR title

  • This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Add documentation if the current PR requires user awareness at the usage level.
  • This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

@HearyShen HearyShen changed the title get all target fields for AgentEvaluator EvaluateTargetOutputFields [feat][evaluation] Coze Coding Evaluation Support Mar 17, 2026
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 92.06349% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...modules/evaluation/application/eval_openapi_app.go 90.47% 5 Missing and 1 partial ⚠️
...api/handler/coze/loop/apis/eval_open_apiservice.go 0.00% 2 Missing ⚠️
...odules/evaluation/domain/service/evaluator_impl.go 95.83% 1 Missing and 1 partial ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #461      +/-   ##
==========================================
+ Coverage   74.45%   74.52%   +0.07%     
==========================================
  Files         629      629              
  Lines       66337    66438     +101     
==========================================
+ Hits        49389    49511     +122     
+ Misses      13663    13644      -19     
+ Partials     3285     3283       -2     
Flag Coverage Δ
unittests 74.52% <92.06%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...luation/application/convertor/evaluator/openapi.go 90.58% <100.00%> (+0.04%) ⬆️
...uation/application/convertor/experiment/openapi.go 84.22% <100.00%> (+0.04%) ⬆️
...ules/evaluation/domain/service/expt_export_impl.go 75.34% <100.00%> (ø)
...aluation/domain/service/expt_run_item_turn_impl.go 87.83% <100.00%> (ø)
.../evaluation/infra/repo/evaluator/evaluator_impl.go 81.51% <100.00%> (+0.18%) ⬆️
...api/handler/coze/loop/apis/eval_open_apiservice.go 0.00% <0.00%> (ø)
...odules/evaluation/domain/service/evaluator_impl.go 83.90% <95.83%> (+4.65%) ⬆️
...modules/evaluation/application/eval_openapi_app.go 92.43% <90.47%> (+0.21%) ⬆️

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77fb395...69cb5de. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@HearyShen HearyShen changed the title [feat][evaluation] Coze Coding Evaluation Support [feat][evaluation] Coze Coding Evaluation Target Support Mar 19, 2026
dsf86
dsf86 previously approved these changes Mar 19, 2026
Add case for EvaluatorTypeCustomRPC in convertEntityEvaluatorTypeToOpenAPI function and refactor evaluator version ID retrieval to use GetEvaluatorVersionID method. Also add test case for agent evaluator in SubmitExperimentOApi test.
Add more comprehensive test cases to verify conversion of different evaluator types
Add EvaluatorTypeAgent constant and handle conversion between entity and openapi types. Also add validation to reject agent type in evaluator openapi conversion.
Skip workspace validation for builtin evaluators to allow cross-workspace execution. Add test cases for evaluator version not found and builtin success scenarios.
…idation

Add optional Extra field to ImportEvaluationSetOApiRequest and GetEvaluationSetIOJobOApiRequest thrift structs
Implement validation, serialization and deserialization for the new field in generated code
implement API to run builtin evaluators by ID or name, including:
- add new endpoint /v1/loop/evaluation/builtin_evaluators/run
- add service method to resolve visible version ID
- add repo method to get evaluator by space ID and name
- update thrift IDL and generate code
- add tests for new functionality
- Move builtin evaluator endpoint from `/builtin_evaluators/run` to `/evaluators/builtin/run`
- Add new middleware `_builtinMw` for builtin evaluator routes
- Implement `GetEvaluatorMetaBySpaceIDAndName` repo method and tests
- Add `ResolveBuiltinEvaluatorVisibleVersionID` service method and tests
Clarify that either builtin_evaluator_id or builtin_evaluator_name must be provided, and if both are provided, they must match
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants