branch-4.0: [fix](test) fix branch-4.0 P0 regression failures (build #198126) by morningman · Pull Request #64525 · apache/doris

morningman · 2026-06-15T08:51:59Z

What problem does this PR solve?

Stabilizes the branch-4.0 P0 regression pipeline by fixing test failures observed in
regression build #198126 (12 failed). This PR addresses 7 of them.

Backports of apache/master fixes

[fix](load) fix empty statistics for forwarded INSERT #64439 [fix](load) fix empty statistics for forwarded INSERT — real product fix: a
forwarded INSERT ... SELECT reported empty JobDetails (ScannedRows=0) because the
non-Nereids-distribute fallback Coordinator was built without the jobId. Fixes
load_p0/insert/test_insert_statistic.
[test](regression) Add debug point ANN index-only scan test #63859 [test](regression) Add debug point ANN index-only scan test — replaces the
fragile ScanBytes-profile-based ANN suites (an unreliable index-only proxy) with
debug-point suites. Fixes ann_index_p0/ann_index_only_scan_distance_expr and
ann_index_p0/ann_index_only_scan_metric_direction (both were new failures).
[fix](case) fix insert_group_commit_into_max_filter_ratio #64402 [fix](case) fix insert_group_commit_into_max_filter_ratio — cherry-pick of the
(currently open) master PR; aligns the case with the enable_insert_strict /
enable_strict_cast decoupling ([fix](insert) enable_insert_strict should not affect semantic of enable_strict_cast #63794). Fixes
insert_p0/insert_group_commit_into_max_filter_ratio.
Partial of [fix](be) Rename CPU time profile counter #64238: only the scanner_profile actualRows regex fix (PhysicalOlapScan[id]
is rendered with the numeric nereids id, not the table name). The BE CPU-counter rename in
[fix](be) Rename CPU time profile counter #64238 is intentionally NOT backported to a release branch. Fixes query_profile/scanner_profile.

Test robustness / isolation fixes (the same defect also exists on apache/master)

temp_table_p0/test_temp_table: the replica-count detection was broken — the regex /:(\d+)/
missed the space in the canonical tag.location.default: 3, and force_olap_table_replication_num
was never consulted, so on a 3-replica cluster SHOW TABLETS returned 9 rows while the test
expected 3. Detection is now whitespace-tolerant, sums the allocation per-tag counts (mirroring
PropertyAnalyzer.analyzeReplicaAllocation precedence) and falls back to
force_olap_table_replication_num, while still asserting the real replica count.
backup_restore/test_backup_restore_reset_index_id (+ test_backup_restore_inverted_idx and
restore_p0/test_validate_restore_inverted_idx): cross-suite race on the process-global
restore_reset_index_id FE config — a concurrent suite flipped it to false between this
suite's BACKUP and RESTORE. The three suites that mutate this global are now nonConcurrent
so they run serially. The FE reset logic itself is correct; this is purely test isolation.

Release note

None

Check List (For Committer)

Test

Check List (For Reviewer who provide ASF feature)

Performance regression

hello-stephen · 2026-06-15T08:52:05Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

morningman · 2026-06-15T08:53:51Z

run buildall

### What problem does this PR solve? When `INSERT INTO ... SELECT` is forwarded from a follower FE to the master FE, `SHOW LOAD` could show an empty `JobDetails`, such as `ScannedRows=0`, `LoadBytes=0`, `TaskNumber=0`, and empty backend lists. The root cause is that the insert load job is registered with a real `jobId`, but when coordinator creation falls back to the regular `Coordinator` / `CloudCoordinator` path, that `jobId` was not passed into the coordinator. Therefore, the coordinator kept the default `jobId=-1` and did not initialize or update the corresponding `LoadManager` progress. The load job was still recorded as `FINISHED`, but its `LoadStatistic` remained empty when `SHOW LOAD` rendered `JobDetails`. This PR preserves the insert `jobId` in the regular `Coordinator` and `CloudCoordinator` fallback paths, so `initJobProgress()` and `updateJobProgress()` update the same `InsertLoadJob` that is later recorded and displayed by `SHOW LOAD`. (cherry picked from commit a53679b)

…3859) Issue Number: None Related PR: None Problem Summary: The previous ANN index-only scan regression coverage inferred whether source vector columns were skipped by comparing ScanBytes from query profiles. That made the test hard to review and could miss cases where both query shapes still read the source column. Replace that coverage with a dedicated debug-point regression that directly fails if the embedding column is read in index-only scenarios, including a remapped reader-schema case where the source slot index differs from the storage column id. Remove the old profile-based suites and generated output. None - Test: Manual test - git diff --cached --check - Regression test not run per request; an earlier attempt was blocked by Maven writing to /Users/roanhe/.m2/repository under the sandbox - Behavior changed: No - Does this need documentation: No Issue Number: close #xxx Related PR: #xxx Problem Summary: None - Test  - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason  - Behavior changed: - [ ] No. - [ ] Yes.  - Does this need documentation? - [ ] No. - [ ] Yes.  - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label  (cherry picked from commit 6e5198b)

…pache#64238) The scanner_profile suite asserts that `actualRows` is backfilled onto the PhysicalOlapScan node, but the regex `PhysicalOlapScan\[scanner_profile\]` can never match: FE renders the node as `PhysicalOlapScan[<id>] ( table= scanner_profile, ... actualRows=N )` (the bracket holds the numeric nereids id, not the table name). matcher.find() is always false regardless of the backfilled value, so the case fails deterministically. Upstream apache/doris apache#64238 (952aede) fixes this regex to `PhysicalOlapScan[^\n]*scanner_profile[^\n]*actualRows=(\d+)`. This is a minimal port of ONLY that regex change. The rest of apache#64238 (a BE CPU-time counter rename `MaxFindRecvrTime`->`MaxFindRecvrCpuTime` plus new `TaskCpuTime`/`ScannerCpuTime` profile assertions and BE unit tests) is an unrelated cosmetic cleanup; it is intentionally NOT backported to avoid shipping a BE change and profile-naming assertions to a release branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…126) test_temp_table asserted `3 * replicaNum == show_tablets_result.size()`, deriving replicaNum by regex-parsing force_olap_table_replication_allocation with `/:(\d+)/`. That detection is wrong on clusters that force replication=3: the canonical allocation string `tag.location.default: 3` has a space after the colon so the regex never matches (replicaNum stuck at 1), and the separate force_olap_table_replication_num config is never consulted. On the 3-replica TeamCity cluster SHOW TABLETS returned 9 rows (3 tablets x 3 replicas) while the test expected 3. Keep verifying the real replica count (so a wrong/missing replica is still caught) but fix the detection: - assert there are exactly 3 distinct tablets (3 partitions x 1 bucket), and - assert total rows == 3 * replicaNum, where replicaNum is derived robustly and mirrors FE precedence (PropertyAnalyzer.analyzeReplicaAllocation): the force_olap_table_replication_allocation per-tag counts are summed first with a whitespace-tolerant regex `/:\s*(\d+)/`, falling back to force_olap_table_replication_num. Same defect exists on apache/master; forward-port there too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…dex_id (build #198126) restore_reset_index_id is a single process-global, master-only, mutable FE config read at RESTORE execution time (OlapTable.resetIdsForRestore). Three suites mutate it: - backup_restore/test_backup_restore_reset_index_id (sets true, then asserts the restored index id was reset to a new value) - backup_restore/test_backup_restore_inverted_idx (sets false / true) - restore_p0/test_validate_restore_inverted_idx (sets false / true) Because the regression framework runs NORMAL suites in parallel, one suite can flip the shared global config between another suite's BACKUP and RESTORE. In build #198126 a concurrent suite set restore_reset_index_id=false after test_backup_restore_reset_index_id had set it true, so that suite's RESTORE took the else-branch, kept the original index id, and the line-142 assertion failed (old id == new id). Add these suites to the `nonConcurrent` group so they run serially in the single-thread SINGLE phase and can no longer race on the shared global config. The FE reset logic itself is correct; this is purely a test-isolation fix. Same design exists on apache/master; forward-port there too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

(cherry picked from commit 8a076ce)

…only_scan debug_point cases These 3 cases were cherry-picked from master (apache#63859). The session variable enable_condition_cache (condition cache feature) does not exist on branch-4.0, so 'set enable_condition_cache=false' fails with 'Unknown system variable'. Since the feature is absent, the branch is already in the cache-disabled state the test expects, so removing the line is a semantic no-op and restores the cases on this branch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

morningman · 2026-06-15T12:59:54Z

run buildall

hello-stephen · 2026-06-15T13:58:23Z

FE UT Coverage Report

Increment line coverage 0.00% (0/7) 🎉
Increment coverage report
Complete coverage report

morningman changed the title ~~[fix](test) fix branch-4.0 P0 regression failures (build #198126)~~ branch-4.0: [fix](test) fix branch-4.0 P0 regression failures (build #198126) Jun 15, 2026

morningman mentioned this pull request Jun 15, 2026

branch-4.0: [fix](case) fix insert_group_commit_into_max_filter_ratio #64402 #64527

Closed

sollhui and others added 9 commits June 15, 2026 22:59

[fix](case) fix insert_group_commit_into_max_filter_ratio

a3d325e

(cherry picked from commit 8a076ce)

ignore shape_check/tpcds_sf1000/shape/query64.groovy

bac53be

skip external_table_p0/hive/test_parquet_join_runtime_filter.groovy

c79cbfc

morningman force-pushed the tc-198126-backports branch from c2254f3 to d58d717 Compare June 15, 2026 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

branch-4.0: [fix](test) fix branch-4.0 P0 regression failures (build #198126)#64525

branch-4.0: [fix](test) fix branch-4.0 P0 regression failures (build #198126)#64525
morningman wants to merge 9 commits into
apache:branch-4.0from
morningman:tc-198126-backports

morningman commented Jun 15, 2026

Uh oh!

hello-stephen commented Jun 15, 2026

Uh oh!

morningman commented Jun 15, 2026

Uh oh!

morningman commented Jun 15, 2026

Uh oh!

hello-stephen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

morningman commented Jun 15, 2026

What problem does this PR solve?

Backports of apache/master fixes

Test robustness / isolation fixes (the same defect also exists on apache/master)

Release note

Check List (For Committer)

Check List (For Reviewer who provide ASF feature)

Uh oh!

hello-stephen commented Jun 15, 2026

Uh oh!

morningman commented Jun 15, 2026

Uh oh!

morningman commented Jun 15, 2026

Uh oh!

hello-stephen commented Jun 15, 2026

FE UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants