Skip to content

[refactor](table) Refactor table and file reader#63893

Draft
Gabriel39 wants to merge 28 commits into
masterfrom
refact_reader_branch
Draft

[refactor](table) Refactor table and file reader#63893
Gabriel39 wants to merge 28 commits into
masterfrom
refact_reader_branch

Conversation

@Gabriel39

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39 Gabriel39 marked this pull request as draft May 29, 2026 06:39
Gabriel39 added a commit to Gabriel39/incubator-doris that referenced this pull request May 29, 2026
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#63893

Problem Summary: Add focused BE unit coverage for new table reader and new parquet reader edge cases, including aggregate pushdown over split ranges, Iceberg equality/position deletes, row lineage after delete filtering, Parquet dictionary/statistics pruning, and IOContext release. Also clean up temporary delete predicate expression columns in the new Parquet reader so equality delete predicates with cast children do not alter the returned file block schema.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - Added BE UT cases in table_reader_test and parquet_reader_test.
    - Ran git diff --check.
    - Tried ./run-be-ut.sh with focused filters, but local JAVA_HOME points to JDK 11 and JDK_17 is not set; the runner requires JDK 17.
- Behavior changed: No
- Does this need documentation: No
Gabriel39 added a commit that referenced this pull request May 29, 2026
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #63893

Problem Summary: Add focused BE unit coverage for new table reader and
new parquet reader edge cases, including aggregate pushdown over split
ranges, Iceberg equality/position deletes, row lineage after delete
filtering, Parquet dictionary/statistics pruning, and IOContext release.
Also clean up temporary delete predicate expression columns in the new
Parquet reader so equality delete predicates with cast children do not
alter the returned file block schema.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - Added BE UT cases in table_reader_test and parquet_reader_test.
    - Ran git diff --check.
- Tried ./run-be-ut.sh with focused filters, but local JAVA_HOME points
to JDK 11 and JDK_17 is not set; the runner requires JDK 17.
- Behavior changed: No
- Does this need documentation: No

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@Gabriel39 Gabriel39 force-pushed the refact_reader_branch branch 4 times, most recently from 837cc56 to 475e48a Compare June 3, 2026 05:14
@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29107 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c7e07bf0f4367f7634e524741d26894f84d16410, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17461	4051	4019	4019
q2	q3	10688	1355	807	807
q4	4687	472	342	342
q5	7582	883	589	589
q6	187	173	139	139
q7	796	840	650	650
q8	9383	1609	1560	1560
q9	5881	4483	4507	4483
q10	6786	1831	1557	1557
q11	421	269	252	252
q12	636	422	288	288
q13	18186	3500	2734	2734
q14	264	264	253	253
q15	q16	812	779	704	704
q17	1018	1005	897	897
q18	6979	5935	5593	5593
q19	2044	1276	992	992
q20	505	386	268	268
q21	6381	2852	2672	2672
q22	475	385	308	308
Total cold run time: 101172 ms
Total hot run time: 29107 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5074	4871	4694	4694
q2	q3	4929	5301	4650	4650
q4	2133	2173	1402	1402
q5	4796	4986	4655	4655
q6	228	181	128	128
q7	1915	1752	1598	1598
q8	2429	2131	2084	2084
q9	7840	7666	7441	7441
q10	4695	4665	4246	4246
q11	530	379	352	352
q12	730	734	521	521
q13	3029	3327	2799	2799
q14	287	281	254	254
q15	q16	687	692	603	603
q17	1280	1252	1252	1252
q18	7398	6975	6771	6771
q19	1109	1094	1118	1094
q20	2214	2220	1955	1955
q21	5243	4552	4491	4491
q22	513	471	436	436
Total cold run time: 57059 ms
Total hot run time: 51426 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (2/3) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168765 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c7e07bf0f4367f7634e524741d26894f84d16410, data reload: false

query5	4336	620	484	484
query6	459	210	175	175
query7	4842	554	313	313
query8	375	211	208	208
query9	8801	4019	4022	4019
query10	434	321	274	274
query11	5941	2346	2210	2210
query12	166	104	101	101
query13	1287	678	408	408
query14	6372	5363	5054	5054
query14_1	4390	4379	4354	4354
query15	203	195	175	175
query16	1032	453	433	433
query17	1120	709	583	583
query18	2464	465	344	344
query19	202	180	136	136
query20	109	109	109	109
query21	211	135	116	116
query22	13661	13491	13333	13333
query23	17282	16449	16129	16129
query23_1	16201	16314	16343	16314
query24	7482	1728	1346	1346
query24_1	1335	1319	1306	1306
query25	589	467	411	411
query26	1359	317	178	178
query27	2601	555	319	319
query28	4489	2082	2027	2027
query29	1112	628	491	491
query30	312	235	197	197
query31	1113	1107	963	963
query32	106	62	62	62
query33	535	339	275	275
query34	1180	1194	651	651
query35	756	803	714	714
query36	1416	1408	1236	1236
query37	160	114	127	114
query38	3209	3114	3049	3049
query39	916	908	898	898
query39_1	884	884	870	870
query40	224	125	101	101
query41	64	62	61	61
query42	94	97	93	93
query43	315	321	280	280
query44	
query45	201	189	180	180
query46	1108	1190	719	719
query47	2370	2376	2229	2229
query48	401	421	303	303
query49	622	469	354	354
query50	982	356	259	259
query51	4282	4268	4232	4232
query52	87	87	81	81
query53	246	269	191	191
query54	270	227	204	204
query55	78	75	70	70
query56	242	224	215	215
query57	1459	1415	1332	1332
query58	252	216	199	199
query59	1591	1676	1425	1425
query60	285	246	234	234
query61	165	167	160	160
query62	692	642	576	576
query63	232	178	183	178
query64	2521	772	614	614
query65	
query66	1769	474	339	339
query67	29729	29711	28959	28959
query68	
query69	422	301	263	263
query70	944	988	988	988
query71	300	212	214	212
query72	2912	2878	2593	2593
query73	837	775	461	461
query74	5130	4933	4801	4801
query75	2695	2550	2225	2225
query76	2283	1160	784	784
query77	353	385	292	292
query78	12324	12452	11858	11858
query79	1456	996	795	795
query80	586	480	396	396
query81	461	287	244	244
query82	577	158	128	128
query83	353	270	261	261
query84	257	140	108	108
query85	868	537	442	442
query86	361	308	292	292
query87	3346	3309	3141	3141
query88	3658	2762	2763	2762
query89	419	374	323	323
query90	1969	182	185	182
query91	178	164	134	134
query92	65	64	60	60
query93	1555	1506	856	856
query94	567	370	309	309
query95	674	475	351	351
query96	1038	751	340	340
query97	2698	2679	2598	2598
query98	229	205	208	205
query99	1171	1180	1023	1023
Total cold run time: 250908 ms
Total hot run time: 168765 ms

@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29068 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e9a9b14d5d681b1e807b8d06a2a8f3c4a1c6feef, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17764	4108	4077	4077
q2	q3	10894	1483	807	807
q4	4807	487	352	352
q5	8583	909	600	600
q6	362	174	141	141
q7	945	845	658	658
q8	10940	1596	1641	1596
q9	7162	4542	4565	4542
q10	6787	1952	1532	1532
q11	441	273	252	252
q12	646	428	294	294
q13	18143	3451	2794	2794
q14	267	258	247	247
q15	q16	821	788	711	711
q17	1011	860	935	860
q18	7059	5840	5519	5519
q19	1175	1231	1001	1001
q20	520	413	262	262
q21	5787	2711	2519	2519
q22	445	366	304	304
Total cold run time: 104559 ms
Total hot run time: 29068 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4380	4309	4346	4309
q2	q3	4562	4996	4393	4393
q4	2104	2217	1390	1390
q5	4466	4347	4679	4347
q6	264	212	149	149
q7	2027	1875	1665	1665
q8	2529	2181	2177	2177
q9	7982	7985	8407	7985
q10	4908	4987	4297	4297
q11	587	415	385	385
q12	773	898	558	558
q13	3321	3710	2976	2976
q14	299	313	270	270
q15	q16	722	763	650	650
q17	1407	1353	1330	1330
q18	7859	7354	6768	6768
q19	1138	1099	1113	1099
q20	2225	2231	1945	1945
q21	5268	4576	4450	4450
q22	542	462	402	402
Total cold run time: 57363 ms
Total hot run time: 51545 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171113 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e9a9b14d5d681b1e807b8d06a2a8f3c4a1c6feef, data reload: false

query5	4346	638	492	492
query6	452	204	183	183
query7	4833	536	307	307
query8	382	235	220	220
query9	8807	4169	4108	4108
query10	461	323	274	274
query11	5972	2371	2257	2257
query12	163	114	101	101
query13	1272	629	440	440
query14	6405	5472	5096	5096
query14_1	4447	4433	4418	4418
query15	210	205	179	179
query16	1071	463	458	458
query17	1156	740	600	600
query18	2725	511	366	366
query19	210	194	153	153
query20	111	110	108	108
query21	226	150	121	121
query22	13616	13704	13406	13406
query23	17317	16568	16148	16148
query23_1	16390	16374	16396	16374
query24	7475	1826	1318	1318
query24_1	1314	1316	1306	1306
query25	593	481	421	421
query26	1302	324	173	173
query27	2613	569	333	333
query28	4417	2007	2025	2007
query29	1119	639	512	512
query30	313	240	215	215
query31	1124	1087	964	964
query32	113	64	64	64
query33	560	340	268	268
query34	1201	1140	669	669
query35	761	826	713	713
query36	1376	1410	1224	1224
query37	156	108	90	90
query38	3198	3187	3058	3058
query39	940	930	909	909
query39_1	899	872	883	872
query40	225	125	109	109
query41	67	64	61	61
query42	96	96	96	96
query43	326	321	275	275
query44	
query45	195	186	184	184
query46	1075	1210	770	770
query47	2366	2397	2272	2272
query48	384	418	297	297
query49	644	474	374	374
query50	1038	364	250	250
query51	4369	4366	4289	4289
query52	90	89	79	79
query53	252	273	198	198
query54	265	220	214	214
query55	81	77	75	75
query56	247	231	221	221
query57	1425	1404	1317	1317
query58	251	229	221	221
query59	1606	1629	1418	1418
query60	288	245	244	244
query61	161	164	163	163
query62	698	675	592	592
query63	230	182	188	182
query64	2551	818	638	638
query65	
query66	1773	462	344	344
query67	29838	29808	29748	29748
query68	
query69	432	300	271	271
query70	966	942	968	942
query71	302	225	214	214
query72	3018	2757	2384	2384
query73	826	748	446	446
query74	5123	4992	4767	4767
query75	2704	2605	2241	2241
query76	2313	1165	790	790
query77	364	380	298	298
query78	12545	12447	11919	11919
query79	1419	1053	787	787
query80	1320	487	395	395
query81	530	284	241	241
query82	603	156	124	124
query83	337	286	271	271
query84	275	139	117	117
query85	946	545	448	448
query86	442	322	285	285
query87	3397	3309	3221	3221
query88	3677	2754	2721	2721
query89	452	384	333	333
query90	1936	189	188	188
query91	182	171	140	140
query92	64	67	62	62
query93	1563	1498	874	874
query94	732	348	310	310
query95	693	381	345	345
query96	1088	777	320	320
query97	2719	2718	2566	2566
query98	219	211	213	211
query99	1155	1186	1043	1043
Total cold run time: 253223 ms
Total hot run time: 171113 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29390 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5decd77b25dfb7993971c59414b306ae17d37096, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17882	4102	4130	4102
q2	q3	10907	1439	837	837
q4	4763	489	350	350
q5	8430	888	590	590
q6	319	171	136	136
q7	895	868	635	635
q8	10915	1700	1634	1634
q9	7272	4595	4500	4500
q10	6775	1856	1542	1542
q11	439	272	249	249
q12	652	425	289	289
q13	18226	3410	2820	2820
q14	275	260	243	243
q15	q16	818	777	714	714
q17	1062	875	973	875
q18	6842	5790	5477	5477
q19	1393	1253	1214	1214
q20	559	430	277	277
q21	6086	2731	2596	2596
q22	479	369	310	310
Total cold run time: 104989 ms
Total hot run time: 29390 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4857	4697	4970	4697
q2	q3	4878	5285	4704	4704
q4	2161	2206	1411	1411
q5	4992	4771	4719	4719
q6	246	189	127	127
q7	1849	1712	1565	1565
q8	2301	1964	1951	1951
q9	7391	7434	7450	7434
q10	4728	4687	4233	4233
q11	544	384	352	352
q12	726	748	527	527
q13	3023	3432	2783	2783
q14	267	278	265	265
q15	q16	676	704	598	598
q17	1271	1258	1254	1254
q18	7357	6776	6861	6776
q19	1092	1079	1114	1079
q20	2226	2233	1936	1936
q21	5290	4562	4473	4473
q22	534	466	408	408
Total cold run time: 56409 ms
Total hot run time: 51292 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168294 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5decd77b25dfb7993971c59414b306ae17d37096, data reload: false

query5	4326	641	483	483
query6	454	202	179	179
query7	4880	580	281	281
query8	368	230	209	209
query9	8785	3994	3990	3990
query10	453	314	271	271
query11	5732	2353	2175	2175
query12	157	102	97	97
query13	1259	562	452	452
query14	6567	5393	5066	5066
query14_1	4397	4400	4372	4372
query15	209	200	178	178
query16	1028	490	434	434
query17	1131	716	596	596
query18	2724	490	385	385
query19	204	180	140	140
query20	118	109	111	109
query21	213	138	116	116
query22	13611	13597	13356	13356
query23	17395	16545	16200	16200
query23_1	16260	16236	16287	16236
query24	7500	1683	1307	1307
query24_1	1319	1288	1327	1288
query25	552	450	378	378
query26	1295	326	171	171
query27	2652	557	343	343
query28	4418	2004	2008	2004
query29	1065	628	478	478
query30	315	239	200	200
query31	1130	1071	961	961
query32	120	61	59	59
query33	524	315	246	246
query34	1189	1117	647	647
query35	744	780	680	680
query36	1390	1387	1257	1257
query37	155	103	91	91
query38	3215	3136	3068	3068
query39	944	915	897	897
query39_1	882	886	859	859
query40	218	121	102	102
query41	64	65	61	61
query42	94	93	93	93
query43	318	319	275	275
query44	
query45	199	188	180	180
query46	1123	1250	750	750
query47	2341	2377	2306	2306
query48	398	423	280	280
query49	635	466	360	360
query50	1040	343	261	261
query51	4416	4269	4246	4246
query52	86	87	76	76
query53	252	255	190	190
query54	264	217	200	200
query55	76	75	69	69
query56	243	220	220	220
query57	1430	1387	1354	1354
query58	241	214	210	210
query59	1583	1612	1423	1423
query60	278	245	228	228
query61	149	150	156	150
query62	716	650	586	586
query63	234	185	183	183
query64	2511	770	615	615
query65	
query66	1768	468	338	338
query67	29734	28977	28885	28885
query68	
query69	420	298	265	265
query70	999	954	927	927
query71	298	226	209	209
query72	2952	2663	2407	2407
query73	834	784	426	426
query74	5161	4946	4803	4803
query75	2660	2553	2249	2249
query76	2314	1184	801	801
query77	360	366	289	289
query78	12505	12347	11807	11807
query79	1266	1070	792	792
query80	525	461	392	392
query81	446	283	244	244
query82	248	163	129	129
query83	269	279	251	251
query84	267	147	114	114
query85	900	532	449	449
query86	329	300	291	291
query87	3344	3302	3195	3195
query88	3618	2724	2704	2704
query89	416	380	324	324
query90	2176	177	179	177
query91	177	168	133	133
query92	63	63	57	57
query93	1412	1411	927	927
query94	542	344	299	299
query95	687	457	355	355
query96	1090	840	388	388
query97	2703	2699	2586	2586
query98	211	202	210	202
query99	1138	1176	1028	1028
Total cold run time: 250720 ms
Total hot run time: 168294 ms

@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

1 similar comment
@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29426 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1be063b5d93f73378ae33a40d6692c8f75681079, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17658	4085	4062	4062
q2	q3	10917	1418	827	827
q4	4746	508	365	365
q5	8253	892	595	595
q6	332	179	136	136
q7	923	856	634	634
q8	10856	1689	1575	1575
q9	7228	4584	4539	4539
q10	6810	1834	1536	1536
q11	437	285	257	257
q12	648	436	296	296
q13	18089	3978	2770	2770
q14	271	261	235	235
q15	q16	826	776	713	713
q17	946	928	1034	928
q18	7058	5761	5477	5477
q19	1186	1237	1200	1200
q20	577	455	292	292
q21	5834	2815	2654	2654
q22	458	460	335	335
Total cold run time: 104053 ms
Total hot run time: 29426 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4923	4842	4754	4754
q2	q3	4971	5267	4645	4645
q4	2113	2184	1390	1390
q5	4980	4730	4669	4669
q6	236	200	147	147
q7	1921	1728	1526	1526
q8	2422	2124	2128	2124
q9	7990	7416	7525	7416
q10	4754	4677	4222	4222
q11	528	384	357	357
q12	737	737	523	523
q13	3057	3417	2819	2819
q14	277	287	258	258
q15	q16	685	709	605	605
q17	1288	1273	1263	1263
q18	7351	6967	6754	6754
q19	1101	1108	1135	1108
q20	2231	2215	1953	1953
q21	5287	4595	4472	4472
q22	528	449	402	402
Total cold run time: 57380 ms
Total hot run time: 51407 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.34% (1906/2433)
Line Coverage 64.79% (33994/52468)
Region Coverage 65.32% (17527/26833)
Branch Coverage 53.99% (9303/17230)

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29178 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 04a4ae61ede8403ffa0a92b822d59113727d5491, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17808	4148	4164	4148
q2	q3	10985	1470	801	801
q4	4819	484	345	345
q5	8515	883	578	578
q6	333	172	138	138
q7	920	855	628	628
q8	10835	1735	1577	1577
q9	7167	4520	4542	4520
q10	6838	1912	1542	1542
q11	435	267	254	254
q12	654	442	290	290
q13	18215	3790	2831	2831
q14	265	257	234	234
q15	q16	826	770	718	718
q17	966	896	965	896
q18	6821	5776	5568	5568
q19	1160	1237	1090	1090
q20	512	408	262	262
q21	5592	2632	2459	2459
q22	427	348	299	299
Total cold run time: 104093 ms
Total hot run time: 29178 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4342	4278	4280	4278
q2	q3	4552	4988	4340	4340
q4	2119	2197	1401	1401
q5	4543	4344	4595	4344
q6	257	196	144	144
q7	2075	1779	1578	1578
q8	2538	2155	2129	2129
q9	8029	7990	7968	7968
q10	4876	5039	4294	4294
q11	561	424	401	401
q12	747	786	539	539
q13	3212	3661	2971	2971
q14	306	319	279	279
q15	q16	736	733	643	643
q17	1370	1349	1320	1320
q18	8086	7261	6842	6842
q19	1143	1103	1138	1103
q20	2227	2233	1960	1960
q21	5297	4623	4525	4525
q22	517	462	412	412
Total cold run time: 57533 ms
Total hot run time: 51471 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168644 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 04a4ae61ede8403ffa0a92b822d59113727d5491, data reload: false

query5	4336	621	491	491
query6	463	193	184	184
query7	4865	543	303	303
query8	361	215	203	203
query9	8758	4044	4028	4028
query10	445	317	269	269
query11	5932	2370	2138	2138
query12	165	101	96	96
query13	1274	603	426	426
query14	6328	5398	5061	5061
query14_1	4423	4395	4401	4395
query15	208	198	179	179
query16	999	459	460	459
query17	1123	704	578	578
query18	2512	487	343	343
query19	201	187	171	171
query20	111	111	107	107
query21	212	142	117	117
query22	13674	13728	13347	13347
query23	17479	16479	16109	16109
query23_1	16359	16266	16308	16266
query24	7488	1763	1323	1323
query24_1	1311	1311	1294	1294
query25	567	468	390	390
query26	1317	314	166	166
query27	2741	553	323	323
query28	4488	2034	2051	2034
query29	1101	637	488	488
query30	310	237	200	200
query31	1148	1080	955	955
query32	107	65	60	60
query33	528	326	253	253
query34	1180	1102	657	657
query35	804	781	693	693
query36	1395	1367	1227	1227
query37	151	98	89	89
query38	3203	3149	3007	3007
query39	930	923	889	889
query39_1	872	877	873	873
query40	217	120	95	95
query41	63	63	60	60
query42	94	91	91	91
query43	315	325	281	281
query44	
query45	197	179	180	179
query46	1063	1175	717	717
query47	2387	2410	2253	2253
query48	395	418	296	296
query49	616	473	335	335
query50	977	353	265	265
query51	4318	4334	4255	4255
query52	87	87	76	76
query53	249	263	191	191
query54	264	213	191	191
query55	79	72	68	68
query56	245	218	234	218
query57	1432	1404	1329	1329
query58	245	207	210	207
query59	1564	1670	1421	1421
query60	283	254	227	227
query61	154	144	149	144
query62	706	645	562	562
query63	231	182	186	182
query64	2554	737	587	587
query65	
query66	1772	460	337	337
query67	29832	29740	29579	29579
query68	
query69	427	306	252	252
query70	964	959	951	951
query71	292	267	212	212
query72	2822	2587	2372	2372
query73	839	754	447	447
query74	5137	4935	4805	4805
query75	2653	2574	2219	2219
query76	2322	1134	814	814
query77	350	375	287	287
query78	12529	12541	11895	11895
query79	1489	1053	804	804
query80	1301	475	377	377
query81	509	281	243	243
query82	572	160	126	126
query83	347	270	247	247
query84	
query85	909	516	416	416
query86	427	310	277	277
query87	3417	3341	3191	3191
query88	3634	2751	2701	2701
query89	427	385	348	348
query90	1904	181	173	173
query91	174	155	135	135
query92	66	59	56	56
query93	1528	1407	903	903
query94	723	352	308	308
query95	688	485	346	346
query96	1094	841	337	337
query97	2722	2730	2552	2552
query98	214	203	217	203
query99	1138	1166	1019	1019
Total cold run time: 252175 ms
Total hot run time: 168644 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 7.69% (2/26) 🎉
Increment coverage report
Complete coverage report

@Gabriel39

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few blocking rollout and operational issues.

Critical checkpoint conclusions:

  • Goal and scope: this PR adds the FileScannerV2/format_v2 reader stack with broad tests, but it also changes default behavior for supported external query scans and Iceberg batch planning.
  • Data correctness: in the reviewed row/delete/aggregate paths, delete predicates and aggregate pushdown have the expected guards; I did not find an additional concrete data-loss issue there.
  • Version and compatibility: thrift option/enum additions are append-only, but the FE default for enable_file_scanner_v2 conflicts with its documented and thrift defaults.
  • FE/BE variables: enable_file_scanner_v2 is forwarded to BE and currently enables V2 for all supported non-load scan ranges by default.
  • Parallel paths and rollback: the old FileScanner remains available only when the session variable is false or the split is unsupported; default true removes rollback-by-default for supported scans.
  • Transactions and persistence: no transaction or storage write-path change was identified.
  • Concurrency and lifecycle: no new shared mutable-state or scanner lifetime issue was identified in the reviewed paths.
  • Performance and observability: the per-file warning-level debug log and the batch-mode default flip are problematic.
  • Tests: substantial BE unit and regression coverage was added; I did not run full Doris test suites in this runner. git diff --check reports trailing whitespace in one generated .out file.
  • Security: no security review was requested, and I am not making a security finding.
  • User focus: no additional user-provided focus was present.

@@ -1112,6 +1113,11 @@ public static double getHotValueThreshold() {
"FileScanNode 扫描数据的最大并发,默认为 16", "The max threads to read data of FileScanNode, default 16"})
public int maxFileScannersConcurrency = 16;

@VarAttrDef.VarAttr(name = ENABLE_FILE_SCANNER_V2, needForward = true, description = {
"开启后 FileScanNode 会在支持的查询场景使用 FileScannerV2,默认关闭",
"When enabled, FileScanNode uses FileScannerV2 for supported query scans. Disabled by default."})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is documented as disabled by default and the thrift field default is also false, but this initializes every FE session to true. toThrift() always forwards that value, and FileScanLocalState::_init_scanners selects FileScannerV2 whenever it is true and all splits are supported, so this makes the new reader stack the default for supported query scans. For a refactor of this size, that removes the intended rollback-by-default path. Please default this to false, or update the release/compatibility plan if default enablement is intentional.

Suggested change
"When enabled, FileScanNode uses FileScannerV2 for supported query scans. Disabled by default."})
public boolean enableFileScannerV2 = false;

@@ -2873,10 +2879,9 @@ public static boolean isEagerAggregationOnJoin() {
public static final String ENABLE_MC_LIMIT_SPLIT_OPTIMIZATION = "enable_mc_limit_split_optimization";
@VarAttrDef.VarAttr(
name = ENABLE_EXTERNAL_TABLE_BATCH_MODE,
fuzzy = true,
description = {"使能外表的 batch mode 功能", "Enable the batch mode function of the external table."},
needForward = true)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This silently flips enable_external_table_batch_mode from true to false and removes it from fuzzy testing. IcebergScanNode.isBatchMode() returns false as soon as this session variable is false, so existing Iceberg scans stop using batch mode by default. That is a user-visible planner/performance change while the PR release note says None; please either keep the previous default and fuzzy coverage, or document the behavior change with dedicated tests.

_data_reader.block_template.insert(
{column.type->create_column(), column.type, column.name});
}
LOG(WARNING) << "TableReader debug: " << debug_string();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is emitted every time a file reader is opened, and debug_string() includes the projected schema, table filters, column predicates, conjunct debug strings, and full column mapping state. With V2 selected for supported scans, a normal external query over many files will write one warning per file and can flood BE logs. Please remove this or gate it behind debug/VLOG-level logging.

Gabriel39 and others added 3 commits June 16, 2026 11:34
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #63893

Problem Summary: TeamCity external regression build 970191 still had several expected output files using old timestamp values. The new parquet timestamp semantics return the corrected values for the affected external table cases, including Hive, Iceberg, Paimon, and TVF parquet result files. This commit refreshes the corresponding regression expected outputs from the observed CI results and keeps unrelated non-timestamp failures untouched.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Compared TeamCity build 970191 failure details with the updated expected output files. Full regression test was not rerun locally.
- Behavior changed: No
- Does this need documentation: No
@suxiaogang223

Copy link
Copy Markdown
Member

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29516 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 95900c7aa27f6798ab1e710ac8e895889970b81f, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17659	4030	4027	4027
q2	2004	316	193	193
q3	10847	1440	819	819
q4	4821	476	342	342
q5	8501	885	589	589
q6	361	173	138	138
q7	890	854	629	629
q8	10633	1676	1662	1662
q9	6087	4559	4527	4527
q10	6832	1827	1518	1518
q11	441	289	243	243
q12	650	426	296	296
q13	18152	3391	2816	2816
q14	270	270	242	242
q15	q16	800	788	706	706
q17	1010	957	1024	957
q18	6892	5897	5696	5696
q19	1165	1366	1111	1111
q20	502	407	269	269
q21	5564	2683	2441	2441
q22	434	358	295	295
Total cold run time: 104515 ms
Total hot run time: 29516 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4386	4319	4331	4319
q2	325	351	223	223
q3	4621	4993	4379	4379
q4	2096	2166	1374	1374
q5	4483	4298	4284	4284
q6	240	259	240	240
q7	2127	1856	1579	1579
q8	2534	2186	2148	2148
q9	8122	7857	7988	7857
q10	4832	4725	4439	4439
q11	609	431	386	386
q12	752	741	556	556
q13	3310	3646	3020	3020
q14	330	299	280	280
q15	q16	698	739	644	644
q17	1354	1357	1288	1288
q18	8063	7361	6862	6862
q19	1106	1116	1101	1101
q20	2236	2205	1954	1954
q21	5310	4629	4493	4493
q22	542	466	425	425
Total cold run time: 58076 ms
Total hot run time: 51851 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 175874 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 95900c7aa27f6798ab1e710ac8e895889970b81f, data reload: false

query5	4351	649	495	495
query6	449	191	174	174
query7	4832	570	314	314
query8	368	210	209	209
query9	8738	4114	4116	4114
query10	460	318	256	256
query11	5941	2364	2153	2153
query12	156	103	101	101
query13	1288	615	417	417
query14	6402	5380	5080	5080
query14_1	4436	4381	4410	4381
query15	210	197	179	179
query16	1004	464	468	464
query17	1127	709	591	591
query18	2509	488	351	351
query19	202	192	151	151
query20	122	110	104	104
query21	219	148	125	125
query22	13733	13707	13512	13512
query23	17578	16597	16227	16227
query23_1	16359	16329	16376	16329
query24	7728	1771	1330	1330
query24_1	1341	1348	1345	1345
query25	561	478	408	408
query26	1298	327	168	168
query27	2658	601	339	339
query28	4523	2063	2082	2063
query29	1089	659	489	489
query30	323	227	200	200
query31	1135	1081	966	966
query32	111	62	61	61
query33	532	313	256	256
query34	1189	1151	677	677
query35	761	779	692	692
query36	1380	1413	1257	1257
query37	149	107	93	93
query38	3193	3132	3055	3055
query39	926	909	883	883
query39_1	886	861	866	861
query40	214	122	100	100
query41	64	64	60	60
query42	96	94	95	94
query43	327	324	289	289
query44	1488	791	773	773
query45	200	182	179	179
query46	1125	1246	758	758
query47	2389	2378	2229	2229
query48	414	391	282	282
query49	610	466	351	351
query50	1016	346	255	255
query51	4321	4357	4282	4282
query52	90	90	75	75
query53	252	266	194	194
query54	263	234	191	191
query55	78	76	75	75
query56	237	218	209	209
query57	1470	1417	1299	1299
query58	248	216	207	207
query59	1654	1683	1451	1451
query60	277	240	231	231
query61	156	152	143	143
query62	700	657	594	594
query63	255	183	192	183
query64	2525	750	603	603
query65	4880	4754	4795	4754
query66	1766	469	339	339
query67	30061	29700	29589	29589
query68	3095	1677	944	944
query69	407	302	270	270
query70	1104	964	969	964
query71	295	242	212	212
query72	2979	2832	2369	2369
query73	878	776	434	434
query74	5129	4983	4795	4795
query75	2636	2590	2240	2240
query76	2317	1198	789	789
query77	356	382	285	285
query78	12447	12499	11879	11879
query79	1401	1231	770	770
query80	1284	478	392	392
query81	524	277	238	238
query82	594	160	128	128
query83	320	285	247	247
query84	265	154	117	117
query85	926	531	438	438
query86	430	310	277	277
query87	3377	3335	3188	3188
query88	3759	2814	2813	2813
query89	434	391	348	348
query90	1875	196	182	182
query91	177	161	134	134
query92	60	61	57	57
query93	1613	1540	872	872
query94	714	354	295	295
query95	689	462	347	347
query96	1098	805	336	336
query97	2707	2690	2550	2550
query98	216	203	203	203
query99	1184	1193	1041	1041
Total cold run time: 263216 ms
Total hot run time: 175874 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.86 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 95900c7aa27f6798ab1e710ac8e895889970b81f, data reload: false

query1	0.00	0.00	0.00
query2	0.14	0.09	0.08
query3	0.37	0.24	0.24
query4	1.62	0.24	0.25
query5	0.33	0.31	0.32
query6	1.14	0.67	0.68
query7	0.03	0.00	0.00
query8	0.10	0.07	0.07
query9	0.50	0.38	0.38
query10	0.58	0.58	0.60
query11	0.33	0.19	0.18
query12	0.32	0.19	0.19
query13	0.54	0.55	0.54
query14	0.93	0.93	0.92
query15	0.67	0.58	0.59
query16	0.38	0.40	0.40
query17	1.00	1.01	1.00
query18	0.31	0.29	0.29
query19	1.93	1.85	1.82
query20	0.02	0.01	0.02
query21	15.39	0.38	0.31
query22	4.81	0.14	0.13
query23	15.73	0.50	0.30
query24	2.50	0.62	0.42
query25	0.15	0.10	0.10
query26	0.74	0.27	0.21
query27	0.10	0.10	0.10
query28	3.47	0.91	0.52
query29	12.52	4.45	3.51
query30	0.36	0.26	0.25
query31	2.76	0.62	0.34
query32	3.23	0.59	0.48
query33	2.95	3.08	2.95
query34	15.71	4.06	3.37
query35	3.28	3.27	3.31
query36	0.66	0.53	0.52
query37	0.12	0.09	0.10
query38	0.08	0.07	0.07
query39	0.08	0.07	0.06
query40	0.20	0.18	0.16
query41	0.13	0.08	0.08
query42	0.09	0.06	0.06
query43	0.07	0.06	0.07
Total cold run time: 96.37 s
Total hot run time: 25.86 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/113) 🎉
Increment coverage report
Complete coverage report

zhangstar333 and others added 5 commits June 16, 2026 17:35
struct type should be use name mode with nested type
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: New parquet profile definitions and wiring were split across ParquetReader, ParquetScan, and column reader headers. This made ParquetReader own counter initialization, pruning counter updates, and scheduler sub-profile assembly directly even though parquet_profile.h already existed for profile-related types. This change centralizes the new parquet RuntimeProfile counter ownership in parquet_profile.h/.cpp and keeps ParquetReader responsible only for invoking the profile helper methods.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Ran build-support/clang-format.sh for touched files.
    - Ran git diff --check.
    - Tried ./run-be-ut.sh --run '--filter=NewParquetReaderTest.*', but local CMake compiler detection failed before building Doris because /opt/homebrew/opt/llvm@16/bin/clang++ could not link a simple program: ld: library 'c++' not found.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Align format_v2 implementation namespaces with the format_v2 ownership boundary. Parquet, Hive, Paimon, Iceberg, and JDBC implementations now live under doris::format subnamespaces, while shared format_v2 expression helpers live under doris::format. Call sites and tests were updated to use the new namespace layout.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Ran build-support/clang-format.sh on the modified BE files
    - Ran git diff --check
    - Ran namespace residue scans for old doris::parquet/hive/paimon/jdbc/iceberg namespaces and duplicate format::format references
    - Attempted targeted BE UT with ./run-be-ut.sh --run '--filter=NewParquetReaderTest.*:ParquetColumnReaderTest.*:TableReaderTest.*:CastTest.*:DeletePredicateTest.*:EqualityDeletePredicateTest.*', but local CMake compiler detection failed before Doris code compiled because /opt/homebrew/opt/llvm@16/bin/clang++ could not link libc++: ld: library 'c++' not found
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #63893

Problem Summary: The new parquet reader reports timestamp values with the updated INT96 timestamp interpretation for existing external parquet coverage. This commit updates the affected regression expected outputs from the latest TeamCity P0 and external regression real outputs. Doris parquet export/write cases with suspicious timestamp offsets are intentionally excluded because those require separate writer-side analysis.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Validated modified expected rows against TeamCity builds 970619 and 970620 failure logs, and ran `git diff --check`.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #63893

Problem Summary: The new parquet reader did not map TIMESTAMP(NANOS) logical columns to a supported Doris timestamp type, and DATETIMEV2 decoded INT64 timestamp values only handled millis and micros. As a result Hive parquet timestamp nanos data was materialized as NULL instead of the expected timestamp values. This change maps parquet timestamp nanos to DATETIMEV2(6), decodes nanos by truncating to microseconds, and adds decoded-value coverage for DATETIMEV2 nanos. It also refreshes the external TVF group4 expected output for a parquet file containing BC timestamp values that Doris cannot represent, where the new reader correctly returns NULL for those rows.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Ran `git diff --check`.
    - Verified the relevant parquet files with DuckDB to confirm timestamp nanos and BC timestamp source values.
    - Attempted `./run-be-ut.sh --run '--filter=DataTypeSerDeDecodedValuesTest.*'`, but local CMake failed before compiling tests because the macOS toolchain cannot link a simple C++ program: `ld: library 'c++' not found`.
- Behavior changed: Yes. The new parquet reader now reads TIMESTAMP(NANOS) values as DATETIMEV2(6) instead of producing NULL through unsupported conversion.
- Does this need documentation: No
@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 77.32% (1889/2443)
Line Coverage 64.42% (33963/52725)
Region Coverage 64.80% (17462/26948)
Branch Coverage 53.96% (9344/17316)

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29030 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a66cad53410d4a9d394019a9bb5d6452129b8af6, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17713	3983	3947	3947
q2	2004	321	187	187
q3	10405	1506	808	808
q4	4759	462	338	338
q5	8262	868	565	565
q6	296	167	136	136
q7	820	858	625	625
q8	10620	1669	1585	1585
q9	6376	4512	4529	4512
q10	6842	1781	1513	1513
q11	441	275	240	240
q12	632	425	301	301
q13	18111	3587	2837	2837
q14	278	261	236	236
q15	q16	791	774	711	711
q17	968	886	993	886
q18	7037	5610	5548	5548
q19	1165	1235	1173	1173
q20	510	413	259	259
q21	5540	2627	2323	2323
q22	436	355	300	300
Total cold run time: 104006 ms
Total hot run time: 29030 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4466	4229	4252	4229
q2	328	352	228	228
q3	4564	4971	4372	4372
q4	2050	2164	1401	1401
q5	4445	4279	4287	4279
q6	231	175	127	127
q7	1951	2025	1640	1640
q8	2504	2182	2165	2165
q9	8110	8269	7993	7993
q10	4829	4757	4298	4298
q11	552	403	382	382
q12	758	761	560	560
q13	3294	3663	2993	2993
q14	301	300	281	281
q15	q16	733	738	642	642
q17	1365	1330	1426	1330
q18	7769	7310	7176	7176
q19	1139	1108	1119	1108
q20	2230	2209	1950	1950
q21	5333	4643	4496	4496
q22	507	442	406	406
Total cold run time: 57459 ms
Total hot run time: 52056 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 175389 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a66cad53410d4a9d394019a9bb5d6452129b8af6, data reload: false

query5	4332	648	476	476
query6	451	200	173	173
query7	4864	563	314	314
query8	358	222	201	201
query9	8756	4086	4068	4068
query10	468	303	265	265
query11	5976	2334	2109	2109
query12	182	97	94	94
query13	1240	635	421	421
query14	6348	5395	5037	5037
query14_1	4358	4409	4332	4332
query15	202	198	179	179
query16	1045	464	391	391
query17	1093	687	553	553
query18	2481	463	334	334
query19	200	195	140	140
query20	106	104	101	101
query21	215	135	120	120
query22	13581	13594	13384	13384
query23	17392	16547	16129	16129
query23_1	16254	16383	16256	16256
query24	7438	1768	1301	1301
query24_1	1299	1340	1321	1321
query25	576	461	391	391
query26	1305	333	174	174
query27	2674	524	344	344
query28	4466	2038	2002	2002
query29	1066	646	501	501
query30	306	239	199	199
query31	1106	1097	966	966
query32	98	64	59	59
query33	547	333	269	269
query34	1197	1181	634	634
query35	753	776	684	684
query36	1381	1370	1235	1235
query37	155	107	99	99
query38	3216	3126	3067	3067
query39	921	927	905	905
query39_1	860	875	876	875
query40	222	129	106	106
query41	71	66	67	66
query42	95	94	95	94
query43	317	319	284	284
query44	1444	779	788	779
query45	199	189	177	177
query46	1093	1236	757	757
query47	2373	2326	2216	2216
query48	423	436	315	315
query49	642	482	387	387
query50	979	357	277	277
query51	4347	4264	4208	4208
query52	90	93	78	78
query53	256	263	206	206
query54	283	235	248	235
query55	80	79	71	71
query56	260	236	241	236
query57	1444	1419	1307	1307
query58	260	218	208	208
query59	1577	1646	1403	1403
query60	291	241	217	217
query61	158	152	152	152
query62	696	656	588	588
query63	219	196	194	194
query64	2537	768	604	604
query65	4899	4820	4752	4752
query66	1765	464	325	325
query67	29780	29672	29559	29559
query68	3341	1557	979	979
query69	408	302	269	269
query70	1041	960	947	947
query71	281	229	215	215
query72	2940	2641	2391	2391
query73	854	867	441	441
query74	5102	4961	4788	4788
query75	2639	2588	2231	2231
query76	2326	1188	789	789
query77	358	354	285	285
query78	12436	12557	11879	11879
query79	1431	1176	792	792
query80	1274	461	388	388
query81	525	280	240	240
query82	601	157	116	116
query83	351	280	250	250
query84	263	141	115	115
query85	919	501	438	438
query86	419	296	276	276
query87	3376	3325	3147	3147
query88	3712	2797	2800	2797
query89	440	368	329	329
query90	1885	184	180	180
query91	180	158	130	130
query92	63	60	54	54
query93	1614	1403	884	884
query94	713	352	304	304
query95	676	400	345	345
query96	1089	823	360	360
query97	2695	2714	2554	2554
query98	215	206	199	199
query99	1173	1137	1028	1028
Total cold run time: 261788 ms
Total hot run time: 175389 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a66cad53410d4a9d394019a9bb5d6452129b8af6, data reload: false

query1	0.00	0.00	0.01
query2	0.09	0.05	0.06
query3	0.25	0.14	0.13
query4	1.61	0.13	0.13
query5	0.25	0.23	0.23
query6	1.25	1.06	1.07
query7	0.05	0.01	0.01
query8	0.08	0.04	0.03
query9	0.37	0.31	0.32
query10	0.55	0.55	0.55
query11	0.21	0.15	0.14
query12	0.18	0.14	0.14
query13	0.48	0.49	0.48
query14	1.00	1.01	0.98
query15	0.63	0.59	0.60
query16	0.31	0.33	0.33
query17	1.08	1.10	1.10
query18	0.22	0.21	0.20
query19	1.99	2.02	1.98
query20	0.02	0.01	0.01
query21	15.43	0.22	0.15
query22	4.80	0.05	0.05
query23	16.14	0.32	0.13
query24	2.89	0.40	0.33
query25	0.12	0.05	0.04
query26	0.72	0.21	0.16
query27	0.04	0.04	0.04
query28	3.53	0.93	0.56
query29	12.50	4.23	3.44
query30	0.27	0.15	0.16
query31	2.76	0.61	0.32
query32	3.23	0.59	0.50
query33	3.19	3.21	3.31
query34	15.56	4.28	3.50
query35	3.51	3.50	3.52
query36	0.57	0.44	0.42
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.19	0.17	0.16
query41	0.08	0.04	0.04
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 96.41 s
Total hot run time: 25.33 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Refine the new Parquet reader row group pruning flow so scan range filtering is applied before more expensive statistics, dictionary, bloom filter, and page index pruning. Also document the Parquet reader, scan scheduler, statistics pruning, and nested column reader APIs, and update affected namespace references in BE tests.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Ran build-support/clang-format.sh on touched BE C++ files and git diff --check locally.
    - Started BE UT validation on Fedora with NewParquetReaderTest.* and ParquetBloomFilterPruningTest.*; fixed compile issues found during validation. Full rerun was interrupted before completion by follow-up history cleanup request.
- Behavior changed: No
- Does this need documentation: No
@suxiaogang223 suxiaogang223 force-pushed the refact_reader_branch branch from d4159b1 to e3379ea Compare June 16, 2026 23:55
Rewrite comments for the entry-point and foundational modules:

parquet_reader.h:
- Class-level doc: role boundary, lifecycle (init→get_schema→open→get_block→close)
- TableReader calling relationship explained
- Each method and field annotated

parquet_type.h:
- ParquetExtraTypeInfo: each variant documented
- ParquetTypeDescriptor: full field-by-field descriptions
- Three-level resolution priority (logical→converted→physical) explained
- resolve_parquet_type / supports_record_reader / decoded_value_kind docs

parquet_column_schema.h:
- Class-level doc: design decisions (wrapper folding, nullable, Dremel levels)
- All fields grouped into sections (identifier / type / levels / children)
- Each field annotated with its role and valid domain (PRIMITIVE vs complex)

parquet_column_schema.cpp:
- SchemaBuildContext fields annotated

parquet_file_context.cpp:
- DorisRandomAccessFile adapter class documented

parquet_profile.h:
- All Profile structs with section-based Chinese comments
- Counter groups organized (RG pruning / page skip / batch read / column reader /
  file ops / decompress & cache / decode / others)

Co-Authored-By: Claude <noreply@anthropic.com>
@suxiaogang223 suxiaogang223 force-pushed the refact_reader_branch branch from 5291051 to 974f56b Compare June 17, 2026 00:20
@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants