GH-3511: Add JMH encoding benchmarks and fix parquet-benchmarks shaded jar#3512
Open
iemejia wants to merge 1 commit intoapache:masterfrom
Open
GH-3511: Add JMH encoding benchmarks and fix parquet-benchmarks shaded jar#3512iemejia wants to merge 1 commit intoapache:masterfrom
iemejia wants to merge 1 commit intoapache:masterfrom
Conversation
This was referenced Apr 19, 2026
Open
… shaded jar The parquet-benchmarks pom is missing the JMH annotation-processor configuration and the AppendingTransformer entries for BenchmarkList / CompilerHints. As a result, the shaded jar built from master fails at runtime with "Unable to find the resource: /META-INF/BenchmarkList". This commit: - Fixes parquet-benchmarks/pom.xml so the shaded jar is runnable: adds jmh-generator-annprocess to maven-compiler-plugin's annotation processor paths, and adds AppendingTransformer entries for META-INF/BenchmarkList and META-INF/CompilerHints to the shade plugin. - Adds 11 JMH benchmarks covering the encode/decode paths used by the pending performance optimization PRs (apache#3494, apache#3496, apache#3500, apache#3504, apache#3506, apache#3510), so reviewers can reproduce the reported numbers and detect regressions: IntEncodingBenchmark, BinaryEncodingBenchmark, ByteStreamSplitEncodingBenchmark, ByteStreamSplitDecodingBenchmark, FixedLenByteArrayEncodingBenchmark, FileReadBenchmark, FileWriteBenchmark, RowGroupFlushBenchmark, ConcurrentReadWriteBenchmark, BlackHoleOutputFile, TestDataFactory. After this change the shaded jar registers 87 benchmarks (was 0 from a working build, or unrunnable at all from a default build).
668caf7 to
2404a29
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves #3511.
The
parquet-benchmarksshaded jar built from current master is non-functional — it fails at runtime withRuntimeException: Unable to find the resource: /META-INF/BenchmarkList. This PR fixes that and adds 11 JMH benchmarks covering the encode/decode paths exercised by the open performance PRs, so reviewers can reproduce the reported numbers.What's broken on master
parquet-benchmarks/pom.xmlis missing two pieces of configuration:maven-compiler-pluginlacks theannotationProcessorPaths/annotationProcessorsconfig forjmh-generator-annprocess, so the JMH annotation processor never runs andMETA-INF/BenchmarkList/META-INF/CompilerHintsare never generated.maven-shade-pluginlacksAppendingTransformerentries for those two resources, so even if generated they would be dropped during shading.Both problems are fixed in this PR.
Benchmarks added
11 new files in
parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/:IntEncodingBenchmarkBinaryEncodingBenchmarkByteStreamSplitEncodingBenchmark/ByteStreamSplitDecodingBenchmarkFixedLenByteArrayEncodingBenchmarkFileReadBenchmark/FileWriteBenchmarkRowGroupFlushBenchmarkConcurrentReadWriteBenchmarkBlackHoleOutputFileOutputFilethat discards bytes — isolates CPU from I/OTestDataFactoryValidation
After this PR, the shaded jar is runnable and registers 87 benchmarks:
Sanity check —
IntEncodingBenchmark.decodePlainreproduces the master baseline cited in #3493/#3494 (~91M ops/s on JDK 21, JMH 1.37, 3 warmup + 5 measurement iterations):Out of scope (deferred)
Modernization of the existing
ReadBenchmarks/WriteBenchmarks/NestedNullWritingBenchmarks(Hadoop-freeLocalInputFile, parameterization, JMH-idiomatic state setup) is a separate concern and will be proposed in a follow-up PR.Follow-up
Once this lands, each open perf PR (#3494, #3496, #3500, #3504, #3506, #3510) will be updated with a one-line "How to reproduce" snippet referencing the relevant
*Benchmarkclass.