perf: vectorize topN native engine#19353
Conversation
49c7b37 to
40d0c26
Compare
| throw new ISE("Aggregator state exceeds 2 GB; cardinality too high for HeapVectorGrouper"); | ||
| } | ||
| int newCapacity = aggStateBuffer.capacity(); | ||
| while (newCapacity < neededCapacity) { |
40d0c26 to
16b8c6e
Compare
16b8c6e to
8b4c736
Compare
|
|
||
| private void growBuffer(final long neededCapacity) | ||
| { | ||
| if (neededCapacity > Integer.MAX_VALUE) { |
There was a problem hiding this comment.
probably want to make this limit configurable
05c35ff to
5b1b1c5
Compare
| return Sequences.filter( | ||
| VectorTopNEngine.process(query, timeBoundaryInspector, cursorHolder, bufHolder.get()), | ||
| Predicates.notNull() | ||
| ).withBaggage(resourceCloser); |
There was a problem hiding this comment.
[P2] Vectorized TopN bypasses existing query metrics reporting.
The new early return into VectorTopNEngine.process skips the row-path bookkeeping that reports TopN metrics today. In the non-vector path this method records queryMetrics.cursor(...), then getMapFn records dimensionCardinality(...) and algorithm selection, and TopNMapFn records selector and pass-size metrics. None of that runs when shouldVectorize is true, so enabling vectorization changes emitted TopN metrics and removes operational visibility into algorithm choice and cardinality. If that loss is intended it should be wired back explicitly; otherwise this is a regression.
There was a problem hiding this comment.
Something we should discuss is what algorithm/cursor metrics we want to expose in the vectorized path?
5b1b1c5 to
3ddac39
Compare
|
@gianm any thoughts here? |
Description
Vectorizes the TopN native query engine. More work should be done to make TopN spilling like GroupBy, so it can use a more efficient, fixed-size, off-heap buffer to back the memory for the aggregation. For now, I'm using a heap-backed grouper similar to HashVectorGrouper in the GroupBy engine. I'm also not sure about the state of things in Dart/MSQe. Speed ups of roughly ~20-50% across the board.
Benchmarks
Run on my Apple M3 Pro, 12 physical cores, 18GB mem
Release note
Vectorize TopN native query engine.
This PR has: