Query Performance Degradation with Null Handling

### Affected Version

If SQL Null Handling feature is turned on:
- Default since v28
- Since v17 with feature flag enabled

### Description
I am currently running v27 image on my clusters. As part of process to upgrade to later versions, I tried to test out the SQL Null-Compatible feature. 

For the case where my table does not have any null values, I come across a performance degradation.

The following benchmark is conducted for a cluster with the same resource. It is found that the Historical + Peon querying latency is a contributor for the difference in querying speed.

#### Cache Disabled

| Description             | V27 (JDK17, Null Disabled) | V27 (JDK17, Null Enabled) | V37 (JDK17) | V37 (JDK21) |
| ----------------------- | -------------------------: | ------------------------: | ----------: | ----------: |
| Select 1 Random Row     |                      5.455 |                     5.142 |       5.416 |       5.414 |
| Select Count (1h)       |                      5.446 |                     5.154 |       5.428 |       5.469 |
| Cardinality (1h Unique) |                      6.600 |                     6.056 |       6.003 |       5.808 |
| Time-Series (12h PT5M)  |                      5.592 |                     5.623 |       7.769 |       5.552 |
| Heavy Scan (24h Agg)    |                     14.957 |                    16.325 |      19.847 |      11.830 |
| TopN Ranking (24h)      |                     12.159 |                    17.245 |      18.903 |      16.707 |

#### Cache Enabled

| Description             | V27 (JDK17, Null Disabled) | V27 (JDK17, Null Enabled) | V37 (JDK17) | V37 (JDK21) |
| ----------------------- | -------------------------: | ------------------------: | ----------: | ----------: |
| Select 1 Random Row     |                      5.318 |                     5.198 |       5.288 |       5.433 |
| Select Count (1h)       |                      5.347 |                     5.215 |       5.317 |       5.504 |
| Cardinality (1h Unique) |                      7.795 |                     6.119 |       6.086 |       5.844 |
| Time-Series (12h PT5M)  |                      5.643 |                     5.362 |       5.392 |       5.550 |
| Heavy Scan (24h Agg)    |                      5.441 |                     5.686 |       5.412 |       5.581 |
| TopN Ranking (24h)      |                      5.628 |                     6.170 |       6.076 |       6.237 |


<details>
<summary>SQL Statements</summary>

#### Select 1 Random Row

```sql
SELECT __time, id
FROM "<TABLE_NAME>"
ORDER BY __time DESC
LIMIT 1;
```

#### Select Count (1h)

```sql
SELECT COUNT(*)
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR;
```

#### Cardinality (1h Unique)

```sql
SELECT COUNT(DISTINCT id)
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR;
```

#### Time-Series (12h PT5M)

```sql
SELECT TIME_FLOOR(__time, 'PT5M') AS bucket_5m, idc, SUM("count") AS total_count
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '12' HOUR
GROUP BY 1, 2;
```

#### Heavy Scan (24h Agg)

```sql
SELECT idc, SUM("count") AS total_count, SUM(sum_rtt) / SUM("count") AS avg_rtt
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY 1;
```

#### TopN Ranking (24h)

```sql
SELECT id, SUM(sum_rtt) AS total_sum_rtt
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100;
```
</details>

### Cause Analysis
When enabling null handling, we will need an extra check to deduce whether the current row in the column is null. While the `isNull` method is cheap, it is called for every single row.


```java
// NullableNumericTopNColumnAggregatesProcessor.java

if (hasNulls && selector.isNull()) { // V27
if (selector.isNull()) { // V37
  // REDACTED for brevity
} else {
  Aggregator[] valueAggregates = getValueAggregators(query, selector, cursor);
  for (Aggregator aggregator : valueAggregates) {
    aggregator.aggregate();
  }
}
```

Asking AI to estimate the latency for each `isNull` method, for a long value, the execution time is 1~3ns. This may be fast, but since the check is run for every row, this resulted in substantial increase for querying time. 

For example, the total number of rows in 1 day for the TopN / Heavy Scan query is 2,147,995,191. If we estimate each operation to take 1-3ns, this matches expectations of the query being 2-6s faster. 

### Suggestions
Regarding suggestions for the fix, please verify if these are feasible:
1. Store something in the metadata of segments to indicate which columns will not have any null values? We can then provide non-null implementations that fast-track to processing.
2. Some work on the compaction side, prioritize compacting segments with null values together if possible?
3. Other suggestions / feedback?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Performance Degradation with Null Handling #19344

Affected Version

Description

Cache Disabled

Cache Enabled

Select 1 Random Row

Select Count (1h)

Cardinality (1h Unique)

Time-Series (12h PT5M)

Heavy Scan (24h Agg)

TopN Ranking (24h)

Cause Analysis

Suggestions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Description	V27 (JDK17, Null Disabled)	V27 (JDK17, Null Enabled)	V37 (JDK17)	V37 (JDK21)
Select 1 Random Row	5.455	5.142	5.416	5.414
Select Count (1h)	5.446	5.154	5.428	5.469
Cardinality (1h Unique)	6.600	6.056	6.003	5.808
Time-Series (12h PT5M)	5.592	5.623	7.769	5.552
Heavy Scan (24h Agg)	14.957	16.325	19.847	11.830
TopN Ranking (24h)	12.159	17.245	18.903	16.707

Description	V27 (JDK17, Null Disabled)	V27 (JDK17, Null Enabled)	V37 (JDK17)	V37 (JDK21)
Select 1 Random Row	5.318	5.198	5.288	5.433
Select Count (1h)	5.347	5.215	5.317	5.504
Cardinality (1h Unique)	7.795	6.119	6.086	5.844
Time-Series (12h PT5M)	5.643	5.362	5.392	5.550
Heavy Scan (24h Agg)	5.441	5.686	5.412	5.581
TopN Ranking (24h)	5.628	6.170	6.076	6.237

Query Performance Degradation with Null Handling #19344

Description

Affected Version

Description

Cache Disabled

Cache Enabled

Select 1 Random Row

Select Count (1h)

Cardinality (1h Unique)

Time-Series (12h PT5M)

Heavy Scan (24h Agg)

TopN Ranking (24h)

Cause Analysis

Suggestions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions