From daa59fa09d859b515ee513933a5f41a03155285c Mon Sep 17 00:00:00 2001 From: javier Date: Tue, 24 Feb 2026 18:20:38 +0100 Subject: [PATCH 1/4] Add array_agg() aggregate function documentation --- documentation/query/functions/aggregation.md | 131 +++++++++++++++++++ documentation/query/functions/array.md | 6 +- 2 files changed, 135 insertions(+), 2 deletions(-) diff --git a/documentation/query/functions/aggregation.md b/documentation/query/functions/aggregation.md index 383a646a3..745170c6b 100644 --- a/documentation/query/functions/aggregation.md +++ b/documentation/query/functions/aggregation.md @@ -74,6 +74,12 @@ calculations. Functions are organized by category below. | [bit_or](#bit_or) | Bitwise OR of all non-NULL values | | [bit_xor](#bit_xor) | Bitwise XOR of all non-NULL values | +### Array aggregates + +| Function | Description | +| :------- | :---------- | +| [array_agg](#array_agg) | Collect values or concatenate arrays into a DOUBLE[] | + ### Specialized aggregates | Function | Description | @@ -408,6 +414,131 @@ FROM weather_data; - [min](#min) - Returns the minimum value itself - [first](#first) - Returns the first value by timestamp order +## array_agg + +`array_agg(value)` or `array_agg(array)` collects row values into a single +`DOUBLE[]` array per group. Use it to build per-group vectors for downstream +array operations such as +[array_avg](/docs/query/functions/array/#array_avg), +[dot_product](/docs/query/functions/array/#dot_product), or time-bucketed +snapshots via SAMPLE BY. + +When called with a scalar argument, each row's value becomes one element of the +output array. When called with a `DOUBLE[]` argument, non-null input arrays are +concatenated into a single flat `DOUBLE[]`. + +#### Parameters + +| Parameter | Type | Description | +| :-------- | :--- | :---------- | +| `value` | `DOUBLE` (or castable) | Scalar to collect. Each row's value becomes one element. NULL values are included as NaN. | +| `array` | `DOUBLE[]` | Array to concatenate. Non-null arrays are appended into a flat `DOUBLE[]`. NULL and empty arrays are skipped. | +| `ordered` | `BOOLEAN` (optional, default `true`) | `true` preserves input order. `false` enables parallel execution, but output order is non-deterministic. | + +Only one of `value` or `array` is used per call. The `ordered` parameter can be +added as a second argument to either form. + +#### Return type + +Always `DOUBLE[]`. Returns NULL when no rows match or all inputs are NULL. + +#### SAMPLE BY support + +`array_agg` works with SAMPLE BY and the following FILL options: + +- `FILL(NULL)` - empty buckets produce NULL +- `FILL(PREV)` - empty buckets repeat the previous bucket's array +- `FILL(NONE)` - empty buckets are omitted from the result + +Not supported: + +- `FILL(LINEAR)` - interpolation is meaningless for arrays +- `FILL(constant)` - a scalar cannot fill an array column; use `FILL(NULL)` instead + +#### Examples + +**Collect prices per symbol:** + +```questdb-sql demo title="array_agg - collect prices per symbol" +SELECT symbol, array_agg(price) AS prices +FROM fx_trades +WHERE symbol = 'EURUSD' + AND timestamp IN '$now - 3s..$now' +GROUP BY symbol; +``` + +Each row's `price` value is collected into a single `DOUBLE[]` per symbol. + +**Time-bucketed arrays with SAMPLE BY:** + +```questdb-sql demo title="array_agg - time-bucketed price arrays" +SELECT timestamp, array_agg(price) AS prices +FROM trades +WHERE symbol = 'BTC-USDT' + AND timestamp IN '$now - 5s..$now' +SAMPLE BY 1s; +``` + +Produces one array per 1-second bucket containing all trade prices in that +interval. + +**Parallel execution for large datasets:** + +```questdb-sql title="array_agg - parallel (unordered) execution" +SELECT timestamp, array_agg(price, false) AS prices +FROM trades +WHERE symbol = 'BTC-USDT' + AND timestamp IN '$now - 5s..$now' +SAMPLE BY 1s; +``` + +Passing `false` as the second argument drops the ordering guarantee, allowing +QuestDB to execute the aggregation in parallel across threads. The array +elements are the same but their order within each bucket is non-deterministic. +Use this when element order does not matter and throughput is the priority. + +**Concatenate arrays within a time bucket:** + +```questdb-sql demo title="array_agg - concatenate bid price arrays" +SELECT timestamp, array_agg(bids[1]) AS all_bids +FROM market_data +WHERE symbol = 'EURUSD' + AND timestamp IN '$now - 1s..$now' +SAMPLE BY 100ms; +``` + +Here `bids[1]` is a `DOUBLE[]` column (bid prices at each depth level). The +function concatenates all non-null bid-price arrays in each 100 ms bucket into a +single flat array. + +**Compose with array functions:** + +```questdb-sql demo title="array_agg - cumulative price sums per symbol" +SELECT symbol, + array_cum_sum(array_agg(price)) AS cumulative_prices +FROM fx_trades +WHERE symbol = 'EURUSD' + AND timestamp IN '$now - 3s..$now' +GROUP BY symbol; +``` + +[array_cum_sum](/docs/query/functions/array/#array_cum_sum) computes running +totals over the collected prices in timestamp order. This has no scalar aggregate +equivalent and is useful for intra-group cumulative analysis. + +#### Constraints and edge cases + +- NULL scalar values become NaN elements in the output array. +- NULL and empty input arrays are skipped during concatenation, but NULL + elements within a non-null array are preserved. +- Returns NULL (not an empty array) when no rows match the filter or all inputs + are NULL. + +#### See also + +- [Array functions](/docs/query/functions/array/) - Functions that operate on arrays +- [SAMPLE BY](/docs/query/sql/sample-by/) - Time-series aggregation + ## avg `avg(value)` calculates simple average of values ignoring missing data (e.g diff --git a/documentation/query/functions/array.md b/documentation/query/functions/array.md index 4de2b6580..c0304539b 100644 --- a/documentation/query/functions/array.md +++ b/documentation/query/functions/array.md @@ -6,8 +6,10 @@ description: Array functions reference documentation. This page documents functions for n-dimensional arrays. This isn't an exhaustive list of all functions that may take an array parameter. For example, financial -functions are listed in [their own section](/docs/query/functions/finance/), whether or -not they can take an array parameter. +functions are listed in [their own section](/docs/query/functions/finance/), +whether or not they can take an array parameter. To collect row values into +arrays during GROUP BY or SAMPLE BY, see +[array aggregates](/docs/query/functions/aggregation/#array-aggregates). ## array_avg From 9ebdd26187a5311710d12e7211238039ee7afd29 Mon Sep 17 00:00:00 2001 From: javier Date: Tue, 24 Feb 2026 18:22:09 +0100 Subject: [PATCH 2/4] Show prices alongside cumulative_prices in compose example --- documentation/query/functions/aggregation.md | 1 + 1 file changed, 1 insertion(+) diff --git a/documentation/query/functions/aggregation.md b/documentation/query/functions/aggregation.md index 745170c6b..5ed859de6 100644 --- a/documentation/query/functions/aggregation.md +++ b/documentation/query/functions/aggregation.md @@ -515,6 +515,7 @@ single flat array. ```questdb-sql demo title="array_agg - cumulative price sums per symbol" SELECT symbol, + array_agg(price) AS prices, array_cum_sum(array_agg(price)) AS cumulative_prices FROM fx_trades WHERE symbol = 'EURUSD' From f1606efba410acee6370be66d15433fbf683310f Mon Sep 17 00:00:00 2001 From: javier Date: Tue, 24 Feb 2026 18:22:48 +0100 Subject: [PATCH 3/4] Remove incorrect claim about no scalar equivalent --- documentation/query/functions/aggregation.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/documentation/query/functions/aggregation.md b/documentation/query/functions/aggregation.md index 5ed859de6..162b1477f 100644 --- a/documentation/query/functions/aggregation.md +++ b/documentation/query/functions/aggregation.md @@ -524,8 +524,7 @@ GROUP BY symbol; ``` [array_cum_sum](/docs/query/functions/array/#array_cum_sum) computes running -totals over the collected prices in timestamp order. This has no scalar aggregate -equivalent and is useful for intra-group cumulative analysis. +totals over the collected prices in timestamp order. #### Constraints and edge cases From 11e987a890c9558b02adb7d4f9b3cd67d5a740a0 Mon Sep 17 00:00:00 2001 From: javier Date: Tue, 24 Feb 2026 18:55:17 +0100 Subject: [PATCH 4/4] Improve array_agg docs for accuracy and consistency Ref: https://github.com/questdb/questdb/pull/6812 - Add explicit syntax signatures for both scalar and array forms - Fix return value semantics: scalar form always produces an array when rows exist, array form returns NULL when all inputs are NULL/empty - Rename "Return type" to "Return value" to match page conventions - Split NULL handling into its own section with per-form behavior - Add result table with real output to first example - Remove internal NaN implementation details from user-facing docs - Add backticks to SAMPLE BY references --- documentation/query/functions/aggregation.md | 39 +++++++++++++------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/documentation/query/functions/aggregation.md b/documentation/query/functions/aggregation.md index 162b1477f..a1a3700ea 100644 --- a/documentation/query/functions/aggregation.md +++ b/documentation/query/functions/aggregation.md @@ -416,12 +416,18 @@ FROM weather_data; ## array_agg -`array_agg(value)` or `array_agg(array)` collects row values into a single -`DOUBLE[]` array per group. Use it to build per-group vectors for downstream -array operations such as +**Syntax:** + +```questdb-sql +array_agg(value [, ordered]) -> DOUBLE[] +array_agg(array [, ordered]) -> DOUBLE[] +``` + +Collects row values into a single `DOUBLE[]` array per group. Use it to build +per-group vectors for downstream array operations such as [array_avg](/docs/query/functions/array/#array_avg), [dot_product](/docs/query/functions/array/#dot_product), or time-bucketed -snapshots via SAMPLE BY. +snapshots via `SAMPLE BY`. When called with a scalar argument, each row's value becomes one element of the output array. When called with a `DOUBLE[]` argument, non-null input arrays are @@ -431,20 +437,25 @@ concatenated into a single flat `DOUBLE[]`. | Parameter | Type | Description | | :-------- | :--- | :---------- | -| `value` | `DOUBLE` (or castable) | Scalar to collect. Each row's value becomes one element. NULL values are included as NaN. | +| `value` | `DOUBLE` (or castable) | Scalar to collect. Each row's value becomes one element. NULL values are preserved. | | `array` | `DOUBLE[]` | Array to concatenate. Non-null arrays are appended into a flat `DOUBLE[]`. NULL and empty arrays are skipped. | | `ordered` | `BOOLEAN` (optional, default `true`) | `true` preserves input order. `false` enables parallel execution, but output order is non-deterministic. | Only one of `value` or `array` is used per call. The `ordered` parameter can be added as a second argument to either form. -#### Return type +#### Return value -Always `DOUBLE[]`. Returns NULL when no rows match or all inputs are NULL. +- **Scalar form** - `DOUBLE[]`. Always produces an array when at least one row + exists, even if every value is NULL. Empty `SAMPLE BY` buckets produce NULL via + `FILL(NULL)`. +- **Array form** - `DOUBLE[]`. Returns NULL (not an empty array) when every + input array is NULL or empty. Empty `SAMPLE BY` buckets produce NULL via + `FILL(NULL)`. #### SAMPLE BY support -`array_agg` works with SAMPLE BY and the following FILL options: +`array_agg` works with `SAMPLE BY` and the following FILL options: - `FILL(NULL)` - empty buckets produce NULL - `FILL(PREV)` - empty buckets repeat the previous bucket's array @@ -467,6 +478,10 @@ WHERE symbol = 'EURUSD' GROUP BY symbol; ``` +| symbol | prices | +| :----- | :----- | +| EURUSD | [1.1922, 1.1928, 1.1925, 1.1927, 1.1927, 1.1926, 1.1924, 1.1926, 1.1932, 1.1934, 1.1933, 1.1928, 1.1929, 1.1934, 1.1935, 1.1934, 1.1935, 1.1929, 1.193, ...] | + Each row's `price` value is collected into a single `DOUBLE[]` per symbol. **Time-bucketed arrays with SAMPLE BY:** @@ -526,13 +541,11 @@ GROUP BY symbol; [array_cum_sum](/docs/query/functions/array/#array_cum_sum) computes running totals over the collected prices in timestamp order. -#### Constraints and edge cases +#### NULL handling -- NULL scalar values become NaN elements in the output array. -- NULL and empty input arrays are skipped during concatenation, but NULL +- NULL scalar inputs appear as null elements in the output array. +- NULL and empty input arrays are skipped during concatenation, but null elements within a non-null array are preserved. -- Returns NULL (not an empty array) when no rows match the filter or all inputs - are NULL. #### See also