Skip to content

Bigquery: Support DISTINCT / ORDER BY / LIMIT inside ARRAY_AGG and STRING_AGG#2660

Open
hbridge wants to merge 1 commit into
taozhi8833998:masterfrom
hbridge:bigquery-array-agg-modifiers
Open

Bigquery: Support DISTINCT / ORDER BY / LIMIT inside ARRAY_AGG and STRING_AGG#2660
hbridge wants to merge 1 commit into
taozhi8833998:masterfrom
hbridge:bigquery-array-agg-modifiers

Conversation

@hbridge
Copy link
Copy Markdown

@hbridge hbridge commented Jun 2, 2026

Problem

The BigQuery grammar rejected all three GoogleSQL aggregate modifiers for ARRAY_AGG and STRING_AGG:

const { Parser } = require('node-sql-parser');
const p = new Parser();
const opt = { database: 'BigQuery' };

p.astify('SELECT ARRAY_AGG(DISTINCT x) AS a FROM t', opt);           // threw
p.astify('SELECT ARRAY_AGG(x LIMIT 5) AS a FROM t', opt);            // threw
p.astify('SELECT ARRAY_AGG(x ORDER BY y) AS a FROM t', opt);         // threw
p.astify('SELECT ARRAY_AGG(DISTINCT x ORDER BY x LIMIT 5) AS a FROM t', opt); // threw

These are all valid per the GoogleSQL ARRAY_AGG docs:

ARRAY_AGG([DISTINCT] expr [ORDER BY key [ASC|DESC] [, ...]] [LIMIT n])

Root cause

aggr_array_agg in pegjs/bigquery.pegjs matched the argument as a bare expr. Any modifier before the closing ) left unconsumed tokens and caused a parse error.

Fix

One rule change in pegjs/bigquery.pegjs: added an array_agg_arg rule that accepts the full modifier syntax, then wired aggr_array_agg to use it.

The new rule:

  • Reuses existing building blocks — count_arg's KW_DISTINCT? / or_and_expr / order_by_clause? pattern is already tested and correct
  • Mirrors the PostgreSQL distinct_args precedent already in pegjs/postgresql.pegjs
  • Adds limit_clause? as the BigQuery-specific piece (LIMIT inside an aggregate is GoogleSQL-only)
  • Returns { distinct, expr, orderby, limit } — same shape as count_arg, so aggrToSQL needs only one new line

src/aggregation.js gains one line: if (args.limit) str = [str, limitToSQL(args.limit)].join(' ').

Tests

Added to test/bigquery.spec.js:

  • 6 round-trip tests (one per modifier combination for both ARRAY_AGG and STRING_AGG)
  • 6 AST-shape tests asserting args.distinct, args.orderby, and args.limit are populated correctly
  • Updated 1 pre-existing "struct expr" test expectation from ARRAY_AGG(undefined) (args were unreachable before) to the now-correctly-parsed output

Full suite: 1584 passing, 1 pending, 0 regressions.

…nd STRING_AGG

Adds `array_agg_arg` rule to the BigQuery grammar that accepts the full
GoogleSQL aggregate-modifier syntax:
  [DISTINCT] expr [ORDER BY …] [LIMIT n]

Previously `aggr_array_agg` used a bare `expr` match, so all three modifier
forms threw a parse error.  The new rule mirrors the PostgreSQL `distinct_args`
pattern already present in this repo, reuses the existing `count_arg` structure
(DISTINCT + or_and_expr + order_by_clause?), and adds an optional `limit_clause?`
as the BigQuery-specific piece.

`aggrToSQL` gains LIMIT serialisation (`args.limit → limitToSQL`) so the AST
round-trips correctly.  The pre-existing "struct expr" test expectation is
updated from the broken `ARRAY_AGG(undefined)` to the now-correctly-parsed
`ARRAY_AGG(STRUCT(…)LIMIT 3)` output.  Twelve new tests (round-trip + AST
shape) cover DISTINCT, LIMIT, ORDER BY, and all three combined for both
ARRAY_AGG and STRING_AGG.
@hbridge hbridge changed the title fix(bigquery): support DISTINCT / ORDER BY / LIMIT inside ARRAY_AGG and STRING_AGG Bigquery: Support DISTINCT / ORDER BY / LIMIT inside ARRAY_AGG and STRING_AGG Jun 2, 2026
@hbridge hbridge marked this pull request as ready for review June 2, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant