Skip to content

Quote bind parameter names with backticks#60

Open
msrathore-db wants to merge 7 commits intomainfrom
fix-hyphenated-column-bind-params
Open

Quote bind parameter names with backticks#60
msrathore-db wants to merge 7 commits intomainfrom
fix-hyphenated-column-bind-params

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

@msrathore-db msrathore-db commented Apr 21, 2026

Summary

Bug. INSERTs (and SELECT/UPDATE/DELETE, and IN clauses) involving column names containing characters outside [A-Za-z0-9_] — most commonly hyphens from DataFrame-origin schemas — fail with UNBOUND_SQL_PARAMETER. SQLAlchemy uses the column name as the default bind parameter name, so compiled SQL like VALUES (:col-with-hyphen) has the Databricks parser split on - and report unbound parameter: col. Same root cause as databricks-sql-python#368, filed against databricks-sqlalchemy.

Fix. Override DatabricksStatementCompiler.bindtemplate and compilation_bindtemplate so every bind marker is rendered wrapped in backticks (:`col-with-hyphen`). The Spark/Databricks SQL grammar (per SqlBaseParser.g4) accepts this as a quoted parameter identifier — simpleIdentifier → quotedIdentifier → BACKQUOTED_IDENTIFIER. Mirrors Oracle's :"name" approach to the same grammar constraint.

The templates are fixed via property descriptors with a no-op setter, because SQLAlchemy's SQLCompiler.__init__ assigns them from BIND_TEMPLATES[paramstyle] inline with statement compilation — any subclass override in __init__ runs too late, and a class-level attribute gets shadowed. A descriptor intercepts both the read (forcing our value) and the write (no-op), so the template is fixed regardless of ordering.

Why template-based, not bindparam_string-based. An earlier revision of this PR overrode bindparam_string directly. That missed the post-compile expansion path: _literal_execute_expanding_parameter builds expanded markers for IN clauses (:col-name_1, :col-name_2, …) directly from self.bindtemplate, bypassing bindparam_string entirely. Changing the template covers every render path uniformly.

Why backticks (not an escape map). First draft added bindname_escape_characters["-"] = "_". That silently corrupts data for a table with both col-name and col_name (both collapse to :col_name, one value overwrites the other in the params dict). A multi-char token like _x2d_ dodges the collision but is noisy and can theoretically still collide. Backticks sidestep the whole issue: the parameter's logical name is identical to the column's, so no mapping is needed and sibling columns keep distinct markers.

Why no opt-out flag. An earlier revision added ?quote_bind_params=false as a kill switch. Dropped because: (a) the dialect has no precedent for behavioral URL flags (only routing: http_path, catalog, schema), and (b) empirical evidence across a wide matrix is strong — see test plan below.

Test plan

Unit tests (tests/test_local/test_ddl.py::TestBindParamQuoting, 11 cases):

  • Hyphenated column → backticked bind marker, dict key preserved unquoted
  • col-name + col_name siblings → distinct markers, both values round-trip
  • Plain identifiers (id, name) → also backticked (unconditional)
  • SQLAlchemy default escape map chars (., [, ], :, %) still work via pre-translation + backtick wrap
  • Leading-digit column names
  • Unicode (prénom, 姓名, Straße)
  • Reserved words (select, from) as column names
  • Hyphen surviving into anonymized WHERE-clause bind
  • Multi-row INSERT (row-level suffixed bind names)
  • IN clause compiles to a valid POSTCOMPILE marker
  • Many special chars (hyphen, /, ?, #, +, *, @, $, &, |, <>) render correctly

Local suite: poetry run pytest tests/test_local/ --ignore=tests/test_local/e2e -q256 passed.

End-to-end against a live SQL warehouse — every case creates a temp table, INSERTs via SQLAlchemy, reads back, asserts exact values, drops the table. 29/29 passing:

  • 22 column-name variants (hyphen, dot, bracket, percent, slash, ?, #, +, *, @, $, &, |, <>, :, leading digit, 3 Unicode scripts, 2 reserved words, long name)
  • Collision case: col-name + col_name siblings
  • SELECT WHERE on hyphenated column
  • UPDATE with hyphenated column in SET + WHERE
  • DELETE with hyphenated column in WHERE
  • IN clause — post-compile expansion path (this is the case the earlier bindparam_string-only revision broke)
  • Multi-row INSERT
  • NULL value bound to hyphenated column

Chars Delta rejects at CREATE TABLE time (space, parens, comma, =, backtick) are intentionally excluded — they can never land in a real table and thus can't reach the bind-name path.

Co-authored-by: Isaac

Column names sourced from DataFrames frequently contain hyphens
(e.g. `col-with-hyphen`). SQLAlchemy uses the column name as the
default bind parameter name, and Databricks named-parameter markers
(`:name`) only accept bare identifiers ([A-Za-z_][A-Za-z0-9_]*).
The hyphen was being emitted verbatim, producing invalid SQL like
`:col-with-hyphen` which the server rejects with
UNBOUND_SQL_PARAMETER because it parses `-with-hyphen` as stray
tokens.

Override `DatabricksStatementCompiler.bindparam_string` to wrap
non-bare-identifier names in backticks (`:`col-with-hyphen``),
which the Spark/Databricks SQL grammar accepts as a quoted
parameter identifier (`simpleIdentifier -> quotedIdentifier` in
`SqlBaseParser.g4`). This mirrors Oracle's `:"name"` approach to
the same problem.

The backticks are quoting syntax only — the parameter's logical
name is still the text between them, so the params dict sent to
the driver keeps the original unquoted key. `escaped_bind_names`
is intentionally left empty so `construct_params` passes keys
through unchanged.

This covers hyphens, spaces, dots, brackets, leading digits, and
any other character outside [A-Za-z0-9_], with no risk of
collisions between sibling columns like `col-name` and `col_name`
(a concern with single-character escape-map approaches).

Co-authored-by: Isaac
Following review feedback: the conditional check based on whether the
name matches a bare-identifier pattern is unnecessary. The Spark/
Databricks SQL grammar accepts :`name` for every valid identifier
(verified empirically against a live SQL warehouse), so wrapping
unconditionally keeps the compiler simpler and removes a class of
edge-case bugs that the condition could miss.

Add a ``quote_bind_params`` flag on DatabricksDialect (default True)
that can be turned off via the URL query parameter
``?quote_bind_params=false`` as an escape hatch if the quoting ever
introduces an unexpected regression. When disabled, we fall through
entirely to stock SQLAlchemy bind-name rendering.

Co-authored-by: Isaac
The earlier bindparam_string override only intercepted the primary
render path. It missed post-compile expansion used by IN clauses:
SQLAlchemy's _literal_execute_expanding_parameter builds expanded
markers (e.g. :col-name_1, :col-name_2) directly from
self.bindtemplate, bypassing bindparam_string entirely. Comprehensive
warehouse testing caught this — SELECT WHERE col IN (...) with a
hyphenated column still failed with UNBOUND_SQL_PARAMETER.

Switch to overriding bindtemplate and compilation_bindtemplate
themselves so every render path (normal bindparam_string, post-compile
expansion, render_bind_cast wrappers) gets backticked uniformly.
Using property descriptors with a no-op setter forces our template to
stick regardless of when super's __init__ assigns from BIND_TEMPLATES.

Also drop the quote_bind_params / ?quote_bind_params=false opt-out
flag — the dialect has no precedent for behavioral URL flags (only
routing: http_path, catalog, schema), and we have strong empirical
evidence the fix is safe on current platforms.

Expand unit and integration coverage: hyphen, dot, bracket, colon,
percent, slash, ?, #, +, *, @, $, &, |, <>, unicode (prénom, 姓名,
Straße), reserved words, leading digits, long names, col-name +
col_name collision, SELECT WHERE, UPDATE, DELETE, IN, multi-row
INSERT, NULL values — all 29 verified end-to-end against a live
warehouse.

Co-authored-by: Isaac
@msrathore-db msrathore-db changed the title Quote bind parameter names containing non-identifier characters Quote bind parameter names with backticks Apr 22, 2026
The previous revision used property descriptors (with a no-op setter)
to force bindtemplate / compilation_bindtemplate. That pattern has
zero precedent in SQLAlchemy's built-in dialects (none of MySQL,
PostgreSQL, SQLite, MSSQL, or Oracle override these templates), and
making the instance attribute un-settable is subtle enough to slow
down a future reader.

Swap to the conventional shape Oracle uses (cx_oracle.py:781):
override bindparam_string for the compile-time render path. For the
execute-time IN-clause expansion path — which bypasses
bindparam_string and reads self.bindtemplate directly from
_literal_execute_expanding_parameter — plain attribute assignment in
__init__ after super() is sufficient, because super() sets
self.bindtemplate near the end of its __init__ (line 1466 in
sqlalchemy 2.0.43) after compilation has already run with
compilation_bindtemplate.

Result: two well-understood extension points, no descriptors, same
end-to-end behavior verified in the comprehensive empirical suite
(29/29 passing against the live warehouse, including the IN-clause
post-compile expansion case that motivated the two-path coverage).

Co-authored-by: Isaac
The __init__ bindtemplate swap covered execute-time IN expansion but
missed two adjacent paths:

  1. compile_kwargs={'render_postcompile': True} — fires inside super's
     __init__, before a post-super subclass override can take effect.
  2. construct_expanded_state() called directly on a compiled stmt.

Both paths funnel through SQLCompiler._literal_execute_expanding_parameter,
which reads self.bindtemplate (or compilation_bindtemplate for numeric
paramstyles) once into a local variable and uses it to render every
expanded marker. Override that single method to swap both templates
to the backticked form for the duration of the super call, then
restore.

This removes the __init__ template swap entirely — the override on
_literal_execute_expanding_parameter is the single point that covers
all three expansion call sites (execute-time, render_postcompile=True
compile-time, construct_expanded_state).

Adds a regression test exercising both the render_postcompile=True
and construct_expanded_state paths.

Co-authored-by: Isaac
The previous revision had two method overrides — bindparam_string for
the compile-time path and _literal_execute_expanding_parameter for the
IN-expansion path — with the latter doing a try/finally state swap of
the template attributes. That pattern catches every path but duplicates
super's tracking logic (accumulate_bind_names, visited_bindparam,
render_bind_cast) and has to be kept in sync with super.

Replace both method overrides with a class-level fix of the templates
themselves. Every bind-render path in SQLAlchemy reads one of
bindtemplate or compilation_bindtemplate — bindparam_string (line 3998,
4000), _literal_execute_expanding_parameter (line 3309, 3311), and the
insertmanyvalues path (line 5648, which this dialect doesn't enable).
Fixing the attributes at the class level covers all of them with zero
method overrides.

Use property descriptors with no-op setters because
SQLCompiler.__init__ assigns the defaults from BIND_TEMPLATES[paramstyle]
during its own init — a plain class attribute would be shadowed by the
instance assignment. The no-op setter silently discards super's
assignment so our class-level value is always what gets read.

Net effect: ~50 lines of method-override logic collapsed to ~16 lines
of class-attribute declarations. Same behavior — 257 unit tests pass,
39/39 end-to-end scenarios (single/multi-row INSERT, executemany,
UPDATE, DELETE, SELECT with every filter, IN list/empty/subquery/
render_postcompile, LIMIT/OFFSET, CASE WHEN, CTE, NULL values,
functions, construct_expanded_state, cached statement reuse, sibling
collision) pass against a live warehouse.

Co-authored-by: Isaac
Same semantics — property with getter returning the fixed template and
a no-op setter so super's assignment is silently discarded. 13 lines
of @Property blocks reduced to 2 inline declarations, with a single
comment explaining why the no-op setter exists.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant