Quote bind parameter names with backticks by msrathore-db · Pull Request #60 · databricks/databricks-sqlalchemy

msrathore-db · 2026-04-21T18:40:10Z

Summary

Bug. INSERTs (and SELECT/UPDATE/DELETE, and IN clauses) involving column names containing characters outside [A-Za-z0-9_] — most commonly hyphens from DataFrame-origin schemas — fail with UNBOUND_SQL_PARAMETER. SQLAlchemy uses the column name as the default bind parameter name, so compiled SQL like VALUES (:col-with-hyphen) has the Databricks parser split on - and report unbound parameter: col. Same root cause as databricks-sql-python#368, filed against databricks-sqlalchemy.

Fix. Override DatabricksStatementCompiler.bindtemplate and compilation_bindtemplate so every bind marker is rendered wrapped in backticks (:`col-with-hyphen`). The Spark/Databricks SQL grammar (per SqlBaseParser.g4) accepts this as a quoted parameter identifier — simpleIdentifier → quotedIdentifier → BACKQUOTED_IDENTIFIER. Mirrors Oracle's :"name" approach to the same grammar constraint.

The templates are fixed via property descriptors with a no-op setter, because SQLAlchemy's SQLCompiler.__init__ assigns them from BIND_TEMPLATES[paramstyle] inline with statement compilation — any subclass override in __init__ runs too late, and a class-level attribute gets shadowed. A descriptor intercepts both the read (forcing our value) and the write (no-op), so the template is fixed regardless of ordering.

Why template-based, not bindparam_string-based. An earlier revision of this PR overrode bindparam_string directly. That missed the post-compile expansion path: _literal_execute_expanding_parameter builds expanded markers for IN clauses (:col-name_1, :col-name_2, …) directly from self.bindtemplate, bypassing bindparam_string entirely. Changing the template covers every render path uniformly.

Why backticks (not an escape map). First draft added bindname_escape_characters["-"] = "_". That silently corrupts data for a table with both col-name and col_name (both collapse to :col_name, one value overwrites the other in the params dict). A multi-char token like _x2d_ dodges the collision but is noisy and can theoretically still collide. Backticks sidestep the whole issue: the parameter's logical name is identical to the column's, so no mapping is needed and sibling columns keep distinct markers.

Why no opt-out flag. An earlier revision added ?quote_bind_params=false as a kill switch. Dropped because: (a) the dialect has no precedent for behavioral URL flags (only routing: http_path, catalog, schema), and (b) empirical evidence across a wide matrix is strong — see test plan below.

Test plan

Unit tests (tests/test_local/test_ddl.py::TestBindParamQuoting, 11 cases):

Local suite: poetry run pytest tests/test_local/ --ignore=tests/test_local/e2e -q → 256 passed.

End-to-end against a live SQL warehouse — every case creates a temp table, INSERTs via SQLAlchemy, reads back, asserts exact values, drops the table. 29/29 passing:

22 column-name variants (hyphen, dot, bracket, percent, slash, ?, #, +, *, @, $, &, |, <>, :, leading digit, 3 Unicode scripts, 2 reserved words, long name)
Collision case: col-name + col_name siblings
SELECT WHERE on hyphenated column
UPDATE with hyphenated column in SET + WHERE
DELETE with hyphenated column in WHERE
IN clause — post-compile expansion path (this is the case the earlier bindparam_string-only revision broke)
Multi-row INSERT
NULL value bound to hyphenated column

Chars Delta rejects at CREATE TABLE time (space, parens, comma, =, backtick) are intentionally excluded — they can never land in a real table and thus can't reach the bind-name path.

Co-authored-by: Isaac

Column names sourced from DataFrames frequently contain hyphens (e.g. `col-with-hyphen`). SQLAlchemy uses the column name as the default bind parameter name, and Databricks named-parameter markers (`:name`) only accept bare identifiers ([A-Za-z_][A-Za-z0-9_]*). The hyphen was being emitted verbatim, producing invalid SQL like `:col-with-hyphen` which the server rejects with UNBOUND_SQL_PARAMETER because it parses `-with-hyphen` as stray tokens. Override `DatabricksStatementCompiler.bindparam_string` to wrap non-bare-identifier names in backticks (`:`col-with-hyphen``), which the Spark/Databricks SQL grammar accepts as a quoted parameter identifier (`simpleIdentifier -> quotedIdentifier` in `SqlBaseParser.g4`). This mirrors Oracle's `:"name"` approach to the same problem. The backticks are quoting syntax only — the parameter's logical name is still the text between them, so the params dict sent to the driver keeps the original unquoted key. `escaped_bind_names` is intentionally left empty so `construct_params` passes keys through unchanged. This covers hyphens, spaces, dots, brackets, leading digits, and any other character outside [A-Za-z0-9_], with no risk of collisions between sibling columns like `col-name` and `col_name` (a concern with single-character escape-map approaches). Co-authored-by: Isaac

Following review feedback: the conditional check based on whether the name matches a bare-identifier pattern is unnecessary. The Spark/ Databricks SQL grammar accepts :`name` for every valid identifier (verified empirically against a live SQL warehouse), so wrapping unconditionally keeps the compiler simpler and removes a class of edge-case bugs that the condition could miss. Add a ``quote_bind_params`` flag on DatabricksDialect (default True) that can be turned off via the URL query parameter ``?quote_bind_params=false`` as an escape hatch if the quoting ever introduces an unexpected regression. When disabled, we fall through entirely to stock SQLAlchemy bind-name rendering. Co-authored-by: Isaac

The earlier bindparam_string override only intercepted the primary render path. It missed post-compile expansion used by IN clauses: SQLAlchemy's _literal_execute_expanding_parameter builds expanded markers (e.g. :col-name_1, :col-name_2) directly from self.bindtemplate, bypassing bindparam_string entirely. Comprehensive warehouse testing caught this — SELECT WHERE col IN (...) with a hyphenated column still failed with UNBOUND_SQL_PARAMETER. Switch to overriding bindtemplate and compilation_bindtemplate themselves so every render path (normal bindparam_string, post-compile expansion, render_bind_cast wrappers) gets backticked uniformly. Using property descriptors with a no-op setter forces our template to stick regardless of when super's __init__ assigns from BIND_TEMPLATES. Also drop the quote_bind_params / ?quote_bind_params=false opt-out flag — the dialect has no precedent for behavioral URL flags (only routing: http_path, catalog, schema), and we have strong empirical evidence the fix is safe on current platforms. Expand unit and integration coverage: hyphen, dot, bracket, colon, percent, slash, ?, #, +, *, @, $, &, |, <>, unicode (prénom, 姓名, Straße), reserved words, leading digits, long names, col-name + col_name collision, SELECT WHERE, UPDATE, DELETE, IN, multi-row INSERT, NULL values — all 29 verified end-to-end against a live warehouse. Co-authored-by: Isaac

The previous revision used property descriptors (with a no-op setter) to force bindtemplate / compilation_bindtemplate. That pattern has zero precedent in SQLAlchemy's built-in dialects (none of MySQL, PostgreSQL, SQLite, MSSQL, or Oracle override these templates), and making the instance attribute un-settable is subtle enough to slow down a future reader. Swap to the conventional shape Oracle uses (cx_oracle.py:781): override bindparam_string for the compile-time render path. For the execute-time IN-clause expansion path — which bypasses bindparam_string and reads self.bindtemplate directly from _literal_execute_expanding_parameter — plain attribute assignment in __init__ after super() is sufficient, because super() sets self.bindtemplate near the end of its __init__ (line 1466 in sqlalchemy 2.0.43) after compilation has already run with compilation_bindtemplate. Result: two well-understood extension points, no descriptors, same end-to-end behavior verified in the comprehensive empirical suite (29/29 passing against the live warehouse, including the IN-clause post-compile expansion case that motivated the two-path coverage). Co-authored-by: Isaac

The __init__ bindtemplate swap covered execute-time IN expansion but missed two adjacent paths: 1. compile_kwargs={'render_postcompile': True} — fires inside super's __init__, before a post-super subclass override can take effect. 2. construct_expanded_state() called directly on a compiled stmt. Both paths funnel through SQLCompiler._literal_execute_expanding_parameter, which reads self.bindtemplate (or compilation_bindtemplate for numeric paramstyles) once into a local variable and uses it to render every expanded marker. Override that single method to swap both templates to the backticked form for the duration of the super call, then restore. This removes the __init__ template swap entirely — the override on _literal_execute_expanding_parameter is the single point that covers all three expansion call sites (execute-time, render_postcompile=True compile-time, construct_expanded_state). Adds a regression test exercising both the render_postcompile=True and construct_expanded_state paths. Co-authored-by: Isaac

The previous revision had two method overrides — bindparam_string for the compile-time path and _literal_execute_expanding_parameter for the IN-expansion path — with the latter doing a try/finally state swap of the template attributes. That pattern catches every path but duplicates super's tracking logic (accumulate_bind_names, visited_bindparam, render_bind_cast) and has to be kept in sync with super. Replace both method overrides with a class-level fix of the templates themselves. Every bind-render path in SQLAlchemy reads one of bindtemplate or compilation_bindtemplate — bindparam_string (line 3998, 4000), _literal_execute_expanding_parameter (line 3309, 3311), and the insertmanyvalues path (line 5648, which this dialect doesn't enable). Fixing the attributes at the class level covers all of them with zero method overrides. Use property descriptors with no-op setters because SQLCompiler.__init__ assigns the defaults from BIND_TEMPLATES[paramstyle] during its own init — a plain class attribute would be shadowed by the instance assignment. The no-op setter silently discards super's assignment so our class-level value is always what gets read. Net effect: ~50 lines of method-override logic collapsed to ~16 lines of class-attribute declarations. Same behavior — 257 unit tests pass, 39/39 end-to-end scenarios (single/multi-row INSERT, executemany, UPDATE, DELETE, SELECT with every filter, IN list/empty/subquery/ render_postcompile, LIMIT/OFFSET, CASE WHEN, CTE, NULL values, functions, construct_expanded_state, cached statement reuse, sibling collision) pass against a live warehouse. Co-authored-by: Isaac

Same semantics — property with getter returning the fixed template and a no-op setter so super's assignment is silently discarded. 13 lines of @Property blocks reduced to 2 inline declarations, with a single comment explaining why the no-op setter exists. Co-authored-by: Isaac

msrathore-db requested review from deeksha-db, gopalldb, jackyhu-db, jayantsing-db, jprakash-db, madhav-db, samikshya-db, shivam2680 and vikrantpuppala as code owners April 21, 2026 18:40

msrathore-db added 2 commits April 22, 2026 12:13

msrathore-db changed the title ~~Quote bind parameter names containing non-identifier characters~~ Quote bind parameter names with backticks Apr 22, 2026

msrathore-db added 4 commits April 22, 2026 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quote bind parameter names with backticks#60

Quote bind parameter names with backticks#60
msrathore-db wants to merge 7 commits intomainfrom
fix-hyphenated-column-bind-params

msrathore-db commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

msrathore-db commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

msrathore-db commented Apr 21, 2026 •

edited

Loading