Open
Conversation
Column names sourced from DataFrames frequently contain hyphens (e.g. `col-with-hyphen`). SQLAlchemy uses the column name as the default bind parameter name, and Databricks named-parameter markers (`:name`) only accept bare identifiers ([A-Za-z_][A-Za-z0-9_]*). The hyphen was being emitted verbatim, producing invalid SQL like `:col-with-hyphen` which the server rejects with UNBOUND_SQL_PARAMETER because it parses `-with-hyphen` as stray tokens. Override `DatabricksStatementCompiler.bindparam_string` to wrap non-bare-identifier names in backticks (`:`col-with-hyphen``), which the Spark/Databricks SQL grammar accepts as a quoted parameter identifier (`simpleIdentifier -> quotedIdentifier` in `SqlBaseParser.g4`). This mirrors Oracle's `:"name"` approach to the same problem. The backticks are quoting syntax only — the parameter's logical name is still the text between them, so the params dict sent to the driver keeps the original unquoted key. `escaped_bind_names` is intentionally left empty so `construct_params` passes keys through unchanged. This covers hyphens, spaces, dots, brackets, leading digits, and any other character outside [A-Za-z0-9_], with no risk of collisions between sibling columns like `col-name` and `col_name` (a concern with single-character escape-map approaches). Co-authored-by: Isaac
Following review feedback: the conditional check based on whether the name matches a bare-identifier pattern is unnecessary. The Spark/ Databricks SQL grammar accepts :`name` for every valid identifier (verified empirically against a live SQL warehouse), so wrapping unconditionally keeps the compiler simpler and removes a class of edge-case bugs that the condition could miss. Add a ``quote_bind_params`` flag on DatabricksDialect (default True) that can be turned off via the URL query parameter ``?quote_bind_params=false`` as an escape hatch if the quoting ever introduces an unexpected regression. When disabled, we fall through entirely to stock SQLAlchemy bind-name rendering. Co-authored-by: Isaac
The earlier bindparam_string override only intercepted the primary render path. It missed post-compile expansion used by IN clauses: SQLAlchemy's _literal_execute_expanding_parameter builds expanded markers (e.g. :col-name_1, :col-name_2) directly from self.bindtemplate, bypassing bindparam_string entirely. Comprehensive warehouse testing caught this — SELECT WHERE col IN (...) with a hyphenated column still failed with UNBOUND_SQL_PARAMETER. Switch to overriding bindtemplate and compilation_bindtemplate themselves so every render path (normal bindparam_string, post-compile expansion, render_bind_cast wrappers) gets backticked uniformly. Using property descriptors with a no-op setter forces our template to stick regardless of when super's __init__ assigns from BIND_TEMPLATES. Also drop the quote_bind_params / ?quote_bind_params=false opt-out flag — the dialect has no precedent for behavioral URL flags (only routing: http_path, catalog, schema), and we have strong empirical evidence the fix is safe on current platforms. Expand unit and integration coverage: hyphen, dot, bracket, colon, percent, slash, ?, #, +, *, @, $, &, |, <>, unicode (prénom, 姓名, Straße), reserved words, leading digits, long names, col-name + col_name collision, SELECT WHERE, UPDATE, DELETE, IN, multi-row INSERT, NULL values — all 29 verified end-to-end against a live warehouse. Co-authored-by: Isaac
The previous revision used property descriptors (with a no-op setter) to force bindtemplate / compilation_bindtemplate. That pattern has zero precedent in SQLAlchemy's built-in dialects (none of MySQL, PostgreSQL, SQLite, MSSQL, or Oracle override these templates), and making the instance attribute un-settable is subtle enough to slow down a future reader. Swap to the conventional shape Oracle uses (cx_oracle.py:781): override bindparam_string for the compile-time render path. For the execute-time IN-clause expansion path — which bypasses bindparam_string and reads self.bindtemplate directly from _literal_execute_expanding_parameter — plain attribute assignment in __init__ after super() is sufficient, because super() sets self.bindtemplate near the end of its __init__ (line 1466 in sqlalchemy 2.0.43) after compilation has already run with compilation_bindtemplate. Result: two well-understood extension points, no descriptors, same end-to-end behavior verified in the comprehensive empirical suite (29/29 passing against the live warehouse, including the IN-clause post-compile expansion case that motivated the two-path coverage). Co-authored-by: Isaac
The __init__ bindtemplate swap covered execute-time IN expansion but
missed two adjacent paths:
1. compile_kwargs={'render_postcompile': True} — fires inside super's
__init__, before a post-super subclass override can take effect.
2. construct_expanded_state() called directly on a compiled stmt.
Both paths funnel through SQLCompiler._literal_execute_expanding_parameter,
which reads self.bindtemplate (or compilation_bindtemplate for numeric
paramstyles) once into a local variable and uses it to render every
expanded marker. Override that single method to swap both templates
to the backticked form for the duration of the super call, then
restore.
This removes the __init__ template swap entirely — the override on
_literal_execute_expanding_parameter is the single point that covers
all three expansion call sites (execute-time, render_postcompile=True
compile-time, construct_expanded_state).
Adds a regression test exercising both the render_postcompile=True
and construct_expanded_state paths.
Co-authored-by: Isaac
The previous revision had two method overrides — bindparam_string for the compile-time path and _literal_execute_expanding_parameter for the IN-expansion path — with the latter doing a try/finally state swap of the template attributes. That pattern catches every path but duplicates super's tracking logic (accumulate_bind_names, visited_bindparam, render_bind_cast) and has to be kept in sync with super. Replace both method overrides with a class-level fix of the templates themselves. Every bind-render path in SQLAlchemy reads one of bindtemplate or compilation_bindtemplate — bindparam_string (line 3998, 4000), _literal_execute_expanding_parameter (line 3309, 3311), and the insertmanyvalues path (line 5648, which this dialect doesn't enable). Fixing the attributes at the class level covers all of them with zero method overrides. Use property descriptors with no-op setters because SQLCompiler.__init__ assigns the defaults from BIND_TEMPLATES[paramstyle] during its own init — a plain class attribute would be shadowed by the instance assignment. The no-op setter silently discards super's assignment so our class-level value is always what gets read. Net effect: ~50 lines of method-override logic collapsed to ~16 lines of class-attribute declarations. Same behavior — 257 unit tests pass, 39/39 end-to-end scenarios (single/multi-row INSERT, executemany, UPDATE, DELETE, SELECT with every filter, IN list/empty/subquery/ render_postcompile, LIMIT/OFFSET, CASE WHEN, CTE, NULL values, functions, construct_expanded_state, cached statement reuse, sibling collision) pass against a live warehouse. Co-authored-by: Isaac
Same semantics — property with getter returning the fixed template and a no-op setter so super's assignment is silently discarded. 13 lines of @Property blocks reduced to 2 inline declarations, with a single comment explaining why the no-op setter exists. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bug. INSERTs (and SELECT/UPDATE/DELETE, and IN clauses) involving column names containing characters outside
[A-Za-z0-9_]— most commonly hyphens from DataFrame-origin schemas — fail withUNBOUND_SQL_PARAMETER. SQLAlchemy uses the column name as the default bind parameter name, so compiled SQL likeVALUES (:col-with-hyphen)has the Databricks parser split on-and reportunbound parameter: col. Same root cause as databricks-sql-python#368, filed againstdatabricks-sqlalchemy.Fix. Override
DatabricksStatementCompiler.bindtemplateandcompilation_bindtemplateso every bind marker is rendered wrapped in backticks (:`col-with-hyphen`). The Spark/Databricks SQL grammar (perSqlBaseParser.g4) accepts this as a quoted parameter identifier —simpleIdentifier → quotedIdentifier → BACKQUOTED_IDENTIFIER. Mirrors Oracle's:"name"approach to the same grammar constraint.The templates are fixed via property descriptors with a no-op setter, because SQLAlchemy's
SQLCompiler.__init__assigns them fromBIND_TEMPLATES[paramstyle]inline with statement compilation — any subclass override in__init__runs too late, and a class-level attribute gets shadowed. A descriptor intercepts both the read (forcing our value) and the write (no-op), so the template is fixed regardless of ordering.Why template-based, not
bindparam_string-based. An earlier revision of this PR overrodebindparam_stringdirectly. That missed the post-compile expansion path:_literal_execute_expanding_parameterbuilds expanded markers for IN clauses (:col-name_1,:col-name_2, …) directly fromself.bindtemplate, bypassingbindparam_stringentirely. Changing the template covers every render path uniformly.Why backticks (not an escape map). First draft added
bindname_escape_characters["-"] = "_". That silently corrupts data for a table with bothcol-nameandcol_name(both collapse to:col_name, one value overwrites the other in the params dict). A multi-char token like_x2d_dodges the collision but is noisy and can theoretically still collide. Backticks sidestep the whole issue: the parameter's logical name is identical to the column's, so no mapping is needed and sibling columns keep distinct markers.Why no opt-out flag. An earlier revision added
?quote_bind_params=falseas a kill switch. Dropped because: (a) the dialect has no precedent for behavioral URL flags (only routing:http_path,catalog,schema), and (b) empirical evidence across a wide matrix is strong — see test plan below.Test plan
Unit tests (
tests/test_local/test_ddl.py::TestBindParamQuoting, 11 cases):col-name+col_namesiblings → distinct markers, both values round-tripid,name) → also backticked (unconditional).,[,],:,%) still work via pre-translation + backtick wrapprénom,姓名,Straße)select,from) as column names/,?,#,+,*,@,$,&,|,<>) render correctlyLocal suite:
poetry run pytest tests/test_local/ --ignore=tests/test_local/e2e -q→ 256 passed.End-to-end against a live SQL warehouse — every case creates a temp table, INSERTs via SQLAlchemy, reads back, asserts exact values, drops the table. 29/29 passing:
?,#,+,*,@,$,&,|,<>,:, leading digit, 3 Unicode scripts, 2 reserved words, long name)col-name+col_namesiblingsbindparam_string-only revision broke)Chars Delta rejects at CREATE TABLE time (space, parens, comma,
=, backtick) are intentionally excluded — they can never land in a real table and thus can't reach the bind-name path.Co-authored-by: Isaac