Skip to content

Commit e3d1019

Browse files
fix(kernel): don't build connector OAuth provider on use_kernel path + add TLS e2e
Live mitmproxy TLS e2e (this commit) surfaced a real bug: on use_kernel=True the connector still built its own auth provider in Session.__init__ via get_python_sql_connector_auth_provider, BEFORE use_kernel was consulted. For OAuth that constructor eagerly runs the flow (DatabricksOAuthProvider calls _initial_get_token in __init__), so passing oauth_client_id for kernel-managed M2M fired the U2M *browser* flow at connect() time — opening a browser and using the M2M service principal id as a U2M OAuth app client_id (which the account rejects). Fix: on use_kernel=True, do NOT build the connector's provider. Hand the kernel auth bridge a minimal AccessTokenAuthProvider when an access_token is present, else None; OAuth M2M/U2M resolve purely from the raw connect() kwargs the bridge already reads, and the kernel owns the entire token lifecycle. Thrift/SEA backends are unchanged. - session.py: branch the provider build on use_kernel; import AccessTokenAuthProvider; update the kernel-branch comment. - auth_bridge.py: kernel_auth_kwargs / _is_pat / _extract_bearer_token accept Optional[AuthProvider] (None when no token on the kernel path); clearer "no credentials" error; refreshed module docstring. - tests/unit/test_session.py: regression guards — use_kernel must NOT call get_python_sql_connector_auth_provider, forwards raw M2M creds via auth_options, and uses a minimal AccessTokenAuthProvider for PAT. TLS e2e (the connector-side counterpart to the kernel's v0_tls_e2e.rs): - tests/e2e/test_kernel_tls.py: 3 mitmproxy-backed tests driving sql.connect(use_kernel=True, _tls_*=...) — strict default rejects the re-signed cert, _tls_trusted_ca_file trusts it, _tls_no_verify bypasses. Skips cleanly without the kernel wheel / creds / MITMPROXY_CA_CERT. Verified live against the dogfood workspace on the OAuth M2M path (all 3 green) — this is what caught the bug above. - kernel-e2e.yml: run test_kernel_tls.py in the existing (label/merge- queue-gated) run-kernel-e2e job behind a mitmproxy service. The kernel wheel is built before the proxy is in scope so cargo's JFrog fetch isn't routed through it; HTTPS_PROXY is scoped to the TLS test step. Path filter now also triggers on session.py and test_kernel_tls.py. 184 unit tests pass; black + mypy clean. Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
1 parent eed76ff commit e3d1019

5 files changed

Lines changed: 319 additions & 29 deletions

File tree

.github/workflows/kernel-e2e.yml

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ jobs:
178178
echo "$CHANGED"
179179
# Run when the connector kernel backend, kernel e2e tests,
180180
# this workflow, the kernel revision pin, or core deps move.
181-
if echo "$CHANGED" | grep -qE "^(src/databricks/sql/backend/kernel/|tests/e2e/test_kernel_backend\.py|tests/unit/test_kernel_|\.github/workflows/kernel-e2e\.yml|KERNEL_REV|pyproject\.toml|poetry\.lock)"; then
181+
if echo "$CHANGED" | grep -qE "^(src/databricks/sql/backend/kernel/|src/databricks/sql/session\.py|tests/e2e/test_kernel_backend\.py|tests/e2e/test_kernel_tls\.py|tests/unit/test_kernel_|\.github/workflows/kernel-e2e\.yml|KERNEL_REV|pyproject\.toml|poetry\.lock)"; then
182182
echo "run_tests=true" >> "$GITHUB_OUTPUT"
183183
else
184184
echo "run_tests=false" >> "$GITHUB_OUTPUT"
@@ -319,6 +319,58 @@ jobs:
319319
- name: Run kernel e2e tests
320320
run: poetry run pytest tests/e2e/test_kernel_backend.py -v
321321

322+
# ── TLS e2e through an intercepting proxy ───────────────────────
323+
# mitmproxy re-signs every TLS connection with its own CA. The
324+
# connector tests (tests/e2e/test_kernel_tls.py) prove the whole
325+
# stack — sql.connect(use_kernel=True, _tls_*=...) → SSLOptions →
326+
# kernel handshake — trusts the CA / accepts no-verify / rejects
327+
# by default. The kernel wheel is already built above, BEFORE any
328+
# proxy is in scope, so cargo's JFrog registry fetch was never
329+
# routed through (and broken by) mitmproxy.
330+
- name: Start mitmproxy
331+
run: |
332+
docker run -d --name mitmproxy \
333+
-p 8080:8080 \
334+
mitmproxy/mitmproxy:12.2.1@sha256:743b6cdc817211d64bc269f5defacca8d14e76e647fc474e5c7244dbcb645141 \
335+
mitmdump --set stream_large_bodies=0
336+
for i in $(seq 1 15); do
337+
if docker exec mitmproxy test -f /home/mitmproxy/.mitmproxy/mitmproxy-ca-cert.pem 2>/dev/null; then
338+
echo "mitmproxy CA cert is ready"
339+
break
340+
fi
341+
echo "Waiting for mitmproxy CA cert... ($i/15)"
342+
sleep 1
343+
done
344+
345+
- name: Extract mitmproxy CA certificate
346+
run: |
347+
docker cp mitmproxy:/home/mitmproxy/.mitmproxy/mitmproxy-ca-cert.pem /tmp/mitmproxy-ca-cert.pem
348+
echo "MITMPROXY_CA_CERT=/tmp/mitmproxy-ca-cert.pem" >> "$GITHUB_ENV"
349+
echo "Extracted mitmproxy CA cert:"
350+
openssl x509 -in /tmp/mitmproxy-ca-cert.pem -noout -subject -issuer
351+
352+
- name: Run kernel TLS e2e tests
353+
# HTTPS_PROXY is scoped to THIS step so only the test's own HTTP
354+
# traffic flows through mitmproxy — not cargo / pip / anything
355+
# the earlier build steps needed from the JFrog registry.
356+
env:
357+
HTTPS_PROXY: http://localhost:8080
358+
run: poetry run pytest tests/e2e/test_kernel_tls.py -v
359+
360+
- name: Verify traffic flowed through mitmproxy
361+
if: always()
362+
run: |
363+
echo "=== mitmproxy logs (tail) ==="
364+
docker logs mitmproxy 2>&1 | tail -50
365+
echo "=== checking for proxied-connection entries ==="
366+
docker logs mitmproxy 2>&1 \
367+
| grep -i "CONNECT\|clientconnect\|<<" \
368+
|| (echo "FAIL: no traffic in mitmproxy logs — proxy was not used" && exit 1)
369+
370+
- name: Cleanup mitmproxy
371+
if: always()
372+
run: docker rm -f mitmproxy 2>/dev/null || true
373+
322374
# Post a Kernel E2E check on both the labeled-PR and merge-queue
323375
# paths so the named check on the PR reflects the latest real
324376
# run (overwriting the synthetic-success check that
@@ -341,7 +393,7 @@ jobs:
341393
completed_at: new Date().toISOString(),
342394
output: {
343395
title: 'Kernel E2E passed',
344-
summary: 'tests/e2e/test_kernel_backend.py ran green against the pinned kernel SHA.'
396+
summary: 'test_kernel_backend.py and the mitmproxy-backed test_kernel_tls.py ran green against the pinned kernel SHA.'
345397
}
346398
});
347399

src/databricks/sql/backend/kernel/auth_bridge.py

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,12 @@
3030
with a clear message rather than deep inside the kernel.
3131
3232
The M2M / U2M decisions are driven by the *raw* connect() kwargs
33-
(``auth_options``), not the built ``AuthProvider`` — the secret is
34-
consumed and discarded during provider construction, so it can only be
35-
read from the original kwargs.
33+
(``auth_options``), not a built ``AuthProvider``. On the kernel path
34+
the connector deliberately does **not** build its own OAuth provider
35+
(that would eagerly run the U2M browser flow / M2M token exchange at
36+
connect() time, before the kernel is consulted), so ``auth_provider``
37+
is either a minimal PAT provider or ``None`` and the OAuth credentials
38+
are available only from the raw kwargs.
3639
"""
3740

3841
from __future__ import annotations
@@ -65,7 +68,7 @@
6568
_TOKEN_REJECT_RE = re.compile(r"[\x00-\x20\x7f]")
6669

6770

68-
def _is_pat(auth_provider: AuthProvider) -> bool:
71+
def _is_pat(auth_provider: Optional[AuthProvider]) -> bool:
6972
"""Return True iff this provider ultimately wraps an
7073
``AccessTokenAuthProvider``.
7174
@@ -84,18 +87,20 @@ def _is_pat(auth_provider: AuthProvider) -> bool:
8487
return False
8588

8689

87-
def _extract_bearer_token(auth_provider: AuthProvider) -> Optional[str]:
90+
def _extract_bearer_token(auth_provider: Optional[AuthProvider]) -> Optional[str]:
8891
"""Pull the current bearer token out of an ``AuthProvider``.
8992
9093
The connector's ``AuthProvider.add_headers`` mutates a header
9194
dict and writes the ``Authorization: Bearer <token>`` value.
9295
Going through that public surface keeps us insulated from
9396
provider-specific internals.
9497
95-
Returns ``None`` if the provider did not write an Authorization
96-
header or wrote a non-Bearer scheme — neither is representable
97-
in the kernel's PAT auth surface.
98+
Returns ``None`` if there is no provider, the provider did not
99+
write an Authorization header, or it wrote a non-Bearer scheme —
100+
none of which is representable in the kernel's PAT auth surface.
98101
"""
102+
if auth_provider is None:
103+
return None
99104
headers: Dict[str, str] = {}
100105
auth_provider.add_headers(headers)
101106
auth = headers.get("Authorization")
@@ -113,7 +118,7 @@ def _extract_bearer_token(auth_provider: AuthProvider) -> Optional[str]:
113118

114119

115120
def kernel_auth_kwargs(
116-
auth_provider: AuthProvider,
121+
auth_provider: Optional[AuthProvider],
117122
auth_options: Optional[Dict[str, Any]] = None,
118123
) -> Dict[str, Any]:
119124
"""Build the kwargs passed to ``databricks_sql_kernel.Session(...)``.
@@ -225,13 +230,18 @@ def kernel_auth_kwargs(
225230
"credentials_provider."
226231
)
227232

228-
# 5. Everything else.
233+
# 5. Everything else (including no usable credentials at all —
234+
# ``auth_provider`` is None on the kernel path when no access
235+
# token was supplied and no OAuth kwargs resolved above).
236+
provider_desc = (
237+
type(auth_provider).__name__ if auth_provider is not None else "no credentials"
238+
)
229239
raise NotSupportedError(
230-
f"use_kernel=True supports PAT, OAuth M2M (oauth_client_id + "
231-
f"oauth_client_secret), and OAuth U2M (auth_type='databricks-oauth' / "
232-
f"'azure-oauth'), but got {type(auth_provider).__name__} with "
233-
f"auth_type={auth_type!r}. Use the Thrift backend (default) for other "
234-
"auth flows."
240+
f"use_kernel=True requires PAT (access_token), OAuth M2M "
241+
f"(oauth_client_id + oauth_client_secret), or OAuth U2M "
242+
f"(auth_type='databricks-oauth' / 'azure-oauth'), but got "
243+
f"{provider_desc} with auth_type={auth_type!r}. Use the Thrift "
244+
"backend (default) for other auth flows."
235245
)
236246

237247

src/databricks/sql/session.py

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from databricks.sql.thrift_api.TCLIService import ttypes
66
from databricks.sql.types import SSLOptions
77
from databricks.sql.auth.auth import get_python_sql_connector_auth_provider
8+
from databricks.sql.auth.authenticators import AccessTokenAuthProvider
89
from databricks.sql.auth.common import ClientContext
910
from databricks.sql.exc import SessionAlreadyClosedError, DatabaseError, RequestError
1011
from databricks.sql import __version__
@@ -96,10 +97,30 @@ def __init__(
9697
# Use the provided HTTP client (created in Connection)
9798
self.http_client = http_client
9899

99-
# Create auth provider with HTTP client context
100-
self.auth_provider = get_python_sql_connector_auth_provider(
101-
server_hostname, http_client=self.http_client, **kwargs
102-
)
100+
# Create auth provider with HTTP client context.
101+
#
102+
# On the kernel path the kernel owns the entire auth lifecycle
103+
# (it acquires/refreshes OAuth tokens itself from the raw
104+
# credentials — see the kernel auth bridge). We must NOT build
105+
# the connector's own provider here: for OAuth it would eagerly
106+
# run the U2M browser flow / M2M token exchange at connect()
107+
# time (``get_auth_provider`` invokes ``_initial_get_token`` in
108+
# the provider constructor), racing — and conflicting with — the
109+
# kernel's auth before ``use_kernel`` is even consulted.
110+
#
111+
# So for ``use_kernel`` we hand the bridge only a minimal PAT
112+
# provider when an ``access_token`` is present, and ``None``
113+
# otherwise (OAuth M2M/U2M resolve purely from the raw kwargs
114+
# the bridge reads). The Thrift / SEA backends are unchanged.
115+
if kwargs.get("use_kernel", False):
116+
access_token = kwargs.get("access_token")
117+
self.auth_provider = (
118+
AccessTokenAuthProvider(access_token) if access_token else None
119+
)
120+
else:
121+
self.auth_provider = get_python_sql_connector_auth_provider(
122+
server_hostname, http_client=self.http_client, **kwargs
123+
)
103124

104125
self.backend = self._create_backend(
105126
server_hostname,
@@ -140,11 +161,11 @@ def _create_backend(
140161
logger.debug("Creating kernel-backed client for use_kernel=True")
141162
# Forward the raw auth-relevant connect() kwargs so the
142163
# kernel auth bridge can build OAuth kwargs from the
143-
# original credentials. The OAuth client secret is consumed
144-
# and discarded during ``auth_provider`` construction, so it
145-
# can only be read from these raw kwargs — not the built
146-
# provider. These are kernel-only; the Thrift / SEA backends
147-
# are unaffected.
164+
# original credentials. On this path we intentionally did
165+
# NOT build the connector's own OAuth provider (see __init__
166+
# above), so these raw kwargs are the only source of the
167+
# OAuth client id/secret. These are kernel-only; the Thrift
168+
# / SEA backends are unaffected.
148169
kernel_auth_options = {
149170
"auth_type": kwargs.get("auth_type"),
150171
"oauth_client_id": kwargs.get("oauth_client_id"),

tests/e2e/test_kernel_tls.py

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
"""E2E TLS tests for ``use_kernel=True``, exercised through an
2+
intercepting HTTPS proxy (mitmproxy) against a live Databricks
3+
workspace.
4+
5+
This is the connector-level counterpart to the kernel repo's
6+
``v0_tls_e2e.rs``. The kernel test proves the raw TLS handshake against
7+
the kernel's ``TlsConfig`` directly; this one proves the *whole stack*:
8+
``sql.connect(use_kernel=True, _tls_*=...)`` → ``SSLOptions`` →
9+
``_kernel_tls_kwargs`` → pyo3 → kernel handshake.
10+
11+
mitmproxy sits in front of the workspace and re-signs every TLS
12+
connection with its own CA. A client trusting only the system roots
13+
sees an untrusted cert and fails — unless we add mitmproxy's CA via
14+
``_tls_trusted_ca_file`` or disable validation via ``_tls_no_verify``.
15+
The three tests cover exactly those outcomes (reject / CA-trusted /
16+
no-verify).
17+
18+
Traffic is routed through mitmproxy via the ``HTTPS_PROXY`` env var
19+
(reqwest honours it by default), since the connector's proxy surface
20+
isn't yet wired to the kernel. The CI job sets
21+
``HTTPS_PROXY=http://localhost:8080``.
22+
23+
Skipped automatically when the kernel wheel, the live creds, or
24+
``MITMPROXY_CA_CERT`` are absent — so it's a no-op in the default test
25+
run and only fires in the ``kernel-e2e`` TLS CI job (or locally with
26+
mitmproxy up; see that workflow / the kernel repo's ``v0_tls_e2e.rs``
27+
header for the docker incantation).
28+
"""
29+
30+
from __future__ import annotations
31+
32+
import os
33+
34+
import pytest
35+
36+
import databricks.sql as sql
37+
from databricks.sql.exc import Error as DatabricksSqlError
38+
39+
# Same real-wheel guard as test_kernel_backend.py: a fake
40+
# ``databricks_sql_kernel`` ModuleType injected by the unit tests has no
41+
# ``__file__``; only a compiled wheel does.
42+
_kernel_mod = pytest.importorskip(
43+
"databricks_sql_kernel",
44+
reason="use_kernel=True requires the databricks-sql-kernel package",
45+
)
46+
if not getattr(_kernel_mod, "__file__", None):
47+
pytest.skip(
48+
"databricks_sql_kernel is a test stub (no __file__); "
49+
"install the real wheel to run kernel TLS e2e tests",
50+
allow_module_level=True,
51+
)
52+
53+
_MITM_CA = os.getenv("MITMPROXY_CA_CERT")
54+
if not _MITM_CA:
55+
pytest.skip(
56+
"MITMPROXY_CA_CERT not set — TLS e2e runs only behind the "
57+
"mitmproxy CI job (or a local mitmproxy)",
58+
allow_module_level=True,
59+
)
60+
61+
62+
def _env(key):
63+
v = os.getenv(key)
64+
return v if v else None
65+
66+
67+
@pytest.fixture(scope="module")
68+
def base_params(connection_details):
69+
"""Live workspace connection params for use_kernel=True. Prefers
70+
OAuth M2M (so the TLS suite doubles as M2M-through-proxy coverage)
71+
and falls back to PAT."""
72+
host = connection_details.get("host")
73+
http_path = connection_details.get("http_path")
74+
if not (host and http_path):
75+
pytest.skip("DATABRICKS_SERVER_HOSTNAME / DATABRICKS_HTTP_PATH not set")
76+
77+
params = {
78+
"server_hostname": host,
79+
"http_path": http_path,
80+
"use_kernel": True,
81+
}
82+
83+
client_id = _env("DATABRICKS_TEST_CLIENT_ID")
84+
client_secret = _env("DATABRICKS_TEST_CLIENT_SECRET")
85+
token = connection_details.get("access_token")
86+
if client_id and client_secret:
87+
params["oauth_client_id"] = client_id
88+
params["oauth_client_secret"] = client_secret
89+
elif token:
90+
params["access_token"] = token
91+
else:
92+
pytest.skip(
93+
"need DATABRICKS_TEST_CLIENT_ID/SECRET (OAuth M2M) or "
94+
"DATABRICKS_TOKEN (PAT)"
95+
)
96+
return params
97+
98+
99+
def _select_one(conn):
100+
with conn.cursor() as cur:
101+
cur.execute("SELECT 1 AS n")
102+
rows = cur.fetchall()
103+
assert len(rows) == 1 and rows[0][0] == 1
104+
105+
106+
def test_tls_fails_without_config_behind_intercepting_proxy(base_params):
107+
"""Strict default (no CA, verify on) must reject mitmproxy's
108+
re-signed certificate. Guards that the positive tests below are
109+
actually validating the chain rather than bypassing the proxy."""
110+
with pytest.raises(DatabricksSqlError):
111+
conn = sql.connect(**base_params)
112+
try:
113+
# Some auth flows defer the first network call until a query.
114+
_select_one(conn)
115+
finally:
116+
try:
117+
conn.close()
118+
except Exception:
119+
pass
120+
121+
122+
def test_tls_with_trusted_custom_ca_succeeds(base_params):
123+
"""Adding mitmproxy's CA via ``_tls_trusted_ca_file`` makes the
124+
re-signed cert trusted, so the full round-trip succeeds. Proves
125+
``_tls_trusted_ca_file`` is wired through to the kernel handshake."""
126+
conn = sql.connect(_tls_trusted_ca_file=_MITM_CA, **base_params)
127+
try:
128+
_select_one(conn)
129+
finally:
130+
conn.close()
131+
132+
133+
def test_tls_with_no_verify_succeeds(base_params):
134+
"""``_tls_no_verify=True`` disables chain validation, so the
135+
round-trip succeeds without supplying the CA."""
136+
conn = sql.connect(_tls_no_verify=True, **base_params)
137+
try:
138+
_select_one(conn)
139+
finally:
140+
conn.close()

0 commit comments

Comments
 (0)