This repository was archived by the owner on Mar 31, 2026. It is now read-only.
Test#1
Open
evb123 wants to merge 294 commits into
Open
Conversation
Signed-off-by: Jim Fulton <jim.fulton@unsupervised.com> Co-Authored-By: Jesse Whitehouse <jesse.whitehouse@databricks.com> Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
* Initial commit Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Added tsparkparam handling Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Added basic test Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Addressed comments Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Addressed missed comments Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Resolved comments --------- Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com>
…es (#228) Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
…#229) * Updated thrift files and added check for protocol version Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Made error message more clear Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Changed name of fn Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Ran linter Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Update src/databricks/sql/client.py Co-authored-by: Jesse <jwhitehouse@airpost.net> --------- Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> Co-authored-by: Jesse <jwhitehouse@airpost.net>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
…233) Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
…ng behaviour (#236) Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
…to follow (#244) Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
* Stop skipping TableDDLTest and permanent skip HasIndexTest We're now in the territory of features that aren't required for sqla2 compat as of pysql==3.0.0 but we may consider adding this in the future. In this case, table comment reflection needs to be manually implemented. Index reflection would require hooking into the compiler to reflect the partition strategy. test_suite.py::HasIndexTest_databricks+databricks::test_has_index[dialect] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index[inspector] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index_schema[dialect] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index_schema[inspector] SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_add_table_comment SKIPPED (Comment reflection is possible but not implemented in this dialect.) test_suite.py::TableDDLTest_databricks+databricks::test_create_index_if_not_exists SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_create_table PASSED test_suite.py::TableDDLTest_databricks+databricks::test_create_table_if_not_exists PASSED test_suite.py::TableDDLTest_databricks+databricks::test_create_table_schema PASSED test_suite.py::TableDDLTest_databricks+databricks::test_drop_index_if_exists SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_drop_table PASSED test_suite.py::TableDDLTest_databricks+databricks::test_drop_table_comment SKIPPED (Comment reflection is possible but not implemented in this dialect.) test_suite.py::TableDDLTest_databricks+databricks::test_drop_table_if_exists PASSED test_suite.py::TableDDLTest_databricks+databricks::test_underscore_names PASSED Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Permanently skip QuotedNameArgumentTest with comments The fixes to DESCRIBE TABLE and visit_xxx were necessary to get to the point where I could even determine that these tests wouldn't pass. But those changes are not currently tested in the dialect. If, in the course of reviewing the remaining tests in the compliance suite, I find that these visit_xxxx methods are not tested anywhere else then we should extend test_suite.py with our own tests to confirm the behaviour for ourselves. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Move files from base.py to _ddl.py The presence of this pytest.ini file is _required_ to establish pytest's root_path https://docs.pytest.org/en/7.1.x/reference/customize.html#finding-the-rootdir Without it, the custom pytest plugin from SQLAlchemy can't read the contents of setup.cfg which makes none of the tests runnable. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Emit a warning for certain constructs Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping RowFetchTest Date type work fixed this test failure Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Revise infer_types logic to never infer a TINYINT This allows these SQLAlchemy tests to pass: test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_bound_limit PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_bound_limit_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_expr_limit_simple_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_expr_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases0] PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases1] PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases2] PASSED This partially reverts the change introduced in #246 Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping FetchLimitOffsetTest I implemented our custom DatabricksStatementCompiler so we can override the default rendering of unbounded LIMIT clauses from `LIMIT -1` to `LIMIT ALL` We also explicitly skip the FETCH clause tests since Databricks doesn't support this syntax. Blacked all source code here too. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping FutureTableDDLTest Add meaningful skip markers for table comment reflection and indexes Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping Identity column tests This closes #175 Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping HasTableTest Adding the @reflection.cache decorator to has_table is necessary to pass test_has_table_cache Caching calls to has_table improves the efficiency of the connector Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Permanently skip LongNameBlowoutTest Databricks constraint names are limited to 255 characters Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping ExceptionTest Black test_suite.py Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Permanently skip LastrowidTest Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Implement PRIMARY KEY and FOREIGN KEY reflection and enable tests Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Skip all IdentityColumnTest tests Turns out that none of these can pass for the same reason that the first two seemed un-runnable in db6f52b Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> --------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jacky Hu <jacky.hu@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
* Put in some unit tests, will add e2e Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Added e2e test Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Linted Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * re-bumped thrift files Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Changed structure to store protocol version as feature of connection Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Fixed parameters test Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Fixed comments Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Update src/databricks/sql/client.py Co-authored-by: Jesse <jwhitehouse@airpost.net> Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Fixed comments Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> * Removed extra indent Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> --------- Signed-off-by: nithinkdb <nithin.krishnamurthi@databricks.com> Co-authored-by: Jesse <jwhitehouse@airpost.net>
…ent (#264) Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Closes #123 Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
* Add OAuth M2M example Signed-off-by: Jacky Hu <jacky.hu@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Martin Rueckl <enigma@nbubu.de>
* Allow pandas 3.x in dependency constraints Relax the pandas version upper bound from <2.4.0 to <4.0.0 to allow pandas 3.x. The pandas APIs used in this project (nullable extension dtypes, DataFrame.to_numpy, PyArrow-to-pandas conversion via types_mapper) are all compatible with pandas 3.0. Add unit tests for _convert_arrow_table covering all mapped data types (int8-64, uint8-64, float32/64, bool, string), null handling, mixed types, duplicate column names, and the disable_pandas code path. Closes #732 * tests: address review feedback on test_pandas_compatibility - Refactor _make_result_set to use ThriftResultSet via normal constructor with mocked args (matches test_client.py pattern), instead of bypassing __init__ via object.__new__ - Add coverage for additional Arrow types: decimal128, date32/date64, timestamp, binary, large_string, list_, struct, map_
…nce (#756) * Add _retry_server_directed_only mode for Retry-After header compliance When enabled, the connector only retries on 429/503 if the server includes a Retry-After header in the response. This prevents duplicate side effects for non-idempotent ExecuteStatement operations where the server has not explicitly signaled that retry is safe. The new opt-in parameter `_retry_server_directed_only` threads through ClientContext, all three DatabricksRetryPolicy construction sites (Thrift, SEA, UnifiedHttpClient), and the retry policy's should_retry/is_retry methods. Default behavior (retry without requiring the header) is unchanged. Signed-off-by: Shubham Dhal <shubham.dhal@databricks.com> * Remove unnecessary _retry_server_directed_only instance variables Inline kwargs.get() at the single point of use in ThriftDatabricksClient and SeaHttpClient instead of storing as dead instance state. Signed-off-by: Shubham Dhal <shubham.dhal@databricks.com> * Address PR feedback: rename and clean up retry-after parameter - Rename server_directed_only to respect_server_retry_after_header throughout for clarity - Store _respect_server_retry_after_header as instance variable in Thrift/SEA backends to match existing kwargs extraction pattern - Replace duplicate test fixture with _make_retry_policy(**overrides) helper for flexible policy construction in tests Signed-off-by: Shubham Dhal <shubham.dhal@databricks.com> * Fix Black formatting for retry.py and utils.py Signed-off-by: Shubham Dhal <shubham.dhal@databricks.com> --------- Signed-off-by: Shubham Dhal <shubham.dhal@databricks.com>
…el's complex_types_as_json post-processor (#795) feat(backend/kernel): plumb _use_arrow_native_complex_types to kernel's complex_types_as_json The connector's `_use_arrow_native_complex_types` toggle is honoured by the Thrift backend (forwarded server-side as `complexTypesAsArrow`) but was silently ignored by the kernel backend — the kernel always returned native Arrow `List` / `Map` / `Struct` regardless. This was the root cause of the 5 `THRIFT_VS_KERNEL_COMPLEX_DISABLED` diffs in the comparator's COMPLEX_TYPES suite. The kernel side gained an opt-in `complex_types_as_json` post- processor (kernel PR #36) that rewrites complex columns to `Utf8` columns of compact JSON text, matching the Thrift wire format byte-for-byte. This change wires the connector's existing kwarg through to that flag: - `session.py`: pass `_use_arrow_native_complex_types` to the kernel client (it was being dropped on the floor for the kernel branch). - `backend/kernel/client.py`: read it from kwargs (default `True`, matching the connector-wide default), invert at the boundary, and set `complex_types_as_json=not _use_arrow_native_complex_types` on the kernel `Session()` constructor. - `backend/kernel/type_mapping.py`: extend `_databricks_type_for_field` to honour `databricks.type_name` for `ARRAY` / `MAP` / `STRUCT` (it already did this for `VARIANT`). When the kernel JSON path is on, the columns arrive as `Utf8` but the kernel preserves the original SQL type name in metadata; `description` should report `array` / `map` / `struct`, matching what the Thrift backend reports under `complexTypesAsArrow=False`. Verified end-to-end against the pecotesting comparator workspace: the `THRIFT_VS_KERNEL_COMPLEX_DISABLED` suite drops from 5 type-shape diffs + 1 row diff to 1 row diff. The remaining row diff is a Thrift server-side bug — Thrift emits invalid JSON for map values containing embedded `"` characters (`{"k":"val with "quote""}` — unescaped inner quote), while the kernel emits the correctly-escaped form (`{"k":"val with \"quote\""}`). The kernel is right here; matching Thrift would mean deliberately producing un-parseable output. Unit tests: - Parametrised test of `_use_arrow_native_complex_types` (default / True / False) → kernel `Session(complex_types_as_json=…)`. - Parametrised test of `description_from_arrow_schema` recovering `array` / `map` / `struct` from metadata, case-insensitively. - Negative test that an unknown `databricks.type_name` defers to the Arrow type rather than corrupting the description. 85 → 94 kernel unit tests; full suite green; black-formatted. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Widen the thrift runtime dependency to >=0.22.0,<0.24.0 so downstream consumers can resolve thrift 0.23.0. Fixes: #783 Signed-off-by: Leo Romanovsky <leo.romanovsky@datadoghq.com>
ci: add cross-repo IT dispatch + merge-queue gate
Wires this repo into the proxy-based integration test infrastructure
in databricks/databricks-driver-test. Mirrors the canonical pattern
established in adbc-drivers/databricks (5-job workflow, not the
simpler ODBC variant).
How it works:
- On a normal PR event (open / push / reopen / non-IT label) we
post a `success` Python Proxy Tests check immediately so the
required check doesn't block the PR. The real tests run in the
merge queue.
- When a maintainer adds the `integration-test` label we dispatch
the suite as a preview — useful for catching regressions before
merge queue time. Includes path-filter detection: if only
docs / tests/unit / etc. changed, auto-pass instead of dispatching.
- Pushing new commits auto-removes the label so a subsequent
labelled run requires a fresh maintainer review.
- On the `merge_group` event the suite runs as the real required
gate. Only PRs whose tests dispatch (or auto-pass when no driver
files changed) can proceed to `main`.
Security:
- Label-gated PR preview: only maintainers can apply
`integration-test`. Fork PRs are not auto-trusted.
- Auto-removed on synchronize: pushing new commits clears the
label, forcing maintainer re-review before tests re-run.
- PR titles are sanitized before interpolation into the dispatch
JSON payload (prevents quote/newline-based JSON injection).
- Dispatch failures post a `failure` check-run via a separate
public-repo App token so the gate never silently turns green
when the dispatcher itself broke.
Operational follow-ups (outside this PR):
1. Create the `integration-test` label in this repo (already done).
2. Install `INTEGRATION_TEST_APP_ID` / `INTEGRATION_TEST_PRIVATE_KEY`
secrets for the GitHub App with write access to
databricks/databricks-driver-test.
3. Enable merge queue on `main` branch protection AND mark
`Python Proxy Tests` a required status check. Until this is done
the merge-queue job is dead code and ITs run only on explicit
label.
Addresses review comments on databricks-sql-python#799 (JSON
injection + sparse merge-queue payload) and adopts the
adbc-drivers/databricks canonical pattern wholesale.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* ci(integration-tests): use github.token for check-run posters Follow-up to #799. The dispatch failure handlers and auto-pass steps were posting check-runs with `steps.public-token.outputs.token`, which is itself an App-token-generating step. That created a silent-failure trap: if the App secrets are missing or rotated, the App-token step fails, then the failure handler also fails (no token to authenticate with), and the gate sits green from the earlier `skip-integration-tests-pr` job's synthetic-success check — the exact silent-pass anti-pattern the failure handler exists to prevent. Discovered by exercising the dispatch end-to-end on a draft PR before the App secrets were installed (#800 closed). The canonical adbc-drivers/databricks workflow has the same latent bug — fix not yet upstreamed there. The fix is to use the default workflow `${{ github.token }}` for all check-posting steps. The default token already has `checks: write` because each job declares the permission. `steps.public-token` is no longer referenced anywhere; the generation step is removed to keep the workflow tidy. The App token is still used (correctly) for the actual dispatch call into databricks-driver-test, where cross-repo write access is required. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test(retry): isolate retry tests from shared telemetry executor The e2e retry tests in retry_test_mixins.py assert exact urllib3 call counts (e.g. `mock_validate_conn.call_count == 6`) while mocking urllib3 globally via `_validate_conn` / `_get_conn`. The shared `TelemetryClientFactory` executor — populated by *prior* tests in the same worker — keeps firing background telemetry pushes that land on the same mocked urllib3 layer, inflating the counts and tripping the assertions. Drain the factory and route any new `initialize_telemetry_client` call to `NoopTelemetryClient` for the duration of the three tests that assert exact counts: - test_oserror_retries - test_retry_max_count_not_exceeded - test_retry_exponential_backoff Factory state is fully restored on exit so subsequent tests see the real telemetry pipeline. Co-authored-by: Isaac --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…as_string + precision/scale + named params + utils.exc fix (#803) Five small comparator-parity fixes surfaced by the same audit run. All are independently small but ship together because they share the kernel-backend client surface and the audit baseline. ### 1. Async Statement retention `KernelDatabricksClient.execute_command` closed the parent `Statement` in `finally` regardless of `async_op`. The kernel's `Statement.close()` invalidates child handles (see databricks-sql-kernel `src/statement/validity.rs`), so the async handle died before the user could poll it — breaking `execute_async` → `is_query_pending` → `get_async_execution_result`. Fix: when `async_op=True`, retain the parent Statement in a new `_async_statements` dict alongside `_async_handles`, and close it from `close_command`, `close_session`, and `get_execution_result` after the executed handle is done. Comparator outcome: STATEMENT_ASYNC suite 3/3 match (was 0/3). ### 2. `intervals_as_string` wire-through pyarrow's Python bindings cannot decode Arrow's `month_interval` type (id 21 — `KeyError` from `.as_py`, `to_pylist`, `cast(string)`, `to_pandas`). Every kernel-backend `SELECT *` over a table with an `INTERVAL YEAR TO MONTH` column raised `ArrowNotImplementedError` — 32 / 88 audit diffs. Fix: pass `intervals_as_string=True` to the kernel `Session(...)` constructor unconditionally. The kernel post-processor stringifies `Interval` / `Duration` columns server-side to `Utf8` (kernel PR #64). Comparator outcome: bucket A (ArrowNotImplementedError) 32 → 0. ### 3. Decimal precision/scale extraction `description_from_arrow_schema` hard-coded `None` for PEP 249 description-tuple slots 4 and 5. For DECIMAL columns the Arrow schema carries `precision` / `scale` on `Decimal128Type`, but they were silently dropped — 88 cell diffs (44 precision + 44 scale). Fix: factor out `_precision_scale_for_arrow_type(arrow_type)` and call it from the description builder. Today it only handles decimals; future extensions slot in here. Comparator outcome: 88 precision+scale diffs → 0. ### 4. Named-parameter binding `bind_tspark_params` raised `NotSupportedError` for any `TSparkParameter` with `name` set. The canonical SEA proto marks `StatementParameter.name` as `openapi_required=true` (named is the spec-required public form; `ordinal` is `PUBLIC_UNDOCUMENTED`). Kernel PR #65 added a `Statement.bind_named_param` PyO3 API. Fix: route named bindings via the new API. Positional ordinals self-increment so a named entry in the middle of the list doesn't claim a positional slot. Comparator outcome: PREPARED_STATEMENT_NAMED 1/1 match (was 0/1). Full thrift-vs-kernel run: 97/132 match (was 96/132). ### 5. `utils.ParamEscaper` `exc.ProgrammingError` import fix `ParamEscaper.escape_args` and `escape_item` both raised `exc.ProgrammingError(...)`, but `exc` was never imported. On any unsupported parameter shape the user saw `NameError: name 'exc' is not defined` instead of a clean PEP 249 `ProgrammingError`. Surfaced via the same audit harness running INLINE_PARAMS: both backends raised `NameError`, which the comparator's class+message match treated as parity — a false-positive that hid both the driver bug and the underlying caller-side type mismatch. Fix: import `ProgrammingError` directly from `databricks.sql.exc` (matching the pattern used in `session.py`, `client.py`, `result_set.py`, etc.) and replace the two `exc.ProgrammingError(...)` sites with bare `ProgrammingError(...)`. ## Dependencies Items #2, #3, and #4 require the matching databricks-sql-kernel changes: PR #64 (`intervals_as_string` + empty-result schema fix) and PR #65 (named-param binding). For local testing the comparator harness uses `KERNEL_FREEZE=1` against a kernel checkout of the feature branches. ## Headline comparator results (full thrift-vs-kernel run) | | Before | After | |------------------------------------|--------|-------| | match / diff / skipped | 60 / 88 / 0 | **97 / 34 / 1** | | Bucket A (ArrowNotImplementedError)| 32 | 0 | | `decimal_column` precision/scale | 88 | 0 | | Bucket B1 (named params) | 1 | 0 | | Suites fully clean | 12 / 30 | **17 / 30** | Remaining 34 diffs cluster into documented causes (complex types in `fetchall_arrow`, timestamp tz on Arrow path, VOID surface, METADATA pattern semantics) — tracked in `~/docs/python-kernel/comparator-diff-tasklist.md`. ## Test plan - [x] Manual repro of async path: `cur.execute_async("SELECT 1")` → `is_query_pending` → `get_async_execution_result` → `fetchall` succeeds on `use_kernel=True` - [x] Manual repro of interval path: `cur.execute("SELECT ym_interval_column FROM ...")` returns string-shaped rows matching the Thrift surface - [x] Manual repro of decimal path: `cur.description` for a `DECIMAL(10,2)` column now reports `precision=10, scale=2` - [x] Live e2e: `test_parameterized_query_named_params` / `test_parameterized_query_named_param_with_null` (added) - [x] Unit tests: `bind_tspark_params` named/positional/mixed routing (`tests/unit/test_kernel_type_mapping.py`) - [x] Manual repro of utils fix: `ParamEscaper().escape_args(object())` raises `ProgrammingError`, not `NameError` - [x] Full thrift-vs-kernel comparator: 97 / 34 / 1 of 132 - [x] Existing connector unit suite: 752 passed Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…s concurrent CI jobs (#805) * [PECOBLR-2461] fix(tests/e2e): de-collide MST table names across concurrent CI jobs `_unique_table_name` derived the table name purely from the test node id, so two CI runs racing on the same warehouse + catalog (e.g. PR + push to main landing within seconds, as happened in run 26410038645) would both target `peco.default.mst_pysql_<test_name>` and step on each other's CREATE / DROP / DML. This caused the three "flaky" failures we kept seeing in `test_transactions.py`: - test_multi_table_commit → TRANSACTION_ROLLBACK_REQUIRED_AFTER_ABORT - test_executemany_rollback_in_txn → TABLE_OR_VIEW_NOT_FOUND - test_write_conflict_single_table → TABLE_OR_VIEW_ALREADY_EXISTS The companion helper `_unique_table_name_raw` already appended a uuid4 suffix for exactly this reason; the fixture helper was an oversight from #775. Add the same suffix here, reserving room so the result still fits in the 80-char cap. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * also de-collide UC Volume tests and add PR-branch concurrency cancellation While verifying the transactions fix, a second flake surfaced — the same cross-CI race pattern, this time on a UC Volume file path: tests/e2e/test_driver.py::TestPySQLCoreSuite:: test_uc_volume_put_fails_if_file_exists_and_overwrite_not_set → Failed: DID NOT RAISE <class 'databricks.sql.exc.ServerOperationError'> Two tests in `tests/e2e/common/uc_volume_tests.py` PUT/GET/REMOVE against the hardcoded path `/Volumes/{catalog}/{schema}/e2etests/file1.csv`. When two CI runs overlap (as happened on this PR after the force-push for DCO sign-off), one run's REMOVE deletes the other run's file between its two PUTs, so the expected `FILE_IN_STAGING_PATH_ALREADY_EXISTS` never fires. Suffix the volume path with `uuid4().hex[:8]` in the two affected tests (test_uc_volume_life_cycle and test_uc_volume_put_fails_if_file_exists_ and_overwrite_not_set). The other UC Volume tests that reference the same path are exercising client-side / server-parse failure paths that never touch the file — leaving those alone keeps the diff focused. Also add a `concurrency` block to the E2E workflow: - On pull_request, cancel-in-progress: a new push / force-push on a PR cancels the previous run on that ref. Eliminates the entire class of "force-pushed during CI → two runs racing on shared warehouse state" failures. - On push to main, runs are NOT cancelled — each merge commit keeps its own clean CI signal so a regression on commit N can't be hidden by commit N+1 landing seconds later. Concurrent main runs can still collide on shared state, but the uuid-suffix conventions in the e2e tests are what keep that isolated. Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…rage (#808) * ci: add kernel-e2e workflow + KERNEL_REV pin Wires up CI coverage for use_kernel=True. The kernel is a private repo with no published wheel, so we pin a kernel SHA in KERNEL_REV and build the wheel inline via maturin develop using the existing INTEGRATION_TEST_APP GitHub App (extended to include databricks/databricks-sql-kernel in its repo allowlist). Gate semantics mirror trigger-integration-tests.yml: - Plain PR events post a synthetic-success Kernel E2E check so the required check doesn't block PRs that don't touch kernel code. - The kernel-e2e label triggers a preview run on the PR and is auto-removed on synchronize for the same security reason as the integration-test label. - merge_group is the real gate — runs when kernel-relevant files change (src/databricks/sql/backend/kernel/, test_kernel_backend.py, KERNEL_REV, etc.), auto-passes otherwise. Unit tests are unchanged: tests/unit/test_kernel_*.py already runs in every code-quality-checks.yml matrix combo against a fake databricks_sql_kernel module injected at sys.modules import time. Required follow-up before this merges: 1. Extend the INTEGRATION_TEST_APP allowlist to include databricks/databricks-sql-kernel. 2. Create the kernel-e2e label in this repo. 3. Add Kernel E2E as a required check on main once a green run lands. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * ci(kernel-e2e): add id-token: write for JFrog OIDC exchange setup-poetry runs setup-jfrog, which exchanges a GitHub OIDC token for a JFrog access token to reach the internal PyPI mirror. That needs id-token: write on the job, which was missing — the labelled preview run failed at setup-poetry with "ACTIONS_ID_TOKEN_REQUEST_TOKEN: unbound variable". Declared at both workflow scope and on run-kernel-e2e directly: a job-level permissions block fully overrides workflow scope, so the redundancy is intentional. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * ci(kernel-e2e): build kernel wheel into connector venv, not a new one `poetry run maturin develop` from inside databricks-sql-kernel/pyo3/ makes poetry create a fresh, empty .venv next to the kernel source (it discovers pyo3/pyproject.toml first and treats it as the project root). That venv has no maturin → "Command not found: maturin". Resolve the connector venv's python path explicitly before changing working directory, then call maturin from that python via `-m maturin`. `--interpreter <path>` pins the produced wheel to the connector venv so the resulting extension is installed where pytest will look for it. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * ci(kernel-e2e): drop --interpreter from maturin develop (not a valid flag) maturin develop installs into whichever python invoked it; the flag exists on `maturin build` only. The previous commit's extra `--interpreter $CONNECTOR_VENV_PY` was redundant — we're already calling maturin via `$CONNECTOR_VENV_PY -m maturin`, so the venv python is the one doing the build and install. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * ci(kernel-e2e): route cargo through JFrog + audit cleanups databricks-protected-runner-group blocks direct egress to index.crates.io, so the maturin build was failing with SSL EOF on the cargo metadata step. Extend setup-jfrog with an opt-in `configure-cargo` input that writes ~/.cargo/config.toml + credentials.toml against the JFrog db-cargo-remote proxy (recipe borrowed verbatim from databricks-odbc's setup-jfrog action) and forward it through setup-poetry so the kernel-e2e workflow can enable it without bypassing the wrapper. Bundled cleanups from a workflow audit: - Drop the redundant `Set up Python 3.10` step — setup-poetry runs actions/setup-python internally at the matching version. - Smoke-check now uses `$CONNECTOR_VENV_PY` (same interpreter we built the wheel with), so a wheel installed into the wrong venv would surface here rather than be masked by `poetry run python` re-resolving. - Post `Kernel E2E` check on the labelled-PR path as well as the merge-queue path; previously the PR would still show the synthetic-success check forever even after a real labelled run failed. - Add a comment to fetch-depth: 0 explaining why we keep it. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * ci(kernel-e2e): bump KERNEL_REV to current kernel main The original pin (aed2efb) predates kernel PR #36 which added `complex_types_as_json` to Session.__new__. Connector main already passes that kwarg (added in PR #795), so every e2e test was failing with: TypeError: Session.__new__() got an unexpected keyword argument 'complex_types_as_json' Bump to current kernel main (3aa25b21) which has the kwarg plus the rest of the comparator-parity changes the connector code already expects. This is a good demonstration of why the bisectable KERNEL_REV pin matters: the connector and kernel evolved in lockstep on `main` before this CI existed, so the very first thing the workflow does once it can actually build the wheel is catch that we'd been shipping a stale pin. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * ci(kernel-e2e): disable bundled rust-cache in setup-rust-toolchain actions-rust-lang/setup-rust-toolchain invokes Swatinem/rust-cache internally, which runs `cargo metadata` from the workflow's working directory. Our job's CWD is the connector repo root (no Cargo.toml there — the kernel checkout is in a subdir), so the bundled cache attempt fails with exit 101 and dumps a Node stack trace into the log. It's cosmetic — the action handles its own errors — but reads as a failure on first glance, and the bundled cache races with the explicit rust-cache step we already configure with the correct `workspaces: databricks-sql-kernel` path. Disabling the bundled cache leaves a single, correctly-keyed rust-cache invocation and cleans up the log. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…ive (#809) The "Required external setup" block read like a TODO list, but every item on it is in place after #808 landed and the main ruleset was updated with required_status_checks + merge_queue. Reword as a "things this depends on (debugging crib)" and add the ruleset note so future readers know why the merge_group path matters. Also serves as a low-risk pilot PR for exercising the new merge queue end-to-end — comments-only diff, no behavioral change. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Running the coverage suite on push:main is observational — by the time it fires, the merge has already happened and a failure can't block anything. Moving the same trigger to merge_group runs the suite against the queue's transient branch (current main + the queued PR diff, freshly merged), which is the same state push:main would have tested, but where a failure can actually eject the PR from the queue before it damages main. pull_request stays as-is so the PR author still gets fast feedback while iterating; the queue run is the gate. Not making this a required status check in the same PR — the suite takes ~17 min and would add that to every queue merge, even for PRs that don't touch driver code. Promote to required later if main breakages slip past the merge_group run without it being mandatory. The coverage override (SKIP_COVERAGE_CHECK in PR body) intentionally doesn't apply to merge_group runs — the override is an author-time escape hatch, not a queue-time bypass. github.event.pull_request.body resolves to empty under merge_group, which naturally degrades to "no override" without code changes. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
The proxy-test suite in databricks-driver-test is now a
`mode: [thrift, kernel]` matrix that posts two named check runs per
dispatch: `Python Proxy Tests / thrift` and `Python Proxy Tests /
kernel` (see databricks-driver-test#369). The single unsuffixed
`Python Proxy Tests` check is no longer posted by the matrix legs,
so the synthetic-success / auto-pass / dispatch-failure stanzas in
this workflow must fan out to both names — otherwise the
required-status-check entry `Python Proxy Tests` would still expect
the unsuffixed name but no workflow would post it.
The `MODES` constant at the top of each script is the single source
of truth for the matrix axis; keep it in sync with the matrix in
databricks-driver-test/.github/workflows/python-proxy-tests.yml.
Rollout order:
1. Land databricks-driver-test#369. Matrix legs start posting the
suffixed names. The legacy `Python Proxy Tests` required check
is still satisfied by this workflow's existing single
synthetic-success path (this PR is not merged yet), so PRs
continue to pass.
2. Land this PR AND update branch protection on `main` to swap the
required-status entry `Python Proxy Tests` for both
`Python Proxy Tests / thrift` and `Python Proxy Tests / kernel`.
Order between these two is fine either way — the broken window
is just the time between them. Do them together to minimise it.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…try tests under merge-queue load (#812) * test(telemetry/e2e): make TestTelemetryE2E deterministic The previous tests asserted "telemetry round-tripped to the server" by intercepting TelemetryClient._telemetry_request_callback and counting completed futures. That recording lags the actual work — the callback fires asynchronously after the HTTP request completes, and on the *last* connection close TelemetryClientFactory.close() shuts the shared executor down with wait=False (intentional, for connection-close latency in production). Two consequences: 1. A `wait(captured_futures, timeout=10)` call right after `with conn:` can return before any callbacks have fired — so the wait is "waiting on" an empty list, returns immediately, and the assertion `assert len(done) == expected_count` fails non- deterministically with `assert 0 == 2` or `assert 1 == 2`. 2. The shared-executor shutdown(wait=False) can drop in-flight submissions that haven't started running yet, so even if we drained correctly we'd be testing whether the server happened to receive the request in time, not whether the connector correctly dispatched it. Switch interception from `_telemetry_request_callback` to `_send_telemetry`. That captures the connector's *intent to submit* synchronously, which is what we actually want to test — the connector either decided to send a batch or it didn't, regardless of what happens to the future afterward. No sleep needed, no timeout-based wait needed, no race against the executor shutdown. 5 consecutive local runs pass deterministically in ~20s each (down from ~17 min when the flake hit). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * ci/test: deflake retry tests under merge-queue load Two compounding fixes that surfaced on PR #812's first merge_group run, where test_oserror_retries failed with `assert mock_validate_conn.call_count == 6` — unexpected `/telemetry-ext` requests had been counted alongside the intended session-endpoint retries. 1. tests/e2e/common/retry_test_mixins.py — strengthen `_isolated_from_telemetry()` with two additional defensive patches: - TelemetryClient._send_telemetry → no-op - TelemetryClient._export_event → no-op The existing factory swap installs NoopTelemetryClient for new connections, but doesn't cover real TelemetryClient instances that slip in via other paths (stale module-global, code that bypasses initialize_telemetry_client, anything created before the context entered). Patching at the class level catches all of them. 2. .github/workflows/code-coverage.yml — serialise merge_group runs. Previous concurrency group keyed on github.ref, which is per-PR in the queue (gh-readonly-queue/main/pr-N-…). That allowed multiple queue entries to hammer the same warehouse in parallel, stressing telemetry / retry paths that single-PR runs don't exercise. Group merge_group + workflow_dispatch under a single fixed name (e2e-mq-serial) so they run one at a time. PR-event runs keep per-ref grouping + cancel-in-progress for fast author feedback. Trade-off: queue throughput drops to one ~17-min run at a time. Folded into PR #812 so the telemetry-test rewrite and the retry-test deflake ship as a single unit. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* Removed the unncecessary chunk * Added tests
) * Extract SPOG org-id from cluster http_path for non-Thrift requests For all-purpose-compute Thrift connections on SPOG (custom-URL) hosts the http_path is /sql/protocolv1/o/<workspace-id>/<cluster-id> and the workspace ID is encoded in the path itself. PoPP routes the Thrift request correctly off the /o/<wsid>/ segment, so the connection succeeds without an explicit ?o= query parameter. Other requests on the same connection (telemetry uploads to /telemetry-ext, feature-flag fetches, SEA REST calls) hit different paths that don't carry the workspace ID. Previously _extract_spog_headers only looked at ?o= in the http_path, so the x-databricks-org-id header was never set for cluster URLs without ?o=. On SPOG hosts PoPP then had no workspace context for these requests and redirected them to /login, silently dropping telemetry. Extend _extract_spog_headers to also extract the workspace ID from the cluster path segment as a fallback when ?o= is absent. Priority order: explicit caller header > ?o= query param > /o/<wsid>/ path segment. Adds five unit tests covering the new cluster-path extraction, leading slash, query-param-wins priority, explicit-header-wins priority, and a warehouse-path regression guard. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Respect mixed-case SPOG org-id headers Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Validate SPOG org id extraction Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…nel path (#819) * feat(kernel): wire OAuth (M2M/U2M), TLS/mTLS, and Geometry on use_kernel path Consume the OAuth + TLS surface added to the Rust kernel's pyo3 binding so the use_kernel backend reaches feature parity with the Thrift/SEA backends for these flows. Bumps KERNEL_REV to the kernel commit that ships the new Session kwargs. OAuth (kernel-owned lifecycle): - auth_bridge.kernel_auth_kwargs now takes the raw connect() auth options (the OAuth secret is consumed during auth_provider construction and isn't recoverable from the built provider, so we read it from the original kwargs): - OAuth M2M: oauth_client_id + oauth_client_secret are forwarded to the kernel's auth_type='oauth-m2m'; the kernel acquires and refreshes tokens itself via workspace OIDC client-credentials. - OAuth U2M: auth_type 'databricks-oauth' / 'azure-oauth' route to the kernel's auth_type='oauth-u2m' (optional oauth_client_id / oauth_redirect_port forwarded); the kernel runs the browser flow. - A custom credentials_provider is rejected with a clear NotSupportedError — it's an opaque token source with no extractable raw creds, so the kernel can't own the lifecycle. Callers are pointed at oauth_client_id / oauth_client_secret. - oauth_client_secret is a new connect() kwarg, honored only on the use_kernel path; Thrift/SEA are unaffected. - session.py forwards the raw auth kwargs into KernelDatabricksClient. TLS / mTLS: - client._kernel_tls_kwargs translates the connector's SSLOptions into the kernel's tls_* Session kwargs: tls_trusted_ca_file -> tls_ca_cert (PEM bytes), client cert/key files -> tls_client_cert / tls_client_key (mTLS), and inverts tls_verify -> tls_skip_verify / tls_verify_hostname -> tls_skip_hostname_verify. A password-protected client key is rejected (the kernel has no surface for it yet). ssl_options is no longer accept-and-ignored on this path. Geometry: - type_mapping recovers GEOGRAPHY / GEOMETRY result columns from the databricks.type_name metadata (they arrive as Arrow Utf8, like VARIANT) so PEP-249 description reports the precise type instead of collapsing to string. Tests: 140 kernel unit tests pass (new M2M/U2M routing, custom-provider rejection, SSLOptions->tls_* translation incl. mTLS + encrypted-key rejection, and geospatial type recovery). black + mypy clean. Note: KERNEL_REV points at the kernel's feat/pyo3-oauth-tls branch HEAD until that PR merges; re-pin to the squash-merge SHA before this lands. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * chore(kernel): re-pin KERNEL_REV to updated feat/pyo3-oauth-tls HEAD The companion kernel PR (#107) was amended with review fixes (doc backend-inversion fix, U2M degenerate-value validation), changing its branch HEAD SHA. Re-pin so this PR's kernel-e2e builds against the current kernel. Still the branch HEAD — re-pin to the squash-merge SHA once #107 lands. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(kernel): address review — re-pin KERNEL_REV, secret hygiene, auth ambiguity, TLS superset Addresses @gopalldb's review on #819. P0: - Re-pin KERNEL_REV to the squash-merge SHA of the now-merged kernel PR (ec2288742cbac0cd9fab50da353e8405972eefe9 on kernel main), replacing the orphaned branch-HEAD SHA. P1: - Scrub oauth_client_secret from the long-lived self._auth_options in open_session's finally (even on the failure path). It outlives the method, so a retained secret was exposed to vars()/pickle/debugger far longer than needed; the kernel now owns it. - tls_verify=False now also emits tls_skip_hostname_verify=True — no-verify subsumes hostname verification, matching SSLOptions' create_ssl_context (check_hostname=False when tls_verify is False). - Reject ambiguous auth combos before resolving, instead of silently picking M2M: (a) credentials_provider + oauth_client_secret, and (b) a U2M auth_type (databricks-oauth/azure-oauth) + oauth_client_secret. Both raise NotSupportedError naming the conflict, so the failure is at session-open rather than a confusing first-call 401 against the wrong principal. - Secret-leak tests: oauth_client_secret is forwarded to the kernel but scrubbed from _auth_options, and absent from repr(conn) / vars(conn); scrub runs even when open_session raises. P2: - _read_pem_bytes rejects an empty/whitespace CA/cert file with a clear ProgrammingError instead of passing empty PEM to the kernel. - _normalize_scopes raises ProgrammingError on a non-str/list/tuple oauth_scopes rather than silently dropping to default scopes. - Added U2M oauth_scopes forwarding test. 150 kernel unit tests pass; black + mypy clean. Note: KERNEL_REV now points at the merged kernel main SHA; no further re-pin needed before merge. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(kernel): don't build connector OAuth provider on use_kernel path + add TLS e2e Live mitmproxy TLS e2e (this commit) surfaced a real bug: on use_kernel=True the connector still built its own auth provider in Session.__init__ via get_python_sql_connector_auth_provider, BEFORE use_kernel was consulted. For OAuth that constructor eagerly runs the flow (DatabricksOAuthProvider calls _initial_get_token in __init__), so passing oauth_client_id for kernel-managed M2M fired the U2M *browser* flow at connect() time — opening a browser and using the M2M service principal id as a U2M OAuth app client_id (which the account rejects). Fix: on use_kernel=True, do NOT build the connector's provider. Hand the kernel auth bridge a minimal AccessTokenAuthProvider when an access_token is present, else None; OAuth M2M/U2M resolve purely from the raw connect() kwargs the bridge already reads, and the kernel owns the entire token lifecycle. Thrift/SEA backends are unchanged. - session.py: branch the provider build on use_kernel; import AccessTokenAuthProvider; update the kernel-branch comment. - auth_bridge.py: kernel_auth_kwargs / _is_pat / _extract_bearer_token accept Optional[AuthProvider] (None when no token on the kernel path); clearer "no credentials" error; refreshed module docstring. - tests/unit/test_session.py: regression guards — use_kernel must NOT call get_python_sql_connector_auth_provider, forwards raw M2M creds via auth_options, and uses a minimal AccessTokenAuthProvider for PAT. TLS e2e (the connector-side counterpart to the kernel's v0_tls_e2e.rs): - tests/e2e/test_kernel_tls.py: 3 mitmproxy-backed tests driving sql.connect(use_kernel=True, _tls_*=...) — strict default rejects the re-signed cert, _tls_trusted_ca_file trusts it, _tls_no_verify bypasses. Skips cleanly without the kernel wheel / creds / MITMPROXY_CA_CERT. Verified live against the dogfood workspace on the OAuth M2M path (all 3 green) — this is what caught the bug above. - kernel-e2e.yml: run test_kernel_tls.py in the existing (label/merge- queue-gated) run-kernel-e2e job behind a mitmproxy service. The kernel wheel is built before the proxy is in scope so cargo's JFrog fetch isn't routed through it; HTTPS_PROXY is scoped to the TLS test step. Path filter now also triggers on session.py and test_kernel_tls.py. 184 unit tests pass; black + mypy clean. Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test(kernel): make use_kernel auth-provider guards wheel-independent The two TestKernelAuthProviderBypass tests went through databricks.sql.connect(use_kernel=True), which triggers the lazy `import databricks_sql_kernel` in the kernel client module. That import fails in the unit-test job (no Rust wheel installed), so the tests errored with ImportError before their patches applied — green locally (wheel present) but red in CI. Rewrite them to exercise Session.__init__'s provider-selection logic directly with _create_backend stubbed, so the kernel client (and its import) is never touched. Assert on session.auth_provider and that get_python_sql_connector_auth_provider is not called. Verified passing both with the wheel present and with `import databricks_sql_kernel` forcibly blocked (reproducing the CI environment). Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…ernel (#820) * feat(kernel): wire retry/backoff params on use_kernel + verify metric-view passthrough Consume the kernel's newly-configurable retry policy (kernel PR #117) from the use_kernel path, and confirm UC Metric View metadata works end-to-end now that the kernel allowlists its session conf. Bumps KERNEL_REV to the merged kernel main. Retry: - session.py forwards the connector's _retry_* kwargs into the kernel client (new retry_options, kernel-only; Thrift/SEA unaffected). - client._kernel_retry_kwargs maps them to the kernel Session retry kwargs: _retry_delay_min -> retry_min_wait_secs, _retry_delay_max -> retry_max_wait_secs, _retry_stop_after_attempts_count -> retry_max_attempts (1:1 total-attempts; the kernel does the retries-after-first conversion), _retry_stop_after_attempts_duration -> retry_overall_timeout_secs. Connector delays are float seconds; the kernel takes whole seconds, so we round, flooring any positive sub-second value to 1s (never collapse a configured backoff to 0). _retry_delay_default has no kernel equivalent (the kernel's no-Retry-After backoff is exponential from retry_min_wait). Metric View: - No connector wiring needed: enable_metric_view_metadata already injects spark.sql.thriftserver.metadata.metricview.enabled into session_configuration (backend-agnostic), which the kernel client passes through to session_conf. Kernel #117 now allowlists that conf verbatim, so it reaches the server on the use_kernel path. Added an e2e test confirming the round-trip. KERNEL_REV -> b4d88220cdfad8dba1cfa89892269342ae26feeb (kernel main with retry config + metric-view allowlist). Tests: unit tests for _kernel_retry_kwargs (rounding, sub-second floor, count 1:1, only-set-keys) and retry-options threading through session.py (wheel-independent, verified with the kernel import blocked); live e2e (retry params accepted + metric-view passthrough) green against dogfood. black + mypy clean. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test: skip retry-threading test when pyarrow absent test_retry_kwargs_threaded_into_kernel_client patches KernelDatabricksClient, which requires importing databricks.sql.backend.kernel.client — that module imports pyarrow at load. In the no-pyarrow CI tier the patch target can't be resolved (AttributeError: module '...kernel' has no attribute 'client'), failing the Unit Tests (non-PyArrow) jobs while the PyArrow jobs passed. Add pytest.importorskip("pyarrow") to the test (matching the other kernel tests) so it skips without pyarrow and runs where it's present. Corrected the stale class docstring (it patches the client directly, not via a _create_backend stub). Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
The retry tests in PySQLRetryTestsMixin patch urllib3 globally
(HTTPSConnectionPool._get_conn / _validate_conn) and assert the
connection's own request was retried exactly N times. With telemetry on
(the default), the same connection makes two kinds of side-channel HTTP
calls through the same mocked pool that inflate the count and flake the
assertion (e.g. `assert call_count == 6` -> AssertionError under
merge-queue load):
1. A synchronous feature-flag GET during Connection.__init__
(is_telemetry_enabled -> FeatureFlagsContext._refresh_flags), and
2. Async telemetry POSTs to /telemetry-ext from the background executor
and the periodic flush thread.
The prior mitigation (_isolated_from_telemetry, #812) only attacked
source #2 and imperfectly, so the tests kept flaking.
Fix (test-only):
- Add enable_telemetry=False to the shared _retry_policy dict. It
short-circuits is_telemetry_enabled before the feature-flag fetch
(kills #1) and installs a NoopTelemetryClient (kills #2), removing
both sources at their origin without changing the retry behavior
under test. Every test in the mixin merges _retry_policy, so this
one line covers them all.
- Harden _isolated_from_telemetry with a backstop that no-ops the
actual HTTP boundary (_send_with_unified_client), closing the
already-submitted-future gap for any other caller of the helper.
- Add a version_string forward-compat shim to the SimpleHttpResponse
mock so the suite also runs on urllib3 >= 2.3 (harmless on the
2.2.x pinned in CI).
Verified against dogfood: test_oserror_retries and
test_retry_max_count_not_exceeded pass 3/3 deterministically for both
Thrift and SEA; the full PySQLRetryTestsMixin (36 tests) passes.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
… use_kernel (#823) * feat(kernel): wire statement query tags + http_headers/user_agent on use_kernel Consume the kernel surface from kernel PR (query tags + custom HTTP headers) on the use_kernel path. Bumps KERNEL_REV. Query tags (statement-level): - execute_command no longer raises NotSupportedError for query_tags; it calls stmt.set_query_tags(query_tags) after set_sql. The connector already passes Dict[str, Optional[str]], which the kernel accepts (None value -> bare key in the SEA query_tags conf). http_headers + user_agent_entry: - The kernel client now forwards http_headers to the kernel Session as the `http_headers` kwarg (was accept-and-ignore). session.py already passes all_headers, which carries the connector's composed User-Agent (PyDatabricksSqlConnector/x (entry)) + caller headers + SPOG org-id. - The kernel applies them per request: its own Authorization / org-id win; a caller User-Agent is APPENDED to the kernel base UA (the base carries the DatabricksJDBCDriverOSS token that gates the SEA result disposition, so it's never replaced). So user_agent_entry is honored end-to-end via the existing http_headers forwarding — no separate kwarg needed. Tests: - unit: query_tags forwarded to set_query_tags (was: rejection test); http_headers forwarded to the kernel Session (and omitted when empty). - e2e (test_kernel_backend.py): a query_tagged query and a connection with user_agent_entry + a custom http_header both round-trip green against a dogfood warehouse. The UA case specifically guards the append behavior (replacing the base UA would 400 on the result disposition). KERNEL_REV -> c2053f68b75fef4a29425096dc6bbafb774d8b83 (kernel PR #119 branch HEAD; re-pin to the squash-merge SHA once #119 lands). 194 kernel unit tests pass; black + mypy clean; 3 e2e pass live. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(kernel): address review — secret-scrub ordering, drop reserved headers Addresses @gopalldb's review on #823. P1.1 — secret scrub now covers a kwarg-build failure: - open_session built auth/tls/retry/http_headers kwargs OUTSIDE the try whose finally scrubs oauth_client_secret. A raise in kernel_auth_kwargs (e.g. OAuth token-exchange failure with the M2M secret in hand) would skip the scrub, leaking the secret onto the long-lived connector. Moved all kwarg builds inside the try (auth_kwargs / tls_kwargs pre-declared empty so the finally can reference them on an early raise). P1.3 / P2 — don't forward kernel-managed headers: - The connector now drops Authorization and x-databricks-org-id from http_headers before forwarding to the kernel (new _KERNEL_MANAGED_HEADERS set). The kernel manages both (auth from the provider; org-id re-derived from ?o= in http_path), so forwarding is redundant — and the connector always injects the SPOG org-id, which the kernel skips-and-warns per request, so this also kills the WARN spam. Double-walls the kernel's own reserved-name skip. Tests: - unit: Authorization / x-databricks-org-id dropped before forwarding; only-reserved-headers omits the kwarg entirely (test_kernel_client). - P1.2: user_agent_entry reaches the kernel client's http_headers on use_kernel=True (test_session) — guards a regression where session.py stops folding the entry into the composed User-Agent. CHANGELOG: deferred to release-cut per repo convention (entries are added at version bump, not per-PR; #819/#820 followed the same). 197 kernel unit tests pass; black + mypy clean; 3 e2e pass live. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * chore: re-pin KERNEL_REV to merged kernel main (101aa46) #119 (statement query tags + custom HTTP headers) is now merged to kernel main. Re-pin from the orphaned branch HEAD (c2053f6) to the current merged main HEAD (101aa46), which contains #119's merge commit (df8302f) and is reachable from main — no orphan-SHA risk. Verified end-to-end against a wheel built from 101aa46: kernel e2e (query_tags_round_trip, user_agent_entry + http_headers round_trip, select_one) all pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* Bump to version 4.2.7 Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Changelog: note kernel retry/backoff params (#820) after main merge Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> * Changelog: drop experimental kernel bullet (not usable yet) Per review feedback on #821 — the use_kernel path isn't usable in 4.2.7. Co-authored-by: Isaac Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com> --------- Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
…sync cancel (#825) * feat(kernel): wire CUJ-gap fixes — staging fail-loud, error context, sync cancel Connector half of the API+CUJ audit fixes (kernel half: databricks-sql-kernel PR #121). Bumps KERNEL_REV to pick up the kernel surface. Staging fail-loud (kernel/client.py): - Volume/staging PUT/GET/REMOVE silently no-op'd on the kernel path (KernelResultSet.is_staging_operation is always False, so the connector's _handle_staging_operation never fired and no file was transferred). Detect the leading verb in execute_command and raise NotSupportedError so ETL fails loud instead of ingesting stale data. Error context (_errors.py): - Forward display_message / diagnostic_info / error_details_json (now exposed across the pyo3 boundary in #121) onto the re-raised PEP-249 exception, and populate ServerOperationError.context with "diagnostic-info" (Spark stack trace) + "operation-id" — matching the Thrift backend so callers reading err.context work identically. Sync cancel wiring (client.py, kernel/client.py): - cursor.cancel() was a silent no-op for the default blocking execute() (active_command_id is None until execute returns). The kernel backend now registers a detached StatementCanceller (keyed by the cursor) before the blocking execute and exposes cancel_running_cursor(cursor). Cursor.cancel() routes to that hook via getattr when there's no command id yet — opt-in, so Thrift/SEA backends are unaffected. Tests: unit (staging fail-loud, _is_staging_statement, cancel registry + routing, error context/diagnostic-info forwarding) and e2e (tz-aware TIMESTAMP, scientific DECIMAL, staging NotSupportedError, diagnostic-info context, cross-thread sync cancel interrupts a running query). All e2e verified live against dogfood with use_kernel=True. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(kernel): address review — comment-prefixed staging + tolerant sync cancel Review fixes on PR #825 (+ KERNEL_REV bump to the amended kernel #121 which now folds the original message on attach failure and bounds error_details_json): P1 #1 — staging fail-loud missed comment-prefixed statements: _is_staging_statement took the first whitespace token without stripping SQL comments, so "-- upload\nPUT ..." / "/* c */ PUT ..." (common in ETL) classified as non-staging and slipped into the silent-no-op bug. Added _strip_leading_sql_comments (handles leading -- line and /* */ block comments, multiple/mixed) before extracting the verb. Tests for both comment forms, mixed, and verb-only-in-comment (must NOT match). P1 #2 — sync cancel could raise out of cursor.cancel(): cursor.cancel() is best-effort per PEP-249, but cancel_running_cursor re-raised a canceller failure (e.g. an early cancel before the server statement id is observed, or a transport hiccup) via the public cancel(). Now swallow+log and still return True (a canceller was present and attempted) so Cursor doesn't emit the misleading "no executing command" warning. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * chore: re-pin KERNEL_REV to merged kernel #121 (cbeaf44) #121 (tz-aware/scientific param binds, error context, sync cancel + Ctrl-C) is merged to kernel main. Re-pin from the orphaned branch HEAD (f62d941) to the merged squash SHA (cbeaf44) — content-identical, but reachable from main so no orphan-SHA risk. Verified against a wheel built from cbeaf44: connector unit + kernel e2e (tz-aware TIMESTAMP, scientific DECIMAL, staging fail-loud incl. comment-prefixed, diagnostic-info context) all pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
test(e2e/mst): DESCRIBE QUERY now allowed in transactions The server's MSTCheckRule allowlist has broadened to include DESCRIBE QUERY (DescribeQueryCommand), mirroring the earlier SHOW COLUMNS change. It no longer throws inside an active transaction, so the prior test_describe_query_blocked assertion (DID NOT RAISE) was stale. Flip it to test_describe_query_not_blocked using _assert_not_blocked (verifies it succeeds and returns >0 rows) and move DESCRIBE QUERY from the Blocked to the Allowed list in the class docstring. Verified against a live DBSQL warehouse: the full TestMstBlockedSql class (9 tests) passes. Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
#824) * feat(kernel): surface kernel logs through Python logging on use_kernel When the kernel backend loads, auto-initialize the kernel's tracing -> Python logging bridge so `use_kernel=True` users see kernel logs with no extra setup. Kernel logs land under the `databricks.sql.kernel` logger (a child of the connector's `databricks.sql.*` namespace), so an existing `logging.getLogger("databricks.sql").setLevel(...)` cascades to them. - `_errors.py` calls `databricks_sql_kernel.init_logging()` once at extension load (it's the canonical kernel-import site). The call is `getattr`-guarded so an older kernel wheel without the function still works — just without kernel logs. - e2e tests assert kernel records reach the `databricks.sql.kernel` logger (and the pyo3 boundary under `databricks.sql.kernel.pyo3`) and that the level set on the logger is respected. Creds-gated per the existing kernel e2e convention. Requires the companion kernel/pyo3 change (databricks-sql-kernel#120) that exposes `init_logging()`. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(kernel): harden log-bridge init + address review feedback Review feedback from PR #824 (gopalldb): - P1: guard the init_logging() call against throwing, not just a missing function. A panic across the PyO3 boundary surfaces as pyo3_runtime.PanicException (a BaseException, not Exception), so a bare call could escape module import and fail every use_kernel=True connection over a non-essential logging feature. Wrap in try/except BaseException, log a debug breadcrumb, continue. This also neutralizes the idempotency concern regardless of the Rust impl. - P2: soften the level-control test docstring to make clear it asserts the effective customer-visible outcome (sub-threshold records don't surface), not source-side suppression — caplog filters after the FFI. - P2: downgrade the databricks.sql.kernel.pyo3 assertion to a soft warning so a benign kernel change to the boundary breadcrumb target doesn't break the connector e2e suite. The core databricks.sql.kernel contract is still hard-asserted. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * chore: bump KERNEL_REV to merged kernel main (f4ee6fe) for the logging bridge The kernel tracing -> Python logging bridge (init_logging / pyo3-log, routing under databricks.sql.kernel) landed in kernel #120 — AFTER the previously-pinned 101aa46 (#118). The kernel-e2e built a wheel without the bridge, so test_kernel_logs_reach_python_logging failed with 'no records delivered' (assert []). Bump to f4ee6fe (current kernel main: includes #118, #120, #123, #125). Verified live against a wheel built from f4ee6fe: the bridge delivers records to the databricks.sql.kernel logger, and both test_kernel_logs_reach_python_logging and test_kernel_log_level_is_respected pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…V to batch 2 (#830) * fix(kernel): stop premature sync statement close (H4); bump KERNEL_REV to batch 2 Connector half of the kernel batch-2 fixes (kernel PR #123). Bumps KERNEL_REV to pick up the batch-2 kernel surface. H4 — don't close the kernel Statement at sync execute-return: execute_command's finally used to `stmt.close()` immediately after `stmt.execute()` succeeded. For a large CloudFetch result with paginated chunk links (all_fetched=false), the kernel fetches later links lazily (get_result_chunks against the LIVE statement) during consumption, so a premature CloseStatement broke those fetches. The kernel now auto-closes the server statement when its result stream drains (ExecutedStatement::next_batch end-of-stream), with the executed-handle Drop as the backstop for partial/abandoned reads. So the connector now flips close_stmt=False on a successful execute and only closes on the error path (no executed handle was produced). The other batch-2 fixes (cancelled-class -> OperationalError, U2M refresh fail-fast, metadata statement close-on-drop, per-binding OAuth client_id) are entirely kernel-side and need no connector code beyond the KERNEL_REV bump. Tests: unit (sync execute does-not-close on success / does-close on failure) + e2e (large multi-chunk result drains without premature close + cursor reuse; server cancel maps to OperationalError not ProgrammingError). All e2e verified live against dogfood. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * chore: re-pin KERNEL_REV to #123 HEAD after cancelled-test fix #123 picked up a follow-up commit fixing the wiremock cancelled-state assertions (ErrorCode::Cancelled). Bump the placeholder pin so the connector CI builds the corrected kernel. Still to be re-pinned to the squash-merge SHA before #830 merges. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * chore: re-pin KERNEL_REV to merged kernel #123 (f4ee6fe) Kernel batch-2 (#123) is merged to kernel main (f4ee6fe, current main HEAD). Re-pin from the orphaned branch HEAD (4f7fbe7) to the merged SHA — reachable from main, no orphan-SHA risk. Verified against a wheel built from f4ee6fe: connector unit (102) + kernel e2e (H4 large-result drain + reuse, server-cancel -> OperationalError, staging fail-loud, diagnostic-info) all pass against the real merged kernel. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * docs: drop internal 'H4' audit shorthand from comments/docstrings Address review: 'H4' is internal audit shorthand, meaningless in the public connector codebase. Reword the affected comment + two test docstrings to describe the behavior directly (premature sync CloseStatement broke lazy CloudFetch chunk-link fetches). No code change. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…y for use_kernel (#838) fix(kernel): cursor-state tracking + metadata parity for use_kernel Brings the use_kernel=True path closer to Thrift parity on the cursor-state-tracking and metadata clusters from the API+CUJ gap audit, surfaces DML rowcount, and makes async command state server-sourced. Async command state — server is the source of truth (no local state): - get_query_state / get_execution_result now re-attach to the statement by id (Session.attach_async_statement, kernel side) and read state / await results from the server, instead of consulting a connector-side handle map. SEA keys GetStatementStatus/Result purely on the id, so this works whether or not the connector still holds the submitting handle — and get_execution_result is inherently re-callable (each call re-attaches), matching the Thrift backend's re-fetchable operation handle. - Deleted _closed_commands (and its bounded-FIFO machinery _record_closed / _CLOSED_COMMANDS_MAX): the connector no longer invents a CLOSED-vs-SUCCEEDED answer. The server reports CLOSED natively (verified: GET after close returns 200 state=CLOSED until TTL, then 404 NOT_FOUND, which maps to a terminal state). This is the local-state inconsistency removed. - _async_handles / _async_statements are retained but DEMOTED to a keep-alive-only registry: the submitting ExecutedAsyncStatement's Drop fires close_statement, which would kill the live async query, so we hold the handle until close_command / close_session. They are no longer consulted for state or results. Cursor-state tracking (T7): - Set cursor.active_command_id in the _make_result_set chokepoint so sync execute, async fetch, AND metadata all leave the cursor pointing at the command that produced the current result set (matches Thrift's unconditional set in _handle_execute_response). - On a failed sync execute, publish the server-issued statement id (read from the canceller's inflight slot) onto active_command_id before re-raising. Best-effort; never masks the original failure. Metadata: - Normalize wildcard/blank catalog ('%' / '*' / '' / None) to None (all-catalogs) across get_schemas/get_tables/get_columns (_catalog_or_none), making the three symmetric (fixes columns). - Normalize empty/whitespace-only pattern args to None (match-all) via _none_if_blank. % / * stay as real LIKE wildcards on patterns. - Drop the connector-side table_types drain + client-side refilter in get_tables; the kernel filters table_types itself (case-insensitively as of the batch-3 kernel change). Removed the dead _drain_kernel_handle / _StaticArrowHandle helpers + unused imports. - Fix the Cursor.columns() docstring: catalog_name=None is accepted on all backends. DML rowcount: - Surface the kernel's num_modified_rows as cursor.rowcount for DML instead of the hardcoded -1. None (SELECT / warehouses that don't report it) leaves rowcount at -1, matching Thrift; getattr-guarded. Bumps KERNEL_REV to the batch-3 kernel commit (case-insensitive table_types, HTTP error-envelope parsing, num_modified_rows getter, Session::attach_async_statement). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* build(kernel): add optional [kernel] extra for use_kernel=True
databricks-sql-kernel is now published to PyPI, so the kernel backend
can ship as an optional dependency instead of a local-dev-only build.
- pyproject: declare databricks-sql-kernel as an optional dependency
gated to python>=3.10 (the wheel is cp310-abi3, Requires-Python
>=3.10), and add the `[kernel]` extra. The extra also lists pyarrow:
the kernel result path (backend/kernel/result_set.py) imports it
unconditionally to wrap the Arrow batches the kernel returns. pyarrow
is already pulled transitively via the kernel wheel's
pyarrow>=23.0.1,<24, but naming it makes the connector-side
requirement explicit and lets pip co-resolve both constraints at
install time.
- backend/kernel/_errors.py: update the use_kernel=True ImportError to
point at `pip install "databricks-sql-connector[kernel]"` and note the
python>=3.10 requirement (was the obsolete "not yet published, build
locally" hint).
- README: document the [kernel] extra, use_kernel=True usage, and the
python>=3.10 / pyarrow notes.
On python<3.10 the `[kernel]` extra resolves to nothing and
use_kernel=True raises the friendly ImportError at runtime; the
connector's own python floor (3.8) is unchanged.
Verified locally (kernel served from a locally-built cp310-abi3 wheel,
since the published package isn't yet mirrored on the dev proxy):
- pip install "databricks-sql-connector[kernel]" -> connector + kernel
+ pyarrow all install; use_kernel=True runs a live query end-to-end
(backend KernelDatabricksClient).
- plain install -> use_kernel=True raises the friendly ImportError.
NOTE: `poetry lock` still needs to be run to refresh poetry.lock with
the databricks-sql-kernel entry; it is intentionally NOT included here
because it requires the kernel to be resolvable on the index poetry/CI
use (the JFrog db-pypi proxy). Confirm the package resolves there before
merging.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* fix(kernel): drop pyarrow from the [kernel] extra to unbreak poetry lock
Listing bare `pyarrow` in the [kernel] extra forced poetry to co-resolve
an unconstrained pyarrow against the kernel's transitive
`pyarrow>=23.0.1,<24` across the connector's full 3.8–3.14 matrix.
pyarrow 23.x requires Python >=3.10, so the constraint is unsatisfiable
on 3.8/3.9 — `poetry lock` failed every CI job with "version solving
failed ... pyarrow is forbidden".
The kernel wheel already declares `pyarrow>=23.0.1,<24` as a hard
runtime dependency, so `pip install databricks-sql-connector[kernel]`
still pulls a compatible pyarrow transitively. The databricks-sql-kernel
dep stays gated to python>=3.10, which now correctly excludes the whole
kernel+pyarrow subtree from the 3.8/3.9 resolution. The kernel's own
metadata is the single source of truth for the pyarrow floor.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* fix(kernel): cap pyarrow <23 on the sub-3.10 band so poetry lock resolves
The kernel's transitive pyarrow>=23.0.1,<24 conflicts with the
connector's own pyarrow>=14.0.1 (declared across 3.8–3.13) during
`poetry lock`: pyarrow>=23 dropped Python 3.9, so for the 3.8–3.10
slice poetry can't find a pyarrow satisfying both and version solving
fails ("pyarrow is forbidden" -> "databricks-sql-kernel is forbidden").
The kernel's python>=3.10 marker doesn't help because poetry unifies
the pyarrow constraint across the connector's declared pyarrow band,
not the kernel's.
Split the connector's pyarrow entry at 3.10 and cap the <3.10 band at
<23. This removes no installable version — the newest pyarrow with a
Python 3.9 wheel is 21.x — it just makes that physical fact explicit to
the solver, so the <3.10 band (capped, kernel absent) and the >=3.10
band (where the kernel can pull pyarrow up to <24) no longer overlap.
Verified `poetry lock` resolves the full dependency set with this
change.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* ci(kernel): add "Unit Tests + Kernel" matrix exercising the real wheel
Mirrors the "Unit Tests + PyArrow" matrix but for the [kernel] extra.
Until now no CI job exercised the published kernel wheel: the base
unit-test matrix installs no extras, and the kernel unit tests use a
fake databricks_sql_kernel module injected into sys.modules, so the
real wheel was never loaded in CI.
The new job (Python 3.10–3.14; the wheel is cp310-abi3 so 3.9 is
omitted) installs the [kernel] extra via --all-extras, then:
- asserts databricks_sql_kernel imports and has a real __file__ (i.e.
the published wheel actually installed, not the test fake), and
- imports the use_kernel backend path (KernelDatabricksClient /
KernelResultSet) against the real wheel,
before running the unit suite. This is the only CI signal that the
published [kernel] extra installs and loads end to end on every PR
(the live use_kernel=True e2e remains in kernel-e2e.yml, merge-queue
gated).
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* test(kernel): exercise use_kernel=True through the real wheel; no silent skips
Ensure every CI job that's meant to cover the kernel actually drives
the use_kernel=True path through the REAL databricks-sql-kernel wheel,
and fails loudly if it can't (rather than silently skipping / passing
on the Thrift path).
Problem this fixes:
- The kernel unit tests inject a fake databricks_sql_kernel into
sys.modules. In a shared `pytest tests/unit tests/e2e` session (the
coverage job, which installs --all-extras so the real wheel IS
present) that fake shadowed the real wheel, so the kernel e2e tests
silently skipped — the coverage job looked like it exercised the
kernel but didn't.
Changes:
- tests/e2e/test_kernel_backend.py + test_kernel_tls.py: replace the
silent `__file__`-based skip with a three-state guard keyed on
importlib.metadata (the on-disk dist DB, which a sys.modules stub
can't fake): skip only when the wheel is genuinely absent; FAIL
LOUDLY when it's installed-but-shadowed. The `conn` fixture now also
asserts conn.session.backend is KernelDatabricksClient, so a
use_kernel=True connection that fell back to Thrift fails the test.
- tests/unit/test_session.py: add TestUseKernelRoutesThroughRealWheel
(marked `realkernel`) — a no-network proof that
sql.connect(use_kernel=True) instantiates the REAL
KernelDatabricksClient (mocks only open_session; does not fake the
wheel). Skips if the wheel is absent; fails if it's shadowed.
- pyproject.toml: register the `realkernel` marker. Tests so marked
need an unpolluted sys.modules and must run in a separate pytest
invocation from the fake-injecting unit tests.
- tests/unit/test_kernel_client.py: document that its session-global
fake mandates the separate-invocation rule for real-wheel tests.
- code-quality-checks.yml: the Unit Tests + Kernel matrix now asserts
the real wheel, runs `tests/unit -m "not realkernel"`, then runs the
real-wheel routing test as its own invocation
(`pytest tests/unit/test_session.py -m realkernel`). All three unit
matrices gained `-m "not realkernel"`.
- code-coverage.yml: --ignore the kernel e2e files and add
`-m "not realkernel"` so the shared --all-extras session doesn't trip
the new loud guards; the real live kernel e2e stays in kernel-e2e.yml
(isolated session, real wheel, live warehouse).
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* ci(kernel): install per-tier extras explicitly, not --all-extras
The "Unit Tests + PyArrow" job used --all-extras, which predates the
[kernel] extra. Now that [kernel] exists, --all-extras silently also
installs the kernel wheel — so that tier no longer isolated the
"pyarrow present, kernel absent" configuration and overlapped the new
"Unit Tests + Kernel" job.
- Unit Tests + PyArrow: --extras pyarrow (pyarrow only; no kernel).
- Unit Tests + Kernel: --extras kernel (resolves the published
databricks-sql-kernel wheel via the [kernel] extra — the exact edge
`pip install databricks-sql-connector[kernel]` uses — which
transitively brings pyarrow).
Each tier now targets its configuration precisely. The kernel install
path here (published wheel via the extra) is intentionally distinct
from kernel-e2e.yml, which maturin-builds tip-of-tree at KERNEL_REV.
Verified against the proxy: --extras pyarrow installs pyarrow and NOT
the kernel; --extras kernel installs databricks-sql-kernel 0.1.2.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
* build(kernel): require databricks-sql-kernel >=0.2.0
Bump the [kernel] extra's floor from ^0.1.0 to ^0.2.0 (>=0.2.0,<0.3.0)
now that 0.2.0 is published. The <0.3.0 cap is deliberate: the kernel is
pre-1.0, so each 0.x minor may be breaking — we bump this when the
kernel ships 0.3.0 rather than auto-adopting a potentially-breaking
minor.
0.2.0 keeps the same Requires-Python (>=3.10) and pyarrow (>=23.0.1,<24)
pin as 0.1.x, so the python>=3.10 marker and the pyarrow <23 sub-3.10
cap are unchanged. Verified `poetry lock` resolves and locks
databricks-sql-kernel 0.2.0.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
---------
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
…843) * ci: add DBR LTS install check + fix pyarrow-compat it caught (ES-1960554) The thrift 0.23.0 bump (PR #796, shipped in 4.2.7) broke `pip install` on DBR LTS: thrift ships sdist-only and 0.23.0's setup.py calls sys.exit(0) on the build-success path, killing the PEP 517 backend before pip writes output.json. On the old setuptools shipped by DBR 14.3/15.4 LTS this is a hard install failure (SEV0 ES-1960554); 4.2.7 was yanked and reverted (#840). Our CI never caught it because every job installs via `poetry install` on a modern runner -- it never does a fresh `pip install` of the built wheel on an LTS toolchain, the real customer path that failed. CI check -------- Adds a PR check (gated to dependency changes) that builds the wheel and installs it INSIDE real DBR LTS clusters via the PECO workspace Jobs API (no PyPI publish) then runs a SELECT 1 smoke test. Matrix = supported LTS {13.3, 14.3, 15.4, 16.4, 17.3} x install target {base, pyarrow, kernel}. Auth is OAuth M2M as the PECO service principal throughout (driver -> workspace API and the notebook's connector -> warehouse smoke query); a PAT is warehouse-scoped and rejected by the workspace REST API. Older LTS ship an SDK too old for auth_type=oauth-m2m, so the smoke harness upgrades databricks-sdk. Per-run artifacts are cleaned up in a finally block. Connector fix (caught by the check) ----------------------------------- The check surfaced a real latent bug: a base install (no [pyarrow] extra) runs against a runtime's bundled pyarrow, and on DBR 13.3/14.3 that pyarrow predates the `promote_options` kwarg, so concat_table_chunks raised `TypeError: concat_tables() got an unexpected keyword argument 'promote_options'` on the Arrow result path. utils.py now falls back to the legacy `promote=True` (equivalent to promote_options="default") when the kwarg is unsupported, with a regression test. Validated end-to-end against the PECO workspace: green on thrift 0.22.0, and re-widening the pin to <0.24.0 fails on 14.3+15.4 with the exact output.json error -- a true guard, not a check that always passes. Also adds an incident-linked comment on the thrift pin so nobody re-widens it before the upstream fix (THRIFT-6067 / apache/thrift#3584) ships. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test(e2e): authenticate e2e via OAuth M2M so staging tests match DATABRICKS_USER The e2e suite connected via a PAT (DATABRICKS_TOKEN). The Personal Staging Location tests PUT/GET/REMOVE against stage://tmp/<DATABRICKS_USER>/..., where DATABRICKS_USER is the PECO service principal (TEST_PECO_SP_ID). A personal stage is identity-scoped by design (there is even a test asserting you cannot touch another user's stage), so the connecting identity MUST equal DATABRICKS_USER. When DATABRICKS_TOKEN authenticates as a different identity, those tests fail with `PERMISSION_DENIED: <user> does not have access to Personal Stage`. Switch the e2e connection to OAuth M2M as the service principal via credentials_provider (conftest.auth_connect_kwargs), so the connecting identity IS the SP == DATABRICKS_USER. Falls back to the PAT when SP OAuth creds aren't set, so local PAT runs are unaffected. Wires DATABRICKS_CLIENT_ID / DATABRICKS_CLIENT_SECRET (TEST_PECO_SP_ID / TEST_PECO_SP_OAUTH_SECRET, already in azure-prod) into code-coverage.yml. Verified locally against the PECO workspace: all 9 staging_ingestion e2e tests pass via the real M2M path (including fails_to_modify_another_staging_user, which validates the identity scoping). Kernel e2e files are unchanged (they run in kernel-e2e.yml, ignored by code-coverage.yml). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test(e2e): add databricks-sdk dev dependency for OAuth M2M auth The e2e M2M auth (conftest.auth_connect_kwargs) imports databricks.sdk.core.oauth_service_principal, but databricks-sdk was not a project dependency, so `poetry install` in code-coverage.yml didn't provide it -- every e2e connection failed with `ModuleNotFoundError: No module named 'databricks.sdk'`. Add it to the dev group (test-only; not a runtime dep of the connector). CI's setup-poetry runs `poetry lock` before install, so the lockfile is regenerated on the runner. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * Revert e2e OAuth M2M; use SP PAT instead (keep DBR LTS check on M2M) The azure-prod DATABRICKS_TOKEN is now a personal access token owned by the PECO service principal, so its identity matches DATABRICKS_USER. That fixes the Personal Staging Location tests (stage://tmp/<SP>/...) with a plain PAT, without the OAuth M2M machinery -- which also broke the retry/HTTP tests, since M2M makes a live token-endpoint call that those tests' urllib3 mocking intercepts. Reverts the e2e auth changes (conftest.auth_connect_kwargs + the consumer call sites + the databricks-sdk dev dep + the code-coverage.yml SP env) back to the plain access_token path. The DBR LTS install check keeps OAuth M2M: it hits the workspace Jobs/SCIM API (which rejects a warehouse-scoped PAT), is proven 15/15 green on M2M, and installs databricks-sdk itself in its own workflow. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TEst