Introduce PySpark Session support ( enables the adapter usage for job clusters) #862

dkruh1 · 2024-12-03T14:57:04Z

Resolves #

Resolves: [dbt-spark Issue #272] ,[dbt-databricks Issue #575]

Description

Pull Request Description

Summary
This PR introduces support for defining a PySpark-based connection when using the adapter. This enhancement allows dbt to run as part of a running Databricks job cluster, expanding its usage beyond SQL warehouses or all-purpose clusters.

Background
The Spark session functionality referenced here was first discussed in [dbt-spark Issue #272]. Specifically for databricks , the issue was raised here :[dbt-databricks Issue #575]

Key Features

PySpark-Based Connection:
A new environment variable, DBT_DATABRICKS_SESSION_CONNECTION, has been introduced.
- When this variable is set to True, a new DatabricksSessionConnectionManager is initialized.
- This manager assumes that the dbt code is being executed in the context of an existing Spark session, making it possible to integrate with running Databricks job clusters.
Testing:
- A new pytest matrix feature called session_support was introduced in to the unit tests. When the session support is enabled , the DBT_DATABRICKS_SESSION_CONNECTION env var is set to true and the unit tests are being executed against the new DatabricksSessionConnectionManager
Functional Testing:
Functional tests were conducted using a Databricks notebook.
- The notebook programmatically triggered dbt while ensuring the DBT_DATABRICKS_SESSION_CONNECTION variable was set to True.
- These tests confirmed that dbt works seamlessly within a running Spark session.
- example notebook code:
  `os.environ["DBT_DATABRICKS_SESSION_CONNECTION"] = "True"
  
  res = dbtRunner().invoke(["run","--profiles-dir","/Workspace/Users/doron.kruh@yotpo.com/dbt-dbx-session-test/","--project-dir","/Workspace/Users/user@databrciks.com/dbt-models","--target","prod", "--select","model_to_execute"] )`

Why This Matters

Enables running dbt within existing Spark sessions, providing more flexibility for advanced Databricks workflows.
Expands the range of cluster types supported by dbt.
Supports integration with Databricks job clusters, ensuring compatibility with real-world use cases.

Next Steps

Document this feature for users who may need it.
Verify compatibility with additional Databricks environments as needed.

Checklist

[X ] I have run this code in development and it appears to resolve the stated issue
[X ] This PR includes tests, or tests are not required/relevant for this PR
[X ] I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

…TION Koala 1423 support session connection

Merge from fork

benc-db · 2024-12-03T17:23:40Z

@dkruh1, we need to discuss internally if we want to take this feature. I appreciate the effort, and I understand why this feature would be valuable to users, but we need to decide whether we want to take on the maintenance burden of an additional connection mechanism. Will get back to you shortly.

benc-db · 2024-12-05T18:30:46Z

@dkruh1 after discussion, we will not be taking this feature at this time. We are focused on ensuring that dbt-databricks provides the best experience for interacting with SQL Warehouses and serverless compute. As this is OSS, you are free to fork our repo and use your implementation that way.

alexeyegorov · 2026-01-22T12:50:27Z

@dkruh1 did you try building this as a fork and using this forked package to run it as databricks job? @leo-schick did you have any progress on this topic or have interest in supporting such an additional forked package?

leo-schick · 2026-01-24T15:58:01Z

@alexeyegorov I am currently not using Databricks in my projects, but I am in strong favor of getting this implemented!
I would prefer getting it merged there first instead of starting an own fork.

@dkruh1 Is there a way to get this implemented into dbt-spark without much extra effort?

alexeyegorov · 2026-01-25T11:54:27Z

@leo-schick I was reading your post like last year. Searched now for possible solutions and mentions on dbt-databricks. Pretty silly it is not supported. I stumbled upon this description:
https://gist.github.com/NodeJSmith/d2fc2e9a289360180ebaa9d7e452e285#gistcomment-5951230
I will search for that fork and maybe it is a working solution? Otherwise I will ask Claude to make a plan to implement it into dbt-spark. :D

btw, worked shortly with Mara during my time at Lampenwelt few years ago. :P

alexeyegorov · 2026-01-25T13:41:30Z

I have chatted with Claude and "we" worked out a plan to implement session mode for the execution of sql and python models via dbt-databricks. I will check how far I can get with it and maybe give it a try as a standalone forked package on our databricks setup.

1.11.x: [WIP] Add session mode for dbt-databricks adapter for 1.11.x version #1310
1.10.x: [WIP] Add session mode for dbt-databricks adapter for 1.10.x version #1311

dkruh36 and others added 11 commits November 27, 2024 12:20

support session method

947adf6

configure tests

9ff849a

reformat files

891e916

fix pr comments

91d0cfc

Merge pull request #1 from YotpoLtd/KOALA-1423-SUPPORT-SESSION-CONNEC…

d18a3d8

…TION Koala 1423 support session connection

Merge remote-tracking branch 'upstream/main' into merge-from-fork

e3081c8

merge from upstream

987016c

updste precommit

c2a3aac

revert wrong changes

f30c727

Merge pull request #3 from YotpoLtd/merge-from-fork

7dea025

Merge from fork

test

190364d

dkruh1 requested review from andrefurlan-db, benc-db and rcypher-databricks as code owners December 3, 2024 14:57

update changelog

38b7e0d

benc-db closed this Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce PySpark Session support ( enables the adapter usage for job clusters) #862

Introduce PySpark Session support ( enables the adapter usage for job clusters) #862

Uh oh!

dkruh1 commented Dec 3, 2024 •

edited

Loading

Uh oh!

benc-db commented Dec 3, 2024

Uh oh!

benc-db commented Dec 5, 2024

Uh oh!

alexeyegorov commented Jan 22, 2026

Uh oh!

leo-schick commented Jan 24, 2026

Uh oh!

alexeyegorov commented Jan 25, 2026

Uh oh!

alexeyegorov commented Jan 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Introduce PySpark Session support ( enables the adapter usage for job clusters) #862

Introduce PySpark Session support ( enables the adapter usage for job clusters) #862

Uh oh!

Conversation

dkruh1 commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Pull Request Description

Checklist

Uh oh!

benc-db commented Dec 3, 2024

Uh oh!

benc-db commented Dec 5, 2024

Uh oh!

alexeyegorov commented Jan 22, 2026

Uh oh!

leo-schick commented Jan 24, 2026

Uh oh!

alexeyegorov commented Jan 25, 2026

Uh oh!

alexeyegorov commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dkruh1 commented Dec 3, 2024 •

edited

Loading

alexeyegorov commented Jan 25, 2026 •

edited

Loading