Skip to content

Conversation

@alexeyegorov
Copy link

@alexeyegorov alexeyegorov commented Jan 26, 2026

Execute DBT in session mode (e.g. on a job cluster)

This PR was created with help by Claude/Cursor to implement session mode for dbt-databricks adapter.

It should allow:

  • execution of sql and python models using a session like in dbt-spark
  • run the whole dbt pipeline on a job cluster to save costs
  • ...

Asset Bundle DBT task

The native dbt_task on Databricks does not provide a Spark session. It is also not possible to retrieve it.
It is about Databricks to allow retrieving a Spark Session within dbt_task in order to keep the Asset Bundle deployment as simple as it is right now.

The workaround to still use the session mode is to define the dbt tasks as python scripts or notebooks.
This approach could be added as a template via Databricks Asset Bundles.

In our job example, this looks as below:

image

With the job cluster selected, it now executes as expected:
image

  • prepare:
    • clone the repository with DBT code
    • run dbt deps and dbt seed
  • run: execute dbt run with optional --full-refresh and a passed --select (e.g. state, specific model, or empty for full selection)
  • test: execute dbt test
  • docs: generate dbt docs
  • cleanup: remove cloned repository files

Example of dbt cli for run execution

image

Pros/Cons

Pros:

  • keep profiles.yml and requirements.txt in the asset bundles repository
    • currently, profiles.yml need to be uploaded manually to an available path on e.g. Databricks; changes require to update and reupload the file;
    • this strategy allows keeping this file in asset bundles repository or in the original dbt repository and link the job to use directly after it was cloned
  • execute the complete DBT pipeline with SQL and Python models on both, all-purpose and job clusters as well as SQL warehouse
  • use a cluster with init script to install 3rd party libraries (e.g. Apache Sedona for geospatial functions)

Cons:

  • not able to use the native dbt_task provided by Databricks -> could be adjusted by Databricks
  • slightly more setup for the definition of the tasks (Python notebooks using dbt CLI) -> could be added as a template to Asset Bundles repository

Description

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

This commit updates the Databricks adapter to version 1.10.15+session, introducing support for session mode execution. Key changes include:

- Added `DatabricksSessionHandle` and `SessionCursorWrapper` for handling SparkSession-based execution.
- Enhanced `DatabricksCredentials` to manage connection methods and validate session mode configurations.
- Updated connection management to support session mode, including automatic selection of submission methods for Python models.
- Improve SparkSession retrieval in Databricks adapter. This commit enhances the `DatabricksSessionHandle` and `SessionPythonJobHelper` classes to improve the retrieval of the existing SparkSession. It introduces multiple methods to obtain the SparkSession, ensuring compatibility with various Databricks environments. Additionally, it refactors method signatures for consistency and readability.
@alexeyegorov alexeyegorov marked this pull request as ready for review January 26, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant