Skip to content

Azure Databricks Python Model SparkContext bug introduced in 1.11.0 #1252

@kranstrom

Description

@kranstrom

Describe the bug

In an Azure Databricks environment when executing a Python Model using dbt-databricks 1.11.0, a task will fail with the following:

03:00:59  Failure in model <python-model-path>.py)
03:00:59    [CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT] Remote client cannot create a SparkContext. Create SparkSession instead.

Steps To Reproduce

  1. Create a Python Wheel with dbt-databricks='1.11.0'
  2. Create a Databricks Python Model file that invokes the Wheel, for example:
from my.wheel.package import execute

def model(dbt, session):
    return execute(dbt, session, is_incremental=dbt.is_incremental)
  1. Create a Databricks job that runs the Python Model and execute.

Important! This issue does not occur when executing locally, which likely has to do with the

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

+ dbt run --select my_python_model
03:00:34  Running with dbt=1.10.13
03:00:36  Registered adapter: databricks=1.11.0
...
03:00:52  Concurrency: 4 threads (target='test')
03:00:52  
03:00:57  1 of 1 START python incremental model my_python_model ................ [RUN]
Fri Nov  7 03:00:58 2025 Connection to spark from PID  125165
Fri Nov  7 03:00:58 2025 Initialized gateway on port 34717
03:00:58  Unhandled error while executing target/run/src/models/my_python_model.py
[CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT] Remote client cannot create a SparkContext. Create SparkSession instead.

System information

The output of dbt --version: 1.10.13

The operating system you're using: N/A

The output of python --version: 3.12.3

Additional context

Looking over the changes, it seems to be caused to the api_client.py changes in 1.11.0, specifically CommandContextApi.create(..) where it calls to the new _create_execution_context(..).

PySpark triggers this on a SparkContext class here: https://github.com/apache/spark/blob/fc49dbd868e08e8607fc188b326b0d8d31294781/python/pyspark/core/context.py#L191

        if "SPARK_CONNECT_MODE_ENABLED" in os.environ and "SPARK_LOCAL_REMOTE" not in os.environ:
            raise PySparkRuntimeError(
                errorClass="CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT",
                messageParameters={},
            )

As a result, it seems the issue is caused by attempting to create a new SparkContext when its ultimately not supported for a remote client.

It seems others like mflow have already encountered and worked around this issue,. In that case, I think the bare minimum is to leverage something like the pyspark.sql.utils.is_remote(), but perhaps more encompassing check for any "databricks connect" as mflow does: https://github.com/WeichenXu123/mlflow/blob/224334a6ebf0f02bef7ce9946467d2a5f21d7228/mlflow/utils/databricks_utils.py#L280

If meeting the remote then, at least for Azure Databricks, the functionality should seemingly work to not create a SparkContext.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions