-
Notifications
You must be signed in to change notification settings - Fork 175
Description
Describe the bug
In an Azure Databricks environment when executing a Python Model using dbt-databricks 1.11.0, a task will fail with the following:
03:00:59 Failure in model <python-model-path>.py)
03:00:59 [CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT] Remote client cannot create a SparkContext. Create SparkSession instead.
Steps To Reproduce
- Create a Python Wheel with
dbt-databricks='1.11.0' - Create a Databricks Python Model file that invokes the Wheel, for example:
from my.wheel.package import execute
def model(dbt, session):
return execute(dbt, session, is_incremental=dbt.is_incremental)- Create a Databricks job that runs the Python Model and execute.
Important! This issue does not occur when executing locally, which likely has to do with the
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
+ dbt run --select my_python_model
03:00:34 Running with dbt=1.10.13
03:00:36 Registered adapter: databricks=1.11.0
...
03:00:52 Concurrency: 4 threads (target='test')
03:00:52
03:00:57 1 of 1 START python incremental model my_python_model ................ [RUN]
Fri Nov 7 03:00:58 2025 Connection to spark from PID 125165
Fri Nov 7 03:00:58 2025 Initialized gateway on port 34717
03:00:58 Unhandled error while executing target/run/src/models/my_python_model.py
[CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT] Remote client cannot create a SparkContext. Create SparkSession instead.
System information
The output of dbt --version: 1.10.13
The operating system you're using: N/A
The output of python --version: 3.12.3
Additional context
Looking over the changes, it seems to be caused to the api_client.py changes in 1.11.0, specifically CommandContextApi.create(..) where it calls to the new _create_execution_context(..).
PySpark triggers this on a SparkContext class here: https://github.com/apache/spark/blob/fc49dbd868e08e8607fc188b326b0d8d31294781/python/pyspark/core/context.py#L191
if "SPARK_CONNECT_MODE_ENABLED" in os.environ and "SPARK_LOCAL_REMOTE" not in os.environ:
raise PySparkRuntimeError(
errorClass="CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT",
messageParameters={},
)
As a result, it seems the issue is caused by attempting to create a new SparkContext when its ultimately not supported for a remote client.
It seems others like mflow have already encountered and worked around this issue,. In that case, I think the bare minimum is to leverage something like the pyspark.sql.utils.is_remote(), but perhaps more encompassing check for any "databricks connect" as mflow does: https://github.com/WeichenXu123/mlflow/blob/224334a6ebf0f02bef7ce9946467d2a5f21d7228/mlflow/utils/databricks_utils.py#L280
If meeting the remote then, at least for Azure Databricks, the functionality should seemingly work to not create a SparkContext.