unit8co
diff --git a/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎darts/models/forecasting/tft_model.py‎
Lines changed: 28 additions & 15 deletions b/‎darts/models/forecasting/tft_model.py‎
Lines changed: 28 additions & 15 deletions
@@ -11,6 +11,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
 
 **Improved**
 
+- Added hyperparameter `skip_interpolation` to `TFTModel` that will replace 1D interpolation on feature embeddings with linear projection. When `True`, it can greatly increase training and inference efficiency while predictive accuracy remains largely unaffected. [#2898](https://github.com/unit8co/darts/pull/2898) by [Zhihao Dai](https://github.com/daidahao).
 - Added mixed precision and 16-bit precision support to `TorchForecastingModel`. Simply specify `{"precision": "bf16-mixed" }` for `pl_trainer_kwargs` to enable mixed precision training. Alternatively, declare a custom `pytorch_lightning.Trainer` with a `"precision"` parameter and pass the trainer to `fit()`. Other precision options such as `"64-true"` and `"16-mixed"` supported by `pytorch_lightning` are also allowed. [#2883](https://github.com/unit8co/darts/pull/2883) by [Zhihao Dai](https://github.com/daidahao).
 - 🔴 Added future and static covariates support to `BlockRNNModel`. This improvement required changes to the underlying model architecture which means that saved model instances from older Darts versions cannot be loaded any longer. [#2845](https://github.com/unit8co/darts/pull/2845) by [Gabriel Margaria](https://github.com/Jaco-Pastorius).
 - `from_group_dataframe()` now supports creating `TimeSeries` from **additional DataFrame backends** (Polars, PyArrow, ...). We leverage `narwhals` as the compatibility layer between DataFrame libraries. See their [documentation](https://narwhals-dev.github.io/narwhals/) for all supported backends. [#2766](https://github.com/unit8co/darts/pull/2766) by [He Weilin](https://github.com/cnhwl).
@@ -23,6 +24,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
 
 **Fixed**
 
+- Fixed a bug causing crashes when running `TFTModel` on MPS devices (macOS with GPUs). [#2898](https://github.com/unit8co/darts/pull/2898) by [Zhihao Dai](https://github.com/daidahao).
 - Fixed a bug when saving a `GlobalNaiveModel` directly after fitting it (without performing prediction). [#2895](https://github.com/unit8co/darts/pull/2895), by [Alain Gysi](https://github.com/Kurokabe)
 - Fixed a bug when using an `EnsembleModel` with `train_forecasting_models=False` and at least one torch model in `forecasting_models`, where calling `historical_forecasts()` with `retrain=True` raised an exception due to the torch models being unintentionally reset. [#2894](https://github.com/unit8co/darts/pull/2894) by [Dennis Bader](https://github.com/dennisbader).
 
 
@@ -42,7 +42,7 @@ def __init__(
         output_dim: tuple[int, int],
         variables_meta: dict[str, dict[str, list[str]]],
         num_static_components: int,
-        hidden_size: Union[int, list[int]],
+        hidden_size: int,
         lstm_layers: int,
         num_attention_heads: int,
         full_attention: bool,
@@ -51,7 +51,8 @@ def __init__(
         categorical_embedding_sizes: dict[str, tuple[int, int]],
         dropout: float,
         add_relative_index: bool,
-        norm_type: Union[str, nn.Module],
+        norm_type: Union[str, type[nn.Module]],
+        skip_interpolation: bool = False,
         **kwargs,
     ):
         """PyTorch module implementing the TFT architecture from `this paper <https://arxiv.org/pdf/1912.09363.pdf>`_
@@ -98,8 +99,12 @@ def __init__(
         likelihood
             The likelihood model to be used for probabilistic forecasts. By default, the TFT uses
             a ``QuantileRegression`` likelihood.
-        norm_type: str | nn.Module
+        norm_type: str | type[nn.Module]
             The type of LayerNorm variant to use.
+        skip_interpolation: bool
+            Whether to skip interpolation and replace with linear projection on feature embeddings in
+            VariableSelectionNetwork. Setting this to `True` could increase training and inference speed.
+            Defaults to `False` to preserve the permutation in the feature embedding space.
         **kwargs
             all parameters required for :class:`darts.models.forecasting.pl_forecasting_module.PLForecastingModule`
             base class.
@@ -119,6 +124,7 @@ def __init__(
         self.feed_forward = feed_forward
         self.dropout = dropout
         self.add_relative_index = add_relative_index
+        self.skip_interpolation = skip_interpolation
 
         if isinstance(norm_type, str):
             try:
@@ -182,6 +188,7 @@ def __init__(
             single_variable_grns={},
             context_size=None,  # no context for static variables
             layer_norm=self.layer_norm,
+            skip_interpolation=self.skip_interpolation,
         )
 
         # variable selection for encoder and decoder
@@ -202,6 +209,7 @@ def __init__(
             prescalers=self.prescalers_linear,
             single_variable_grns={},
             layer_norm=self.layer_norm,
+            skip_interpolation=self.skip_interpolation,
         )
 
         self.decoder_vsn = _VariableSelectionNetwork(
@@ -213,6 +221,7 @@ def __init__(
             prescalers=self.prescalers_linear,
             single_variable_grns={},
             layer_norm=self.layer_norm,
+            skip_interpolation=self.skip_interpolation,
         )
 
         # static encoders
@@ -368,11 +377,11 @@ def decoder_variables(self) -> list[str]:
         return self.variables_meta["model_config"]["time_varying_decoder_input"]
 
     @staticmethod
-    def expand_static_context(context: torch.Tensor, time_steps: int) -> torch.Tensor:
+    def expand_static_context(context: torch.Tensor) -> torch.Tensor:
         """
         add time dimension to static context
         """
-        return context[:, None].expand(-1, time_steps, -1)
+        return context.unsqueeze(1).contiguous()
 
     @staticmethod
     def get_relative_index(
@@ -409,7 +418,7 @@ def get_attention_mask_future(
         encoder_length: int,
         decoder_length: int,
         batch_size: int,
-        device: str,
+        device: torch.device,
         full_attention: bool,
     ) -> torch.Tensor:
         """
@@ -466,7 +475,6 @@ def forward(self, x_in: PLModuleInput) -> torch.Tensor:
         batch_size = x_cont_past.shape[dim_samples]
         encoder_length = self.input_chunk_length
         decoder_length = self.output_chunk_length
-        time_steps = encoder_length + decoder_length
 
         # avoid unnecessary regeneration of attention mask
         if batch_size != self.batch_size_last:
@@ -549,23 +557,23 @@ def forward(self, x_in: PLModuleInput) -> torch.Tensor:
             static_covariate_var = None
 
         static_context_expanded = self.expand_static_context(
-            context=self.static_context_grn(static_embedding), time_steps=time_steps
+            self.static_context_grn(static_embedding)
         )
 
         embeddings_varying_encoder = {
             name: input_vectors_past[name] for name in self.encoder_variables
         }
         embeddings_varying_encoder, encoder_sparse_weights = self.encoder_vsn(
             x=embeddings_varying_encoder,
-            context=static_context_expanded[:, :encoder_length],
+            context=static_context_expanded,
         )
 
         embeddings_varying_decoder = {
             name: input_vectors_future[name] for name in self.decoder_variables
         }
         embeddings_varying_decoder, decoder_sparse_weights = self.decoder_vsn(
             x=embeddings_varying_decoder,
-            context=static_context_expanded[:, encoder_length:],
+            context=static_context_expanded,
         )
 
         # LSTM
@@ -603,9 +611,7 @@ def forward(self, x_in: PLModuleInput) -> torch.Tensor:
         static_context_enriched = self.static_context_enrichment(static_embedding)
         attn_input = self.static_enrichment_grn(
             x=lstm_out,
-            context=self.expand_static_context(
-                context=static_context_enriched, time_steps=time_steps
-            ),
+            context=self.expand_static_context(static_context_enriched),
         )
 
         # multi-head attention
@@ -660,6 +666,7 @@ def __init__(
             dict[str, Union[int, tuple[int, int]]]
         ] = None,
         add_relative_index: bool = False,
+        skip_interpolation: bool = False,
         loss_fn: Optional[nn.Module] = None,
         likelihood: Optional[TorchLikelihood] = None,
         norm_type: Union[str, nn.Module] = "LayerNorm",
@@ -742,6 +749,10 @@ def __init__(
             This allows to use the TFTModel without having to pass future_covariates to :func:`fit()` and
             :func:`train()`. It gives a value to the position of each step from input and output chunk relative
             to the prediction point. The values are normalized with ``input_chunk_length``.
+        skip_interpolation
+            Whether to skip interpolation and replace with linear projection on feature embeddings in
+            VariableSelectionNetwork. Setting this to ``True`` could increase training and inference speed.
+            Defaults to ``False`` to preserve the permutation in the feature embedding space.
         loss_fn: nn.Module
             PyTorch loss function used for training. By default, the TFT model is probabilistic and uses a
             ``likelihood`` instead (``QuantileRegression``). To make the model deterministic, you can set the `
@@ -949,11 +960,12 @@ def encode_year(idx):
             else {}
         )
         self.add_relative_index = add_relative_index
+        self.skip_interpolation = skip_interpolation
         self.output_dim: Optional[tuple[int, int]] = None
         self.norm_type = norm_type
         self._considers_static_covariates = use_static_covariates
 
-    def _create_model(self, train_sample: TorchTrainingSample) -> nn.Module:
+    def _create_model(self, train_sample: TorchTrainingSample) -> PLForecastingModule:
         """
         `train_sample` contains the following tensors:
             (past_target, past_covariates, historic_future_covariates, future_covariates, static_covariates,
@@ -1140,6 +1152,7 @@ def _create_model(self, train_sample: TorchTrainingSample) -> nn.Module:
             hidden_continuous_size=self.hidden_continuous_size,
             categorical_embedding_sizes=self.categorical_embedding_sizes,
             add_relative_index=self.add_relative_index,
+            skip_interpolation=self.skip_interpolation,
             norm_type=self.norm_type,
             **self.pl_module_params,
         )
@@ -1149,7 +1162,7 @@ def _build_train_dataset(
         series: Sequence[TimeSeries],
         past_covariates: Optional[Sequence[TimeSeries]],
         future_covariates: Optional[Sequence[TimeSeries]],
-        sample_weight: Optional[Sequence[TimeSeries]],
+        sample_weight: Optional[Union[Sequence[TimeSeries], str]],
         max_samples_per_ts: Optional[int],
         stride: int = 1,
     ) -> TorchTrainingDataset: