Skip to content

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Dec 18, 2025

Description

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Code refactoring

Related Issues

Closes #(issue number)

Testing

  • I have tested my changes
  • Existing tests still pass

Checklist

  • My code follows the project style
  • I have updated documentation if needed
  • I have added tests for new functionality (if applicable)

Summary by CodeRabbit

Release Notes (v5.1.0 Upcoming)

  • New Features

    • Added time-series clustering workflow with transform.cluster() and transform.expand_solution() methods for timestep reduction.
    • Introduced unified weighting system via TimeSeriesWeights with temporal, period, and scenario support.
    • Added storage cluster modes (independent, cyclic, intercluster, intercluster_cyclic) for improved seasonal dynamics.
  • Breaking Changes

    • Removed legacy clustering API; use new transform.cluster() instead.
    • Renamed time metadata from hours_per_step to timestep_duration.
  • Documentation

    • Added four new clustering example notebooks demonstrating typical workflows, multi-period systems, and storage modes.
    • Enhanced API documentation with clustering best practices and safety guidelines.

✏️ Tip: You can customize this high-level summary in your review settings.

This reverts commit e49ed1b.
  Changes Made

  1. flixopt/components.py - Storage class

  - Added cluster_storage_mode parameter with default 'intercluster_cyclic'
  - Updated docstring with full documentation
  - Updated StorageModel._do_modeling to use self.element.cluster_storage_mode

  2. flixopt/clustering/base.py - Clustering dataclass

  - Removed storage_mode attribute
  - Updated __repr__ to exclude storage mode

  3. flixopt/transform_accessor.py - cluster() method

  - Removed storage_mode parameter
  - Updated docstrings

  4. flixopt/flow_system.py - _add_inter_cluster_linking()

  - Now filters storages by their cluster_storage_mode
  - Only links storages with 'intercluster' or 'intercluster_cyclic'

  5. flixopt/clustering/storage_linking.py - InterClusterLinking

  - Changed __init__ to accept storages list instead of storage_cyclic flag
  - Each storage's cyclic constraint is determined by its own cluster_storage_mode

  New Usage

  # Per-storage modes - different storages can have different behaviors
  battery = Storage('Battery', ..., cluster_storage_mode='cyclic')  # short-term
  hydrogen = Storage('H2_Store', ..., cluster_storage_mode='intercluster_cyclic')  # seasonal

  # Cluster without storage_mode parameter
  fs_clustered = fs.transform.cluster(n_clusters=4, cluster_duration='1D')

  # Re-optimize with different storage modes by changing the attribute
  battery.cluster_storage_mode = 'independent'
  fs_clustered.optimize(solver)  # Now battery uses 'independent' mode
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
flixopt/transform_accessor.py (1)

1087-1090: Use the property setter instead of direct assignment to _solution.

The code directly assigns to expanded_fs._solution instead of using the solution property setter. This bypasses cache invalidation for statistics.

🧹 Nitpick comments (6)
mkdocs.yml (1)

72-72: Navigation label doesn't match notebook filename.

The navigation label is "Clustering" but the target notebook is notebooks/08a-aggregation.ipynb. Consider renaming the notebook to 08a-clustering.ipynb for consistency, or keeping both label and filename aligned to avoid confusion.

flixopt/clustering/__init__.py (1)

35-57: Lazy import pattern may cause unexpected None for direct imports.

InterClusterLinking is set to None at module level (Line 37) and included in __all__ (Line 55). Users who do from flixopt.clustering import InterClusterLinking will get None instead of the actual class. They must call _get_inter_cluster_linking() to get the real class, but this is a private function.

Consider using __getattr__ at module level for lazy loading:

🔎 Proposed fix using __getattr__
-# Lazy import for InterClusterLinking to avoid circular imports
-# It depends on structure.Submodel which has complex import dependencies
-InterClusterLinking = None
-
-
-def _get_inter_cluster_linking():
-    """Get InterClusterLinking class with lazy import."""
-    global InterClusterLinking
-    if InterClusterLinking is None:
-        from .storage_linking import InterClusterLinking as _InterClusterLinking
-
-        InterClusterLinking = _InterClusterLinking
-    return InterClusterLinking
+_InterClusterLinking = None
+
+
+def __getattr__(name: str):
+    """Lazy import for InterClusterLinking to avoid circular imports."""
+    global _InterClusterLinking
+    if name == 'InterClusterLinking':
+        if _InterClusterLinking is None:
+            from .storage_linking import InterClusterLinking
+            _InterClusterLinking = InterClusterLinking
+        return _InterClusterLinking
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
flixopt/clustering/base.py (1)

1028-1039: Consider using np.bincount for counting cluster occurrences.

The current implementation uses a list comprehension with sum(1 for x in cluster_order if x == c) for each unique cluster, resulting in O(n*k) time complexity. Using np.bincount would be O(n).

🔎 Proposed optimization
-    # Count occurrences of each cluster
-    unique_clusters = np.unique(cluster_order)
-    occurrences = {}
-    for c in unique_clusters:
-        occurrences[int(c)] = sum(1 for x in cluster_order if x == c)
-
-    n_clusters = len(unique_clusters)
+    # Count occurrences of each cluster using bincount for efficiency
+    cluster_order_array = np.asarray(cluster_order)
+    n_clusters = int(cluster_order_array.max()) + 1
+    occurrence_counts = np.bincount(cluster_order_array, minlength=n_clusters)
+
     cluster_occurrences_da = xr.DataArray(
-        [occurrences.get(c, 0) for c in range(n_clusters)],
+        occurrence_counts,
         dims=['cluster'],
         name='cluster_occurrences',
     )
tests/test_clustering/test_base.py (1)

142-159: Test uses None for original_flow_system - consider documenting expectation.

Line 155 uses original_flow_system=None which works for basic dataclass creation but may cause issues if any code path accesses properties on the FlowSystem. This is acceptable for unit testing but consider adding a comment.

🔎 Suggested documentation
         info = Clustering(
             result=result,
-            original_flow_system=None,  # Would be FlowSystem in practice
+            original_flow_system=None,  # type: ignore[arg-type]  # OK for unit test; FlowSystem required in production
             backend_name='tsam',
         )
docs/notebooks/08c-clustering.ipynb (1)

221-221: Remove redundant strict=False parameter from zip() call.

Line 221 uses zip(..., strict=False), which is unnecessary since strict=False is the default value. With the project requiring Python 3.11+, this parameter adds no functional value and can be safely removed to simplify the code. If you want to ensure the iterables have equal length, consider using strict=True instead.

flixopt/clustering/storage_linking.py (1)

126-132: Consider edge case when capacity bounds are unavailable.

When capacity.fixed_size and capacity.maximum are both None, the fallback to 1e9 may be too permissive for some use cases. This could lead to numerical issues in solvers or unexpected behavior.

🔎 Consider adding a warning
         else:
-            cap_value = 1e9  # Large default
+            cap_value = 1e9  # Large default
+            logger.warning(
+                f'Storage {storage.label} has no capacity bounds - using 1e9 as upper bound for SOC_boundary'
+            )
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1ceac5 and a6e5bd4.

📒 Files selected for processing (15)
  • docs/notebooks/08c-clustering.ipynb (1 hunks)
  • docs/notebooks/08c2-clustering-storage-modes.ipynb (1 hunks)
  • docs/notebooks/data/generate_example_systems.py (5 hunks)
  • flixopt/__init__.py (3 hunks)
  • flixopt/clustering/__init__.py (1 hunks)
  • flixopt/clustering/base.py (1 hunks)
  • flixopt/clustering/storage_linking.py (1 hunks)
  • flixopt/components.py (8 hunks)
  • flixopt/flow_system.py (13 hunks)
  • flixopt/plot_result.py (1 hunks)
  • flixopt/transform_accessor.py (3 hunks)
  • mkdocs.yml (1 hunks)
  • tests/test_clustering/__init__.py (1 hunks)
  • tests/test_clustering/test_base.py (1 hunks)
  • tests/test_clustering/test_integration.py (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • flixopt/plot_result.py
  • tests/test_clustering/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • flixopt/init.py
🧰 Additional context used
🧬 Code graph analysis (4)
flixopt/clustering/storage_linking.py (4)
flixopt/structure.py (3)
  • Submodel (1642-1821)
  • add_variables (1666-1675)
  • add_constraints (1677-1686)
flixopt/flow_system.py (2)
  • FlowSystem (51-2303)
  • storages (1865-1876)
flixopt/clustering/base.py (7)
  • ClusterStructure (34-232)
  • n_clusters (951-956)
  • n_original_periods (102-104)
  • n_original_periods (959-963)
  • has_multi_dims (107-109)
  • cluster_order (937-941)
  • get_cluster_order_for_slice (111-126)
flixopt/components.py (1)
  • charge_state (1074-1076)
flixopt/components.py (2)
flixopt/structure.py (3)
  • timestep_duration (298-300)
  • timestep_duration (1811-1812)
  • flow_system (499-517)
flixopt/clustering/base.py (1)
  • cluster_start_positions (978-996)
tests/test_clustering/test_base.py (1)
flixopt/clustering/base.py (13)
  • Clustering (882-996)
  • ClusterResult (236-459)
  • ClusterStructure (34-232)
  • create_cluster_structure_from_mapping (999-1046)
  • cluster_order (937-941)
  • n_clusters (951-956)
  • n_original_periods (102-104)
  • n_original_periods (959-963)
  • get_cluster_weight_per_timestep (147-173)
  • timestep_mapping (973-975)
  • n_original_timesteps (308-310)
  • validate (425-459)
  • get_expansion_mapping (312-321)
tests/test_clustering/test_integration.py (2)
flixopt/flow_system.py (4)
  • FlowSystem (51-2303)
  • weights (1926-1951)
  • coords (1879-1885)
  • aggregation_weight (1954-1964)
flixopt/structure.py (4)
  • TimeSeriesWeights (46-128)
  • sum_over_time (101-112)
  • effective_objective (92-94)
  • cluster_weight (307-313)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.14)
  • GitHub Check: Build documentation
🔇 Additional comments (35)
flixopt/clustering/base.py (1)

340-423: LGTM - expand_data handles multi-dimensional cases correctly.

The method properly handles simple (no period/scenario) and multi-dimensional cases with appropriate slicing and recombination using xr.concat.

tests/test_clustering/test_integration.py (1)

99-130: Good integration test for clustering workflow.

The test properly uses pytest.importorskip('tsam') for conditional testing, creates a realistic FlowSystem with time series data, and validates the expected reduction from 168 to 48 timesteps.

tests/test_clustering/test_base.py (1)

14-63: Good test coverage for ClusterStructure.

Tests cover basic creation, numpy array conversion, and weight calculation per timestep. The weight calculation test at lines 47-62 validates the expected behavior clearly.

docs/notebooks/08c-clustering.ipynb (3)

1-41: Well-structured introduction and setup.

The notebook provides clear learning objectives and properly sets up dependencies with the tsam requirement note.


142-169: Good demonstration of peak forcing usage.

The warning about time_series_for_high_peaks is critical for users, and the code example clearly shows how to specify the peak series.


448-454: The referenced notebook 08c2-clustering-storage-modes.ipynb exists in the repository at docs/notebooks/08c2-clustering-storage-modes.ipynb. The link in line 454 is correct and requires no changes.

Likely an incorrect or invalid review comment.

docs/notebooks/data/generate_example_systems.py (4)

237-351: LGTM! Well-structured district heating system generator.

The function correctly loads CSV data, slices to one month, and constructs a complete FlowSystem with CHP, boiler, storage, and grid connections. Good use of InvestParameters for sizing optimization.


354-451: LGTM! Clean operational system setup.

Good separation from the district heating system by removing investments and adding startup costs, making it suitable for rolling horizon optimization. The fixed storage capacity and startup constraints are appropriate for operational planning.


454-590: LGTM! Excellent seasonal storage demonstration system.

The solar irradiance and heat demand profiles are well-designed to show seasonal mismatch. Using initial_charge_state='equals_final' ensures yearly cyclic behavior. This is ideal for demonstrating inter-cluster storage linking value.


600-632: LGTM! Improved multiperiod system with longer horizon.

The extension to 336 hours (2 weeks) provides better clustering demonstration capability. The updated comment about transform.sisel() is helpful.

docs/notebooks/08c2-clustering-storage-modes.ipynb (1)

1-425: LGTM! Excellent educational notebook for storage clustering modes.

The notebook effectively demonstrates:

  • Clear comparison of all four storage modes with measurable results
  • Seasonal storage patterns visualization
  • Practical guidance on mode selection
  • Proper use of expand_solution() for full-resolution analysis

The documentation table and interpretation sections provide valuable guidance for users.

flixopt/clustering/storage_linking.py (3)

55-83: LGTM! Clean initialization with proper extraction of cluster structure values.

The constructor properly extracts commonly used values and handles both scalar and DataArray types for n_clusters.


214-282: LGTM! Robust multi-dimensional constraint generation.

The method correctly handles period/scenario dimensions by:

  1. Checking which dimensions exist in each variable
  2. Building appropriate selectors
  3. Generating descriptive constraint names with slice suffixes

This ensures proper constraint generation regardless of dimension structure.


181-188: Indexing is correct and properly leverages the extra timestep.

The code correctly accesses end-of-cluster states. Since charge_state is created with extra_timestep=True, it has n_clusters * timesteps_per_cluster + 1 elements in its time coordinate. The formula end_idx = (c + 1) * timesteps_per_cluster yields valid indices for all clusters (0 to n_clusters * timesteps_per_cluster), allowing the last cluster to access the final extra timestep. The comment about the extra timestep is accurate and documents this design correctly.

flixopt/components.py (5)

273-300: LGTM! Well-designed cluster_mode parameter with comprehensive documentation.

The new cluster_mode parameter provides clear options for handling storage in clustering scenarios. The docstring explains each mode's behavior and use cases effectively. The default 'intercluster_cyclic' is appropriate for most yearly optimization scenarios.

Also applies to: 406-446


472-475: LGTM! Proper handling of None initial_charge_state.

The updated condition correctly skips transformation when initial_charge_state is None (free) or a string ('equals_final'), only transforming numeric values.


551-562: LGTM! Updated plausibility check for None initial_charge_state.

The condition now correctly handles the case where initial_charge_state is None, preventing invalid numeric comparisons.


919-959: Mask indexing correctly handles cluster boundary skipping.

The mask construction using clustering.cluster_start_positions[1:] - 1 properly identifies and masks the last timestep of each cluster (except the first), which effectively skips inter-cluster boundaries as intended. The indexing is mathematically sound with no out-of-bounds risk.


947-959: The off-by-one concern is unfounded; indexing is correct, but clarify the boundary semantics.

The code correctly uses len(self._model.flow_system.timesteps) as the final end_pos. Since charge_state is indexed by timesteps_extra (which has n+1 elements with valid indices 0 through n), and len(timesteps) equals n, this index is valid and correctly points to the extra timestep.

However, clarify the comment on line 951. It states "End of cluster i is at (start of cluster i+1) - 1" but the code uses starts[i + 1] directly without the -1. This distinction matters: the code intentionally links at cluster boundaries (where cluster i+1 begins), not at the final data point of cluster i. Update the comment to reflect that the constraint enforces cyclic linkage at cluster transitions.

flixopt/transform_accessor.py (5)

577-644: LGTM! Comprehensive cluster() method documentation.

The docstring provides clear explanations of:

  • The clustering workflow steps
  • All parameters with examples
  • Two-stage optimization workflow
  • Important notes about peak margins and storage modes

This will help users understand and correctly apply the clustering feature.


658-668: LGTM! Proper validation of timestep uniformity and cluster duration.

The checks ensure:

  1. Uniform timestep sizes (required for tsam clustering)
  2. Cluster duration is a multiple of timestep size

Good use of np.isclose for floating-point comparisons.


788-793: LGTM! Correct handling of 'equals_final' for reduced timesteps.

Setting initial_charge_state = None for storages with 'equals_final' is correct since the cyclic constraint doesn't apply meaningfully to reduced typical periods. The storage_mode constraints handle boundaries appropriately.


1061-1064: LGTM! Clean expansion logic using ClusterResult.expand_data().

The expand function correctly delegates to info.result.expand_data() which handles the multi-dimensional mapping internally. This keeps the expansion logic centralized.


882-926: LGTM! Well-structured helper for combining slices.

The _combine_slices_to_dataarray method correctly handles:

  • Simple case (no period/scenario)
  • Period-only case
  • Scenario-only case
  • Full multi-dimensional case

Using xr.concat with proper dimension ordering ensures consistent output.

flixopt/flow_system.py (11)

41-43: LGTM! Correct use of TYPE_CHECKING imports.

The new type imports for Clustering and TimeSeriesWeights are correctly placed in the TYPE_CHECKING block, avoiding circular import issues while providing proper type hints.


179-206: LGTM! Well-designed cluster weight initialization.

The cluster_weight parameter is properly documented and initialized with sensible defaults (1.0 for all timesteps when not clustered). The use of fit_to_model_coords with multiple dimensions is appropriate, as it gracefully handles only the dimensions that exist.


321-326: LGTM! Consistent renaming to timestep_duration.

The method renaming from calculate_hours_per_timestep to calculate_timestep_duration is consistent with the broader architectural changes. The implementation correctly returns a DataArray with the new name 'timestep_duration'.


377-411: LGTM! Proper time metadata computation.

The _compute_time_metadata method is correctly updated to compute and return timestep_duration instead of the old naming. The extraction of hours_of_last_timestep from the DataArray using .isel(time=-1).item() is appropriate.


447-488: LGTM! Proper time metadata synchronization.

The _update_time_metadata method correctly recomputes and updates timestep_duration in the dataset. The check for variable existence before updating is good defensive programming.


696-698: LGTM! Proper deserialization of cluster_weight.

The cluster_weight parameter is correctly deserialized from the dataset using the same pattern as other optional parameters. The fallback to None ensures backward compatibility with datasets that don't have cluster weights.


235-236: LGTM! Cleaner clustering state management.

The renaming from _clustering_info to clustering improves clarity, and the type hint Clustering | None provides proper type safety.


1267-1300: LGTM! Architectural shift to inter-cluster linking.

The method now uses _add_inter_cluster_linking() instead of constraint-based clustering. This aligns with the PR's goal of moving from constraint-based clustering to an aggregation abstraction layer. The conditional check for self.clustering properly guards the linking logic.


1925-1951: LGTM! Excellent unified weighting interface.

The new weights property provides a clean, PyPSA-inspired interface for accessing all weight types. The temporal weight correctly combines timestep_duration (physical duration) and cluster_weight (representation count), which is the proper weight for time aggregation in clustered models.


1953-1964: LGTM! Convenient aggregation weight property.

The aggregation_weight property provides direct access to the combined time aggregation weight. The documentation correctly notes its equivalence to weights.temporal and recommends the unified interface for new code.


1302-1334: All interfaces verified and correctly implemented.

The Storage.cluster_mode attribute is properly defined in components.py (line 416) as a Literal['independent', 'cyclic', 'intercluster', 'intercluster_cyclic'] with default 'intercluster_cyclic'. The InterClusterLinking class in clustering/storage_linking.py correctly accepts all required parameters: model, flow_system, cluster_structure, and storages. The do_modeling() method exists and is properly invoked. The reviewed code correctly filters storages and instantiates the linking model.

Comment on lines +88 to +99
def __repr__(self) -> str:
n_clusters = (
int(self.n_clusters) if isinstance(self.n_clusters, (int, np.integer)) else int(self.n_clusters.values)
)
occ = [int(self.cluster_occurrences.sel(cluster=c).values) for c in range(n_clusters)]
return (
f'ClusterStructure(\n'
f' {self.n_original_periods} original periods → {n_clusters} clusters\n'
f' timesteps_per_cluster={self.timesteps_per_cluster}\n'
f' occurrences={occ}\n'
f')'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

__repr__ may fail for multi-dimensional n_clusters.

Line 90 uses int(self.n_clusters.values) which will raise TypeError if n_clusters is a multi-dimensional DataArray (since .values would return an array). This matches the documented multi-scenario case where n_clusters could vary per slice.

🔎 Proposed fix
     def __repr__(self) -> str:
-        n_clusters = (
-            int(self.n_clusters) if isinstance(self.n_clusters, (int, np.integer)) else int(self.n_clusters.values)
-        )
+        if isinstance(self.n_clusters, (int, np.integer)):
+            n_clusters = int(self.n_clusters)
+        elif isinstance(self.n_clusters, xr.DataArray):
+            n_clusters = int(self.n_clusters.values) if self.n_clusters.ndim == 0 else int(self.n_clusters.max().values)
+        else:
+            n_clusters = int(self.n_clusters)

Comment on lines +540 to +556
if kind == 'duration_curve':
sorted_vars = {}
for var in ds.data_vars:
for rep in ds.coords['representation'].values:
values = np.sort(ds[var].sel(representation=rep).values.flatten())[::-1]
sorted_vars[(var, rep)] = values
n = len(values)
ds = xr.Dataset(
{
var: xr.DataArray(
[sorted_vars[(var, r)] for r in ['Original', 'Clustered']],
dims=['representation', 'rank'],
coords={'representation': ['Original', 'Clustered'], 'rank': range(n)},
)
for var in resolved_variables
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Variable values used outside loop may be undefined in edge cases.

Line 546 uses n = len(values) where values is assigned inside the inner loop (Line 544). While the earlier check at Line 524 prevents empty resolved_variables, if ds.data_vars is empty (which could happen if selection removes all variables), this would raise NameError.

🔎 Proposed fix
         # For duration curve: flatten and sort values
         if kind == 'duration_curve':
             sorted_vars = {}
+            n = 0
             for var in ds.data_vars:
                 for rep in ds.coords['representation'].values:
                     values = np.sort(ds[var].sel(representation=rep).values.flatten())[::-1]
                     sorted_vars[(var, rep)] = values
-            n = len(values)
+                    n = len(values)
             ds = xr.Dataset(
🤖 Prompt for AI Agents
In flixopt/clustering/base.py around lines 540 to 556, the variable `values` is
only assigned inside the inner loop and then used outside to compute `n =
len(values)`, which can be undefined if `ds.data_vars` is empty; fix by deriving
`n` from the populated `sorted_vars` collection (e.g., after the loops set n =
len(next(iter(sorted_vars.values()))) ) and add a guard for the case when
`sorted_vars` is empty (raise a clear error or skip building the Dataset) so you
never rely on a loop-local variable outside its scope.

Comment on lines +830 to +845
# Build long-form DataFrame with cluster labels including occurrence counts
rows = []
data_vars = {}
for var in resolved_variables:
data = aggregated_data[var].values
data_by_cluster = data.reshape(n_clusters, timesteps_per_cluster)
data_vars[var] = xr.DataArray(
data_by_cluster,
dims=['cluster', 'timestep'],
coords={'cluster': range(n_clusters), 'timestep': range(timesteps_per_cluster)},
)
for c in range(n_clusters):
occurrence = int(cs.cluster_occurrences.sel(cluster=c).values)
label = f'Cluster {c} (×{occurrence})'
for t in range(timesteps_per_cluster):
rows.append({'cluster': label, 'timestep': t, 'value': data_by_cluster[c, t], 'variable': var})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

clusters() assumes 1D aggregated data for reshape.

Line 834-835 reshapes data with data.reshape(n_clusters, timesteps_per_cluster), which assumes aggregated_data[var] is 1D. For multi-period/scenario data, this reshape would fail. Consider adding dimension validation or handling multi-dim data.

🔎 Proposed fix to add validation
         for var in resolved_variables:
             data = aggregated_data[var].values
+            if data.ndim != 1:
+                raise ValueError(
+                    f"clusters() currently only supports 1D aggregated data. "
+                    f"Variable '{var}' has shape {data.shape}. "
+                    f"Use select={{...}} to reduce to a single period/scenario."
+                )
             data_by_cluster = data.reshape(n_clusters, timesteps_per_cluster)
🤖 Prompt for AI Agents
In flixopt/clustering/base.py around lines 830 to 845, the code unconditionally
reshapes aggregated_data[var].values with data.reshape(n_clusters,
timesteps_per_cluster) which assumes a 1D array; add validation to handle
multi-dimensional inputs by checking data.ndim and/or data.size: if data.ndim ==
1 assert data.size == n_clusters * timesteps_per_cluster and reshape as before;
if data.ndim > 1 either flatten via data.ravel() (after verifying size matches)
or explicitly stack/concatenate the time-like dimensions into a single 1D axis
(using xarray.stack or numpy.reshape) before reshaping, and if sizes don’t match
raise a clear ValueError indicating expected length and actual shape so callers
can correct input. Ensure the choice (flatten vs. explicit stack) is consistent
with how multi-period/scenario data should be interpreted in the clustering
context.

@@ -0,0 +1,148 @@
"""Integration tests for flixopt.aggregation module with FlowSystem."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring references "aggregation" but module is for "clustering".

The docstring says "flixopt.aggregation module" but the file is test_clustering/test_integration.py. This appears to be a copy-paste oversight from the refactoring.

🔎 Proposed fix
-"""Integration tests for flixopt.aggregation module with FlowSystem."""
+"""Integration tests for flixopt.clustering module with FlowSystem."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Integration tests for flixopt.aggregation module with FlowSystem."""
"""Integration tests for flixopt.clustering module with FlowSystem."""
🤖 Prompt for AI Agents
In tests/test_clustering/test_integration.py around line 1, the module-level
docstring incorrectly references "flixopt.aggregation" due to copy-paste; update
the docstring to correctly describe this file as integration tests for the
flixopt.clustering module (and mention FlowSystem if desired) so it accurately
reflects the file's purpose and location.

  Summary

  New Files Created

  1. flixopt/clustering/intercluster_helpers.py - New helper module with:
    - SliceContext - dataclass for iteration context with period/scenario/cluster_order
    - MultiDimIterator - unified iterator over (period, scenario) combinations
    - CapacityBounds - dataclass for capacity bounds
    - extract_capacity_bounds() - extract bounds from various capacity parameter types
    - build_boundary_coords() - build coordinates/dims for SOC_boundary variables

  Refactored Methods in StorageModel (components.py:1025-1211)

  The monolithic 200+ line method was split into focused helper methods:
  - _add_intercluster_linking() - Main orchestrator (now ~70 lines)
  - _add_cluster_start_constraints() - Adds ΔE_0 = 0 constraints
  - _compute_delta_soc() - Pre-computes delta_SOC for each cluster
  - _add_linking_constraints() - Links SOC boundaries using MultiDimIterator
  - _add_combined_bound_constraints() - Adds combined bounds, now correctly handles multi-dim cases

  Removed Files

  - flixopt/clustering/storage_linking.py - The old InterClusterLinking class is no longer needed

  Updated Exports

  - flixopt/clustering/__init__.py - Removed InterClusterLinking from exports

  Key Bug Fix

  The _add_combined_bound_constraints() method now uses MultiDimIterator to correctly iterate over (period, scenario) combinations, fixing the bug where it always used the default cluster_order slice regardless of dimensions.

  All 125 clustering and storage tests pass.
…cleaner and more vectorized. Here's what changed:

  Before (overly complex)

  - 365+ individual constraints for linking and bounds (one per original period)
  - Dictionary-based delta_soc storage with integer keys
  - MultiDimIterator machinery with SliceContext and build_selector methods
  - Nested loops over periods, original_periods, and sample points

  After (simplified)

  - ~10 constraint groups total using vectorized operations
  - DataArray-based delta_soc with proper cluster dimension
  - No MultiDimIterator needed - xarray broadcasting handles dimensions naturally
  - Vectorized constraints for cluster starts, linking, and bounds

  Key changes in components.py:

  | Method                          | Before                                        | After                                                      |
  |---------------------------------|-----------------------------------------------|------------------------------------------------------------|
  | _add_cluster_start_constraints  | Loop over n_clusters                          | Single vectorized isel(time=cluster_starts)                |
  | _compute_delta_soc              | Returns dict[int, DataArray]                  | Returns DataArray with cluster dim                         |
  | _add_linking_constraints        | Loop over n_original_periods with selectors   | Single vectorized constraint using advanced indexing       |
  | _add_combined_bound_constraints | Nested loops (periods × original_periods × 3) | Loop only over 3 sample points, vectorize original_periods |

  Removed from intercluster_helpers.py:

  - SliceContext dataclass
  - MultiDimIterator class (~70 lines of code)

  The code now follows the same pattern as the regular storage constraints (lines 928-945), using isel() with slices/arrays instead of Python loops.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
flixopt/clustering/intercluster_helpers.py (1)

54-64: Consider broadening the investment detection.

The current check has_investment = hasattr(capacity_param, 'maximum_size') might not detect InvestParameters objects that use fixed_size without maximum_size. Consider:

has_investment = isinstance(capacity_param, InvestParameters)

This would ensure the investment constraint at line 1087-1091 in components.py is applied for all investment cases, not just continuous sizing.

However, the current implementation appears safe since the bounds are correctly computed even if has_investment is False.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6e5bd4 and b334699.

📒 Files selected for processing (4)
  • flixopt/clustering/__init__.py (1 hunks)
  • flixopt/clustering/intercluster_helpers.py (1 hunks)
  • flixopt/components.py (10 hunks)
  • flixopt/flow_system.py (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
flixopt/clustering/__init__.py (1)
flixopt/clustering/base.py (3)
  • ClusterResult (236-459)
  • ClusterStructure (34-232)
  • create_cluster_structure_from_mapping (999-1046)
flixopt/clustering/intercluster_helpers.py (2)
flixopt/flow_system.py (1)
  • FlowSystem (51-2265)
flixopt/interface.py (1)
  • InvestParameters (956-1279)
flixopt/components.py (6)
flixopt/structure.py (5)
  • timestep_duration (298-300)
  • timestep_duration (1811-1812)
  • flow_system (499-517)
  • values (1589-1594)
  • values (1876-1877)
flixopt/results.py (3)
  • flow_rate (2014-2015)
  • flow_system (356-375)
  • size (2022-2030)
flixopt/flow_system.py (1)
  • coords (1841-1847)
flixopt/clustering/base.py (5)
  • cluster_start_positions (978-996)
  • n_clusters (951-956)
  • n_original_periods (102-104)
  • n_original_periods (959-963)
  • cluster_order (937-941)
flixopt/clustering/intercluster_helpers.py (2)
  • build_boundary_coords (81-106)
  • extract_capacity_bounds (29-78)
flixopt/interface.py (1)
  • InvestParameters (956-1279)
flixopt/flow_system.py (1)
flixopt/structure.py (5)
  • TimeSeriesWeights (46-128)
  • cluster_weight (307-313)
  • timestep_duration (298-300)
  • timestep_duration (1811-1812)
  • aggregation_weight (316-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.14)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: Build documentation
🔇 Additional comments (14)
flixopt/clustering/__init__.py (1)

1-42: LGTM! Clean public API surface.

The module provides a well-structured public API for clustering functionality with clear documentation and usage examples.

flixopt/components.py (7)

273-446: LGTM! Well-documented cluster-mode API.

The new cluster_mode parameter provides clear semantics for how storage behaves during clustering, with detailed documentation for each mode. The initial_charge_state=None option is properly typed and documented.


472-562: LGTM! Proper handling of initial_charge_state=None.

The transform and validation logic correctly handles the three cases (None, 'equals_final', numeric) for initial charge state. The conditional checks properly skip numeric comparisons when initial_charge_state is not numeric.


921-933: LGTM! Correct charge balance formulation.

The balance equation properly uses timestep_duration for variable-duration timesteps. The exponential loss term ((1 - rel_loss) ** timestep_duration) correctly models decay over variable durations.


935-959: LGTM! Proper inter-cluster boundary handling.

The mask correctly skips balance constraints at inter-cluster boundaries, preventing naive links that would violate temporal structure. The cyclic mode constraints properly enforce that each cluster returns to its initial state.


961-1000: LGTM! Correct dual-mode investment bounding.

The logic properly distinguishes between intercluster and non-intercluster modes:

  • Intercluster: symmetric bounds [-size, +size] because charge_state represents delta from boundary (can be negative)
  • Non-intercluster: standard bounds [0, size] because charge_state is absolute SOC

This is essential for the S-N model formulation.


1041-1213: LGTM! Solid inter-cluster linking implementation.

The implementation correctly follows the S-N model from Blanke et al. (2022):

  1. Cluster start constraints (ΔE_0 = 0)
  2. SOC_boundary variables for absolute SOC tracking
  3. Vectorized linking constraints using cluster_order
  4. Combined bound constraints with sampling approach

The vectorized operations are efficient, and the sampling strategy (start, mid, end) is a reasonable approximation for enforcing bounds throughout each cluster.


1215-1254: LGTM! Comprehensive bounds handling for both modes.

The bounds computation properly handles all cases:

  • Unbounded storage: -inf/0 lower bound for intercluster/non-intercluster
  • Investment/fixed capacity: symmetric [-cap, +cap] for intercluster, scaled [rel_min*cap, rel_max*cap] for non-intercluster

The use of xr.ones_like() ensures proper time dimension alignment for the bounds arrays.

flixopt/flow_system.py (3)

179-206: LGTM! Proper cluster_weight initialization.

The cluster_weight parameter is correctly initialized to 1.0 for all timesteps when None (non-clustered case). The fit_to_model_coords call with dims=['time', 'period', 'scenario'] gracefully handles optional dimensions.


191-479: LGTM! Consistent renaming to timestep_duration.

The renaming from hours_per_step to timestep_duration is consistently applied throughout:

  • Variable names
  • Function names (calculate_timestep_duration)
  • DataArray names
  • All references in metadata methods

The terminology is clearer and aligns with the broader refactoring in the PR.


1887-1926: LGTM! Clean unified weighting interface.

The new weights property provides a PyPSA-inspired unified interface for all weight types. The temporal weight computation (timestep_duration * cluster_weight) correctly combines physical duration with cluster representation count.

The aggregation_weight convenience property provides backward compatibility while directing users to the unified interface.

flixopt/clustering/intercluster_helpers.py (3)

20-27: LGTM! Clean dataclass for capacity bounds.

The CapacityBounds dataclass provides a clear structure for holding lower/upper bounds and investment flag, making the return type of extract_capacity_bounds explicit.


66-77: LGTM! Proper handling of DataArray and scalar bounds.

The logic correctly handles both cases:

  • DataArray: expands dimensions and transposes to maintain consistent dimension order
  • Scalar: converts to float (handling numpy scalars) and creates a filled array

The fallback to 1e9 at line 64 appears to be defensive coding, as unbounded storage is handled separately in Storage._absolute_charge_state_bounds.


81-106: LGTM! Clean coordinate construction.

The function properly builds coordinates for SOC_boundary variables, starting with cluster_boundary and optionally adding period and scenario dimensions based on FlowSystem attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants