-
Notifications
You must be signed in to change notification settings - Fork 9
Feature/aggregate rework #537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…egration # Conflicts: # flixopt/flow_system.py
This reverts commit e49ed1b.
Changes Made
1. flixopt/components.py - Storage class
- Added cluster_storage_mode parameter with default 'intercluster_cyclic'
- Updated docstring with full documentation
- Updated StorageModel._do_modeling to use self.element.cluster_storage_mode
2. flixopt/clustering/base.py - Clustering dataclass
- Removed storage_mode attribute
- Updated __repr__ to exclude storage mode
3. flixopt/transform_accessor.py - cluster() method
- Removed storage_mode parameter
- Updated docstrings
4. flixopt/flow_system.py - _add_inter_cluster_linking()
- Now filters storages by their cluster_storage_mode
- Only links storages with 'intercluster' or 'intercluster_cyclic'
5. flixopt/clustering/storage_linking.py - InterClusterLinking
- Changed __init__ to accept storages list instead of storage_cyclic flag
- Each storage's cyclic constraint is determined by its own cluster_storage_mode
New Usage
# Per-storage modes - different storages can have different behaviors
battery = Storage('Battery', ..., cluster_storage_mode='cyclic') # short-term
hydrogen = Storage('H2_Store', ..., cluster_storage_mode='intercluster_cyclic') # seasonal
# Cluster without storage_mode parameter
fs_clustered = fs.transform.cluster(n_clusters=4, cluster_duration='1D')
# Re-optimize with different storage modes by changing the attribute
battery.cluster_storage_mode = 'independent'
fs_clustered.optimize(solver) # Now battery uses 'independent' mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
flixopt/transform_accessor.py (1)
1087-1090: Use the property setter instead of direct assignment to_solution.The code directly assigns to
expanded_fs._solutioninstead of using thesolutionproperty setter. This bypasses cache invalidation for statistics.
🧹 Nitpick comments (6)
mkdocs.yml (1)
72-72: Navigation label doesn't match notebook filename.The navigation label is "Clustering" but the target notebook is
notebooks/08a-aggregation.ipynb. Consider renaming the notebook to08a-clustering.ipynbfor consistency, or keeping both label and filename aligned to avoid confusion.flixopt/clustering/__init__.py (1)
35-57: Lazy import pattern may cause unexpectedNonefor direct imports.
InterClusterLinkingis set toNoneat module level (Line 37) and included in__all__(Line 55). Users who dofrom flixopt.clustering import InterClusterLinkingwill getNoneinstead of the actual class. They must call_get_inter_cluster_linking()to get the real class, but this is a private function.Consider using
__getattr__at module level for lazy loading:🔎 Proposed fix using __getattr__
-# Lazy import for InterClusterLinking to avoid circular imports -# It depends on structure.Submodel which has complex import dependencies -InterClusterLinking = None - - -def _get_inter_cluster_linking(): - """Get InterClusterLinking class with lazy import.""" - global InterClusterLinking - if InterClusterLinking is None: - from .storage_linking import InterClusterLinking as _InterClusterLinking - - InterClusterLinking = _InterClusterLinking - return InterClusterLinking +_InterClusterLinking = None + + +def __getattr__(name: str): + """Lazy import for InterClusterLinking to avoid circular imports.""" + global _InterClusterLinking + if name == 'InterClusterLinking': + if _InterClusterLinking is None: + from .storage_linking import InterClusterLinking + _InterClusterLinking = InterClusterLinking + return _InterClusterLinking + raise AttributeError(f"module {__name__!r} has no attribute {name!r}")flixopt/clustering/base.py (1)
1028-1039: Consider usingnp.bincountfor counting cluster occurrences.The current implementation uses a list comprehension with
sum(1 for x in cluster_order if x == c)for each unique cluster, resulting in O(n*k) time complexity. Usingnp.bincountwould be O(n).🔎 Proposed optimization
- # Count occurrences of each cluster - unique_clusters = np.unique(cluster_order) - occurrences = {} - for c in unique_clusters: - occurrences[int(c)] = sum(1 for x in cluster_order if x == c) - - n_clusters = len(unique_clusters) + # Count occurrences of each cluster using bincount for efficiency + cluster_order_array = np.asarray(cluster_order) + n_clusters = int(cluster_order_array.max()) + 1 + occurrence_counts = np.bincount(cluster_order_array, minlength=n_clusters) + cluster_occurrences_da = xr.DataArray( - [occurrences.get(c, 0) for c in range(n_clusters)], + occurrence_counts, dims=['cluster'], name='cluster_occurrences', )tests/test_clustering/test_base.py (1)
142-159: Test usesNonefororiginal_flow_system- consider documenting expectation.Line 155 uses
original_flow_system=Nonewhich works for basic dataclass creation but may cause issues if any code path accesses properties on the FlowSystem. This is acceptable for unit testing but consider adding a comment.🔎 Suggested documentation
info = Clustering( result=result, - original_flow_system=None, # Would be FlowSystem in practice + original_flow_system=None, # type: ignore[arg-type] # OK for unit test; FlowSystem required in production backend_name='tsam', )docs/notebooks/08c-clustering.ipynb (1)
221-221: Remove redundantstrict=Falseparameter fromzip()call.Line 221 uses
zip(..., strict=False), which is unnecessary sincestrict=Falseis the default value. With the project requiring Python 3.11+, this parameter adds no functional value and can be safely removed to simplify the code. If you want to ensure the iterables have equal length, consider usingstrict=Trueinstead.flixopt/clustering/storage_linking.py (1)
126-132: Consider edge case when capacity bounds are unavailable.When
capacity.fixed_sizeandcapacity.maximumare both None, the fallback to1e9may be too permissive for some use cases. This could lead to numerical issues in solvers or unexpected behavior.🔎 Consider adding a warning
else: - cap_value = 1e9 # Large default + cap_value = 1e9 # Large default + logger.warning( + f'Storage {storage.label} has no capacity bounds - using 1e9 as upper bound for SOC_boundary' + )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
docs/notebooks/08c-clustering.ipynb(1 hunks)docs/notebooks/08c2-clustering-storage-modes.ipynb(1 hunks)docs/notebooks/data/generate_example_systems.py(5 hunks)flixopt/__init__.py(3 hunks)flixopt/clustering/__init__.py(1 hunks)flixopt/clustering/base.py(1 hunks)flixopt/clustering/storage_linking.py(1 hunks)flixopt/components.py(8 hunks)flixopt/flow_system.py(13 hunks)flixopt/plot_result.py(1 hunks)flixopt/transform_accessor.py(3 hunks)mkdocs.yml(1 hunks)tests/test_clustering/__init__.py(1 hunks)tests/test_clustering/test_base.py(1 hunks)tests/test_clustering/test_integration.py(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- flixopt/plot_result.py
- tests/test_clustering/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
- flixopt/init.py
🧰 Additional context used
🧬 Code graph analysis (4)
flixopt/clustering/storage_linking.py (4)
flixopt/structure.py (3)
Submodel(1642-1821)add_variables(1666-1675)add_constraints(1677-1686)flixopt/flow_system.py (2)
FlowSystem(51-2303)storages(1865-1876)flixopt/clustering/base.py (7)
ClusterStructure(34-232)n_clusters(951-956)n_original_periods(102-104)n_original_periods(959-963)has_multi_dims(107-109)cluster_order(937-941)get_cluster_order_for_slice(111-126)flixopt/components.py (1)
charge_state(1074-1076)
flixopt/components.py (2)
flixopt/structure.py (3)
timestep_duration(298-300)timestep_duration(1811-1812)flow_system(499-517)flixopt/clustering/base.py (1)
cluster_start_positions(978-996)
tests/test_clustering/test_base.py (1)
flixopt/clustering/base.py (13)
Clustering(882-996)ClusterResult(236-459)ClusterStructure(34-232)create_cluster_structure_from_mapping(999-1046)cluster_order(937-941)n_clusters(951-956)n_original_periods(102-104)n_original_periods(959-963)get_cluster_weight_per_timestep(147-173)timestep_mapping(973-975)n_original_timesteps(308-310)validate(425-459)get_expansion_mapping(312-321)
tests/test_clustering/test_integration.py (2)
flixopt/flow_system.py (4)
FlowSystem(51-2303)weights(1926-1951)coords(1879-1885)aggregation_weight(1954-1964)flixopt/structure.py (4)
TimeSeriesWeights(46-128)sum_over_time(101-112)effective_objective(92-94)cluster_weight(307-313)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: test (3.14)
- GitHub Check: Build documentation
🔇 Additional comments (35)
flixopt/clustering/base.py (1)
340-423: LGTM -expand_datahandles multi-dimensional cases correctly.The method properly handles simple (no period/scenario) and multi-dimensional cases with appropriate slicing and recombination using
xr.concat.tests/test_clustering/test_integration.py (1)
99-130: Good integration test for clustering workflow.The test properly uses
pytest.importorskip('tsam')for conditional testing, creates a realistic FlowSystem with time series data, and validates the expected reduction from 168 to 48 timesteps.tests/test_clustering/test_base.py (1)
14-63: Good test coverage for ClusterStructure.Tests cover basic creation, numpy array conversion, and weight calculation per timestep. The weight calculation test at lines 47-62 validates the expected behavior clearly.
docs/notebooks/08c-clustering.ipynb (3)
1-41: Well-structured introduction and setup.The notebook provides clear learning objectives and properly sets up dependencies with the tsam requirement note.
142-169: Good demonstration of peak forcing usage.The warning about
time_series_for_high_peaksis critical for users, and the code example clearly shows how to specify the peak series.
448-454: The referenced notebook08c2-clustering-storage-modes.ipynbexists in the repository atdocs/notebooks/08c2-clustering-storage-modes.ipynb. The link in line 454 is correct and requires no changes.Likely an incorrect or invalid review comment.
docs/notebooks/data/generate_example_systems.py (4)
237-351: LGTM! Well-structured district heating system generator.The function correctly loads CSV data, slices to one month, and constructs a complete FlowSystem with CHP, boiler, storage, and grid connections. Good use of InvestParameters for sizing optimization.
354-451: LGTM! Clean operational system setup.Good separation from the district heating system by removing investments and adding startup costs, making it suitable for rolling horizon optimization. The fixed storage capacity and startup constraints are appropriate for operational planning.
454-590: LGTM! Excellent seasonal storage demonstration system.The solar irradiance and heat demand profiles are well-designed to show seasonal mismatch. Using
initial_charge_state='equals_final'ensures yearly cyclic behavior. This is ideal for demonstrating inter-cluster storage linking value.
600-632: LGTM! Improved multiperiod system with longer horizon.The extension to 336 hours (2 weeks) provides better clustering demonstration capability. The updated comment about
transform.sisel()is helpful.docs/notebooks/08c2-clustering-storage-modes.ipynb (1)
1-425: LGTM! Excellent educational notebook for storage clustering modes.The notebook effectively demonstrates:
- Clear comparison of all four storage modes with measurable results
- Seasonal storage patterns visualization
- Practical guidance on mode selection
- Proper use of
expand_solution()for full-resolution analysisThe documentation table and interpretation sections provide valuable guidance for users.
flixopt/clustering/storage_linking.py (3)
55-83: LGTM! Clean initialization with proper extraction of cluster structure values.The constructor properly extracts commonly used values and handles both scalar and DataArray types for
n_clusters.
214-282: LGTM! Robust multi-dimensional constraint generation.The method correctly handles period/scenario dimensions by:
- Checking which dimensions exist in each variable
- Building appropriate selectors
- Generating descriptive constraint names with slice suffixes
This ensures proper constraint generation regardless of dimension structure.
181-188: Indexing is correct and properly leverages the extra timestep.The code correctly accesses end-of-cluster states. Since
charge_stateis created withextra_timestep=True, it hasn_clusters * timesteps_per_cluster + 1elements in its time coordinate. The formulaend_idx = (c + 1) * timesteps_per_clusteryields valid indices for all clusters (0 ton_clusters * timesteps_per_cluster), allowing the last cluster to access the final extra timestep. The comment about the extra timestep is accurate and documents this design correctly.flixopt/components.py (5)
273-300: LGTM! Well-designed cluster_mode parameter with comprehensive documentation.The new
cluster_modeparameter provides clear options for handling storage in clustering scenarios. The docstring explains each mode's behavior and use cases effectively. The default'intercluster_cyclic'is appropriate for most yearly optimization scenarios.Also applies to: 406-446
472-475: LGTM! Proper handling of None initial_charge_state.The updated condition correctly skips transformation when
initial_charge_stateis None (free) or a string ('equals_final'), only transforming numeric values.
551-562: LGTM! Updated plausibility check for None initial_charge_state.The condition now correctly handles the case where
initial_charge_stateis None, preventing invalid numeric comparisons.
919-959: Mask indexing correctly handles cluster boundary skipping.The mask construction using
clustering.cluster_start_positions[1:] - 1properly identifies and masks the last timestep of each cluster (except the first), which effectively skips inter-cluster boundaries as intended. The indexing is mathematically sound with no out-of-bounds risk.
947-959: The off-by-one concern is unfounded; indexing is correct, but clarify the boundary semantics.The code correctly uses
len(self._model.flow_system.timesteps)as the finalend_pos. Sincecharge_stateis indexed bytimesteps_extra(which has n+1 elements with valid indices 0 through n), andlen(timesteps)equals n, this index is valid and correctly points to the extra timestep.However, clarify the comment on line 951. It states "End of cluster i is at (start of cluster i+1) - 1" but the code uses
starts[i + 1]directly without the-1. This distinction matters: the code intentionally links at cluster boundaries (where cluster i+1 begins), not at the final data point of cluster i. Update the comment to reflect that the constraint enforces cyclic linkage at cluster transitions.flixopt/transform_accessor.py (5)
577-644: LGTM! Comprehensive cluster() method documentation.The docstring provides clear explanations of:
- The clustering workflow steps
- All parameters with examples
- Two-stage optimization workflow
- Important notes about peak margins and storage modes
This will help users understand and correctly apply the clustering feature.
658-668: LGTM! Proper validation of timestep uniformity and cluster duration.The checks ensure:
- Uniform timestep sizes (required for tsam clustering)
- Cluster duration is a multiple of timestep size
Good use of
np.isclosefor floating-point comparisons.
788-793: LGTM! Correct handling of 'equals_final' for reduced timesteps.Setting
initial_charge_state = Nonefor storages with 'equals_final' is correct since the cyclic constraint doesn't apply meaningfully to reduced typical periods. The storage_mode constraints handle boundaries appropriately.
1061-1064: LGTM! Clean expansion logic using ClusterResult.expand_data().The expand function correctly delegates to
info.result.expand_data()which handles the multi-dimensional mapping internally. This keeps the expansion logic centralized.
882-926: LGTM! Well-structured helper for combining slices.The
_combine_slices_to_dataarraymethod correctly handles:
- Simple case (no period/scenario)
- Period-only case
- Scenario-only case
- Full multi-dimensional case
Using
xr.concatwith proper dimension ordering ensures consistent output.flixopt/flow_system.py (11)
41-43: LGTM! Correct use of TYPE_CHECKING imports.The new type imports for
ClusteringandTimeSeriesWeightsare correctly placed in the TYPE_CHECKING block, avoiding circular import issues while providing proper type hints.
179-206: LGTM! Well-designed cluster weight initialization.The
cluster_weightparameter is properly documented and initialized with sensible defaults (1.0 for all timesteps when not clustered). The use offit_to_model_coordswith multiple dimensions is appropriate, as it gracefully handles only the dimensions that exist.
321-326: LGTM! Consistent renaming to timestep_duration.The method renaming from
calculate_hours_per_timesteptocalculate_timestep_durationis consistent with the broader architectural changes. The implementation correctly returns a DataArray with the new name 'timestep_duration'.
377-411: LGTM! Proper time metadata computation.The
_compute_time_metadatamethod is correctly updated to compute and returntimestep_durationinstead of the old naming. The extraction ofhours_of_last_timestepfrom the DataArray using.isel(time=-1).item()is appropriate.
447-488: LGTM! Proper time metadata synchronization.The
_update_time_metadatamethod correctly recomputes and updatestimestep_durationin the dataset. The check for variable existence before updating is good defensive programming.
696-698: LGTM! Proper deserialization of cluster_weight.The
cluster_weightparameter is correctly deserialized from the dataset using the same pattern as other optional parameters. The fallback toNoneensures backward compatibility with datasets that don't have cluster weights.
235-236: LGTM! Cleaner clustering state management.The renaming from
_clustering_infotoclusteringimproves clarity, and the type hintClustering | Noneprovides proper type safety.
1267-1300: LGTM! Architectural shift to inter-cluster linking.The method now uses
_add_inter_cluster_linking()instead of constraint-based clustering. This aligns with the PR's goal of moving from constraint-based clustering to an aggregation abstraction layer. The conditional check forself.clusteringproperly guards the linking logic.
1925-1951: LGTM! Excellent unified weighting interface.The new
weightsproperty provides a clean, PyPSA-inspired interface for accessing all weight types. The temporal weight correctly combinestimestep_duration(physical duration) andcluster_weight(representation count), which is the proper weight for time aggregation in clustered models.
1953-1964: LGTM! Convenient aggregation weight property.The
aggregation_weightproperty provides direct access to the combined time aggregation weight. The documentation correctly notes its equivalence toweights.temporaland recommends the unified interface for new code.
1302-1334: All interfaces verified and correctly implemented.The
Storage.cluster_modeattribute is properly defined incomponents.py(line 416) as aLiteral['independent', 'cyclic', 'intercluster', 'intercluster_cyclic']with default'intercluster_cyclic'. TheInterClusterLinkingclass inclustering/storage_linking.pycorrectly accepts all required parameters:model,flow_system,cluster_structure, andstorages. Thedo_modeling()method exists and is properly invoked. The reviewed code correctly filters storages and instantiates the linking model.
| def __repr__(self) -> str: | ||
| n_clusters = ( | ||
| int(self.n_clusters) if isinstance(self.n_clusters, (int, np.integer)) else int(self.n_clusters.values) | ||
| ) | ||
| occ = [int(self.cluster_occurrences.sel(cluster=c).values) for c in range(n_clusters)] | ||
| return ( | ||
| f'ClusterStructure(\n' | ||
| f' {self.n_original_periods} original periods → {n_clusters} clusters\n' | ||
| f' timesteps_per_cluster={self.timesteps_per_cluster}\n' | ||
| f' occurrences={occ}\n' | ||
| f')' | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__repr__ may fail for multi-dimensional n_clusters.
Line 90 uses int(self.n_clusters.values) which will raise TypeError if n_clusters is a multi-dimensional DataArray (since .values would return an array). This matches the documented multi-scenario case where n_clusters could vary per slice.
🔎 Proposed fix
def __repr__(self) -> str:
- n_clusters = (
- int(self.n_clusters) if isinstance(self.n_clusters, (int, np.integer)) else int(self.n_clusters.values)
- )
+ if isinstance(self.n_clusters, (int, np.integer)):
+ n_clusters = int(self.n_clusters)
+ elif isinstance(self.n_clusters, xr.DataArray):
+ n_clusters = int(self.n_clusters.values) if self.n_clusters.ndim == 0 else int(self.n_clusters.max().values)
+ else:
+ n_clusters = int(self.n_clusters)| if kind == 'duration_curve': | ||
| sorted_vars = {} | ||
| for var in ds.data_vars: | ||
| for rep in ds.coords['representation'].values: | ||
| values = np.sort(ds[var].sel(representation=rep).values.flatten())[::-1] | ||
| sorted_vars[(var, rep)] = values | ||
| n = len(values) | ||
| ds = xr.Dataset( | ||
| { | ||
| var: xr.DataArray( | ||
| [sorted_vars[(var, r)] for r in ['Original', 'Clustered']], | ||
| dims=['representation', 'rank'], | ||
| coords={'representation': ['Original', 'Clustered'], 'rank': range(n)}, | ||
| ) | ||
| for var in resolved_variables | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable values used outside loop may be undefined in edge cases.
Line 546 uses n = len(values) where values is assigned inside the inner loop (Line 544). While the earlier check at Line 524 prevents empty resolved_variables, if ds.data_vars is empty (which could happen if selection removes all variables), this would raise NameError.
🔎 Proposed fix
# For duration curve: flatten and sort values
if kind == 'duration_curve':
sorted_vars = {}
+ n = 0
for var in ds.data_vars:
for rep in ds.coords['representation'].values:
values = np.sort(ds[var].sel(representation=rep).values.flatten())[::-1]
sorted_vars[(var, rep)] = values
- n = len(values)
+ n = len(values)
ds = xr.Dataset(🤖 Prompt for AI Agents
In flixopt/clustering/base.py around lines 540 to 556, the variable `values` is
only assigned inside the inner loop and then used outside to compute `n =
len(values)`, which can be undefined if `ds.data_vars` is empty; fix by deriving
`n` from the populated `sorted_vars` collection (e.g., after the loops set n =
len(next(iter(sorted_vars.values()))) ) and add a guard for the case when
`sorted_vars` is empty (raise a clear error or skip building the Dataset) so you
never rely on a loop-local variable outside its scope.
| # Build long-form DataFrame with cluster labels including occurrence counts | ||
| rows = [] | ||
| data_vars = {} | ||
| for var in resolved_variables: | ||
| data = aggregated_data[var].values | ||
| data_by_cluster = data.reshape(n_clusters, timesteps_per_cluster) | ||
| data_vars[var] = xr.DataArray( | ||
| data_by_cluster, | ||
| dims=['cluster', 'timestep'], | ||
| coords={'cluster': range(n_clusters), 'timestep': range(timesteps_per_cluster)}, | ||
| ) | ||
| for c in range(n_clusters): | ||
| occurrence = int(cs.cluster_occurrences.sel(cluster=c).values) | ||
| label = f'Cluster {c} (×{occurrence})' | ||
| for t in range(timesteps_per_cluster): | ||
| rows.append({'cluster': label, 'timestep': t, 'value': data_by_cluster[c, t], 'variable': var}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clusters() assumes 1D aggregated data for reshape.
Line 834-835 reshapes data with data.reshape(n_clusters, timesteps_per_cluster), which assumes aggregated_data[var] is 1D. For multi-period/scenario data, this reshape would fail. Consider adding dimension validation or handling multi-dim data.
🔎 Proposed fix to add validation
for var in resolved_variables:
data = aggregated_data[var].values
+ if data.ndim != 1:
+ raise ValueError(
+ f"clusters() currently only supports 1D aggregated data. "
+ f"Variable '{var}' has shape {data.shape}. "
+ f"Use select={{...}} to reduce to a single period/scenario."
+ )
data_by_cluster = data.reshape(n_clusters, timesteps_per_cluster)🤖 Prompt for AI Agents
In flixopt/clustering/base.py around lines 830 to 845, the code unconditionally
reshapes aggregated_data[var].values with data.reshape(n_clusters,
timesteps_per_cluster) which assumes a 1D array; add validation to handle
multi-dimensional inputs by checking data.ndim and/or data.size: if data.ndim ==
1 assert data.size == n_clusters * timesteps_per_cluster and reshape as before;
if data.ndim > 1 either flatten via data.ravel() (after verifying size matches)
or explicitly stack/concatenate the time-like dimensions into a single 1D axis
(using xarray.stack or numpy.reshape) before reshaping, and if sizes don’t match
raise a clear ValueError indicating expected length and actual shape so callers
can correct input. Ensure the choice (flatten vs. explicit stack) is consistent
with how multi-period/scenario data should be interpreted in the clustering
context.
| @@ -0,0 +1,148 @@ | |||
| """Integration tests for flixopt.aggregation module with FlowSystem.""" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring references "aggregation" but module is for "clustering".
The docstring says "flixopt.aggregation module" but the file is test_clustering/test_integration.py. This appears to be a copy-paste oversight from the refactoring.
🔎 Proposed fix
-"""Integration tests for flixopt.aggregation module with FlowSystem."""
+"""Integration tests for flixopt.clustering module with FlowSystem."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """Integration tests for flixopt.aggregation module with FlowSystem.""" | |
| """Integration tests for flixopt.clustering module with FlowSystem.""" |
🤖 Prompt for AI Agents
In tests/test_clustering/test_integration.py around line 1, the module-level
docstring incorrectly references "flixopt.aggregation" due to copy-paste; update
the docstring to correctly describe this file as integration tests for the
flixopt.clustering module (and mention FlowSystem if desired) so it accurately
reflects the file's purpose and location.
Summary
New Files Created
1. flixopt/clustering/intercluster_helpers.py - New helper module with:
- SliceContext - dataclass for iteration context with period/scenario/cluster_order
- MultiDimIterator - unified iterator over (period, scenario) combinations
- CapacityBounds - dataclass for capacity bounds
- extract_capacity_bounds() - extract bounds from various capacity parameter types
- build_boundary_coords() - build coordinates/dims for SOC_boundary variables
Refactored Methods in StorageModel (components.py:1025-1211)
The monolithic 200+ line method was split into focused helper methods:
- _add_intercluster_linking() - Main orchestrator (now ~70 lines)
- _add_cluster_start_constraints() - Adds ΔE_0 = 0 constraints
- _compute_delta_soc() - Pre-computes delta_SOC for each cluster
- _add_linking_constraints() - Links SOC boundaries using MultiDimIterator
- _add_combined_bound_constraints() - Adds combined bounds, now correctly handles multi-dim cases
Removed Files
- flixopt/clustering/storage_linking.py - The old InterClusterLinking class is no longer needed
Updated Exports
- flixopt/clustering/__init__.py - Removed InterClusterLinking from exports
Key Bug Fix
The _add_combined_bound_constraints() method now uses MultiDimIterator to correctly iterate over (period, scenario) combinations, fixing the bug where it always used the default cluster_order slice regardless of dimensions.
All 125 clustering and storage tests pass.
…cleaner and more vectorized. Here's what changed: Before (overly complex) - 365+ individual constraints for linking and bounds (one per original period) - Dictionary-based delta_soc storage with integer keys - MultiDimIterator machinery with SliceContext and build_selector methods - Nested loops over periods, original_periods, and sample points After (simplified) - ~10 constraint groups total using vectorized operations - DataArray-based delta_soc with proper cluster dimension - No MultiDimIterator needed - xarray broadcasting handles dimensions naturally - Vectorized constraints for cluster starts, linking, and bounds Key changes in components.py: | Method | Before | After | |---------------------------------|-----------------------------------------------|------------------------------------------------------------| | _add_cluster_start_constraints | Loop over n_clusters | Single vectorized isel(time=cluster_starts) | | _compute_delta_soc | Returns dict[int, DataArray] | Returns DataArray with cluster dim | | _add_linking_constraints | Loop over n_original_periods with selectors | Single vectorized constraint using advanced indexing | | _add_combined_bound_constraints | Nested loops (periods × original_periods × 3) | Loop only over 3 sample points, vectorize original_periods | Removed from intercluster_helpers.py: - SliceContext dataclass - MultiDimIterator class (~70 lines of code) The code now follows the same pattern as the regular storage constraints (lines 928-945), using isel() with slices/arrays instead of Python loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
flixopt/clustering/intercluster_helpers.py (1)
54-64: Consider broadening the investment detection.The current check
has_investment = hasattr(capacity_param, 'maximum_size')might not detect InvestParameters objects that usefixed_sizewithoutmaximum_size. Consider:has_investment = isinstance(capacity_param, InvestParameters)This would ensure the investment constraint at line 1087-1091 in
components.pyis applied for all investment cases, not just continuous sizing.However, the current implementation appears safe since the bounds are correctly computed even if
has_investmentis False.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
flixopt/clustering/__init__.py(1 hunks)flixopt/clustering/intercluster_helpers.py(1 hunks)flixopt/components.py(10 hunks)flixopt/flow_system.py(13 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
flixopt/clustering/__init__.py (1)
flixopt/clustering/base.py (3)
ClusterResult(236-459)ClusterStructure(34-232)create_cluster_structure_from_mapping(999-1046)
flixopt/clustering/intercluster_helpers.py (2)
flixopt/flow_system.py (1)
FlowSystem(51-2265)flixopt/interface.py (1)
InvestParameters(956-1279)
flixopt/components.py (6)
flixopt/structure.py (5)
timestep_duration(298-300)timestep_duration(1811-1812)flow_system(499-517)values(1589-1594)values(1876-1877)flixopt/results.py (3)
flow_rate(2014-2015)flow_system(356-375)size(2022-2030)flixopt/flow_system.py (1)
coords(1841-1847)flixopt/clustering/base.py (5)
cluster_start_positions(978-996)n_clusters(951-956)n_original_periods(102-104)n_original_periods(959-963)cluster_order(937-941)flixopt/clustering/intercluster_helpers.py (2)
build_boundary_coords(81-106)extract_capacity_bounds(29-78)flixopt/interface.py (1)
InvestParameters(956-1279)
flixopt/flow_system.py (1)
flixopt/structure.py (5)
TimeSeriesWeights(46-128)cluster_weight(307-313)timestep_duration(298-300)timestep_duration(1811-1812)aggregation_weight(316-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.13)
- GitHub Check: test (3.14)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: Build documentation
🔇 Additional comments (14)
flixopt/clustering/__init__.py (1)
1-42: LGTM! Clean public API surface.The module provides a well-structured public API for clustering functionality with clear documentation and usage examples.
flixopt/components.py (7)
273-446: LGTM! Well-documented cluster-mode API.The new
cluster_modeparameter provides clear semantics for how storage behaves during clustering, with detailed documentation for each mode. Theinitial_charge_state=Noneoption is properly typed and documented.
472-562: LGTM! Proper handling of initial_charge_state=None.The transform and validation logic correctly handles the three cases (None, 'equals_final', numeric) for initial charge state. The conditional checks properly skip numeric comparisons when initial_charge_state is not numeric.
921-933: LGTM! Correct charge balance formulation.The balance equation properly uses
timestep_durationfor variable-duration timesteps. The exponential loss term((1 - rel_loss) ** timestep_duration)correctly models decay over variable durations.
935-959: LGTM! Proper inter-cluster boundary handling.The mask correctly skips balance constraints at inter-cluster boundaries, preventing naive links that would violate temporal structure. The cyclic mode constraints properly enforce that each cluster returns to its initial state.
961-1000: LGTM! Correct dual-mode investment bounding.The logic properly distinguishes between intercluster and non-intercluster modes:
- Intercluster: symmetric bounds
[-size, +size]becausecharge_staterepresents delta from boundary (can be negative)- Non-intercluster: standard bounds
[0, size]becausecharge_stateis absolute SOCThis is essential for the S-N model formulation.
1041-1213: LGTM! Solid inter-cluster linking implementation.The implementation correctly follows the S-N model from Blanke et al. (2022):
- Cluster start constraints (ΔE_0 = 0)
- SOC_boundary variables for absolute SOC tracking
- Vectorized linking constraints using cluster_order
- Combined bound constraints with sampling approach
The vectorized operations are efficient, and the sampling strategy (start, mid, end) is a reasonable approximation for enforcing bounds throughout each cluster.
1215-1254: LGTM! Comprehensive bounds handling for both modes.The bounds computation properly handles all cases:
- Unbounded storage:
-inf/0lower bound for intercluster/non-intercluster- Investment/fixed capacity: symmetric
[-cap, +cap]for intercluster, scaled[rel_min*cap, rel_max*cap]for non-interclusterThe use of
xr.ones_like()ensures proper time dimension alignment for the bounds arrays.flixopt/flow_system.py (3)
179-206: LGTM! Proper cluster_weight initialization.The
cluster_weightparameter is correctly initialized to 1.0 for all timesteps when None (non-clustered case). Thefit_to_model_coordscall withdims=['time', 'period', 'scenario']gracefully handles optional dimensions.
191-479: LGTM! Consistent renaming to timestep_duration.The renaming from
hours_per_steptotimestep_durationis consistently applied throughout:
- Variable names
- Function names (
calculate_timestep_duration)- DataArray names
- All references in metadata methods
The terminology is clearer and aligns with the broader refactoring in the PR.
1887-1926: LGTM! Clean unified weighting interface.The new
weightsproperty provides a PyPSA-inspired unified interface for all weight types. The temporal weight computation (timestep_duration * cluster_weight) correctly combines physical duration with cluster representation count.The
aggregation_weightconvenience property provides backward compatibility while directing users to the unified interface.flixopt/clustering/intercluster_helpers.py (3)
20-27: LGTM! Clean dataclass for capacity bounds.The
CapacityBoundsdataclass provides a clear structure for holding lower/upper bounds and investment flag, making the return type ofextract_capacity_boundsexplicit.
66-77: LGTM! Proper handling of DataArray and scalar bounds.The logic correctly handles both cases:
- DataArray: expands dimensions and transposes to maintain consistent dimension order
- Scalar: converts to float (handling numpy scalars) and creates a filled array
The fallback to 1e9 at line 64 appears to be defensive coding, as unbounded storage is handled separately in
Storage._absolute_charge_state_bounds.
81-106: LGTM! Clean coordinate construction.The function properly builds coordinates for SOC_boundary variables, starting with cluster_boundary and optionally adding period and scenario dimensions based on FlowSystem attributes.
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist
Summary by CodeRabbit
Release Notes (v5.1.0 Upcoming)
New Features
transform.cluster()andtransform.expand_solution()methods for timestep reduction.TimeSeriesWeightswith temporal, period, and scenario support.Breaking Changes
transform.cluster()instead.hours_per_steptotimestep_duration.Documentation
✏️ Tip: You can customize this high-level summary in your review settings.