Releases: neo4j/graph-data-science
2.3.4
Graph Data Science 2.3.3
New features
Neo4j Database Compatibility
-
This release is compatible with all Neo4j 5.x database version <=
5.7.0. Please see our compatibility matrix above. -
Added
includeGraphsparameter togds.alpha.backupto allow backups without graphs.
Bug fixes
- Multiclass node classification compatible with non-consecutive class ids
- RandomWalk stable on multiple runs (user contribution by github user hindog)
Improvements
- Make
gds.alpha.restoremore failsafe- Continue to restore graphs and models also after the first failure for a user.
- Improve logging around failures
Full Changelog: 2.3.2...2.3.3
Graph Data Science 2.3.2
GDS 2.3.2 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9).
For GDS compatibility with previous releases, please use GDS Compatibility Table.
New features
Neo4j Database Compatibility
- This release is compatible with all Neo4j 5.x database version <=
5.6.0. Please see our compatibility matrix above.
Bug fixes
- Graphs imported via Arrow no longer cause invalid node mappings that produced
ArrayIndexOutOfBoundsExceptions - Correct memory estimation of Leiden for very small graphs
- KNN no longer result in an AIOOB exception if the array node properties did not exist for some nodes
- CELF no longer returns negative gains for some nodes
- GraphSage will no longer return NaN values because of incorrect neighbor sampling
Improvements
- More accurate memory estimation on Node Similarity and filtered Node Similarity algorithms for high topN or topK values.
- The
gds.alpha.modularityprocedures for computing modularity no longer require each community to be smaller than the size of the graph. - Improve the progress logging of
gds.graph.project.cypherto be more accurate. Especially, this avoids underestimating when the relationship query is more complex.
Graph Data Science 2.3.1
GDS 2.3.1 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9) & 4.3 versions (≥ 4.3.15) Database.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
New features
Neo4j Database Compatibility
- This release is compatible with all Neo4j 5.x database version <=
5.5.0. Please see our compatibility matrix above.
Log Progress
- New optional configuration parameter
logProgressallows you to specify whether percentage logging for that procedural call is on or off.
Bug fixes
- Louvain no longer reports the incorrect modularity
- Leiden on weighted graphs communities are now reported correctly
- Persisted Models no longer cause false positive error logs when loaded into the Model Catalog
- Yens on graphs without parallel relationships would cause issues
Improvements
- Filtered Node Similarity progress logging has been improved
Graph Data Science 2.3.0
GDS 2.3.0 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9) & 4.3 versions (≥ 4.3.15) Database.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
Breaking changes
- Leiden was promoted to the beta tier. It is now called via the 'gds.beta.leiden' command instead of the
gds.alpha.leidencommand. - K-means was promoted to the beta tier. It is now called via the
gds.beta.kmeanscommand instead of thegds.alpha.kmeanscommand. - Minimum weighted spanning tree algorithm was promoted to the beta tier. It is now called via the
gds.beta.spanningTreecommand instead ofgds.alpha.spanningTree- The procedures
gds.alpha.spanningTree.minimumandgds.alpha.spanningTree.maximumhave been removed. You can get the same behaviour by specifying the new parameterobjectiveingds.beta.spanningTree. - The
weightWritePropertyhas been removed as a configuration parameter. To supply the Relationship Type and Property for the produced relationship, use:mutateRelationshipTypemutateProperty
gds.alpha.spanningTree.kminandgds.alpha.spanningTree.kmaxhave been removed as the K-Spanning Tree algorithm has been moved in its own spacegds.alpha.kSpanningTree- The parameter
startNodeIdin all Spanning Tree algorithms has been replaced withsourceNode.
- The procedures
- Arrow: when projecting graphs,
nullwill be translated toNaNfor floating point values. This enables users of either the GDS Python Client or PyArrow to loadNaNproperties stored in Pandas DataFrames - Cypher Aggregations will become the primary surface for creating projections with Cypher. Offering a more intuitive and expressive interface than Cypher Projections that can also be used in Fabric or Composite Database setups.
- The algorithm
gds.alpha.influenceMaximization.greedyhas been removed. It's replacement is thegds.beta.influenceMaximization.celfalgorithm which has the same configuration parameters and offers better performance.
New features
Neo4j Database Compatibility
- This release is compatible with all Neo4j 5.x database version <=
5.4.0. Please see compatibility matrix above.
Minimum Directed Steiner Tree
- Added heuristic for minimum directed Steiner Tree under the
gds.beta.steinerTreedomain.- Added
statsmode withgds.beta.steinerTree.stats - Added
streammode withgds.beta.steinerTree.stream - Added
mutatemode withgds.beta.steinerTree.mutate - Added
writemode withgds.beta.steinerTree.write - Now available in progress tracking -
gds.list.progress()
- Added
Leiden
- New parameter
consecutiveIdsthat assigns consecutive ids for the discovered communities. - New parameter
seedPropertyto seed initial communities for nodes. - New parameter
toleranceto enable convergence criteria based on differences in modularity from one iteration to another. - Now available in progress tracking -
gds.list.progress() - Added memory estimation mode:
gds.beta.leiden.mutate.estimategds.beta.leiden.stats.estimategds.beta.leiden.stream.estimategds.beta.leiden.write.estimate
Logistic Regression & MLP
- New configuration parameters
classWeightsandfocusWeightfor training methods, supported by procedures:gds.beta.pipeline.nodeClassification.addLogisticRegressiongds.beta.pipeline.nodeClassification.addMLPgds.beta.pipeline.linkPrediction.addLogisticRegressiongds.beta.pipeline.linkPrediction.addMLP
HashGNN
- New algorithm
gds.alpha.hashgnn.{mutate,stream}to create HashGNN node embeddings - New estimation procedures
gds.alpha.hashgnn.{mutate,stream}.estimateto estimate the memory required to run HashGNN
Link Prediction
- Added new optional configuration parameter
negativeRelationshipTypetogds.beta.pipeline.linkPrediction.configureSplit
Spanning Tree
- New modes supported:
gds.beta.spanningTree.{stats, stream, mutate} - New yield outputs for
gds.beta.spanningTree:- the sum of weights in the discovered spanning tree.
- the number of relationships written or added for write and mutate mode respectively.
- Added memory estimation mode :
gds.beta.spanningTree.stream.estimategds.beta.spanningTree.mutate.estimategds.beta.spanningTree.stats.estimategds.beta.spanningTree.write.estimate
Write Labels
gds.alpha.graph.nodeLabel.mutateallows for the Graph Projection to be mutated with new labelsgds.alpha.graph.nodeLabel.writeallows for Node Labels to be written back from projections to a Neo4j Database
Graph Projections
- Arrow now supports specifying undirected relationship types using the
undirected_relationship_typesconfiguration argument - Cypher Aggregations (
gds.alpha.graph.project) now support specifying undirected relationship types using theundirectedRelationshipTypesconfiguration option - New procedure to turn directed relationships into undirected relationships:
gds.beta.graph.relationships.toUndirected - Projections created using either the Native, Arrow and Cypher Aggregation APIs can now be "inverse indexed", this will enable more efficient algorithm implementations
Administration
- Added the
jobIdandusernameto theongoingGdsProceduresreturn field ofgds.alpha.systemMonitor. - Added username as a new return field to
gds.beta.listProgress. - Added a new return field to
gds.graph.listcalledschemaWithOrientationwhich also includes the orientation. - Administrators can now see all running tasks from all users with
gds.beta.listProgress
Bug fixes
- Minimum Weighted Spanning Tree: Graphs with parallel edges could make the discovered tree have wrong weights on relationships
- Cypher Aggregations: When using
gds.alpha.graph.project:- The projected graph would list relationship types with zero relationships
- AIOOB exceptions could surface due to sizing errors
- Arrow:
CREATE_DATABASEaction would throw anNullPointerExceptionif missing ID fields in the Arrow record. A more descriptive exception is provided gds.graph.listcould cause issues on some JDKs when calculating the memory usage of Projections- Export relationship progress logging (
gds.beta.graph.export.csv) reports the correct progress - Graph constructed with Cypher Aggregation using arbitrary IDs are now blocked from write procedures
- The k-Spanning Tree algorithm no longer returns disconnected partitions
- Multi-threading bug when creating projections via Cypher Aggregation or Arrow could lead to lost labels
- Node label filtering could lead to streamed node properties being null when filters are applied
- Cypher projections and Cypher aggregation would throw the wrong error message when loading an invalid relationship
- Node label filtering that would lead to the wrong results. This also affected: gds.beta.graphSage and gds.beta.graph.relationships.stream
Improvements
Arrow
- graph import now fully supports external node ids in the 64 Bit space.
- graph import now supports 16, 32 or 64 Bit node identifiers.
- Arrow server will now check user RBAC permissions for creating and accessing databases
- Database import now creates a Relationship Type index
Leiden
- Better parallelization and improved overall performance improvements
WCC
- Now supports a new and faster sampling strategy ( undirected and directed graphs) by using the new
inverse index.
Machine Learning
- Inner components of
pipelinefield returned bygds.pipeline.{ linkPrediction | nodeClassification | nodeRegression }.trainprocedures are now present directly as part ofmodelInfo. Thepipelinefield is now deprecated for removal in a future version.
Other Improvements
- Speed improvements for Dijkstra, Astar, Yens, CELF, weighted Betweenness Centrality, and the Spanning Tree algorithms. The improvements will see a slight increase in the memory consumption of these algorithms.
- Improved error message for invalid node labels and relationship types
- Pregel now supports bidirectional computations (allows for messages to be sent along incoming relationships) using the new
inverse index. - The procedure
gds.graph.exportnow creates a Relationship Type index - Extended node property validation to reject projection configuration mappings with the same property keys, but different default values.
Other changes
- Histograms returned such as
degreeDistributioningds.graph.listcan have slightly different values for specific percentiles due to changes in floating point operations. - Progress tracking in the Spanning Tree algorithm has been reworked. Progress reporting may differ from earlier versions.
- Mark the yielded field
schemaas deprecated ingds.graph.listandgds.graph.drop. In the next major release, theschemafield will use the semantics ofschemaWithOrientation - In
gds.alpha.model.store, the positional argumentfailIfUnsupportedTypeis renamed tofailIfUnsupported. Both will be supported until it is promoted to the beta tier. - Progress tracking for Betweenness Centrality has been reworked. Progress reporting may differ from earlier versions.
Pre-release changes
- The Steiner Tree procedures in
gds.beta.SteinerTreewas originally introduced asgds.alpha.SteinerTree. The update in naming occurred in 2.3.0-alpha04.
Graph Data Science 2.2.7
GDS 2.2.7 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9) & 4.3 versions (≥ 4.3.15) Database.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
New features
- Added compatibility for Neo4j database 5.4.0.
Bug fixes
- Missing id fields in the Arrow records for the
CREATE_DATABASEaction would throw aNullPointerException. It now throws a more descriptive exception instead. - Graphs with long node or relationship property names would fail during the restore process.
- Yens algorithm would ignore edges in multigraphs and yield incorrect results.
- Multi-threading bug when creating projections via Cypher Aggregation or Arrow could lead to lost labels.
- Node label filtering could lead to streamed node properties being null when filters are applied.
- Cypher projections and Cypher aggregation would throw the wrong error message when loading an invalid relationship.
- Node label filtering that would lead to the wrong results. This also affected:
gds.beta.graphSageandgds.beta.graph.relationships.stream.
Graph Data Science 2.3.0-Alpha04
GDS 2.3.0-alpha04 is compatible with Neo4j 5 & 4.4 versions (≥ 4.4.9) & 4.3 versions (≥ 4.3.15) Database.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
Breaking changes
- Leiden was promoted to the beta tier. It is now called via the 'gds.beta.leiden' command instead of the
gds.alpha.leidencommand. - K-means was promoted to the beta tier. It is now called via the
gds.beta.kmeanscommand instead of thegds.alpha.kmeanscommand. - Minimum weighted spanning tree algorithm was promoted to the beta tier. It is now called via the
gds.beta.spanningTreecommand instead ofgds.alpha.spanningTree- The procedures
gds.alpha.spanningTree.minimumandgds.alpha.spanningTree.maximumhave been removed. You can get the same behaviour by specifying the new parameterobjectiveingds.beta.spanningTree. - The
weightWritePropertyhas been removed as a configuration parameter. To supply the Relationship Type and Property for the produced relationship, use:mutateRelationshipTypemutateProperty
gds.alpha.spanningTree.kminandgds.alpha.spanningTree.kmaxhave been removed as the K-Spanning Tree algorithm has been moved in its own spacegds.alpha.kSpanningTree- The parameter
startNodeIdin all Spanning Tree algorithms has been replaced withsourceNode.
- The procedures
- Arrow: when projecting graphs,
nullwill be translated toNaNfor floating point values. This enables users of either the GDS Python Client or PyArrow to loadNaNproperties stored in Pandas DataFrames - Cypher Aggregations will become the primary surface for creating projections with Cypher. Offering a more intuitive and expressive interface than Cypher Projections that can also be used in Fabric or Composite Database setups.
- The algorithm
gds.alpha.influenceMaximization.greedyhas been removed. It's replacement is the already existinggds.beta.influenceMaximization.celfalgorithm which has the same configuration parameters and offers better performance.
New features
Minimum Directed Steiner Tree
- Added heuristic for minimum directed Steiner Tree under the
gds.beta.steinerTreedomain.- Added
statsmode withgds.beta.steinerTree.stats - Added
streammode withgds.beta.steinerTree.stream - Added
mutatemode withgds.beta.steinerTree.mutate - Added
writemode withgds.beta.steinerTree.write - Now available in progress tracking -
gds.list.progress()
- Added
Leiden
- New parameter
consecutiveIdsthat assigns consecutive ids for the discovered communities. - New parameter
seedPropertyto seed initial communities for nodes. - New parameter
toleranceto enable convergence criteria based on difference in modularity from one iteration to another. - Now available in progress tracking -
gds.list.progress() - Added memory estimation mode:
gds.beta.leiden.mutate.estimategds.beta.leiden.stats.estimategds.beta.leiden.stream.estimategds.beta.leiden.write.estimate
Logistic Regression & MLP
- New configuration parameters
classWeightsandfocusWeightfor training methods, supported by procedures:gds.beta.pipeline.nodeClassification.addLogisticRegressiongds.beta.pipeline.nodeClassification.addMLPgds.beta.pipeline.linkPrediction.addLogisticRegressiongds.beta.pipeline.linkPrediction.addMLP
HashGNN
- New algorithm
gds.alpha.hashgnn.{mutate,stream}to create HashGNN node embeddings - New procedures
gds.alpha.hashgnn.{mutate,stream}.estimateto estimate the memory required to run HashGNN
Link Prediction
- Added new optional configuration parameter
negativeRelationshipTypetogds.beta.pipeline.linkPrediction.configureSplit
Spanning Tree
- New modes supported:
gds.beta.spanningTree.(stats, stream, mutate) - New yield output for
gds.beta.spanningTreethat outputs the sum of weights in the discovered spanning tree. - New yield output for
gds.beta.spanningTreethat outputs the number of relationships written or added for write and mutate mode respectively. - Added memory estimation mode :
gds.beta.spanningTree.stream.estimategds.beta.spanningTree.mutate.estimategds.beta.spanningTree.stats.estimategds.beta.spanningTree.write.estimate
Write Labels
- Added
gds.alpha.graph.nodeLabel.writeto allow for Node Labels to be written back from projections to a Neo4j Database
Graph Projections
- Arrow now supports specifying undirected relationship types using the
undirected_relationship_typesconfiguration argument - Cypher Aggregations (
gds.alpha.graph.project) now support specifying undirected relationship types using theundirectedRelationshipTypesconfiguration option - New procedure to turn directed relationships into undirected relationships:
gds.beta.graph.relationships.toUndirect
Administration
- Added the
jobIdandusernameto theongoingGdsProceduresreturn field ofgds.alpha.systemMonitor. - Added username as a new return field to
gds.beta.listProgress. - Added a new return field to
gds.graph.listcalledschemaWithOrientationwhich also includes the orientation. - Administrators can now see all running tasks from all users with
gds.beta.listProgress
Bug fixes
- Minimum Weighted Spanning Tree: Graphs with parallel edges could make the discovered tree have wrong weights on relationships
- Cypher Aggregations: When using
gds.alpha.graph.project:- The projected graph would list relationship types with zero relationships
- AIOOB exceptions could surface due to sizing errors
- Arrow:
CREATE_DATABASEaction would throw a NPE if missing id fields in Arrow record.. A more descriptive exception is provided
Improvements
Arrow
- graph import now fully supports external node ids in the 64 Bit space.
- graph import now supports 16, 32 or 64 Bit node identifiers.
Leiden
- Better parallelization and improved overall performance improvements
Other Improvements
- Speed improvements for Dijkstra, Astar, Yens, CELF, weighted Betweenness Centrality, and the Spanning Tree algorithms. The improvements will see a slight increase in the memory consumption of these algorithms.
- Improved error message for invalid node labels and relationship types
Other changes
- Histograms returned such as
degreeDistributioningds.graph.listcan have slightly different values for specific percentiles due to changes in floating point operations. - Progress tracking in the Spanning Tree algorithm has been reworked. Progress reporting may differ from earlier versions.
- Mark the yielded field
schemaas deprecated ingds.graph.listandgds.graph.drop. In the next major release, theschemafield will use the semantics ofschemaWithOrientation - In
gds.alpha.model.store, the positional argument failIfUnsupportedType is renamed to failIfUnsupported. Both will be supported until it is promoted to the beta tier. - Progress tracking for Betweenness Centrality has been reworked. Progress reporting may differ from earlier versions.
Graph Data Science 2.2.6
Neo4j Graph Data Science 2.2.6 is compatible with Neo4j Database 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For Neo4j Graph Data Science compatibility, please use the Neo4j Compatibility Matrix.
Improvements
- Added support for Neo4j Database 5.3
Graph Data Science 2.3.0-alpha03
GDS 2.3.0-alpha03 is compatible with Neo4j 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
Breaking changes
- Leiden promoted to the beta tier. It is now called via the 'gds.beta.leiden' command instead of the
gds.alpha.leidencommand. - K-means is promoted to the beta tier. It is now called via the
gds.beta.kmeanscommand instead of thegds.alpha.kmeanscommand. - The parameter
startNodeIdin Spanning Tree algorithms have been replaced withsourceNode. - The minimum weighted spanning tree algorithm is moved to beta. It is now called via the
gds.beta.spanningTreecommand instead ofgds.alpha.spanningTree- The procedures
gds.alpha.spanningTree.minimumandgds.alpha.spanningTree.maximumhave been removed. You can get the same behavior by specifying the new parameterobjectiveingds.beta.spanningTree.
- The procedures
New features
Minimum Directed Steiner Tree
- Added heuristic for minimum directed Steiner Tree under the
gds.alpha.steinerTreedomain.- Added
statsmode withgds.alpha.steinerTree.stats - Added
streammode withgds.alpha.steinerTree.stream - Added
mutatemode withgds.alpha.steinerTree.mutate - Added
writemode withgds.alpha.steinerTree.write
- Added
Leiden
- New parameter
consecutiveIdsthat assigns consecutive ids for the discovered communities. - New parameter
seedPropertyto seed initial communities for nodes. - New parameter
toleranceto enable convergence criteria based on difference in modularity from one iteration to another. - Now available in progress tracking -
gds.list.progress() - Added memory estimation mode:
gds.beta.leiden.mutate.estimategds.beta.leiden.stats.estimategds.beta.leiden.stream.estimategds.beta.leiden.write.estimate
Logistic Regression & MLP
- New configuration parameters
classWeightsandfocusWeightfor training methods, supported by procedures:gds.beta.pipeline.nodeClassification.addLogisticRegressiongds.beta.pipeline.nodeClassification.addMLPgds.beta.pipeline.linkPrediction.addLogisticRegressiongds.beta.pipeline.linkPrediction.addMLP
HashGNN
- New algorithm
gds.alpha.hashgnn.{mutate,stream}to create HashGNN node embeddings - New procedures
gds.alpha.hashgnn.{mutate,stream}.estimateto estimate the memory required to run HashGNN
Link Prediction
- Added new optional configuration parameter
negativeRelationshipTypetogds.beta.pipeline.linkPrediction.configureSplit
Spanning Tree
- New modes supported:
gds.alpha.spanningTree.(stats, stream, mutate) - New yield output for
gds.alpha.spanningTreethat outputs the sum of weights in the discovered spanning tree. - New yield output for
gds.alpha.spanningTreethat outputs the number of relationships written or added for write and mutate mode respectively. - Added memory estimation mode :
gds.alpha.spanningTree.stream.estimategds.alpha.spanningTree.mutate.estimategds.alpha.spanningTree.stats.estimategds.alpha.spanningTree.write.estimate
Write Labels
- Added
gds.alpha.graph.nodeLabel.writeto allow for Node Labels to be written back from projections to a Neo4j Database
Administration
- Added the
jobIdandusernameto theongoingGdsProceduresreturn field ofgds.alpha.systemMonitor. - Added username as a new return field to
gds.beta.listProgress. - Added a new return field to
gds.graph.listcalledschemaWithOrientationwhich also includes the orientation.
Bug fixes
- Fixed a bug in Minimum Weighted Spanning Tree on graphs with parallel edges where the discovered tree could have wrong weights.
Improvements
Arrow
- graph import now fully supports external node ids in the 64 Bit space.
- graph import now supports 16, 32 or 64 Bit node identifiers.
Leiden
- Better parallelization and improved overall performance improvements
Other Algorithms
- Speed improvements for Dijkstra, Astar, Yens, CELF, weighted Betweenness Centrality, and the Spanning Tree algorithms. The improvements will see a slight increase in the memory consumption of these algorithms.
Other changes
- Histograms returned such as
degreeDistributioningds.graph.listcan have slightly different values for specific percentiles due to changes in floating point operations. - Progress tracking in the Spanning Tree algorithm has been reworked. Progress reporting may differ from earlier versions.
- Mark the yielded field
schemaas deprecated ingds.graph.listandgds.graph.drop. In the next major release, theschemafield will use the semantics ofschemaWithOrientation
Graph Data Science 2.2.5
Neo4j Graph Data Science 2.2.5 is compatible with Neo4j Database 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For Neo4j Graph Data Science compatibility, please use the Neo4j Compatibility Matrix.
Bug Fixes
- Some functions would not work as expected with Neo4j 5.x versions
gds.alpha.linkprediction.adamicAdargds.alpha.linkprediction.commonNeighborsgds.alpha.linkprediction.resourceAllocationgds.alpha.linkprediction.totalNeighbors