Releases: neo4j/graph-data-science
Graph Data Science 2.5.1
New features
- Add support for Neo4j 5.13
Graph Data Science 2.5.0
Breaking changes
- Dropped support for earlier version of Neo4j 5, in particular 5.1, 5.2, 5.3, 5.4, and 5.5 are no longer supported and GDS is no longer compatible with those versions.
New features
Major
- Added new algorithms for directed acyclic graphs:
gds.dag.topologicalSort.streamgds.dag.longestPath.stream
- Deprecating
alphaandbetanamespace for procedures and algorithms, and improving many to production grade - see details in ‘Full list of procedure being promoted’
Minor
- Added procedure to retrieve the version of the installed GDS
CALL gds.version
- Add new procedure,
gds.license.stateto verify the license state of the Graph Data Science library. Also, analogous adding a new functiongds.isLicensed(). - Added memory estimation for modularity calculation via procedures
gds.modularity.[stream|stats].estimate - Added memory estimation for filtered KNN via procedures
gds.knn.filtered.[mutate|stream|stats|write].estimate - Added Stats and Write modes for Harmonic Closeness Centrality
- Added new procedures for SCC:
gds.scc.mutategds.scc.stats
- Added memory estimation to SCC:
gds.scc.stream.estimategds.scc.stats.estimategds.scc.mutate.estimategds.scc.write.estimate
- Added consecutiveIds parameter to
gds.sccprocedures to output the components in a consecutive id space. - Added memory estimation for Steiner Tree via procedures
gds.steinerTree.[mutate|stream|stats|write].estimate - Added stats mode for
gds.modularityOptimization
Bug fixes
- Fixed a bug that in logging progress of
Prepare Batchesin GraphSAGE training. - Fixed a bug where KNN would compute incorrect
EUCLIDEANsimilarity. - Fixed a bug where limits validation could potentially not be triggered with configuration settings passed by from specified defaults.
- Fixed a bug where
gds.graph.filterwould list a relationship type of__ALL__even if all relationships were filtered out. - Fixed a bug where Triangle Count could compute an incorrect number of triangles when the
maxDegreeparameter was specified. - Fixed a bug where Triangle Count could compute an incorrect number of triangles when multiple relationship types are specified.
Improvements
- The random graph generation procedure now will return a different graph each time
gds.beta.graph.generateis called without specifying a random seed. Furthermore, when the seed is specified, the resulting graph will always have the same topology. - It is now possible to specify common node labels when importing nodes via arrow.
- A better error message is thrown when encountering null values in the
nodeLabelscolumn when importing nodes via arrow. - Added the configuration option
listNodeLabelsfor the node property stream procedures that will trigger listing all node labels for the respective node. - Added the configuration option
list_node_labelsfor the node property stream arrow endpoints that will trigger listing all node labels for the respective node. - The Cypher projection now returns the executing query as part of the projection result as well as part of the
gds.graph.listoutput. - Support passing
startNodestogds.graph.sample.cnarwas node objects instead of only node ids. - Support passing
nodeIdtogds.util.nodePropertyas node objects instead of only node id. - Improved validation for relationship projections: If a global
SUM,MIN,MAXorCOUNTaggregation is defined, there needs to be at least one property mapping. - HITS algorithm procedures have a default
hitsIterationsvalue of 20 - More accurate progress tracking for the
gds.sccalgorithm. - The
componentDistributionandcommunityDistributionparameters now also include thep1, p5,p10, p25percentiles. This affects algorithms in theCommunity Detectioncategory.
Full list of procedure being promoted
- Promoting Model Catalog procedures:
gds.beta.model.drop, deprecated bygds.model.drop- Return column
sharedrenamed topublished modelName,modelTypeextracted to separate return columns
- Return column
gds.beta.model.exists, deprecated bygds.model.existsgds.beta.model.list, deprecated bygds.model.list- Return column
sharedrenamed topublished modelName,modelTypeextracted to separate return columns
- Return column
gds.alpha.model.delete, deprecated bygds.model.deletegds.alpha.model.load, deprecated bygds.model.loadgds.alpha.model.publish, deprecated bygds.model.publish- Return column
sharedrenamed topublished modelName,modelTypeextracted to separate return columns
- Return column
gds.alpha.model.store, deprecated bygds.model.store
- Promoting Pipeline Catalog procedures:
gds.beta.pipeline.drop, deprecated bygds.pipeline.dropgds.beta.pipeline.exists, deprecated bygds.pipeline.existsgds.beta.pipeline.list, deprecated bygds.pipeline.list- Procedure
gds.alpha.systemMonitoris deprecated bygds.systemMonitor - Procedure
gds.beta.listProgressis deprecated bygds.listProgress - Procedure
gds.alpha.trianglesis deprecated bygds.triangles
- Deprecating
gds.beta.steinerTreeproceduresgds.beta.steinerTree.mutate, deprecated bygds.steinerTree.mutategds.beta.steinerTree.stats, deprecated bygds.steinerTree.statsgds.beta.steinerTree.stream, deprecated bygds.steinerTree.streamgds.beta.steinerTree.write, deprecated bygds.steinerTree.write
- Deprecating
gds.beta.spanningTreeproceduresgds.beta.spanningTree.mutate[.estimate], deprecated bygds.spanningTree.mutate[.estimate]gds.beta.spanningTree.stats[.estimate], deprecated bygds.spanningTree.stats[.estimate]gds.beta.spanningTree.stream[.estimate], deprecated bygds.spanningTree.stream[.estimate]gds.beta.spanningTree.write[.estimate], deprecated bygds.spanningTree.write[.estimate]
- Deprecating
gds.alpha.maxkcutproceduresgds.alpha.maxkcut.mutate[.estimate], deprecated bygds.maxkcut.mutate[.estimate]gds.alpha.maxkcut.stream[.estimate], deprecated bygds.maxkcut.stream[.estimate]
- Deprecating
gds.beta.closenessproceduresgds.beta.closeness.mutate, deprecated bygds.closeness.mutate- The
mutatePropertyfield has been removed, it can be accessed via theconfiguration.
- The
gds.beta.closeness.stats, deprecated bygds.closeness.statsgds.beta.closeness.stream, deprecated bygds.closeness.streamgds.beta.closeness.write, deprecated bygds.closeness.write- The
writePropertyfield has been removed, it can be accessed via theconfiguration.
- The
- Deprecating
gds.beta.leidenproceduresgds.beta.leiden.mutate[.estimate], deprecated bygds.leiden.mutate[.estimate]gds.beta.leiden.stats[.estimate], deprecated bygds.leiden.stats[.estimate]gds.beta.leiden.stream[.estimate], deprecated bygds.leiden.stream[.estimate]gds.beta.leiden.write[.estimate], deprecated bygds.leiden.write[.estimate]
- Deprecating
gds.alpha.conductanceproceduresgds.alpha.conductance.stream, deprecated bygds.conductance.stream
- Deprecating
gds.alpha.modularityproceduresgds.alpha.modularity.stream, deprecated bygds.modularity.streamgds.alpha.modularity.stats, deprecated bygds.modularity.stats
- Deprecating
gds.beta.modularityOptimizationproceduresgds.beta.modularityOptimization.stream[.estimate], deprecated bygds.modularityOptimization.stream[.estimate]gds.beta.modularityOptimization.stats[.estimate], deprecated bygds.modularityOptimization.stats[.estimate]gds.beta.modularityOptimization.stream[.estimate], deprecated bygds.modularityOptimization.stream[.estimate]gds.beta.modularityOptimization.stats[.estimate], deprecated bygds.modularityOptimization.stats[.estimate]
- Deprecating
gds.beta.influenceMaximization.celfproceduresgds.beta.influenceMaximization.celf.mutate[.estimate], deprecated bygds.influenceMaximization.celf.mutate[.estimate]gds.beta.influenceMaximization.celf.stats[.estimate], deprecated bygds.influenceMaximization.celf.stats[.estimate]gds.beta.influenceMaximization.celfstream[.estimate], deprecated bygds.influenceMaximization.celf.stream[.estimate]gds.beta.influenceMaximization.celf.write[.estimate], deprecated bygds.influenceMaximization.celf.write[.estimate]
- Deprecating
gds.alpha.knn.filteredproceduresgds.alpha.knn.filtered.mutate, deprecated bygds.knn.filtered.mutategds.alpha.knn.filtered.stats, deprecated bygds.knn.filtered.statsgds.alpha.knn.filtered.stream, deprecated bygds.knn.filtered.streamgds.alpha.knn.filtered.write, deprecated bygds.knn.filtered.write
- Deprecating
gds.alpha.nodeSimilarity.filteredproceduresgds.alpha.nodeSimilarity.filtered.mutate[.estimate], deprecated bygds.nodeSimilarity.filtered.mutate[.estimate]gds.alpha.nodeSimilarity.filtered.stats[.estimate], deprecated bygds.nodeSimilarity.filtered.stats[.estimate]gds.alpha.nodeSimilarity.filtered.stream[.estimate], deprecated bygds.nodeSimilarity.filtered.stream[.estimate]gds.alpha.nodeSimilarity.filtered.write[.estimate], deprecated bygds.nodeSimilarity.filtered.write[.estimate]
- Deprecating
gds.alpha.closeness.harmonicproceduresgds.alpha.closeness.harmonic.stream, deprecated bygds.closeness.harmonic.streamgds.alpha.closeness.harmonic.write, deprecated bygds.closeness.harmonic.write
- Deprecating
gds.beta.graph.relationshipsprocedures- `gds.beta.graph.relati...
Graph Data Science 2.4.6
neo4j-graph-data-science-2.4.6
New features
- Added compatibility with Neo4j database 5.12.0.
Bug fixes
- Fix a bug where HITS
writeandmutateprocedures failed to parse configuration.
2.4.5
neo4j-graph-data-science-2.4.5
Bug fixes
- Fix a bug in the triangle-related procedures with on graphs with multiple relationship types where triangles could be computed incorrectly. The following procedures are affected:
gds.triangleCount.[stream|mutate|write|stats]gds.localClusteringCoefficient.[stream|mutate|write|stats]gds.alpha.triangles
Graph Data Science 2.4.4
Bug fixes
- Fixed a bug where arrow processes that are automatically removed when they were aborted would not be properly cleaned up
Graph Data Science 2.4.3
Improvements
- Added COSINE as an available similarityMetric for the gds.nodeSimilarity procedure
- When exporting graphs to CSV or using backup and restore, a more diverse node label naming is now possible by using label mapping
Bug fixes
- Fixed a bug where array default values would not be serialized or deserialized to csv correctly
- Fixed an issue where Speaker-Listener LabelPropagation and other Pregel procedures wouldn’t stream or mutate on graphs that are not persisted in a Neo4j database
- Fixed a bug in graph restore on AuraDS, which was failing after shutdown when node label name contained special characters or underscores
Graph Data Science 2.4.1
Bug fixes
- Fix a bug in K-Core decomposition that can return invalid values if core values are not consecutive.
- Fix a bug when using
mutatePropertywhere using the same name as an existing node property could fail. Affected procedures include:gds.alpha.knn.filtered.mutategds.alpha.nodeSimilarity.filtered.mutategds.beta.pipeline.linkPrediction.predict.mutategds.beta.steinerTree.mutategds.beta.spanningTree.mutategds.knn.mutategds.nodeSimilarity.mutate
Improvements
- Improved error handling when negative node ids are used as input in the
sourceNode,targetNode,sourceNodes, andtargetNodesfields. - Improved performance when projecting in-memory graphs when projecting larger graphs.
Graph Data Science 2.4.0
Breaking changes
- Pass
concurrencywhen training a pipeline to the node property steps. Before they were executed with the default concurrency of4if not overridden. This affectsgds.beta.pipeline.linkPrediction.traingds.beta.pipeline.nodeClassification.traingds.alpha.pipeline.nodeClassification.train
New features
Major
- Added Bellman-Ford algorithm
- Added K-Core Decomposition algorithm
- Added new Common Neighbour Aware Random Walk graph sampling algorithm
- Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable
Minor
-
You can rename node properties when writing them back to the neo4j database using
gds.nodeProperties.writeby placing them inside a map in the formnodeProperty: 'renamedProperty'. -
Added
minCommunitySize|minComponentSizeparameter to more procedures to allow filtering the result. (Contributed by @airtyon) -
Added new procedure
gds.alpha.drop.cypherdbto drop created in-memory databases -
Added
upperDegreeCutoffparameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value. -
Added
aggregationtogds.beta.toUndirectedto allow the aggregation of the new undirected relationships. -
Added new optional parameter
storeModelToDiskthat automatically saves serializable models after training for licensed users. This affectsgds.beta.pipeline.[linkPrediction|nodeClassification].trainandgds.beta.graphSage.train. -
Added procedure
gds.graph.relationshipProperties.writethat allows writing relationships with multiple properties to Neo4j. -
Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
- The existing 'Cypher projection' (
gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The existing 'Cypher projection' (
- The procedure name is losing the
alphaqualifier and is now calledgds.graph.project. - The old name
gds.alpha.graph.projectis deprecated and usages will forward to the new name while also adapting to the new API. - The 4th and 5th parameters
nodeConfigandrelationshipConfighave been merged into a singledataConfigparameter. - The
propertiesconfiguration key in this mergeddataConfigparameter has been renamed torelationshipProperties. - The overall projection configuration (e.g.
readConcurrency) has moved from the 6th parameter to the 5th parameter.
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
-
Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the
FlightInfoendpoint.
Bug fixes
- Fixed: Arrow server doesn't enable to project graphs with blank names anymore
- Fixed: Arrow validates dangling relationships when creating an in-memory graph
- Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
- Fixed a bug where
gds.graph.exportcould fail when exporting larger graphs - Fixed a bug where
gds.alpha.kSpanningTreereturned incorrect results when called with thenodeLabelsparameter. - Fixed a bug where
gds.triangleCountwould throw an ArrayIndexOutOfBoundsException when called with thenodeLabelsparameter. - Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.
Improvements
Major
- Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
- Improved partitioning. This affects the parallel runtime of
gds.alpha.hits,gds.beta.graph.project.subgraphandgds.beta.pipeline.linkPrediction.predictifsampleRate = 0
Minor
- Improve progress tracking for
gds.beta.graphSage.train. This will enable progress bars on the python client. - Improve error message for invalid
nodeLabelsandrelationshipTypesfor procedures supporting memory estimation. - Allow running
gds.debug.sysInfoandgds.debug.arrowto run against the system database. - Improve automatic conversion of array property values during graph projection.
- The Yens algorithm can now be run in parallel.
- The node regression now verifies upfront that the all
targetPropertyvalues provided are valid when callinggds.alpha.pipeline.nodeRegression.train. - The scale properties algorithm has been promoted:
- Added new procedures
gds.scaleProperties.[stream,mutate]which replacegds.alpha.scaleProperties.[stream,mutate]that are now deprecated- The scalers
L1NormandL2Normare not supported in the new procedures.
- The scalers
- Added new procedures
gds.scaleProperties.[stats,write]to return statistics from a scale properties computation and write scaled properties back to a database respectively - Procedures
gds.scaleProperties.[mutate,stats,stream,write]support progress tracking with volumes. This will enable progress bars on the python client - Procedures
gds.scaleProperties.[mutate,stats,write]return statistics from the performed scale computation - Added new parameter
offsetto thelogscaler. This also affects procedures:gds.pageRankgds.eigenvectorgds.articleRank
- Added new procedures
gds.scaleProperties.[mutate|stats|stream|write].estimatefor estimating the memory requirements of running the scale properties algorithm - Nodes with missing properties (
nullorNaN) are now omitted in the scale computation. Their scale value is set toNaNin the output.
- Added new procedures
- Reduce the memory footprint of the binary embeddings saved by
gds.beta.hashgnn.mutate. - Promote random forest classifier to beta tier. Added
gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForestwhich replacegds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForestthat are now deprecated. - Reduced memory allocation for the Spanning Tree algorithm.
- A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
- Improve memory usage when projecting very large graphs with very high degree nodes.
- Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
- The import of nodes with negative id via arrow into a database is now forbidden.
- Graph restore now attempts to use the same id map implementation that has been used for the original graph.
- Setting the
useBadCollectoroption to true for the arrow database import will now actually trigger errors if the collector encountered a problem.
Graph Data Science 2.4.0 PREVIEW
Neo4j Graph Data Science version 2.4.0 is compatible with Neo4j version 4.4 and Neo4j versions 5.1 through 5.8.
Breaking changes
- Pass
concurrenywhen training a pipeline to the node property steps. Before they were executed with the default concurrency of4if not overridden. This affectsgds.beta.pipeline.linkPrediction.traingds.beta.pipeline.nodeClassification.traingds.alpha.pipeline.nodeClassification.train
New features
- You can rename node properties when writing them back to the neo4j database using
gds.nodeProperties.writeby placing them inside a map in the formnodeProperty: 'renamedProperty'. - Added
minCommunitySize|minComponentSizeparameter to more procedures to allow filtering the result. (Contributed by @airtyon) This includes:gds.wcc.streamgds.louvain.streamgds.labelPropagation.streamgds.beta.k1coloring.[stream|write]gds.beta.leiden.[stream|write]gds.beta.modularityOptimization.[stream|write]gds.alpha.maxkcut.stream
- Added new procedure
gds.alpha.drop.cypherdbto drop created in-memory databases - Added Bellman-Ford algorithm:
gds.bellmanFord.streamgds.bellmanFord.stream.estimategds.bellmanFord.statsgds.bellmanFord.stats.estimategds.bellmanFord.mutategds.bellmanFord.mutate.estimategds.bellmanFord.writegds.bellmanFord.write.estimate
- Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable. This affects
gds.alpha.model.storeandgds.alpha.model.load. - Added
upperDegreeCutoffparameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value. - Added
aggregationtogds.beta.toUndirectedto allow the aggregation of the new undirected relationships. - Added new optional parameter
storeModelToDiskthat automatically saves serializable models after training for licensed users. This affectsgds.beta.pipeline.[linkPrediction|nodeClassification].trainandgds.beta.graphSage.train. - Added K-Core Decomposition algorithm:
gds.kcore.statsgds.kcore.stats.estimategds.kcore.streamgds.kcore.stream.estimategds.kcore.mutategds.kcore.mutate.estimategds.kcore.writegds.kcore.write.estimate
- Added procedure
gds.graph.relationshipProperties.writethat allows writing relationships with multiple properties to Neo4j. - Added new Common Neighbour Aware Random Walk graph sampling algorithm
gds.graph.sample.cnarw. Available underbetatier. - Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
- The existing 'Cypher projection' (
gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The existing 'Cypher projection' (
- The procedure name is losing the
alphaqualifier and is now calledgds.graph.project. - The old name
gds.alpha.graph.projectis deprecated and usages will forward to the new name while also adapting to the new API. - The 4th and 5th parameters
nodeConfigandrelationshipConfighave been merged into a singledataConfigparameter. - The
propertiesconfiguration key in this mergeddataConfigparameter has been renamed torelationshipProperties. - The overall projection configuration (e.g.
readConcurrency) has moved from the 6th parameter to the 5th parameter.
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
Bug fixes
- Fixed: Arrow server doesn't enable to project graphs with blank names anymore
- Fixed: Arrow validates dangling relationships when creating an in-memory graph.
Improvements
- Improve progress tracking for
gds.beta.graphSage.train. This will enable progress bars on the python client. - Improve error message for invalid
nodeLabelsandrelationshipTypesfor procedures supporting memory estimation. - Allow running
gds.debug.sysInfoandgds.debug.arrowto run against the system database. - Improve automatic conversion of array property values during graph projection.
- The Yens algorithm can now be run in parallel.
- The node regression now verifies upfront that the all
targetPropertyvalues provided are valid when callinggds.alpha.pipeline.nodeRegression.train. - The scale properties algorithm has been promoted:
- Added new procedures
gds.scaleProperties.[stream,mutate]which replacegds.alpha.scaleProperties.[stream,mutate]that are now deprecated- The scalers
L1NormandL2Normare not supported in the new procedures.
- The scalers
- Added new procedures
gds.scaleProperties.[stats,write]to return statistics from a scale properties computation and write scaled properties back to a database respectively - Procedures
gds.scaleProperties.[mutate,stats,stream,write]support progress tracking with volumes. This will enable progress bars on the python client - Procedures
gds.scaleProperties.[mutate,stats,write]return statistics from the performed scale computation - Added new parameter
offsetto thelogscaler. This also affects procedures:gds.pageRankgds.eigenvectorgds.articleRank
- Added new procedures
gds.scaleProperties.[mutate|stats|stream|write].estimatefor estimating the memory requirements of running the scale properties algorithm - Nodes with missing properties (
nullorNaN) are now omitted in the scale computation. Their scale value is set toNaNin the output.
- Added new procedures
- Reduce the memory footprint of the binary embeddings saved by
gds.beta.hashgnn.mutate. - Promote random forest classifier to beta tier. Added
gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForestwhich replacegds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForestthat are now deprecated. - Reduced memory allocation for the Spanning Tree algorithm.
- A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
- Improve runtime of
gds.alpha.hitsfor concurrency > 1 due to a better partitioning. - Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
- Improve parallel runtime of
gds.beta.graph.project.subgraphwhen filtering relationships due to a better partitioning. - Improve parallel runtime of
gds.beta.pipeline.linkPrediction.predictifsampleRate = 0due to a better partitioning. - Improve memory usage when projecting very large graphs with very high degree nodes.
- Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.