Graph Data Science 2.4.0
Breaking changes
- Pass
concurrencywhen training a pipeline to the node property steps. Before they were executed with the default concurrency of4if not overridden. This affectsgds.beta.pipeline.linkPrediction.traingds.beta.pipeline.nodeClassification.traingds.alpha.pipeline.nodeClassification.train
New features
Major
- Added Bellman-Ford algorithm
- Added K-Core Decomposition algorithm
- Added new Common Neighbour Aware Random Walk graph sampling algorithm
- Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable
Minor
-
You can rename node properties when writing them back to the neo4j database using
gds.nodeProperties.writeby placing them inside a map in the formnodeProperty: 'renamedProperty'. -
Added
minCommunitySize|minComponentSizeparameter to more procedures to allow filtering the result. (Contributed by @airtyon) -
Added new procedure
gds.alpha.drop.cypherdbto drop created in-memory databases -
Added
upperDegreeCutoffparameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value. -
Added
aggregationtogds.beta.toUndirectedto allow the aggregation of the new undirected relationships. -
Added new optional parameter
storeModelToDiskthat automatically saves serializable models after training for licensed users. This affectsgds.beta.pipeline.[linkPrediction|nodeClassification].trainandgds.beta.graphSage.train. -
Added procedure
gds.graph.relationshipProperties.writethat allows writing relationships with multiple properties to Neo4j. -
Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
- The existing 'Cypher projection' (
gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The existing 'Cypher projection' (
- The procedure name is losing the
alphaqualifier and is now calledgds.graph.project. - The old name
gds.alpha.graph.projectis deprecated and usages will forward to the new name while also adapting to the new API. - The 4th and 5th parameters
nodeConfigandrelationshipConfighave been merged into a singledataConfigparameter. - The
propertiesconfiguration key in this mergeddataConfigparameter has been renamed torelationshipProperties. - The overall projection configuration (e.g.
readConcurrency) has moved from the 6th parameter to the 5th parameter.
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
-
Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the
FlightInfoendpoint.
Bug fixes
- Fixed: Arrow server doesn't enable to project graphs with blank names anymore
- Fixed: Arrow validates dangling relationships when creating an in-memory graph
- Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
- Fixed a bug where
gds.graph.exportcould fail when exporting larger graphs - Fixed a bug where
gds.alpha.kSpanningTreereturned incorrect results when called with thenodeLabelsparameter. - Fixed a bug where
gds.triangleCountwould throw an ArrayIndexOutOfBoundsException when called with thenodeLabelsparameter. - Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.
Improvements
Major
- Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
- Improved partitioning. This affects the parallel runtime of
gds.alpha.hits,gds.beta.graph.project.subgraphandgds.beta.pipeline.linkPrediction.predictifsampleRate = 0
Minor
- Improve progress tracking for
gds.beta.graphSage.train. This will enable progress bars on the python client. - Improve error message for invalid
nodeLabelsandrelationshipTypesfor procedures supporting memory estimation. - Allow running
gds.debug.sysInfoandgds.debug.arrowto run against the system database. - Improve automatic conversion of array property values during graph projection.
- The Yens algorithm can now be run in parallel.
- The node regression now verifies upfront that the all
targetPropertyvalues provided are valid when callinggds.alpha.pipeline.nodeRegression.train. - The scale properties algorithm has been promoted:
- Added new procedures
gds.scaleProperties.[stream,mutate]which replacegds.alpha.scaleProperties.[stream,mutate]that are now deprecated- The scalers
L1NormandL2Normare not supported in the new procedures.
- The scalers
- Added new procedures
gds.scaleProperties.[stats,write]to return statistics from a scale properties computation and write scaled properties back to a database respectively - Procedures
gds.scaleProperties.[mutate,stats,stream,write]support progress tracking with volumes. This will enable progress bars on the python client - Procedures
gds.scaleProperties.[mutate,stats,write]return statistics from the performed scale computation - Added new parameter
offsetto thelogscaler. This also affects procedures:gds.pageRankgds.eigenvectorgds.articleRank
- Added new procedures
gds.scaleProperties.[mutate|stats|stream|write].estimatefor estimating the memory requirements of running the scale properties algorithm - Nodes with missing properties (
nullorNaN) are now omitted in the scale computation. Their scale value is set toNaNin the output.
- Added new procedures
- Reduce the memory footprint of the binary embeddings saved by
gds.beta.hashgnn.mutate. - Promote random forest classifier to beta tier. Added
gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForestwhich replacegds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForestthat are now deprecated. - Reduced memory allocation for the Spanning Tree algorithm.
- A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
- Improve memory usage when projecting very large graphs with very high degree nodes.
- Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
- The import of nodes with negative id via arrow into a database is now forbidden.
- Graph restore now attempts to use the same id map implementation that has been used for the original graph.
- Setting the
useBadCollectoroption to true for the arrow database import will now actually trigger errors if the collector encountered a problem.