Release Graph Data Science 2.4.0 · neo4j/graph-data-science

Breaking changes

Pass concurrency when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
- gds.beta.pipeline.linkPrediction.train
- gds.beta.pipeline.nodeClassification.train
- gds.alpha.pipeline.nodeClassification.train

New features

Major

Added Bellman-Ford algorithm
Added K-Core Decomposition algorithm
Added new Common Neighbour Aware Random Walk graph sampling algorithm
Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable

Minor

You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.
Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon)
Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases
Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.
Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.
Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.
Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.
Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
  - The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The procedure name is losing the alpha qualifier and is now called gds.graph.project.
- The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
- The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
- The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
- The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.
Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the FlightInfo endpoint.

Bug fixes

Fixed: Arrow server doesn't enable to project graphs with blank names anymore
Fixed: Arrow validates dangling relationships when creating an in-memory graph
Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
Fixed a bug where gds.graph.export could fail when exporting larger graphs
Fixed a bug where gds.alpha.kSpanningTree returned incorrect results when called with the nodeLabels parameter.
Fixed a bug where gds.triangleCount would throw an ArrayIndexOutOfBoundsException when called with the nodeLabels parameter.
Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.

Improvements

Major

Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
Improved partitioning. This affects the parallel runtime of gds.alpha.hits, gds.beta.graph.project.subgraph and gds.beta.pipeline.linkPrediction.predict if sampleRate = 0

Minor

Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
Improve automatic conversion of array property values during graph projection.
The Yens algorithm can now be run in parallel.
The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
The scale properties algorithm has been promoted:
- Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
  - The scalers L1Norm and L2Norm are not supported in the new procedures.
- Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
- Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
- Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
- Added new parameter offset to the log scaler. This also affects procedures:
  - gds.pageRank
  - gds.eigenvector
  - gds.articleRank
- Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
- Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
Reduced memory allocation for the Spanning Tree algorithm.
A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
Improve memory usage when projecting very large graphs with very high degree nodes.
Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
The import of nodes with negative id via arrow into a database is now forbidden.
Graph restore now attempts to use the same id map implementation that has been used for the original graph.
Setting the useBadCollector option to true for the arrow database import will now actually trigger errors if the collector encountered a problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graph Data Science 2.4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Breaking changes

New features

Major

Minor

Bug fixes

Improvements

Major

Minor

Contributors

Uh oh!