Skip to content

Graph Data Science 2.4.0

Choose a tag to compare

@gminneci gminneci released this 14 Jun 15:39
· 5461 commits to master since this release

Breaking changes

  • Pass concurrency when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
    • gds.beta.pipeline.linkPrediction.train
    • gds.beta.pipeline.nodeClassification.train
    • gds.alpha.pipeline.nodeClassification.train

New features

Major

  • Added Bellman-Ford algorithm
  • Added K-Core Decomposition algorithm
  • Added new Common Neighbour Aware Random Walk graph sampling algorithm
  • Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable

Minor

  • You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.

  • Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon)

  • Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases

  • Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.

  • Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.

  • Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.

  • Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.

  • Cypher Aggregation has graduated, which comes with a new name and API changes:

    • The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
      • The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
    • The procedure name is losing the alpha qualifier and is now called gds.graph.project.
    • The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
    • The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
    • The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
    • The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.
  • Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the FlightInfo endpoint.

Bug fixes

  • Fixed: Arrow server doesn't enable to project graphs with blank names anymore
  • Fixed: Arrow validates dangling relationships when creating an in-memory graph
  • Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
  • Fixed a bug where gds.graph.export could fail when exporting larger graphs
  • Fixed a bug where gds.alpha.kSpanningTree returned incorrect results when called with the nodeLabels parameter.
  • Fixed a bug where gds.triangleCount would throw an ArrayIndexOutOfBoundsException when called with the nodeLabels parameter.
  • Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.

Improvements

Major

  • Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
    • FastRP
    • HashGNN
    • Leiden
    • Approxmaxkcut
    • Conductance
    • LinkPrediction training
    • ToUndirected
  • Improved partitioning. This affects the parallel runtime of gds.alpha.hits, gds.beta.graph.project.subgraph and gds.beta.pipeline.linkPrediction.predict if sampleRate = 0

Minor

  • Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
  • Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
  • Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
  • Improve automatic conversion of array property values during graph projection.
  • The Yens algorithm can now be run in parallel.
  • The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
  • The scale properties algorithm has been promoted:
    • Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
      • The scalers L1Norm and L2Norm are not supported in the new procedures.
    • Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
    • Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
    • Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
    • Added new parameter offset to the log scaler. This also affects procedures:
      • gds.pageRank
      • gds.eigenvector
      • gds.articleRank
    • Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
    • Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
  • Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
  • Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
  • Reduced memory allocation for the Spanning Tree algorithm.
  • A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
  • Improve memory usage when projecting very large graphs with very high degree nodes.
  • Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
  • The import of nodes with negative id via arrow into a database is now forbidden.
  • Graph restore now attempts to use the same id map implementation that has been used for the original graph.
  • Setting the useBadCollector option to true for the arrow database import will now actually trigger errors if the collector encountered a problem.