Add example for standalone sessions

FlorentinD · RafalSkolasinski · FlorentinD · commit f57dfd873446 · 2025-05-16T10:31:05.000+02:00
Co-authored-by: Rafal Skolasinski &lt;rafal.skolasinski@neo4j.com&gt;
diff --git a/doc/modules/ROOT/pages/graph-analytics-serverless.adoc b/doc/modules/ROOT/pages/graph-analytics-serverless.adoc
@@ -17,7 +17,7 @@ The process of populating the session with data is called _remote projection_.
 Once populated, a GDS Session can run GDS workloads, such as algorithms and machine learning models.
 Results from these computations can be written back to the original source, using _remote write-back_ in the Attached and Self-managed types.
 
-TIP: For ready-to-run notebooks, see our tutorials on GDS Sessions for xref:tutorials/graph-analytics-serverless.adoc[AuraDB] and xref:tutorials/graph-analytics-serverless-self-managed[self-managed databases].
+TIP: For ready-to-run notebooks, see our tutorials on GDS Sessions for xref:tutorials/graph-analytics-serverless.adoc[AuraDB], xref:tutorials/graph-analytics-serverless-self-managed.adoc[self-managed databases], and xref:tutorials/graph-analytics-serverless-standalone.adoc[any other data source].
 
 
 == GDS Session management
diff --git a/doc/modules/ROOT/pages/tutorials/graph-analytics-serverless-standalone.adoc b/doc/modules/ROOT/pages/tutorials/graph-analytics-serverless-standalone.adoc
@@ -0,0 +1,264 @@
+// DO NOT EDIT - AsciiDoc file generated automatically
+
+= Graph Analytics Serverless for any data source
+
+
+https://colab.research.google.com/github/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless-standalone.ipynb[image:https://colab.research.google.com/assets/colab-badge.svg[Open
+In Colab]]
+
+
+This Jupyter notebook is hosted
+https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless-standalone.ipynb[here]
+in the Neo4j Graph Data Science Client Github repository.
+
+The notebook shows how to use the `graphdatascience` Python library to
+create, manage, and use a GDS Session.
+
+We consider a graph of people and fruits, which we’re using as a simple
+example to show how to load data from Pandas DataFrames to a GDS
+Session, run algorithms, and inspect the results. We will cover all
+management operations: creation, listing, and deletion.
+
+If you are using AuraDB, follow link:../graph-analytics-serverless[this
+example]. If you are using a self-managed Neo4j instance, follow
+link:../graph-analytics-serverless-self-managed[this example].
+
+== Prerequisites
+
+This notebook requires having a Neo4j instance instance available and
+that the Graph Analytics Serverless
+https://neo4j.com/docs/aura/graph-analytics/#aura-gds-serverless[feature]
+is enabled for your Neo4j Aura project.
+
+We also need to have the `graphdatascience` Python library installed,
+version `1.15` or later.
+
+[source, python, role=no-test]
+----
+%pip install "graphdatascience>=1.15"
+----
+
+== Aura API credentials
+
+A GDS Session is managed via the Aura API. In order to use the Aura API,
+we need to have
+https://neo4j.com/docs/aura/platform/api/authentication/#_creating_credentials[Aura
+API credentials].
+
+Using these credentials, we can create our `GdsSessions` object, which
+is the main entry point for managing GDS Sessions.
+
+[source, python, role=no-test]
+----
+import os
+
+from graphdatascience.session import AuraAPICredentials, GdsSessions
+
+client_id = os.environ["AURA_API_CLIENT_ID"]
+client_secret = os.environ["AURA_API_CLIENT_SECRET"]
+
+# If your account is a member of several projects, you must also specify the project ID to use
+project_id = os.environ.get("AURA_API_PROJECT_ID", None)
+
+sessions = GdsSessions(api_credentials=AuraAPICredentials(client_id, client_secret, project_id=project_id))
+----
+
+== Creating a new session
+
+A new session is created by calling `sessions.get++_++or++_++create()`.
+As the data source, we assume that a self-managed Neo4j DBMS instance
+has been set up and is accessible. We need to pass the database address,
+user name and password to the `DbmsConnectionInfo` class.
+
+We also need to specify the session size. Please refer to the API
+reference documentation or the manual for a full list.
+
+Finally, we need to give our session a name. We will call ours
+`people-and-fruits-sm'. It is possible to reconnect to an existing session by calling`get++_++or++_++create++`++
+with the same session name and configuration.
+
+We will also set a time-to-live (TTL) for the session. This ensures that
+our session is automatically deleted after being unused for 30 minutes.
+This is a good practice to avoid incurring costs should we forget to
+delete the session ourselves.
+
+[source, python, role=no-test]
+----
+from graphdatascience.session import AlgorithmCategory, SessionMemory
+
+# Explicitly define the size of the session
+memory = SessionMemory.m_4GB
+
+# Estimate the memory needed for the GDS session
+memory = sessions.estimate(
+    node_count=20,
+    relationship_count=50,
+    algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
+)
+
+print(f"Estimated memory: {memory}")
+
+# Find out and specify where to create the GDS session
+cloud_locations = sessions.available_cloud_locations()
+print(f"Available locations: {cloud_locations}")
+cloud_location = cloud_locations[0]
+----
+
+[source, python, role=no-test]
+----
+# Create a GDS session!
+gds = sessions.get_or_create(
+    # we give it a representative name
+    session_name="people-and-fruits-standalone",
+    memory=memory,
+    cloud_location=cloud_location,
+)
+----
+
+== Listing sessions
+
+Now that we have created a session, let’s list all our sessions to see
+what that looks like
+
+[source, python, role=no-test]
+----
+from pandas import DataFrame
+
+gds_sessions = sessions.list()
+
+# for better visualization
+DataFrame(gds_sessions)
+----
+
+== Adding a dataset
+
+We assume that the configured Neo4j database instance is empty. We will
+add our dataset using standard Cypher.
+
+In a more realistic scenario, this step is already done, and we would
+just connect to the existing database.
+
+[source, python, role=no-test]
+----
+import pandas as pd
+
+people_df = pd.DataFrame(
+    [
+        {"nodeId": 0, "name": "Dan", "age": 18, "experience": 63, "hipster": 0},
+        {"nodeId": 1, "name": "Annie", "age": 12, "experience": 5, "hipster": 0},
+        {"nodeId": 2, "name": "Matt", "age": 22, "experience": 42, "hipster": 0},
+        {"nodeId": 3, "name": "Jeff", "age": 51, "experience": 12, "hipster": 0},
+        {"nodeId": 4, "name": "Brie", "age": 31, "experience": 6, "hipster": 0},
+        {"nodeId": 5, "name": "Elsa", "age": 65, "experience": 23, "hipster": 0},
+        {"nodeId": 6, "name": "Bobby", "age": 38, "experience": 4, "hipster": 1},
+        {"nodeId": 7, "name": "John", "age": 4, "experience": 100, "hipster": 0},
+    ]
+)
+people_df["labels"] = "Person"
+
+fruits_df = pd.DataFrame(
+    [
+        {"nodeId": 8, "name": "Apple", "tropical": 0, "sourness": 0.3, "sweetness": 0.6},
+        {"nodeId": 9, "name": "Banana", "tropical": 1, "sourness": 0.1, "sweetness": 0.9},
+        {"nodeId": 10, "name": "Mango", "tropical": 1, "sourness": 0.3, "sweetness": 1.0},
+        {"nodeId": 11, "name": "Plum", "tropical": 0, "sourness": 0.5, "sweetness": 0.8},
+    ]
+)
+fruits_df["labels"] = "Fruit"
+
+like_relationships = [(0, 8), (1, 9), (2, 10), (3, 10), (4, 9), (5, 11), (7, 11)]
+likes_df = pd.DataFrame([{"sourceNodeId": src, "targetNodeId": trg} for (src, trg) in like_relationships])
+likes_df["relationshipType"] = "LIKES"
+
+knows_relationship = [(0, 1), (0, 2), (1, 2), (1, 3), (1, 4), (2, 5), (7, 3)]
+knows_df = pd.DataFrame([{"sourceNodeId": src, "targetNodeId": trg} for (src, trg) in knows_relationship])
+knows_df["relationshipType"] = "KNOWS"
+----
+
+== Construct Graph from DataFrames
+
+Now that we have imported a graph to our database, we create graphs
+directly from pandas `DataFrame` objects. We do that by using the
+`gds.graph.construct()` endpoint.
+
+[source, python, role=no-test]
+----
+nodes = [people_df.drop(columns="name"), fruits_df.drop(columns="name")]  # GDS does not support string properties
+relationships = [likes_df, knows_df]
+
+G = gds.graph.construct("people-fruits", nodes, relationships)
+
+
+str(G)
+----
+
+== Running Algorithms
+
+We can now run algorithms on the constructed graph. This is done using
+the standard GDS Python Client API. There are many other tutorials
+covering some interesting things we can do at this step, so we will keep
+it rather brief here.
+
+We will simply run PageRank and FastRP on the graph.
+
+[source, python, role=no-test]
+----
+print("Running PageRank ...")
+pr_result = gds.pageRank.mutate(G, mutateProperty="pagerank")
+print(f"Compute millis: {pr_result['computeMillis']}")
+print(f"Node properties written: {pr_result['nodePropertiesWritten']}")
+print(f"Centrality distribution: {pr_result['centralityDistribution']}")
+
+print("Running FastRP ...")
+frp_result = gds.fastRP.mutate(
+    G,
+    mutateProperty="fastRP",
+    embeddingDimension=8,
+    featureProperties=["pagerank"],
+    propertyRatio=0.2,
+    nodeSelfInfluence=0.2,
+)
+print(f"Compute millis: {frp_result['computeMillis']}")
+# stream back the results
+result = gds.graph.nodeProperties.stream(G, ["pagerank", "fastRP"], separate_property_columns=True)
+
+print(result)
+----
+
+To resolve the nodeIds to names, we can merge it back with the source
+data frames.
+
+[source, python, role=no-test]
+----
+names = pd.concat([people_df, fruits_df])[["nodeId", "name"]]
+
+result.merge(names, how="left")
+----
+
+== Deleting the session
+
+Now that we have finished our analysis, we can delete the session. The
+results that we produced were written back to our Neo4j database, and
+will not be lost. If we computed additional things that we did not write
+back, those will be lost.
+
+Deleting the session will release all resources associated with it, and
+stop incurring costs.
+
+[source, python, role=no-test]
+----
+# or gds.delete()
+sessions.delete(session_name="people-and-fruits-standalone")
+----
+
+[source, python, role=no-test]
+----
+# let's also make sure the deleted session is truly gone:
+sessions.list()
+----
+
+== Conclusion
+
+And we’re done! We have created a GDS Session, projected a graph, run
+some algorithms, inspect the results, and deleted the session. This is a
+simple example, but it shows the main steps of using GDS Sessions.
diff --git a/doc/modules/ROOT/partials/tutorial-list.adoc b/doc/modules/ROOT/partials/tutorial-list.adoc
@@ -8,4 +8,5 @@
 * xref:tutorials/heterogeneous-node-classification-with-hashgnn.adoc[]
 * xref:tutorials/kge-predict-transe-pyg-train.adoc[]
 * xref:tutorials/graph-analytics-serverless.adoc[]
+* xref:tutorials/graph-analytics-serverless-standalone.adoc[]
 * xref:tutorials/graph-analytics-serverless-self-managed.adoc[]
diff --git a/examples/graph-analytics-serverless-standalone.ipynb b/examples/graph-analytics-serverless-standalone.ipynb