Skip to content

Commit f57dfd8

Browse files
Add example for standalone sessions
Co-authored-by: Rafal Skolasinski <[email protected]>
1 parent fc3ad15 commit f57dfd8

File tree

4 files changed

+640
-1
lines changed

4 files changed

+640
-1
lines changed

doc/modules/ROOT/pages/graph-analytics-serverless.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ The process of populating the session with data is called _remote projection_.
1717
Once populated, a GDS Session can run GDS workloads, such as algorithms and machine learning models.
1818
Results from these computations can be written back to the original source, using _remote write-back_ in the Attached and Self-managed types.
1919

20-
TIP: For ready-to-run notebooks, see our tutorials on GDS Sessions for xref:tutorials/graph-analytics-serverless.adoc[AuraDB] and xref:tutorials/graph-analytics-serverless-self-managed[self-managed databases].
20+
TIP: For ready-to-run notebooks, see our tutorials on GDS Sessions for xref:tutorials/graph-analytics-serverless.adoc[AuraDB], xref:tutorials/graph-analytics-serverless-self-managed.adoc[self-managed databases], and xref:tutorials/graph-analytics-serverless-standalone.adoc[any other data source].
2121

2222

2323
== GDS Session management
Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,264 @@
1+
// DO NOT EDIT - AsciiDoc file generated automatically
2+
3+
= Graph Analytics Serverless for any data source
4+
5+
6+
https://colab.research.google.com/github/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless-standalone.ipynb[image:https://colab.research.google.com/assets/colab-badge.svg[Open
7+
In Colab]]
8+
9+
10+
This Jupyter notebook is hosted
11+
https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless-standalone.ipynb[here]
12+
in the Neo4j Graph Data Science Client Github repository.
13+
14+
The notebook shows how to use the `graphdatascience` Python library to
15+
create, manage, and use a GDS Session.
16+
17+
We consider a graph of people and fruits, which we’re using as a simple
18+
example to show how to load data from Pandas DataFrames to a GDS
19+
Session, run algorithms, and inspect the results. We will cover all
20+
management operations: creation, listing, and deletion.
21+
22+
If you are using AuraDB, follow link:../graph-analytics-serverless[this
23+
example]. If you are using a self-managed Neo4j instance, follow
24+
link:../graph-analytics-serverless-self-managed[this example].
25+
26+
== Prerequisites
27+
28+
This notebook requires having a Neo4j instance instance available and
29+
that the Graph Analytics Serverless
30+
https://neo4j.com/docs/aura/graph-analytics/#aura-gds-serverless[feature]
31+
is enabled for your Neo4j Aura project.
32+
33+
We also need to have the `graphdatascience` Python library installed,
34+
version `1.15` or later.
35+
36+
[source, python, role=no-test]
37+
----
38+
%pip install "graphdatascience>=1.15"
39+
----
40+
41+
== Aura API credentials
42+
43+
A GDS Session is managed via the Aura API. In order to use the Aura API,
44+
we need to have
45+
https://neo4j.com/docs/aura/platform/api/authentication/#_creating_credentials[Aura
46+
API credentials].
47+
48+
Using these credentials, we can create our `GdsSessions` object, which
49+
is the main entry point for managing GDS Sessions.
50+
51+
[source, python, role=no-test]
52+
----
53+
import os
54+
55+
from graphdatascience.session import AuraAPICredentials, GdsSessions
56+
57+
client_id = os.environ["AURA_API_CLIENT_ID"]
58+
client_secret = os.environ["AURA_API_CLIENT_SECRET"]
59+
60+
# If your account is a member of several projects, you must also specify the project ID to use
61+
project_id = os.environ.get("AURA_API_PROJECT_ID", None)
62+
63+
sessions = GdsSessions(api_credentials=AuraAPICredentials(client_id, client_secret, project_id=project_id))
64+
----
65+
66+
== Creating a new session
67+
68+
A new session is created by calling `sessions.get++_++or++_++create()`.
69+
As the data source, we assume that a self-managed Neo4j DBMS instance
70+
has been set up and is accessible. We need to pass the database address,
71+
user name and password to the `DbmsConnectionInfo` class.
72+
73+
We also need to specify the session size. Please refer to the API
74+
reference documentation or the manual for a full list.
75+
76+
Finally, we need to give our session a name. We will call ours
77+
`people-and-fruits-sm'. It is possible to reconnect to an existing session by calling`get++_++or++_++create++`++
78+
with the same session name and configuration.
79+
80+
We will also set a time-to-live (TTL) for the session. This ensures that
81+
our session is automatically deleted after being unused for 30 minutes.
82+
This is a good practice to avoid incurring costs should we forget to
83+
delete the session ourselves.
84+
85+
[source, python, role=no-test]
86+
----
87+
from graphdatascience.session import AlgorithmCategory, SessionMemory
88+
89+
# Explicitly define the size of the session
90+
memory = SessionMemory.m_4GB
91+
92+
# Estimate the memory needed for the GDS session
93+
memory = sessions.estimate(
94+
node_count=20,
95+
relationship_count=50,
96+
algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
97+
)
98+
99+
print(f"Estimated memory: {memory}")
100+
101+
# Find out and specify where to create the GDS session
102+
cloud_locations = sessions.available_cloud_locations()
103+
print(f"Available locations: {cloud_locations}")
104+
cloud_location = cloud_locations[0]
105+
----
106+
107+
[source, python, role=no-test]
108+
----
109+
# Create a GDS session!
110+
gds = sessions.get_or_create(
111+
# we give it a representative name
112+
session_name="people-and-fruits-standalone",
113+
memory=memory,
114+
cloud_location=cloud_location,
115+
)
116+
----
117+
118+
== Listing sessions
119+
120+
Now that we have created a session, let’s list all our sessions to see
121+
what that looks like
122+
123+
[source, python, role=no-test]
124+
----
125+
from pandas import DataFrame
126+
127+
gds_sessions = sessions.list()
128+
129+
# for better visualization
130+
DataFrame(gds_sessions)
131+
----
132+
133+
== Adding a dataset
134+
135+
We assume that the configured Neo4j database instance is empty. We will
136+
add our dataset using standard Cypher.
137+
138+
In a more realistic scenario, this step is already done, and we would
139+
just connect to the existing database.
140+
141+
[source, python, role=no-test]
142+
----
143+
import pandas as pd
144+
145+
people_df = pd.DataFrame(
146+
[
147+
{"nodeId": 0, "name": "Dan", "age": 18, "experience": 63, "hipster": 0},
148+
{"nodeId": 1, "name": "Annie", "age": 12, "experience": 5, "hipster": 0},
149+
{"nodeId": 2, "name": "Matt", "age": 22, "experience": 42, "hipster": 0},
150+
{"nodeId": 3, "name": "Jeff", "age": 51, "experience": 12, "hipster": 0},
151+
{"nodeId": 4, "name": "Brie", "age": 31, "experience": 6, "hipster": 0},
152+
{"nodeId": 5, "name": "Elsa", "age": 65, "experience": 23, "hipster": 0},
153+
{"nodeId": 6, "name": "Bobby", "age": 38, "experience": 4, "hipster": 1},
154+
{"nodeId": 7, "name": "John", "age": 4, "experience": 100, "hipster": 0},
155+
]
156+
)
157+
people_df["labels"] = "Person"
158+
159+
fruits_df = pd.DataFrame(
160+
[
161+
{"nodeId": 8, "name": "Apple", "tropical": 0, "sourness": 0.3, "sweetness": 0.6},
162+
{"nodeId": 9, "name": "Banana", "tropical": 1, "sourness": 0.1, "sweetness": 0.9},
163+
{"nodeId": 10, "name": "Mango", "tropical": 1, "sourness": 0.3, "sweetness": 1.0},
164+
{"nodeId": 11, "name": "Plum", "tropical": 0, "sourness": 0.5, "sweetness": 0.8},
165+
]
166+
)
167+
fruits_df["labels"] = "Fruit"
168+
169+
like_relationships = [(0, 8), (1, 9), (2, 10), (3, 10), (4, 9), (5, 11), (7, 11)]
170+
likes_df = pd.DataFrame([{"sourceNodeId": src, "targetNodeId": trg} for (src, trg) in like_relationships])
171+
likes_df["relationshipType"] = "LIKES"
172+
173+
knows_relationship = [(0, 1), (0, 2), (1, 2), (1, 3), (1, 4), (2, 5), (7, 3)]
174+
knows_df = pd.DataFrame([{"sourceNodeId": src, "targetNodeId": trg} for (src, trg) in knows_relationship])
175+
knows_df["relationshipType"] = "KNOWS"
176+
----
177+
178+
== Construct Graph from DataFrames
179+
180+
Now that we have imported a graph to our database, we create graphs
181+
directly from pandas `DataFrame` objects. We do that by using the
182+
`gds.graph.construct()` endpoint.
183+
184+
[source, python, role=no-test]
185+
----
186+
nodes = [people_df.drop(columns="name"), fruits_df.drop(columns="name")] # GDS does not support string properties
187+
relationships = [likes_df, knows_df]
188+
189+
G = gds.graph.construct("people-fruits", nodes, relationships)
190+
191+
192+
str(G)
193+
----
194+
195+
== Running Algorithms
196+
197+
We can now run algorithms on the constructed graph. This is done using
198+
the standard GDS Python Client API. There are many other tutorials
199+
covering some interesting things we can do at this step, so we will keep
200+
it rather brief here.
201+
202+
We will simply run PageRank and FastRP on the graph.
203+
204+
[source, python, role=no-test]
205+
----
206+
print("Running PageRank ...")
207+
pr_result = gds.pageRank.mutate(G, mutateProperty="pagerank")
208+
print(f"Compute millis: {pr_result['computeMillis']}")
209+
print(f"Node properties written: {pr_result['nodePropertiesWritten']}")
210+
print(f"Centrality distribution: {pr_result['centralityDistribution']}")
211+
212+
print("Running FastRP ...")
213+
frp_result = gds.fastRP.mutate(
214+
G,
215+
mutateProperty="fastRP",
216+
embeddingDimension=8,
217+
featureProperties=["pagerank"],
218+
propertyRatio=0.2,
219+
nodeSelfInfluence=0.2,
220+
)
221+
print(f"Compute millis: {frp_result['computeMillis']}")
222+
# stream back the results
223+
result = gds.graph.nodeProperties.stream(G, ["pagerank", "fastRP"], separate_property_columns=True)
224+
225+
print(result)
226+
----
227+
228+
To resolve the nodeIds to names, we can merge it back with the source
229+
data frames.
230+
231+
[source, python, role=no-test]
232+
----
233+
names = pd.concat([people_df, fruits_df])[["nodeId", "name"]]
234+
235+
result.merge(names, how="left")
236+
----
237+
238+
== Deleting the session
239+
240+
Now that we have finished our analysis, we can delete the session. The
241+
results that we produced were written back to our Neo4j database, and
242+
will not be lost. If we computed additional things that we did not write
243+
back, those will be lost.
244+
245+
Deleting the session will release all resources associated with it, and
246+
stop incurring costs.
247+
248+
[source, python, role=no-test]
249+
----
250+
# or gds.delete()
251+
sessions.delete(session_name="people-and-fruits-standalone")
252+
----
253+
254+
[source, python, role=no-test]
255+
----
256+
# let's also make sure the deleted session is truly gone:
257+
sessions.list()
258+
----
259+
260+
== Conclusion
261+
262+
And we’re done! We have created a GDS Session, projected a graph, run
263+
some algorithms, inspect the results, and deleted the session. This is a
264+
simple example, but it shows the main steps of using GDS Sessions.

doc/modules/ROOT/partials/tutorial-list.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,5 @@
88
* xref:tutorials/heterogeneous-node-classification-with-hashgnn.adoc[]
99
* xref:tutorials/kge-predict-transe-pyg-train.adoc[]
1010
* xref:tutorials/graph-analytics-serverless.adoc[]
11+
* xref:tutorials/graph-analytics-serverless-standalone.adoc[]
1112
* xref:tutorials/graph-analytics-serverless-self-managed.adoc[]

0 commit comments

Comments
 (0)