|
| 1 | +// DO NOT EDIT - AsciiDoc file generated automatically |
| 2 | + |
| 3 | += Graph Analytics Serverless for any data source |
| 4 | + |
| 5 | + |
| 6 | +https://colab.research.google.com/github/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless-standalone.ipynb[image:https://colab.research.google.com/assets/colab-badge.svg[Open |
| 7 | +In Colab]] |
| 8 | + |
| 9 | + |
| 10 | +This Jupyter notebook is hosted |
| 11 | +https://github.com/neo4j/graph-data-science-client/blob/main/examples/graph-analytics-serverless-standalone.ipynb[here] |
| 12 | +in the Neo4j Graph Data Science Client Github repository. |
| 13 | + |
| 14 | +The notebook shows how to use the `graphdatascience` Python library to |
| 15 | +create, manage, and use a GDS Session. |
| 16 | + |
| 17 | +We consider a graph of people and fruits, which we’re using as a simple |
| 18 | +example to show how to load data from Pandas DataFrames to a GDS |
| 19 | +Session, run algorithms, and inspect the results. We will cover all |
| 20 | +management operations: creation, listing, and deletion. |
| 21 | + |
| 22 | +If you are using AuraDB, follow link:../graph-analytics-serverless[this |
| 23 | +example]. If you are using a self-managed Neo4j instance, follow |
| 24 | +link:../graph-analytics-serverless-self-managed[this example]. |
| 25 | + |
| 26 | +== Prerequisites |
| 27 | + |
| 28 | +This notebook requires having a Neo4j instance instance available and |
| 29 | +that the Graph Analytics Serverless |
| 30 | +https://neo4j.com/docs/aura/graph-analytics/#aura-gds-serverless[feature] |
| 31 | +is enabled for your Neo4j Aura project. |
| 32 | + |
| 33 | +We also need to have the `graphdatascience` Python library installed, |
| 34 | +version `1.15` or later. |
| 35 | + |
| 36 | +[source, python, role=no-test] |
| 37 | +---- |
| 38 | +%pip install "graphdatascience>=1.15" |
| 39 | +---- |
| 40 | + |
| 41 | +== Aura API credentials |
| 42 | + |
| 43 | +A GDS Session is managed via the Aura API. In order to use the Aura API, |
| 44 | +we need to have |
| 45 | +https://neo4j.com/docs/aura/platform/api/authentication/#_creating_credentials[Aura |
| 46 | +API credentials]. |
| 47 | + |
| 48 | +Using these credentials, we can create our `GdsSessions` object, which |
| 49 | +is the main entry point for managing GDS Sessions. |
| 50 | + |
| 51 | +[source, python, role=no-test] |
| 52 | +---- |
| 53 | +import os |
| 54 | +
|
| 55 | +from graphdatascience.session import AuraAPICredentials, GdsSessions |
| 56 | +
|
| 57 | +client_id = os.environ["AURA_API_CLIENT_ID"] |
| 58 | +client_secret = os.environ["AURA_API_CLIENT_SECRET"] |
| 59 | +
|
| 60 | +# If your account is a member of several projects, you must also specify the project ID to use |
| 61 | +project_id = os.environ.get("AURA_API_PROJECT_ID", None) |
| 62 | +
|
| 63 | +sessions = GdsSessions(api_credentials=AuraAPICredentials(client_id, client_secret, project_id=project_id)) |
| 64 | +---- |
| 65 | + |
| 66 | +== Creating a new session |
| 67 | + |
| 68 | +A new session is created by calling `sessions.get++_++or++_++create()`. |
| 69 | +As the data source, we assume that a self-managed Neo4j DBMS instance |
| 70 | +has been set up and is accessible. We need to pass the database address, |
| 71 | +user name and password to the `DbmsConnectionInfo` class. |
| 72 | + |
| 73 | +We also need to specify the session size. Please refer to the API |
| 74 | +reference documentation or the manual for a full list. |
| 75 | + |
| 76 | +Finally, we need to give our session a name. We will call ours |
| 77 | +`people-and-fruits-sm'. It is possible to reconnect to an existing session by calling`get++_++or++_++create++`++ |
| 78 | +with the same session name and configuration. |
| 79 | + |
| 80 | +We will also set a time-to-live (TTL) for the session. This ensures that |
| 81 | +our session is automatically deleted after being unused for 30 minutes. |
| 82 | +This is a good practice to avoid incurring costs should we forget to |
| 83 | +delete the session ourselves. |
| 84 | + |
| 85 | +[source, python, role=no-test] |
| 86 | +---- |
| 87 | +from graphdatascience.session import AlgorithmCategory, SessionMemory |
| 88 | +
|
| 89 | +# Explicitly define the size of the session |
| 90 | +memory = SessionMemory.m_4GB |
| 91 | +
|
| 92 | +# Estimate the memory needed for the GDS session |
| 93 | +memory = sessions.estimate( |
| 94 | + node_count=20, |
| 95 | + relationship_count=50, |
| 96 | + algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING], |
| 97 | +) |
| 98 | +
|
| 99 | +print(f"Estimated memory: {memory}") |
| 100 | +
|
| 101 | +# Find out and specify where to create the GDS session |
| 102 | +cloud_locations = sessions.available_cloud_locations() |
| 103 | +print(f"Available locations: {cloud_locations}") |
| 104 | +cloud_location = cloud_locations[0] |
| 105 | +---- |
| 106 | + |
| 107 | +[source, python, role=no-test] |
| 108 | +---- |
| 109 | +# Create a GDS session! |
| 110 | +gds = sessions.get_or_create( |
| 111 | + # we give it a representative name |
| 112 | + session_name="people-and-fruits-standalone", |
| 113 | + memory=memory, |
| 114 | + cloud_location=cloud_location, |
| 115 | +) |
| 116 | +---- |
| 117 | + |
| 118 | +== Listing sessions |
| 119 | + |
| 120 | +Now that we have created a session, let’s list all our sessions to see |
| 121 | +what that looks like |
| 122 | + |
| 123 | +[source, python, role=no-test] |
| 124 | +---- |
| 125 | +from pandas import DataFrame |
| 126 | +
|
| 127 | +gds_sessions = sessions.list() |
| 128 | +
|
| 129 | +# for better visualization |
| 130 | +DataFrame(gds_sessions) |
| 131 | +---- |
| 132 | + |
| 133 | +== Adding a dataset |
| 134 | + |
| 135 | +We assume that the configured Neo4j database instance is empty. We will |
| 136 | +add our dataset using standard Cypher. |
| 137 | + |
| 138 | +In a more realistic scenario, this step is already done, and we would |
| 139 | +just connect to the existing database. |
| 140 | + |
| 141 | +[source, python, role=no-test] |
| 142 | +---- |
| 143 | +import pandas as pd |
| 144 | +
|
| 145 | +people_df = pd.DataFrame( |
| 146 | + [ |
| 147 | + {"nodeId": 0, "name": "Dan", "age": 18, "experience": 63, "hipster": 0}, |
| 148 | + {"nodeId": 1, "name": "Annie", "age": 12, "experience": 5, "hipster": 0}, |
| 149 | + {"nodeId": 2, "name": "Matt", "age": 22, "experience": 42, "hipster": 0}, |
| 150 | + {"nodeId": 3, "name": "Jeff", "age": 51, "experience": 12, "hipster": 0}, |
| 151 | + {"nodeId": 4, "name": "Brie", "age": 31, "experience": 6, "hipster": 0}, |
| 152 | + {"nodeId": 5, "name": "Elsa", "age": 65, "experience": 23, "hipster": 0}, |
| 153 | + {"nodeId": 6, "name": "Bobby", "age": 38, "experience": 4, "hipster": 1}, |
| 154 | + {"nodeId": 7, "name": "John", "age": 4, "experience": 100, "hipster": 0}, |
| 155 | + ] |
| 156 | +) |
| 157 | +people_df["labels"] = "Person" |
| 158 | +
|
| 159 | +fruits_df = pd.DataFrame( |
| 160 | + [ |
| 161 | + {"nodeId": 8, "name": "Apple", "tropical": 0, "sourness": 0.3, "sweetness": 0.6}, |
| 162 | + {"nodeId": 9, "name": "Banana", "tropical": 1, "sourness": 0.1, "sweetness": 0.9}, |
| 163 | + {"nodeId": 10, "name": "Mango", "tropical": 1, "sourness": 0.3, "sweetness": 1.0}, |
| 164 | + {"nodeId": 11, "name": "Plum", "tropical": 0, "sourness": 0.5, "sweetness": 0.8}, |
| 165 | + ] |
| 166 | +) |
| 167 | +fruits_df["labels"] = "Fruit" |
| 168 | +
|
| 169 | +like_relationships = [(0, 8), (1, 9), (2, 10), (3, 10), (4, 9), (5, 11), (7, 11)] |
| 170 | +likes_df = pd.DataFrame([{"sourceNodeId": src, "targetNodeId": trg} for (src, trg) in like_relationships]) |
| 171 | +likes_df["relationshipType"] = "LIKES" |
| 172 | +
|
| 173 | +knows_relationship = [(0, 1), (0, 2), (1, 2), (1, 3), (1, 4), (2, 5), (7, 3)] |
| 174 | +knows_df = pd.DataFrame([{"sourceNodeId": src, "targetNodeId": trg} for (src, trg) in knows_relationship]) |
| 175 | +knows_df["relationshipType"] = "KNOWS" |
| 176 | +---- |
| 177 | + |
| 178 | +== Construct Graph from DataFrames |
| 179 | + |
| 180 | +Now that we have imported a graph to our database, we create graphs |
| 181 | +directly from pandas `DataFrame` objects. We do that by using the |
| 182 | +`gds.graph.construct()` endpoint. |
| 183 | + |
| 184 | +[source, python, role=no-test] |
| 185 | +---- |
| 186 | +nodes = [people_df.drop(columns="name"), fruits_df.drop(columns="name")] # GDS does not support string properties |
| 187 | +relationships = [likes_df, knows_df] |
| 188 | +
|
| 189 | +G = gds.graph.construct("people-fruits", nodes, relationships) |
| 190 | +
|
| 191 | +
|
| 192 | +str(G) |
| 193 | +---- |
| 194 | + |
| 195 | +== Running Algorithms |
| 196 | + |
| 197 | +We can now run algorithms on the constructed graph. This is done using |
| 198 | +the standard GDS Python Client API. There are many other tutorials |
| 199 | +covering some interesting things we can do at this step, so we will keep |
| 200 | +it rather brief here. |
| 201 | + |
| 202 | +We will simply run PageRank and FastRP on the graph. |
| 203 | + |
| 204 | +[source, python, role=no-test] |
| 205 | +---- |
| 206 | +print("Running PageRank ...") |
| 207 | +pr_result = gds.pageRank.mutate(G, mutateProperty="pagerank") |
| 208 | +print(f"Compute millis: {pr_result['computeMillis']}") |
| 209 | +print(f"Node properties written: {pr_result['nodePropertiesWritten']}") |
| 210 | +print(f"Centrality distribution: {pr_result['centralityDistribution']}") |
| 211 | +
|
| 212 | +print("Running FastRP ...") |
| 213 | +frp_result = gds.fastRP.mutate( |
| 214 | + G, |
| 215 | + mutateProperty="fastRP", |
| 216 | + embeddingDimension=8, |
| 217 | + featureProperties=["pagerank"], |
| 218 | + propertyRatio=0.2, |
| 219 | + nodeSelfInfluence=0.2, |
| 220 | +) |
| 221 | +print(f"Compute millis: {frp_result['computeMillis']}") |
| 222 | +# stream back the results |
| 223 | +result = gds.graph.nodeProperties.stream(G, ["pagerank", "fastRP"], separate_property_columns=True) |
| 224 | +
|
| 225 | +print(result) |
| 226 | +---- |
| 227 | + |
| 228 | +To resolve the nodeIds to names, we can merge it back with the source |
| 229 | +data frames. |
| 230 | + |
| 231 | +[source, python, role=no-test] |
| 232 | +---- |
| 233 | +names = pd.concat([people_df, fruits_df])[["nodeId", "name"]] |
| 234 | +
|
| 235 | +result.merge(names, how="left") |
| 236 | +---- |
| 237 | + |
| 238 | +== Deleting the session |
| 239 | + |
| 240 | +Now that we have finished our analysis, we can delete the session. The |
| 241 | +results that we produced were written back to our Neo4j database, and |
| 242 | +will not be lost. If we computed additional things that we did not write |
| 243 | +back, those will be lost. |
| 244 | + |
| 245 | +Deleting the session will release all resources associated with it, and |
| 246 | +stop incurring costs. |
| 247 | + |
| 248 | +[source, python, role=no-test] |
| 249 | +---- |
| 250 | +# or gds.delete() |
| 251 | +sessions.delete(session_name="people-and-fruits-standalone") |
| 252 | +---- |
| 253 | + |
| 254 | +[source, python, role=no-test] |
| 255 | +---- |
| 256 | +# let's also make sure the deleted session is truly gone: |
| 257 | +sessions.list() |
| 258 | +---- |
| 259 | + |
| 260 | +== Conclusion |
| 261 | + |
| 262 | +And we’re done! We have created a GDS Session, projected a graph, run |
| 263 | +some algorithms, inspect the results, and deleted the session. This is a |
| 264 | +simple example, but it shows the main steps of using GDS Sessions. |
0 commit comments