Skip to content

Commit bc9a321

Browse files
New course - genai-graphrag-python (#420)
* new course structure * llm-knowledge-graph-construction updates * in progress updates * in progress updates * in progress updates * in progress updates * in progress updats * updates after walkthrough * set image alts * minor caption change * course summary * QA review updates * minor update * lesson summary * updates post review * update branch * make course active
1 parent 2d7d4c5 commit bc9a321

File tree

115 files changed

+7956
-1510
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

115 files changed

+7956
-1510
lines changed
59.7 KB
Loading
Lines changed: 53 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,55 @@
1-
= Constructing Knowledge Graphs with Neo4j GraphRAG Python
2-
:categories: llms:99
1+
= Constructing Knowledge Graphs with Neo4j GraphRAG for Python
2+
:categories: llms:10, advanced:7, processing:5, generative-ai:4
3+
:status: active
4+
:duration: 2 hours
5+
:caption: Learn how to use Python and LLMs to convert unstructured data into knowledge graphs.
6+
:usecase: blank-sandbox
7+
:key-points: Create a knowledge graph using Neo4j GraphRAG for Python, Model a knowledge graph of structure and unstructured data, Query a knowledge graph using retrievers, Customize the knowledge graph build process
8+
:repository: neo4j-graphacademy/genai-graphrag-python
9+
:banner-style: light
310

4-
In this course, you will learn how to:
11+
== Course Description
512

6-
* Use the `neo4j_graphrag` Python package to build graph retrieval agumented generation (GraphRAG) applications.
7-
* Build pipelines to construct knowledge graphs from unstructured text.
8-
* Combine semantic search and relationships to improve the quality of LLM generated responses.
13+
In this hands-on course, you will learn how to create knowledge graphs using link:https://neo4j.com/docs/neo4j-graphrag-python/current/[Neo4j GraphRAG for Python^].
14+
15+
You will:
16+
17+
* Use the `neo4j_graphrag` Python package to build knowledge graphs from unstructured data.
18+
* Add structured data to the knowledge graph to improve LLM responses.
19+
* Create retrievers to search the knowledge graph.
20+
* Learn how you can customize the build process to suit your data and use case.
21+
22+
Finally, you will use what you have learned to build a knowledge graph from your data.
23+
24+
=== Prerequisites
25+
26+
This is an advanced course and you should:
27+
28+
* Understand graph and Neo4 fundamental concepts - link:/courses/neo4j-fundamentals[Neo4j and Graph Fundamentals^].
29+
* Have an understanding of how Generative AI, LLMs, and vector indexes are related to Neo4j - link:/courses/genai-fundamentals[Neo4j & GenerativeAI Fundamentals^].
30+
* Be able to read and write simple Cypher queries - link:/courses/cypher-fundamentals[Cypher Fundamentals^].
31+
* Understand how you can use an LLM to generate a knowledge graph - link:/courses/llm-knowledge-graph-construction[https://graphacademy.neo4j.com/courses/llm-knowledge-graph-construction/^].
32+
* Have experience with programming in Python.
33+
34+
=== Duration
35+
36+
{duration}
37+
38+
=== What you will learn
39+
40+
How to:
41+
42+
* Use the Neo4j GraphRAG for Python package to create a knowledge graph from unstructured data.
43+
* Enhance a knowledge graph by adding structured data.
44+
* Create Retrievers to search a knowledge graph.
45+
* Customize the knowledge graph build process to suit your data and use case.
46+
* Model a knowledge graph of both structured and unstructured data.
47+
48+
49+
50+
[.includes]
51+
== This course includes
52+
53+
* [lessons]#16 lessons#
54+
* [challenges]#7 hands-on challenges#
55+
* [quizes]#8 simple quizzes to support your learning#

asciidoc/courses/genai-graphrag-python/illustration.svg

Lines changed: 159 additions & 0 deletions
Loading
108 KB
Loading
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
= Constructing Knowledge Graphs
2+
:type: lesson
3+
:order: 1
4+
5+
In this lesson you will review the process of constructing knowledge graphs from unstructured text using an LLM.
6+
7+
== The construction process
8+
9+
Typically, you would follow these steps:
10+
11+
. Gather the data
12+
. Chunk the data
13+
. _Vectorize_ the data
14+
. Pass the data to an LLM to extract nodes and relationships
15+
. Use the output to generate the graph
16+
17+
=== Gather your data sources
18+
19+
The first step is to gather your unstructured data.
20+
The data can be in the form of text documents, PDFs, publicly available data, or any other source of information.
21+
22+
Depending on the format, you may need to reformat the data into a format (typically text) that the LLM can process.
23+
24+
The data sources should contain the information you want to include in your knowledge graph.
25+
26+
=== Chunk the data
27+
28+
The next step is to break down the data into _right-sized_ parts.
29+
This process is known as _chunking_.
30+
31+
The size of the chunks depends on the LLM you are using, the complexity of the data, and what you want to extract from the data.
32+
33+
You may not need to chunk the data if the LLM can process the entire document at once and it fits your requirements.
34+
35+
=== Vectorize the data
36+
37+
Depending on your requirements for querying and searching the data, you may need to create *vector embeddings*.
38+
You can use any embedding model to create embeddings for each data chunk, but the same model must be used for all embeddings.
39+
40+
Placing these vectors into a link:https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/[Vector index^] allows you to perform semantic searches, similarity searches, and clustering on the data.
41+
42+
[TIP]
43+
.Chunking, Vectors, and Similarity Search
44+
You can learn more about how to chunk documents, vectors, similarity search, and embeddings in the GraphAcademy course link:https://graphacademy.neo4j.com/courses/llm-vectors-unstructured/1-introduction/2-semantic-search/[Introduction to Vector Indexes and Unstructured Data^].
45+
46+
=== Extract nodes and relationships
47+
48+
The next step is to pass the unstructured text data to the LLM to extract the nodes and relationships.
49+
50+
You should provide a suitable prompt that will instruct the LLM to:
51+
52+
- Identify the entities in the text.
53+
- Extract the relationships between the entities.
54+
- Format the output so you can use it to generate the graph, for example, as JSON or another structured format.
55+
56+
Optionally, you may also provide additional context or constraints for the extraction, such as the type of entities or relationships you are interested in extracting.
57+
58+
59+
=== Generate the graph
60+
61+
Finally, you can use the output from the LLM to generate the graph by creating the nodes and relationships within Neo4j.
62+
63+
The entity and relationship types would become labels and relationship types in the graph.
64+
The _names_ would be the node and relationship identifiers.
65+
66+
== Example
67+
68+
If you wanted to construct a knowledge graph based on the link:https://en.wikipedia.org/wiki/Neo4j[Neo4j Wikipedia page^], you would:
69+
70+
. **Gather** the text from the page. +
71+
+
72+
image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page"]
73+
. Split the text into **chunks**.
74+
+
75+
Neo4j is a graph database management system (GDBMS) developed
76+
by Neo4j Inc.
77+
+
78+
{sp}
79+
+
80+
The data elements Neo4j stores are nodes, edges connecting them,
81+
and attributes of nodes and edges...
82+
83+
. Generate **embeddings** and **vectors** for each chunk.
84+
+
85+
[0.21972137987, 0.12345678901, 0.98765432109, ...]
86+
87+
. **Extract** the entities and relationships using an **LLM**.
88+
+
89+
Send the text to the LLM with an appropriate prompt, for example:
90+
+
91+
Your task is to identify the entities and relations requested
92+
with the user prompt from a given text. You must generate the
93+
output in a JSON format containing a list with JSON objects.
94+
95+
Text:
96+
{text}
97+
+
98+
Parse the entities and relationships output by the LLM.
99+
+
100+
[source, json]
101+
----
102+
{
103+
"node_types": [
104+
{
105+
"label": "GraphDatabase",
106+
"properties": [
107+
{
108+
"name": "Neo4j", "type": "STRING"
109+
}
110+
]
111+
},
112+
{
113+
"label": "Company",
114+
"properties": [
115+
{
116+
"name": "Neo4j Inc", "type": "STRING"
117+
}
118+
]
119+
},
120+
{
121+
"label": "Programming Language",
122+
"properties": [
123+
{
124+
"name": "Java", "type": "STRING"
125+
}
126+
]
127+
}
128+
],
129+
"relationship_types": [
130+
{
131+
"label": "DEVELOPED_BY"
132+
},
133+
{
134+
"label": "IMPLEMENTED_IN"
135+
}
136+
],
137+
"patterns": [
138+
["Neo4j", "DEVELOPED_BY", "Neo4j Inc"],
139+
["Neo4j", "IMPLEMENTED_IN", "Java"],
140+
]
141+
}
142+
----
143+
. **Generate** the graph.
144+
+
145+
Use the data to construct the graph in Neo4j by creating nodes and relationships based on the entities and relationships extracted by the LLM.
146+
+
147+
[source, cypher, role=noplay nocopy]
148+
.Generate the graph
149+
----
150+
MERGE (neo4jInc:Company {id: 'Neo4j Inc'})
151+
MERGE (neo4j:GraphDatabase {id: 'Neo4j'})
152+
MERGE (java:ProgrammingLanguage {id: 'Java'})
153+
MERGE (neo4j)-[:DEVELOPED_BY]->(neo4jInc)
154+
MERGE (neo4j)-[:IMPLEMENTED_IN]->(java)
155+
----
156+
157+
158+
159+
[.quiz]
160+
== Check your understanding
161+
162+
include::questions/1-steps.adoc[leveloffset=+1]
163+
164+
[.summary]
165+
== Lesson Summary
166+
167+
In this lesson, you learned about how to construct a knowledge graph.
168+
169+
In the next lesson, you will setup your development environment to build knowledge graphs using Python and Neo4j.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
[.question]
2+
= 1. Knowledge graph construction steps
3+
4+
Which of the following steps could be considered **optional**?
5+
6+
* [ ] Gather your data sources
7+
* [x] Chunk the data
8+
* [x] _Vectorize_ the data
9+
* [ ] Pass the data to an LLM to extract nodes and relationships
10+
* [ ] Use the output to generate the graph
11+
12+
[TIP,role=hint]
13+
.Hint
14+
====
15+
The essential parts of the process are obtaining the data to pass to the LLM and using the output to generate the graph.
16+
====
17+
18+
[TIP,role=solution]
19+
.Solution
20+
====
21+
The optional steps are:
22+
23+
* Chunk the data
24+
* _Vectorize_ the data
25+
26+
It may not be necessary to chunk the data or vectorize it depending on the LLM you are using, the complexity of the data, and your requirements.
27+
====
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
= Setup your development environment
2-
:order: 0
32
:type: lesson
4-
:lab: {repository-link}
5-
:disable-cache: true
3+
:order: 2
64
:branch: main
75

8-
In this module, you will use Python, LangChain, and OpenAI to create a knowledge graph from unstructured data.
6+
7+
During this course, you will:
8+
9+
* Use the Neo4j link:https://neo4j.com/docs/neo4j-graphrag-python/current/[GraphRAG for Python ()`neo4j_graphrag`) package to create a knowledge graph from unstructured and structured data
10+
* Create vector and text to Cypher retrievers that use the knowledge graph to provide context to an LLM
11+
912
You must set up a development environment to run the code examples and exercises.
1013

1114
include::../../../../../../shared/courses/codespace/get-started.adoc[]
@@ -17,18 +20,18 @@ You will need link:https://python.org[Python] installed and the ability to insta
1720

1821
You may want to set up a virtual environment using link:https://docs.python.org/3/library/venv.html[`venv`^] or link:https://virtualenv.pypa.io/en/latest/[`virtualenv`^] to keep your dependencies separate from other projects.
1922

20-
Clone the link:{repository-link}[github.com/neo4j-graphacademy/llm-knowledge-graph-construction] repository:
23+
Clone the link:{repository-link}[github.com/neo4j-graphacademy/genai-graphrag-python] repository:
2124

2225
[source,bash]
2326
----
24-
git clone https://github.com/neo4j-graphacademy/llm-knowledge-graph-construction
27+
git clone https://github.com/neo4j-graphacademy/genai-graphrag-python
2528
----
2629

27-
Install the required packages using `pip` and download the required data:
30+
Install the required packages using `pip`:
2831

2932
[source,bash]
3033
----
31-
cd llm-knowledge-graph
34+
cd genai-graphrag-python
3235
pip install -r requirements.txt
3336
----
3437

@@ -45,20 +48,34 @@ Fill in the required values.
4548
[source]
4649
.Create a .env file
4750
----
48-
include::{repository-raw}/{branch}/.env.example[]
51+
# Create a copy of this file and name it .env
52+
OPENAI_API_KEY="sk-..."
53+
NEO4J_URI="{instance-scheme}://{instance-ip}:{instance-boltPort}"
54+
NEO4J_USERNAME="{instance-username}"
55+
NEO4J_PASSWORD="{instance-password}"
56+
NEO4J_DATABASE="{instance-database}"
4957
----
58+
// include::{repository-raw}/{branch}/.env.example[]
5059

5160
Add your Open AI API key (`OPENAI_API_KEY`), which you can get from link:https://platform.openai.com[platform.openai.com^].
5261

53-
Update the Neo4j sandbox connection details:
62+
ifeval::[{course-completed}==true]
63+
64+
.Course completed
65+
[IMPORTANT]
66+
====
67+
You have completed this course.
68+
69+
The Neo4j sandbox instance is no longer available, you can create a Neo4j cloud instance using link:https://console.neo4j.io[Neo4j AuraDB^]
70+
====
71+
72+
endif::[]
73+
5474

55-
NEO4J_URI:: [copy]#bolt://{instance-ip}:{instance-boltPort}#
56-
NEO4J_USERNAME:: [copy]#{instance-username}#
57-
NEO4J_PASSWORD:: [copy]#{instance-password}#
5875

5976
== Test your setup
6077

61-
You can test your setup by running `llm-knowledge_graph/test_environment.py` - this will attempt to connect to the Neo4j sandbox and the OpenAI API.
78+
You can test your setup by running `genai-graphrag-python/test_environment.py` - this will attempt to connect to the Neo4j sandbox and the OpenAI API.
6279

6380
You will see an `OK` message if you have set up your environment correctly. If any tests fail, check the contents of the `.env` file.
6481

@@ -68,9 +85,13 @@ When you are ready, you can move on to the next task.
6885

6986
read::Success - let's get started![]
7087

88+
89+
90+
read::Continue[]
91+
7192
[.summary]
72-
== Summary
93+
== Lesson Summary
7394

74-
You have setup your environment and are ready to start this module.
95+
In this lesson, you setup your development environment to build a knowledge graph.
7596

76-
In the next lesson, you will explore a strategy for storing unstructured data in a graph.
97+
In the next module, you will create a knowledge graph from unstructured and structured data using an LLM.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
= Introduction
2+
:order: 1
3+
4+
Welcome to Constructing Knowledge Graphs with Neo4j GraphRAG for Python.
5+
6+
== Module Overview
7+
8+
In this module, you will:
9+
10+
* Review the process of creating knowledge graphs from unstructured text.
11+
* Setup a development environment to build your own knowledge graph.
12+
13+
If you are ready, let's get going!
14+
15+
link:./1-knowledge-graph-construction/[Ready? Let's go →, role=btn]

0 commit comments

Comments
 (0)