Replies: 2 comments
-
|
@azurewtl the current tree index has not been touched/maintained in quite some time 😓 It is in need of quite lot of refactoring tbh. If you are ambitious enough to tackle it, it would be appreciated 🙏🏻 tbh I haven't even gone through all the code in there lol |
Beta Was this translation helpful? Give feedback.
-
Generalized from the original topicThe goal here is to build a hierarchy of knowledge base structure that helping retriever to find the MOST relevant chunks, when given a bunch of documents in folders, which contains many useful hierarchal meta info in it's own folder structure. My current approach would be:
Existing Approach I have ResearchedDuring the evaluation of my ambition(surprised by the comprehensiveness of exist feature), I have found 4 existing modules, which many construct such hierarchy of knowledge base.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
By skim throughly the code about
TreeIndex, I think the idea is brilliant. However I am a bit confused by the current implementation.In order to keep track of the tree structure, It uses a dict in
TreeIndex, which seems have the same function asnode relationship, and the latter is more intuitive in my option.Additionally, current implement of GPTTreeIndexBuilder merely merges the input
nodes/documentsuntil the number hitnum_childrenparameter, regardless of its metadata.I think the nodes should be merge primary based on
raw document file path, in such way the node from actually the same file is under the same parent.I am think about to reimplement the tree index behavior using raw_documents. The splitter should takes the whole document as a whole, and split the document based on the it path hierarchy and paragraphs. A tree structure should be generated during the splitting of document, to preserve the nature knowledge structure of folders.
Beta Was this translation helpful? Give feedback.
All reactions