HARD -- Efficient way to use enterprise dataset without uploading all files? #10388

LiorZadah · 2025-09-30T08:31:37Z

LiorZadah
Sep 30, 2025

Hello,

I’m working with RAGFlow on an Azure VM and want to create a dataset for enterprise use.
I have a large local folder (~90GB) containing many PDF and DOCX files.

My questions are:

Do I need to upload all files into the system in order to use them as a dataset, or is there a way for RAGFlow to reference a folder/directory directly?

Is there a method to “see” or register existing files in the dataset without uploading them again (for example, by mounting a folder, indexing, or using external storage like S3)?

What is the recommended workflow for handling very large datasets (tens of GBs, thousands of files) in RAGFlow for analytics and Q&A use cases?

I’m mainly looking for an efficient approach that avoids manual upload of thousands of files one by one.

Thanks in advance!

LiorZadah · 2025-10-08T15:20:31Z

LiorZadah
Oct 8, 2025
Author

Anyone? :)

0 replies

yingfeng · 2025-10-09T09:19:58Z

yingfeng
Oct 9, 2025
Maintainer

Hi,
You have to upload the data from Azure to RAGFlow. And currently you can do that through API.
From 0.22 which is going to be launched in this Nov, we will provide some data sources and you can ingest data by just click several buttons. And more data sources could be easily added.

2 replies

jesuslop Oct 21, 2025

I would want to automate the uploads (personal use scenario), could I run rclone bisync directly against the MinIO storage and then call the PARSE api method only for uploaded files? (that is, do the upload API method just upload, or does uploading with the API have state side-effects changing the internal DBs of RagFlow?)

yingfeng Oct 21, 2025
Maintainer

No, just sync data from MinIO is not enough because each data has corresponding meta information within MySQL or Meta service.
We are implementing the S3 data source right now, which will be available in weeks, and you can take a look on the implementation and adapt it to your personal data source.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfiniFlow

HARD -- Efficient way to use enterprise dataset without uploading all files? #10388

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

InfiniFlow

HARD -- Efficient way to use enterprise dataset without uploading all files? #10388

Uh oh!

LiorZadah Sep 30, 2025

Replies: 2 comments · 2 replies

Uh oh!

LiorZadah Oct 8, 2025 Author

Uh oh!

yingfeng Oct 9, 2025 Maintainer

Uh oh!

jesuslop Oct 21, 2025

Uh oh!

yingfeng Oct 21, 2025 Maintainer

LiorZadah
Sep 30, 2025

Replies: 2 comments 2 replies

LiorZadah
Oct 8, 2025
Author

yingfeng
Oct 9, 2025
Maintainer

yingfeng Oct 21, 2025
Maintainer