[Question]: How to chunk a Markdown file only by a custom delimiter (e.g., <hr>) without further splitting?

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### Describe your problem

# My goal
I have some Markdown (.md) files that I have pre-processed. Each entry/record in this file is explicitly separated by \n\n\<hr\>\n\n.

My goal is to have RAGFlow treat each section separated by `<hr>` as a single, complete chunk, regardless of how long that section is.

Example data structure in my_file.md:
```markdown
# Entry 1 Title
This is the content for entry 1.
It might be very long, over 1000 characters.
## SubTitle
...
...

<hr>

# Entry 2 Title
This is the content for entry 2.
It might be short.

<hr>

# Entry 3 Title
This is another long entry.
...
```
And my desired chunks should be Chunk 1: (All content for Entry 1),Chunk 2: (All content for Entry 2)...

# What I have tried
## Using the "General" Knowledge Base Import
When I upload the file directly to a Knowledge Base using the "general" method, RAGFlow still splits my entries. For example, if "Entry 1" is 1500 characters and my chunk_size is 512, though I set "Text segmentation identifier" as `\n\n<hr>\n\n`,RAGFlow splits "Entry 1" into 3-4 smaller chunks. This breaks the context of my data.(in this case,I have 1000 structured entries, and chunk_size is 512, my file is split into 2127)

<img width="775" height="668" alt="Image" src="https://github.com/user-attachments/assets/7083dd47-3bce-4c39-aa3e-741b194c3c49" />

<img width="1466" height="62" alt="Image" src="https://github.com/user-attachments/assets/b5e2379b-96db-4235-a6bc-f06baa9e60cb" />

## Using an Ingestion Pipeline (Agent):
I tried to build a custom processing pipeline to control this. My pipeline looks like this(the template): File -> Parser -> Chunker -> Tokenizer (Indexer)

<img width="1884" height="900" alt="Image" src="https://github.com/user-attachments/assets/4bda4e5e-c7d9-4c25-8a95-47a4ce601fef" />

<img width="595" height="861" alt="Image" src="https://github.com/user-attachments/assets/b77a9dbd-0013-49b0-93bc-c80b8cbfff93" />

I hope the chunker can split the markdown file with H1, while I come across an error`[ERROR]Input error: ... Input should be 'json' or 'chunks' [type=literal_error, input_value='text']`

# My Question
How can I correctly configure RAGFlow to chunk only on my custom \<hr\> separator and prevent any further splitting (like by chunk_size or paragraphs) within those sections?
A) Is there a setting in the standard Knowledge Base import that I missed (e.g., "split by custom regex" or "disable size limit")?

B) If this must be done with an Ingestion Pipeline, what is the correct node setup and configuration to achieve this? Is there anything I missed?

Thanks for your time : ) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: How to chunk a Markdown file only by a custom delimiter (e.g., <hr>) without further splitting? #10890

Self Checks

Describe your problem

My goal

What I have tried

Using the "General" Knowledge Base Import

Using an Ingestion Pipeline (Agent):

My Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: How to chunk a Markdown file only by a custom delimiter (e.g., <hr>) without further splitting? #10890

Description

Self Checks

Describe your problem

My goal

What I have tried

Using the "General" Knowledge Base Import

Using an Ingestion Pipeline (Agent):

My Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions