Ollama truncating input prompt context size #10500
Replies: 1 comment
-
|
Hi @joelam33-coder! I'm Dosu and I’m helping the ragflow team. RagFlow's Max Token configuration does not reliably pass the context size to Ollama. Even if you set 8192 in RagFlow, Ollama will use its default (4096) unless the underlying model in Ollama is configured for a higher context window. This is why you see prompt truncation and hallucination issues. The workaround—creating a derived model in Ollama with the desired context size—is necessary because RagFlow's settings alone do not update Ollama's context window. This gap is a known issue and may be addressed in future updates, but for now, changing RagFlow's settings is not enough without also updating the Ollama model itself. More details are discussed here and here. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
ollama version is 0.12.5
Ragflow 0.20.5
In RagFlow Model Provider: For Ollama gemma3:4b Max Token set to 8192
In Chat Setting Max Token set to 8192
When I run Ollam PS, it shows the default size 4096. It seems like Ragflow is not asking or calculating the context size and pass it to ollama before running it.
Due to truncated text, it causes the model to hallucinate because it is cutting off the instruction.
The question I have is that, is the Max Token configuration in the RagFlow Model Provider passing the context size to ollama?
Error I received:
time=2025-10-11T15:34:53.835Z level=WARN source=runner.go:159 msg="truncating input prompt" limit=4096 prompt=4339 keep=4 new=4096
People with similar problem:
ollama/ollama#10368
my temporary solution. I have to create a model derived from gemma3:4b and set it to 8192 in Ollama which I do not prefer.
Beta Was this translation helpful? Give feedback.
All reactions