-
Notifications
You must be signed in to change notification settings - Fork 864
Description
Up until recently, the Ollama Python library worked with all models that I downloaded, but now I run into issues with some of the newer models:
For reference, here is how I call Ollama:
response = ollama.generate(
model=llm_engine,
prompt=query,
stream=False,
format=response_model.model_json_schema(),
options={"temperature": 0.0, "top_k": 1, "num_ctx": num_ctx},
images=images,
)
(1) Minor -- qwen3:32b responds only in "thinking" instead of "response":
Instead of just using response["response"], I now need to use
response_text = (response["response"] or response["thinking"] or "").strip()
since qwen3:32b consistently responds in response["thinking"] and leaves response["response"] blank.
(2) Major -- llama4:16x17b and gpt-oss:20b don't seem to provide any response:
Here is the response I got from invoking gpt-oss:20b:
model='gpt-oss:20b' created_at='2025-11-03T09:07:14.833064Z' done=True done_reason='stop' total_duration=11868059417 load_duration=3789110375 prompt_eval_count=2640 prompt_eval_duration=7464908083 eval_count=7 eval_duration=371895000 response='' thinking=None context=[...]
Apologies in advance if I'm doing something wrong.
Many thanks
Wolfram
I am running Ollama's Python client version 0.6.0 and Ollama version 0.12.9.