No option to set API timeout! #6445

openSourcerer9000 · 2025-07-30T16:42:18Z

openSourcerer9000
Jul 30, 2025

I can configure everything else about the API, But the default API timeout is just too aggressive for large, slow local models, where processing the gigantic system prompt takes longer than the default timeout. I'm trying the new qwen coder 480b which actually gets fantastic tokens per second because it's moe, but I simply can't use it because roo doesn't have patience to wait for it to process its system prompt. if there's already a way to do this from the gui, please let me know

in the meantime, how can I hack the hard coded timeout setting? Where does it live in the codebase? is there a guide to forking roo and using the fork with VSCode? thanks

cybrah · 2025-08-07T20:54:13Z

cybrah
Aug 7, 2025

Would also really like to see this!

Roo code keeps timing out on my slow, local models

0 replies

fernandaspets · 2025-10-13T02:07:25Z

fernandaspets
Oct 13, 2025

same here please let us set the timeout settings. its timing out on 80k context with local model

0 replies

chrisgladden · 2025-10-14T06:23:54Z

chrisgladden
Oct 14, 2025

Agreed. Universal timeout control would be beautiful. Cline implemented one for Ollama, but neglected LM Studio, which is a bummer since Ollama doesn't support MLX. Throw a bone to those of us living life in the local slow lane ;)

0 replies

thornad · 2025-10-22T15:36:44Z

thornad
Oct 22, 2025

Are you guys not doing this on purpose? It's been months. How hard is to set up the timeout value in the GUI ? This is fundamental for local LLM users.

2 replies

hannesrudolph Nov 6, 2025
Collaborator

I'm confused.. this was solved months ago. Look under the provider settings.

cybrah Nov 6, 2025

Not solved for local models. I have to use a proxy server in between that sends blank deltas to keep the connection open.

BrianVB · 2025-11-04T01:26:52Z

BrianVB
Nov 4, 2025

I am indexing a very large codebase, taking hours. Successful ollama logs look like this:

Nov 03 09:38:58 bvb-desktop ollama[285850]: [GIN] 2025/11/03 - 09:38:58 | 200 | 58.715907463s |       127.0.0.1 | POST     "/api/embed"
Nov 03 09:39:03 bvb-desktop ollama[285850]: [GIN] 2025/11/03 - 09:39:03 | 200 | 56.829082619s |       127.0.0.1 | POST     "/api/embed"
Nov 03 09:39:08 bvb-desktop ollama[285850]: [GIN] 2025/11/03 - 09:39:08 | 200 | 55.268326636s |       127.0.0.1 | POST     "/api/embed"

I'm not sure if I'm reading it right, but if 58.715907463s is how long each request is taking to process, I'm near that 1 minute limit. These numbers range from 39 up to 1 min. There are "successful" runs like that between thousands of lines like this:

Nov 03 11:18:32 bvb-desktop ollama[285850]: time=2025-11-03T11:18:32.180-05:00 level=INFO source=server.go:3634 msg="http: superfluous response.WriteHeader call from github.com/ollama/ollama/runner/llamarunner.(*Server).embeddings (runner.go:717)"
Nov 03 11:18:38 bvb-desktop ollama[285850]: time=2025-11-03T11:18:38.514-05:00 level=INFO source=server.go:1559 msg="aborting embedding request due to client closing the connection"
Nov 03 11:18:38 bvb-desktop ollama[285850]: time=2025-11-03T11:18:38.515-05:00 level=INFO source=server.go:1559 msg="aborting embedding request due to client closing the connection"
Nov 03 11:18:38 bvb-desktop ollama[285850]: time=2025-11-03T11:18:38.514-05:00 level=INFO source=server.go:1559 msg="aborting embedding request due to client closing the connection"

I'm assuming this isn't a bug, and it's just the timeout. Being able to set this higher would be a huge help.

0 replies

thornad · 2025-11-06T14:36:41Z

thornad
Nov 6, 2025

Not for LM Studio or Ollama as a provider, there is no such setting.

…

On 06/11/2025 13:49, Hannes Rudolph wrote: I'm confused.. this was solved months ago. Look under the provider settings. — Reply to this email directly, view it on GitHub <#6445 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWJXUBZHYOF3DMWPOJRO5L33NGVDAVCNFSM6AAAAACCXPJADWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOBZGE4TKNI>. You are receiving this because you commented.Message ID: ***@***.***>

0 replies

thornad · 2025-11-10T11:14:28Z

thornad
Nov 10, 2025

Try this to prove the issue is actualy in RooCode, and not in lmstudio or mlx: curl -X POST http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d "{ \"model\": ***@***.***\", \"messages\": [{ \"role\": \"user\", \"content\": $(jq -Rs . < long_file.txt) }] }" Replace model name and file name with your model and your file, long enough to process for more than 300s. It will work just fine. Which means the issue is in RooCode, most likely in a library that has a hardcoded 300s timeout.

…

On 06/11/2025 14:59, cybrah wrote: Not solved for local models. I have to use a proxy server in between that sends blank deltas to keep the connection open. — Reply to this email directly, view it on GitHub <#6445 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWJXUGBMXXCPLRKUERLBSL33NO4FAVCNFSM6AAAAACCXPJADWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOBZGI2TSMY>. You are receiving this because you commented.Message ID: ***@***.***>

0 replies

thornad · 2025-11-10T15:13:43Z

thornad
Nov 10, 2025

I*Located *the issue in the *Undici *library in the OpenAI SDK. I *replaced* undici with *Axios *and there were *no more timeouts*. It is a working short term solution. But long term, Someone needs to fix Undici lib, and OpenAI SDK, RooCode, KiloCode, all those using it to become aware of the issue.

…

On 06/11/2025 14:36, ***@***.*** wrote: Not for LM Studio or Ollama as a provider, there is no such setting. On 06/11/2025 13:49, Hannes Rudolph wrote: > > I'm confused.. this was solved months ago. Look under the provider settings. > > — > Reply to this email directly, view it on GitHub <#6445 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWJXUBZHYOF3DMWPOJRO5L33NGVDAVCNFSM6AAAAACCXPJADWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOBZGE4TKNI>. > You are receiving this because you commented.Message ID: ***@***.***> >

0 replies

thornad · 2025-11-11T11:15:46Z

thornad
Nov 11, 2025

Ok, so here are my research results: OpenAI SDK uses native *fetch() *and Native fetch() uses Node.js internal undici. Looks like node developers hardcoded the *5 min* timeout in undici on purpose (base timeout?). There is also another *10 minute *hardcoded timeout in there (keepAlive timeout?). I think these need to be fixed in the Open AI SDK and the other projects *by not using native fetch.* In the mean time the only way things worked for me is to replace it with Axios, but the developers should really look at this and find a solution, because even the cloud services will hit a 5 minute and/or 10 minute timeout at some point.

…

On 06/11/2025 14:36, ***@***.*** wrote: Not for LM Studio or Ollama as a provider, there is no such setting. On 06/11/2025 13:49, Hannes Rudolph wrote: > > I'm confused.. this was solved months ago. Look under the provider settings. > > — > Reply to this email directly, view it on GitHub <#6445 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWJXUBZHYOF3DMWPOJRO5L33NGVDAVCNFSM6AAAAACCXPJADWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOBZGE4TKNI>. > You are receiving this because you commented.Message ID: ***@***.***> >

0 replies

No option to set API timeout! #6445

Uh oh!

Replies: 9 comments · 2 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hannesrudolph Nov 6, 2025 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 9 comments 2 replies

hannesrudolph Nov 6, 2025
Collaborator