Eval bug: unsloth/gpt-oss-120b-GGUF:F16 produces incoherent output

### Name and Version

$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
  Device 1: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
version: 6951 (9aa63374f)
built with cc (GCC) 15.2.1 20251022 (Red Hat 15.2.1-3) for x86_64-redhat-linux


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

Ryzen 5 9600X 128GB RAM, 2 x RTX5060TI-16

### Models

unsloth/gpt-oss-120b-GGUF:F16

### Problem description & steps to reproduce


In recent versions of llama-cpp (unfortunately, I can't tell exactly starting with which version) unsloth/gpt-oss-120b-GGUF:F16 produces the outputs unrelated to users' prompt, sometimes replying to something it wasn't asked, sometimes producing a totally incoherent output.  It may not happen with short prompts, but with longer prompts (800+ tokens) it happens every time. The models of Qwen3 family (including VL) process the same prompts without issues, just like gpt-oss-120b could do about a month ago.

Running it like this:
./llama-server -dev CUDA0,CUDA1 -ngl 99 -ts 9,10 -c 90000 --no-webui -hf unsloth/gpt-oss-120b-GGUF:F16 --prio 3 -ot ".ffn_(up|down)_exps.=CPU" -fa on --swa-full --jinja

Adding the parameters recommended by Unsloth.ai (--temp 1.0 --min-p 0.0 --top-p 1.0 --top-k 0.0) doesn't change anything.

### First Bad Commit

n/a

### Relevant log output

```shell
n/a
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: unsloth/gpt-oss-120b-GGUF:F16 produces incoherent output #17016

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: unsloth/gpt-oss-120b-GGUF:F16 produces incoherent output #17016

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions