bug: Qwen Embedded doesn't work.

### Issue description

`Qwen3-Embedding-8B-Q4_K_M.gguf` (downloaded from huggingface) works on `gpu: false` (cpu) but not when run on vulkan.

### Expected Behavior

`await embedContext.getEmbeddingFor(...)` returns successfully, or throws an error that is understood.

### Actual Behavior

Logs:
```
Embedding model loaded, priming with large data.
[node-llama-cpp] state_write_data: writing state
[node-llama-cpp] state_write_data: - writing model info
[node-llama-cpp] state_write_data: - writing output ids
[node-llama-cpp] state_write_data: - writing logits
[node-llama-cpp] state_write_data: - writing embeddings
[node-llama-cpp] state_write_data: - writing memory module
[node-llama-cpp] init: embeddings required but some input tokens were not marked as outputs -> overriding      
[node-llama-cpp] output_reserve: reallocating output buffer from size 0.59 MiB to 152.11 MiB                   
[node-llama-cpp] init: embeddings required but some input tokens were not marked as outputs -> overriding
[node-llama-cpp] init: embeddings required but some input tokens were not marked as outputs -> overriding
D:/a/node-llama-cpp/node-llama-cpp/llama/llama.cpp/src/llama-context.cpp:622: fatal error
```
Exit code -1073740791

I get the impression that the warnings `init: embeddings required but some input tokens were not marked as outputs -> overriding` can be safely ignored, as that's what I do with the chat version of the qwen model and it hasn't caused issues (not only that but despite my best efforts I don't know how to fix that warning).

### Steps to reproduce

I've omitted the text I was using to test, I just copied text from my internal wiki, you can probably copy text from any available wiki to do the test, as long as it's between 2000 and 3000 characters it should suffice. Running the embedder on a sample size of ~100 tokens seems to work, so I suspect the crash is related to the buffer reallocation.

I did manage to get a similar setup to work reliably on the cpu (but veeeery slowly) on a forked thread in my development program, but I haven't been able to replicate the setup in my sample test. The only things I think are different is `{gpu: false}` was provided to `getLlama`, the context size was 8192 and the batch-size wasn't set.


```typescript

import { getLlama, Llama, LlamaLogLevel } from "node-llama-cpp";

(async function main() {
	let llamacpp = await getLlama();

	llamacpp.logLevel = LlamaLogLevel.debug;

	let embedModel = await llamacpp.loadModel({
		modelPath: "../Models/Qwen3-Embedding-8B-Q4_K_M.gguf",
		useMlock: true,
		useMmap: true,
		gpuLayers: "auto",
		metadataOverrides: {
			general: {}
		}
	});

	let embedContext = await embedModel.createEmbeddingContext({contextSize: 4096 , batchSize: 256 /*, threads: 4*/});
	console.log("\nEmbedding model loaded, priming with large data.");
	await embedContext.getEmbeddingFor(`
		
A really long text of 2000+ characters.

		`.trim()); // pre-reserve big outputs/KV
	console.log("Embedding model primed.");

})();
```


<details><summary>package.json</summary>
<p>


```json
{
  "type": "module",
  "scripts": {
    "start": "node vulcan.js",
    "test": "echo \"Error: no test specified\" && exit 1",
    "build": "tsc --build",
    "rebuild": "tsc --build --force"
  },
  "dependencies": {
    "@types/node": "^18.0.0",
    "node-llama-cpp": "^3.14.2",
    "typescript": "^5.4.5"
  }
}
```

</p>
</details> 


<details><summary>tsconfig</summary>
<p>

```tsconfig
{
  "compileOnSave": true,
  "compilerOptions": {
    "module": "ESNext",                                  /* Specify what module code is generated. */
    "target": "ES2020",                                  /* Set the JavaScript language version for emitted JavaScript and include compatible library declarations. */
    "moduleResolution": "node",
    "types": ["node"],
    "sourceMap": true,
    "esModuleInterop": true,                             /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
    "forceConsistentCasingInFileNames": true,            /* Ensure that casing is correct in imports. */
    "strict": true,                                      /* Enable all strict type-checking options. */
    "skipLibCheck": true,                                /* Skip type checking all .d.ts files. */
    "useDefineForClassFields": false,
    "experimentalDecorators": true,
    "emitDecoratorMetadata": false,
  }
}
```
</p>
</details> 

### My Environment

| Dependency               | Version             |
| ---                      | ---                 |
| Operating System         |        Windows 11 Pro (10.0.22621)           |
| CPU                      | Intel i7-11700 |
| Node.js version          | 20.11.1             |
| Typescript version       | 5.4.5             |
| `node-llama-cpp` version | 3.14.2             |

`npx --yes node-llama-cpp inspect gpu` output:
```
npm WARN deprecated npmlog@6.0.2: This package is no longer supported.
npm WARN deprecated are-we-there-yet@3.0.1: This package is no longer supported.
npm WARN deprecated gauge@4.0.4: This package is no longer supported.
OS: Windows 10.0.22621 (x64)
Node: 20.11.1 (x64)

node-llama-cpp: 3.14.2
Prebuilt binaries: b6845

Vulkan: available

Vulkan device: AMD Radeon RX 6800 XT
Vulkan used VRAM: 4.88% (786.42MB/15.73GB)
Vulkan free VRAM: 95.11% (14.97GB/15.73GB)

CPU model: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
Math cores: 0
Used RAM: 70.61% (22.5GB/31.86GB)
Free RAM: 29.38% (9.36GB/31.86GB)
Used swap: 51.28% (33.78GB/65.86GB)
Max swap size: 65.86GB
mmap: supported
```


### Additional Context

A slightly different setup results in the program crashing without throwing the error.

### Relevant Features Used

- [ ] Metal support
- [ ] CUDA support
- [x] Vulkan support
- [ ] Grammar
- [ ] Function calling

### Are you willing to resolve this issue by submitting a Pull Request?

No, I don’t have the time and I’m okay to wait for the community / maintainers to resolve this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bug: Qwen Embedded doesn't work. #519

Issue description

Expected Behavior

Actual Behavior

Steps to reproduce

My Environment

Additional Context

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dependency	Version
Operating System	Windows 11 Pro (10.0.22621)
CPU	Intel i7-11700
Node.js version	20.11.1
Typescript version	5.4.5
`node-llama-cpp` version	3.14.2

Uh oh!

bug: Qwen Embedded doesn't work. #519

Description

Issue description

Expected Behavior

Actual Behavior

Steps to reproduce

My Environment

Additional Context

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions