Improving input token throughput?

so, i ran some benchmark with VLLM (I actually modified some parts of the code so it would work with vllm on tpu), and the logs show something like this:
```
Processed prompts:  16%|▏| 1456/9263 [37:28<11:01:04,  5.08s/it, est. speed input: 86.99 toks/s, outp                                                                                                     ProceProcessed prompts:  64%|████████████████████████████████████████████████████████                               

| 5971/9263 [2:16:18<1:17:49,  1.42s/it, est. speed input: 72.62 toks/s, output: 2683.22 toks/s]
```
the input tok/s is much worse compared to the output tok/s, do you think this could be a bottleneck?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving input token throughput? #1063

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improving input token throughput? #1063

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions