Performance issue on the embeddings step

### Tested versions

pyannote.audio-4.0.1
pyannote.audio-4.0.2


### System information

Description:    Ubuntu 24.04.3 LTS Release:        24.04

### Issue description

I used a pyannote on a five mins long audio. Segmentation, speaker_counting take just 2 seconds. But embeddings takes a really long time. My GPU is RTX-4070.


My versions: 
```
pyannote.audio-4.0.2
torchcodec-0.7.0
torch '2.8.0+cu128'
numpy-2.3.5
```

```
segmentation     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
speaker_counting ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
embeddings       ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━  50% 0:08:35
```

### Minimal reproduction example (MRE)
https://gist.github.com/kenenbek/22c66fb788487f64ce8ecacfe1626535

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance issue on the embeddings step #1955

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance issue on the embeddings step #1955

Description

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions