Automatic Speech Recognition pipline raises error when torchcodec is installed but not valid

### System Info

- `transformers` version: 4.57.3
- Platform: Windows-11-10.0.26200-SP0
- Python version: 3.12.9
- Huggingface_hub version: 0.36.0
- Safetensors version: 0.7.0
- Accelerate version: not installed
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.9.1+cpu (NA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No


- torch: 2.9.1
- torchcodec: 0.8.1
- ffmpeg: version 8.0.1-full_build-www.gyan.dev

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [x] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

1. `uv init`
2. `uv add torch torchcodec transformers`
3. Run the following: https://huggingface.co/docs/transformers/main/en/task_summary#automatic-speech-recognition

```python
from transformers import pipeline

transcriber = pipeline(
    task="automatic-speech-recognition", model="openai/whisper-small"
)
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
```

Then I will get the following Error:

```bash
$ uv run ./main.py
Device set to use cpu
Traceback (most recent call last):
  File "C:\Users\user\transformers-debug\main.py", line 6, in <module>
    transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 275, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\transformers\pipelines\base.py", line 1459, in __call__
    return next(
           ^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 126, in __next__
    item = next(self.iterator)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 271, in __next__
    processed = self.infer(next(self.iterator), **self.params)
                           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 732, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torch\utils\data\dataloader.py", line 788, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 33, in fetch
    data.append(next(self.dataset_iter))
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 188, in __next__
    processed = next(self.subiterator)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 381, in preprocess
    import torchcodec
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\__init__.py", line 10, in <module>
    from . import decoders, samplers  # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\decoders\__init__.py", line 7, in <module>
    from .._core import AudioStreamMetadata, VideoStreamMetadata
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\_core\__init__.py", line 8, in <module>
    from ._metadata import (
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\_core\_metadata.py", line 16, in <module>
    from torchcodec._core.ops import (
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\_core\ops.py", line 84, in <module>
    load_torchcodec_shared_libraries()
  File "C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\_core\ops.py", line 69, in load_torchcodec_shared_libraries
    raise RuntimeError(
RuntimeError: Could not load libtorchcodec. Likely causes:
          1. FFmpeg is not properly installed in your environment. We support
             versions 4, 5, 6, and 7 on all platforms, and 8 on Mac and Linux.
          2. The PyTorch version (2.9.1+cpu) is not compatible with
             this version of TorchCodec. Refer to the version compatibility
             table:
             https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec.
          3. Another runtime dependency; see exceptions below.
        The following exceptions were raised as we tried to load libtorchcodec:

[start of libtorchcodec loading traceback]
FFmpeg version 8: Could not load this library: C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\libtorchcodec_core8.dll
FFmpeg version 7: Could not load this library: C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\libtorchcodec_core7.dll
FFmpeg version 6: Could not load this library: C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\libtorchcodec_core6.dll
FFmpeg version 5: Could not load this library: C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\libtorchcodec_core5.dll
FFmpeg version 4: Could not load this library: C:\Users\user\transformers-debug\.venv\Lib\site-packages\torchcodec\libtorchcodec_core4.dll
[end of libtorchcodec loading traceback].
```

I think the cause is, in Windows, even if ffmpeg and `torchcodec` are installed, just `import torchcodec` raises an error, without using conda or manually setting `os.add_dll_directory("path/to/ffmpeg/dll/dir)` for shared builds of ffmpeg.
So the following part raises an error (because torchcodec is available but importing yields an error):

https://github.com/huggingface/transformers/blob/cac0a28c83cf87b7a05495de3177099c635ba852/src/transformers/pipelines/automatic_speech_recognition.py#L375-L376

### Expected behavior

Raises no error. Actually, without `torchcodec`, we have
```bash
$ uv run ./main.py
Device set to use cpu
`return_token_timestamps` is deprecated for WhisperFeatureExtractor and will be removed in Transformers v5. Use `return_attention_mask` instead, as the number of frames can be inferred from it.
Using custom `forced_decoder_ids` from the (generation) config. This is deprecated in favor of the `task` and `language` flags/config options.
Transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English. This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. See https://github.com/huggingface/transformers/pull/28687 for more details.
```
which is fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatic Speech Recognition pipline raises error when torchcodec is installed but not valid #42499

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Automatic Speech Recognition pipline raises error when torchcodec is installed but not valid #42499

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions