-
Notifications
You must be signed in to change notification settings - Fork 74
Add video encoding tutorial doc #1063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
ba3cbbf
d5be152
fd59e4c
3eaee28
9bbeb1f
49a6614
1bcb9ce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,287 @@ | ||||||
| # Copyright (c) Meta Platforms, Inc. and affiliates. | ||||||
| # All rights reserved. | ||||||
| # | ||||||
| # This source code is licensed under the BSD-style license found in the | ||||||
| # LICENSE file in the root directory of this source tree. | ||||||
|
|
||||||
| """ | ||||||
| ======================================= | ||||||
| Encoding video frames with VideoEncoder | ||||||
| ======================================= | ||||||
| In this example, we'll learn how to encode video frames to a file or to raw | ||||||
| bytes using the :class:`~torchcodec.encoders.VideoEncoder` class. | ||||||
| """ | ||||||
|
|
||||||
| # %% | ||||||
| # First, we'll download a video and decode some frames to tensors. | ||||||
| # These will be the input to the :class:`~torchcodec.encoders.VideoEncoder`. For more details on decoding, | ||||||
| # see :ref:`sphx_glr_generated_examples_decoding_basic_example.py`. | ||||||
| # Otherwise, skip ahead to :ref:`creating_encoder`. | ||||||
|
|
||||||
| import requests | ||||||
| from torchcodec.decoders import VideoDecoder | ||||||
| from IPython.display import Video | ||||||
|
|
||||||
|
|
||||||
| def play_video(encoded_bytes): | ||||||
| return Video( | ||||||
| data=encoded_bytes.numpy().tobytes(), | ||||||
| embed=True, | ||||||
| width=640, | ||||||
| height=360, | ||||||
| mimetype="video/mp4", | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| # Video source: https://www.pexels.com/video/adorable-cats-on-the-lawn-4977395/ | ||||||
| # Author: Altaf Shah. | ||||||
| url = "https://videos.pexels.com/video-files/4977395/4977395-hd_1920_1080_24fps.mp4" | ||||||
|
|
||||||
| response = requests.get(url, headers={"User-Agent": ""}) | ||||||
| if response.status_code != 200: | ||||||
| raise RuntimeError(f"Failed to download video. {response.status_code = }.") | ||||||
|
|
||||||
| raw_video_bytes = response.content | ||||||
|
|
||||||
| decoder = VideoDecoder(raw_video_bytes) | ||||||
| frames = decoder.get_frames_in_range(0, 60).data # Get first 60 frames | ||||||
| # TODO: use float once other PR lands | ||||||
| frame_rate = int(decoder.metadata.average_fps) | ||||||
|
|
||||||
| # %% | ||||||
| # .. _creating_encoder: | ||||||
| # | ||||||
| # Creating an encoder | ||||||
| # ------------------- | ||||||
| # | ||||||
| # Let's instantiate a :class:`~torchcodec.encoders.VideoEncoder`. We will need to provide | ||||||
| # the frames to be encoded as a 4D tensor of shape | ||||||
| # ``(num_frames, num_channels, height, width)`` with values in the ``[0, 255]`` | ||||||
| # range and ``torch.uint8`` dtype. We will also need to provide the frame rate of the input | ||||||
| # video. | ||||||
| # | ||||||
| # .. note:: | ||||||
| # | ||||||
| # The ``frame_rate`` parameter corresponds to the frame rate of the | ||||||
| # *input* video. It will also be used for the frame rate of the *output* encoded video. | ||||||
| from torchcodec.encoders import VideoEncoder | ||||||
|
|
||||||
| print(f"{frames.shape = }, {frames.dtype = }") | ||||||
| print(f"{frame_rate = } fps") | ||||||
|
|
||||||
| encoder = VideoEncoder(frames=frames, frame_rate=frame_rate) | ||||||
|
|
||||||
| # %% | ||||||
| # Encoding to file, bytes, or file-like | ||||||
| # ------------------------------------- | ||||||
| # | ||||||
| # :class:`~torchcodec.encoders.VideoEncoder` supports encoding frames into a | ||||||
| # file via the :meth:`~torchcodec.encoders.VideoEncoder.to_file` method, to | ||||||
| # file-like objects via the :meth:`~torchcodec.encoders.VideoEncoder.to_file_like` | ||||||
| # method, or to raw bytes via :meth:`~torchcodec.encoders.VideoEncoder.to_tensor`. | ||||||
| # For now we will use :meth:`~torchcodec.encoders.VideoEncoder.to_tensor`, so we | ||||||
| # can easily inspect and display the encoded video. | ||||||
|
|
||||||
| encoded_frames = encoder.to_tensor(format="mp4") | ||||||
| play_video(encoded_frames) | ||||||
|
|
||||||
| # %% | ||||||
| # | ||||||
| # Now that we have encoded data, we can decode it back to verify the | ||||||
| # round-trip encode/decode process works as expected: | ||||||
|
|
||||||
| decoder_verify = VideoDecoder(encoded_frames) | ||||||
| decoded_frames = decoder_verify.get_frames_in_range(0, 60).data | ||||||
|
|
||||||
| print(f"Re-decoded video: {decoded_frames.shape = }") | ||||||
| print(f"Original frames: {frames.shape = }") | ||||||
|
|
||||||
| # %% | ||||||
| # .. _codec_selection: | ||||||
| # | ||||||
| # Codec Selection | ||||||
| # --------------- | ||||||
| # | ||||||
| # By default, the codec used is selected automatically using the file extension provided | ||||||
| # in the ``dest`` parameter for the :meth:`~torchcodec.encoders.VideoEncoder.to_file` method, | ||||||
| # or using the ``format`` parameter for the | ||||||
| # :meth:`~torchcodec.encoders.VideoEncoder.to_file_like` and | ||||||
| # :meth:`~torchcodec.encoders.VideoEncoder.to_tensor` methods. | ||||||
| # | ||||||
| # For example, when encoding to MP4 format, the default codec is typically ``H.264``. | ||||||
| # | ||||||
| # To use a codec other than the default, use the ``codec`` parameter. | ||||||
| # You can specify either a specific codec implementation (e.g., ``"libx264"``) | ||||||
| # or a codec specification (e.g., ``"h264"``). Different codecs offer | ||||||
| # different tradeoffs between quality, file size, and encoding speed. | ||||||
| # | ||||||
| # .. note:: | ||||||
| # | ||||||
| # To see available encoders on your system, run ``ffmpeg -encoders``. | ||||||
| # | ||||||
| # Let's encode the same frames using different codecs: | ||||||
|
|
||||||
| import tempfile | ||||||
| from pathlib import Path | ||||||
|
|
||||||
| # H.264 encoding | ||||||
| h264_output = tempfile.NamedTemporaryFile(suffix=".mp4", delete=False).name | ||||||
| encoder.to_file(h264_output, codec="libx264") | ||||||
|
|
||||||
| # H.265 encoding | ||||||
| hevc_output = tempfile.NamedTemporaryFile(suffix=".mp4", delete=False).name | ||||||
| encoder.to_file(hevc_output, codec="hevc") | ||||||
|
|
||||||
| # Now let's use ffprobe to verify the codec used in the output files | ||||||
| import subprocess | ||||||
|
|
||||||
| for output, name in [(h264_output, "h264_output"), (hevc_output, "hevc_output")]: | ||||||
| result = subprocess.run( | ||||||
| [ | ||||||
| "ffprobe", | ||||||
| "-v", | ||||||
| "error", | ||||||
| "-select_streams", | ||||||
| "v:0", | ||||||
| "-show_entries", | ||||||
| "stream=codec_name", | ||||||
| "-of", | ||||||
| "default=noprint_wrappers=1:nokey=1", | ||||||
| output, | ||||||
| ], | ||||||
| capture_output=True, | ||||||
| text=True, | ||||||
| ) | ||||||
| print(f"Codec used in {name}: {result.stdout.strip()}") | ||||||
|
|
||||||
| # %% | ||||||
| # For most cases, you can simply specify the format parameter and let the FFmpeg select the default codec. | ||||||
| # However, specifying the codec parameter is useful to select a particular codec implementation | ||||||
| # (``libx264`` vs ``libx265``) or to have more control over the encoding behavior. | ||||||
|
||||||
|
|
||||||
| # %% | ||||||
| # .. _pixel_format: | ||||||
| # | ||||||
| # Pixel Format | ||||||
| # ------------ | ||||||
| # | ||||||
| # The ``pixel_format`` parameter controls the color sampling (chroma subsampling) | ||||||
| # of the output video. This affects both quality and file size. | ||||||
| # | ||||||
| # Common pixel formats: | ||||||
| # | ||||||
| # - ``"yuv420p"`` - 4:2:0 chroma subsampling (standard quality, smaller file size, widely compatible) | ||||||
| # - ``"yuv444p"`` - 4:4:4 chroma subsampling (full chroma resolution, higher quality, larger file size) | ||||||
| # | ||||||
| # Most playback devices and platforms support ``yuv420p``, making it the most | ||||||
| # common choice for video encoding. | ||||||
| # | ||||||
| # .. note:: | ||||||
| # | ||||||
| # Pixel format support depends on the codec used. Use ``ffmpeg -h encoder=<codec_name>`` | ||||||
| # to check available options for your selected codec. | ||||||
|
|
||||||
| # Standard pixel format | ||||||
| yuv420_encoded_frames = encoder.to_tensor( | ||||||
| format="mp4", codec="libx264", pixel_format="yuv420p" | ||||||
| ) | ||||||
| play_video(yuv420_encoded_frames) | ||||||
|
|
||||||
| # %% | ||||||
| # .. _crf: | ||||||
| # | ||||||
| # CRF (Constant Rate Factor) | ||||||
| # -------------------------- | ||||||
| # | ||||||
| # The ``crf`` parameter controls video quality, where lower values produce higher quality output. | ||||||
| # | ||||||
| # For example, with the commonly used H.264 codec, ``libx264``: | ||||||
| # | ||||||
| # - Values range from 0 (lossless) to 51 (worst quality) | ||||||
| # - Values 17 or 18 are conisdered visually lossless, and the default is 23. | ||||||
|
||||||
| # - Values 17 or 18 are conisdered visually lossless, and the default is 23. | |
| # - Values 17 or 18 are considered visually lossless, and the default is 23. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well done, it's really cool to visually see the effect it has on quality!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I came here to say the same thing. :)
NicolasHug marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a nitpick, but I think we don't need bullet points here.
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -51,18 +51,23 @@ def to_file( | |||||
| codec (str, optional): The codec to use for encoding (e.g., "libx264", | ||||||
| "h264"). If not specified, the default codec | ||||||
| for the container format will be used. | ||||||
| See :ref:`codec_selection` for details. | ||||||
| pixel_format (str, optional): The pixel format for encoding (e.g., | ||||||
| "yuv420p", "yuv444p"). If not specified, uses codec's default format. | ||||||
| See :ref:`pixel_format` for details. | ||||||
| crf (int or float, optional): Constant Rate Factor for encoding quality. Lower values | ||||||
| mean better quality. Valid range depends on the encoder (commonly 0-51). | ||||||
|
||||||
| mean better quality. Valid range depends on the encoder (commonly 0-51). | |
| mean better quality. Valid range depends on the encoder (e.g. 0-51 for libx264). |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in the other methods. I suggest this because it wasn't immediately obvious to me what "compression" meant in this case.
| encoding speed and compression. Valid values depend on the encoder (commonly | |
| encoding speed and compression (output size). Valid values depend on the encoder (commonly |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this, I think this is an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not edit the docstrings, as I expect they can be useful as a quick reference to valid values. I am open to suggestions on this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings look good, I left minor suggestions above. I agree it is super valuable to have short descriptions of valid values in there.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an excellent place to explain that the
formatparameter selects the default codec - we can also briefly explain the difference between, say, an mp4 video file and the actual codec used to decode and encode the video streams in that file. If this is well explained in any externall FFmpeg docs, we can link to those as well.That then sets us up for the next section, as the natural next question a reader may have is, what if I don't want the default codec?
At the end of the "Codec Selection" section, we should give some guidance on when to just use
formatand when to specifycodecas well. Nothing elaborate, just a sentence or two. I think that will go a long way to informing our about the relationship between these two options.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions, I added some brief guidance on
codecvsformatat the end.I drafted text to explain the difference between container-format and codec, but I am worried it dilutes the "Codec Selection" section with text that is not specific to the API. I would be happy to add a link, but I was not able to find useful FFmpeg docs on this subject.