@@ -429,30 +429,32 @@ class VideoDecoder {
429429// MaybePermuteHWC2CHW().
430430//
431431// Also, importantly, the way we figure out the the height and width of the
432- // output frame varies and depends on the decoding entry-point:
433- // - In all cases, if the user requested specific height and width from the
434- // options, we honor that. Otherwise we fall into one of the categories below.
435- // - In Batch decoding APIs (e.g. getFramesAtIndices), we get height and width
436- // from the stream metadata, which itself got its value from the CodecContext,
437- // when the stream was added.
438- // - In single frames APIs:
439- // - On CPU we get height and width from the AVFrame.
440- // - On GPU, we get height and width from the metadata (same as batch APIs)
432+ // output frame tensor varies, and depends on the decoding entry-point. In
433+ // *decreasing order of accuracy*, we use the following sources for determining
434+ // height and width:
435+ // - getHeightAndWidthFromResizedAVFrame(). This is the height and width of the
436+ // AVframe, *post*-resizing. This is only used for single-frame decoding APIs,
437+ // on CPU, with filtergraph.
438+ // - getHeightAndWidthFromOptionsOrAVFrame(). This is the height and width from
439+ // the user-specified options if they exist, or the height and width of the
440+ // AVFrame *before* it is resized. In theory, i.e. if there are no bugs within
441+ // our code or within FFmpeg code, this should be exactly the same as
442+ // getHeightAndWidthFromResizedAVFrame(). This is used by single-frame
443+ // decoding APIs, on CPU, with swscale.
444+ // - getHeightAndWidthFromOptionsOrMetadata(). This is the height and width from
445+ // the user-specified options if they exist, or the height and width form the
446+ // stream metadata, which itself got its value from the CodecContext, when the
447+ // stream was added. This is used by batch decoding APIs, or by GPU-APIs (both
448+ // batch and single-frames).
441449//
442- // These 2 strategies are encapsulated within
443- // getHeightAndWidthFromOptionsOrMetadata() and
444- // getHeightAndWidthFromOptionsOrAVFrame(). The reason they exist is to make it
445- // very obvious which logic is used in which place, and they allow for `git
446- // grep`ing.
447- //
448- // The source of truth for height and width really is the AVFrame: it's the
449- // decoded ouptut from FFmpeg. The info from the metadata (i.e. from the
450- // CodecContext) may not be as accurate. However, the AVFrame is only available
451- // late in the call stack, when the frame is decoded, while the CodecContext is
452- // available early when a stream is added. This is why we use the CodecContext
453- // for pre-allocating batched output tensors (we could pre-allocate those only
454- // once we decode the first frame to get the info frame the AVFrame, but that's
455- // a more complex logic).
450+ // The source of truth for height and width really is the (resized) AVFrame:
451+ // it's the decoded ouptut from FFmpeg. The info from the metadata (i.e. from
452+ // the CodecContext) may not be as accurate. However, the AVFrame is only
453+ // available late in the call stack, when the frame is decoded, while the
454+ // CodecContext is available early when a stream is added. This is why we use
455+ // the CodecContext for pre-allocating batched output tensors (we could
456+ // pre-allocate those only once we decode the first frame to get the info frame
457+ // the AVFrame, but that's a more complex logic).
456458//
457459// Because the sources for height and width may disagree, we may end up with
458460// conflicts: e.g. if we pre-allocate a batch output tensor based on the
@@ -466,6 +468,10 @@ struct FrameDims {
466468 FrameDims (int h, int w) : height(h), width(w) {}
467469};
468470
471+ // There's nothing preventing you from calling this on a non-resized frame, but
472+ // please don't.
473+ FrameDims getHeightAndWidthFromResizedAVFrame (const AVFrame& resizedAVFrame);
474+
469475FrameDims getHeightAndWidthFromOptionsOrMetadata (
470476 const VideoDecoder::VideoStreamDecoderOptions& options,
471477 const VideoDecoder::StreamMetadata& metadata);
0 commit comments