Skip to content

Conversation

@scotts
Copy link
Contributor

@scotts scotts commented Nov 5, 2024

This PR makes the following changes:

  1. Renames the existing Decord decoder kind to DecordAccurate. This is because the API calls use accurate seeking.
  2. Adds a new Decord decoder kind, DecordAccurateBatch. This uses the batch APIs. We believe this is an accurate API.
  3. Adds a Decord benchmark kind to the README graph.
  4. Renames the existing TorchCodecCore decoder kind to TorchCodecCoreNonBatch.
  5. Adds the decoder kind TorchCodecCore - while it has the same name as a previous decoder kind, it's using the best core API for each scenario. We can directly compare it to TorchCodecPublic. Any systematic difference is likely caused by the logic in VideoDecoder itself.
  6. Removes all of the fine-grained calls to timeit inside of the experiments. If we want that data, we should create separate experiments for it. In general, if we are going to do something N iterations, and then time how long the N iterators take, we can't also time each N iteration. We don't want the cost of the fine-grained timers to add to the overall time. If we want fine-grained timers, we can't time the batch. And if we time the batch, we can't do fine-grained timers.
  7. Refactors benchmark_decoders.py so that we have a registry of decoder kinds, and we access that registry to know what to run. This eliminates a lot of the bespoke logic. Adding new decoder kinds is now easy: just make a new entry to the registry, and the rest of the code works. As a bonus, this unifies specifying and adding options for decoder kinds.

The following results were run with:

python benchmarks/decoders/benchmark_decoders.py --bm_video_speed_min_run_seconds=20

These are four different calls of the above:

[--------- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------]
                                         |  uniform 10 seek()+next()  |  random 10 seek()+next()  |  1 next()  |  10 next()  |  100 next()  |  create()+next()
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------
      DecordAccurate                     |            53.1            |           119.0           |    12.8    |     16.1    |     68.6     |                 
      DecordAccurateBatch                |            52.5            |           117.6           |    13.0    |     16.1    |     47.9     |                 
      TorchAudio                         |           468.6            |           524.2           |     8.0    |     14.0    |     69.0     |                 
      TorchVision[backend=video_reader]  |           343.2            |           331.4           |    12.7    |     15.9    |     44.1     |              
      TorchCodecCoreNonBatch             |            47.4            |           109.7           |     9.6    |     12.6    |     42.5     |                 
      TorchCodecCoreBatch                |            49.8            |            44.1           |    11.7    |     14.6    |     61.9     |                 
      TorchCodecCore:                    |            49.2            |            44.1           |     9.5    |     12.7    |     43.2     |        9.5      
      TorchCodecCore:num_threads=1       |           111.9            |           102.8           |     6.9    |     11.9    |     53.9     |                 
      TorchCodecPublic                   |            50.5            |            44.3           |    11.6    |     14.9    |     45.3     |                 


      DecordAccurate                     |            53.4            |           119.2           |    12.9    |     16.7    |     68.0     |                 
      DecordAccurateBatch                |            52.8            |           119.1           |    13.1    |     16.2    |     48.0     |                 
      TorchAudio                         |           472.7            |           519.2           |     8.0    |     14.0    |     72.3     |                 
      TorchVision[backend=video_reader]  |           343.8            |           328.3           |    12.7    |     15.9    |     44.2     |                 
      TorchCodecCoreNonBatch             |            47.5            |           109.8           |     9.5    |     12.7    |     46.1     |                 
      TorchCodecCoreBatch                |            50.2            |            44.6           |    11.6    |     14.6    |     61.4     |                 
      TorchCodecCore:                    |            49.4            |            43.8           |     9.5    |     12.6    |     46.4     |        9.5      
      TorchCodecCore:num_threads=1       |           111.1            |           101.8           |     6.9    |     11.8    |     69.1     |                 
      TorchCodecPublic                   |            49.3            |            44.0           |    11.7    |     14.8    |     48.8     |                 


      DecordAccurate                     |            52.6            |           117.8           |    12.9    |     16.1    |     68.0     |                 
      DecordAccurateBatch                |            52.5            |           120.1           |    13.0    |     16.1    |     48.1     |                 
      TorchAudio                         |           470.9            |           520.7           |     7.9    |     15.4    |     77.1     |                 
      TorchVision[backend=video_reader]  |           351.0            |           329.0           |    12.7    |     16.1    |     44.3     |                 
      TorchCodecCoreNonBatch             |            47.3            |           109.3           |     9.5    |     12.6    |     49.7     |                 
      TorchCodecCoreBatch                |            49.7            |            44.0           |    11.6    |     14.8    |     61.6     |                 
      TorchCodecCore:                    |            50.2            |            43.8           |     9.5    |     12.7    |     49.8     |        9.5      
      TorchCodecCore:num_threads=1       |           111.9            |           102.1           |     6.9    |     11.8    |     61.4     |                 
      TorchCodecPublic                   |            49.6            |            44.4           |    11.8    |     15.0    |     53.8     | 
      
      DecordAccurate                     |            52.7            |           117.7           |    12.9    |     16.2    |     68.3     |                 
      DecordAccurateBatch                |            52.2            |           117.8           |    13.0    |     16.1    |     48.0     |                 
      TorchAudio                         |           468.1            |           515.4           |     7.9    |     15.3    |     84.1     |                 
      TorchVision[backend=video_reader]  |           348.0            |           330.3           |    12.7    |     16.0    |     50.7     |                 
      TorchCodecCoreNonBatch             |            47.9            |           109.9           |     9.6    |     12.6    |     57.6     |                 
      TorchCodecCoreBatch                |            49.7            |            44.2           |    11.6    |     14.8    |     61.6     |                 
      TorchCodecCore:                    |            50.1            |            44.7           |     9.6    |     12.7    |     57.8     |        9.5      
      TorchCodecCore:num_threads=1       |           111.5            |           102.5           |     6.9    |     11.8    |     69.3     |                 
      TorchCodecPublic                   |            49.9            |            44.4           |    11.8    |     14.9    |     60.7     |                 

Some observations:

  1. The sampler-inspired experiments (random and uniform) are remarkably consistent across all decoders.
  2. 1 next and 10 next are also remarkably consistent across all decoders.
  3. 100 next is consistent across:
    a. DecordAccurate.
    b. DecordAccurateBatch.
    c. TorchVision.
    d. TorchCodecCoreBatch.
  4. 100 next has remarkable variation across:
    a. TorchAudio.
    b. TorchCodecCoreNonBatch.
    c. TorchCodecCore.
    d. TorchCodecPublic.
  5. TorchCodecCore is consistently slightly faster than TorchCodecPublic. This means we have an opportunity to shave off some time in the logic in the public API.
  6. While both TorchCodecCore and TorchCodecPublic display variation across runs, they notably always move together within a run. That is, if TorchCodecCore has a "good" run, then so does TorchCodecPublic. That means there may be something systematic going on that determines if a run is "good" or not. Maybe something to do with how the video gets laid out in memory?
  7. TorchVision is consistently the best performer in 100 next.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 5, 2024
@scotts scotts force-pushed the readme_decord branch 2 times, most recently from 529fb38 to 6831c2d Compare November 6, 2024 05:58
@scotts scotts marked this pull request as ready for review November 6, 2024 06:00
@scotts scotts requested a review from ahmadsharif1 November 6, 2024 06:00
Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at everything in great details, but this all look sensible to me. Plot LG as well.

As a side note, I have also noticed high levels of variance when running the samplers benchmarks, both within and across benchmark runs.

@scotts scotts merged commit 8ac81b7 into meta-pytorch:main Nov 6, 2024
21 of 29 checks passed
@scotts scotts deleted the readme_decord branch November 6, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants