More benchmark refactoring, more benchmarks. #337

scotts · 2024-11-05T21:10:33Z

This PR makes the following changes:

Renames the existing Decord decoder kind to DecordAccurate. This is because the API calls use accurate seeking.
Adds a new Decord decoder kind, DecordAccurateBatch. This uses the batch APIs. We believe this is an accurate API.
Adds a Decord benchmark kind to the README graph.
Renames the existing TorchCodecCore decoder kind to TorchCodecCoreNonBatch.
Adds the decoder kind TorchCodecCore - while it has the same name as a previous decoder kind, it's using the best core API for each scenario. We can directly compare it to TorchCodecPublic. Any systematic difference is likely caused by the logic in VideoDecoder itself.
Removes all of the fine-grained calls to timeit inside of the experiments. If we want that data, we should create separate experiments for it. In general, if we are going to do something N iterations, and then time how long the N iterators take, we can't also time each N iteration. We don't want the cost of the fine-grained timers to add to the overall time. If we want fine-grained timers, we can't time the batch. And if we time the batch, we can't do fine-grained timers.
Refactors benchmark_decoders.py so that we have a registry of decoder kinds, and we access that registry to know what to run. This eliminates a lot of the bespoke logic. Adding new decoder kinds is now easy: just make a new entry to the registry, and the rest of the code works. As a bonus, this unifies specifying and adding options for decoder kinds.

The following results were run with:

python benchmarks/decoders/benchmark_decoders.py --bm_video_speed_min_run_seconds=20

These are four different calls of the above:

[--------- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------]
                                         |  uniform 10 seek()+next()  |  random 10 seek()+next()  |  1 next()  |  10 next()  |  100 next()  |  create()+next()
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------
      DecordAccurate                     |            53.1            |           119.0           |    12.8    |     16.1    |     68.6     |                 
      DecordAccurateBatch                |            52.5            |           117.6           |    13.0    |     16.1    |     47.9     |                 
      TorchAudio                         |           468.6            |           524.2           |     8.0    |     14.0    |     69.0     |                 
      TorchVision[backend=video_reader]  |           343.2            |           331.4           |    12.7    |     15.9    |     44.1     |              
      TorchCodecCoreNonBatch             |            47.4            |           109.7           |     9.6    |     12.6    |     42.5     |                 
      TorchCodecCoreBatch                |            49.8            |            44.1           |    11.7    |     14.6    |     61.9     |                 
      TorchCodecCore:                    |            49.2            |            44.1           |     9.5    |     12.7    |     43.2     |        9.5      
      TorchCodecCore:num_threads=1       |           111.9            |           102.8           |     6.9    |     11.9    |     53.9     |                 
      TorchCodecPublic                   |            50.5            |            44.3           |    11.6    |     14.9    |     45.3     |                 


      DecordAccurate                     |            53.4            |           119.2           |    12.9    |     16.7    |     68.0     |                 
      DecordAccurateBatch                |            52.8            |           119.1           |    13.1    |     16.2    |     48.0     |                 
      TorchAudio                         |           472.7            |           519.2           |     8.0    |     14.0    |     72.3     |                 
      TorchVision[backend=video_reader]  |           343.8            |           328.3           |    12.7    |     15.9    |     44.2     |                 
      TorchCodecCoreNonBatch             |            47.5            |           109.8           |     9.5    |     12.7    |     46.1     |                 
      TorchCodecCoreBatch                |            50.2            |            44.6           |    11.6    |     14.6    |     61.4     |                 
      TorchCodecCore:                    |            49.4            |            43.8           |     9.5    |     12.6    |     46.4     |        9.5      
      TorchCodecCore:num_threads=1       |           111.1            |           101.8           |     6.9    |     11.8    |     69.1     |                 
      TorchCodecPublic                   |            49.3            |            44.0           |    11.7    |     14.8    |     48.8     |                 


      DecordAccurate                     |            52.6            |           117.8           |    12.9    |     16.1    |     68.0     |                 
      DecordAccurateBatch                |            52.5            |           120.1           |    13.0    |     16.1    |     48.1     |                 
      TorchAudio                         |           470.9            |           520.7           |     7.9    |     15.4    |     77.1     |                 
      TorchVision[backend=video_reader]  |           351.0            |           329.0           |    12.7    |     16.1    |     44.3     |                 
      TorchCodecCoreNonBatch             |            47.3            |           109.3           |     9.5    |     12.6    |     49.7     |                 
      TorchCodecCoreBatch                |            49.7            |            44.0           |    11.6    |     14.8    |     61.6     |                 
      TorchCodecCore:                    |            50.2            |            43.8           |     9.5    |     12.7    |     49.8     |        9.5      
      TorchCodecCore:num_threads=1       |           111.9            |           102.1           |     6.9    |     11.8    |     61.4     |                 
      TorchCodecPublic                   |            49.6            |            44.4           |    11.8    |     15.0    |     53.8     | 
      
      DecordAccurate                     |            52.7            |           117.7           |    12.9    |     16.2    |     68.3     |                 
      DecordAccurateBatch                |            52.2            |           117.8           |    13.0    |     16.1    |     48.0     |                 
      TorchAudio                         |           468.1            |           515.4           |     7.9    |     15.3    |     84.1     |                 
      TorchVision[backend=video_reader]  |           348.0            |           330.3           |    12.7    |     16.0    |     50.7     |                 
      TorchCodecCoreNonBatch             |            47.9            |           109.9           |     9.6    |     12.6    |     57.6     |                 
      TorchCodecCoreBatch                |            49.7            |            44.2           |    11.6    |     14.8    |     61.6     |                 
      TorchCodecCore:                    |            50.1            |            44.7           |     9.6    |     12.7    |     57.8     |        9.5      
      TorchCodecCore:num_threads=1       |           111.5            |           102.5           |     6.9    |     11.8    |     69.3     |                 
      TorchCodecPublic                   |            49.9            |            44.4           |    11.8    |     14.9    |     60.7     |

Some observations:

The sampler-inspired experiments (random and uniform) are remarkably consistent across all decoders.
1 next and 10 next are also remarkably consistent across all decoders.
100 next is consistent across:
a. DecordAccurate.
b. DecordAccurateBatch.
c. TorchVision.
d. TorchCodecCoreBatch.
100 next has remarkable variation across:
a. TorchAudio.
b. TorchCodecCoreNonBatch.
c. TorchCodecCore.
d. TorchCodecPublic.
TorchCodecCore is consistently slightly faster than TorchCodecPublic. This means we have an opportunity to shave off some time in the logic in the public API.
While both TorchCodecCore and TorchCodecPublic display variation across runs, they notably always move together within a run. That is, if TorchCodecCore has a "good" run, then so does TorchCodecPublic. That means there may be something systematic going on that determines if a run is "good" or not. Maybe something to do with how the video gets laid out in memory?
TorchVision is consistently the best performer in 100 next.

NicolasHug

I didn't look at everything in great details, but this all look sensible to me. Plot LG as well.

As a side note, I have also noticed high levels of variance when running the samplers benchmarks, both within and across benchmark runs.

benchmarks/decoders/benchmark_decoders.py

scotts added 3 commits November 5, 2024 13:07

More benchmark refactoring

5937eac

Merge branch 'main' of github.com:pytorch/torchcodec into readme_decord

6601d50

Fix formatting

6ddfa6c

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 5, 2024

scotts force-pushed the readme_decord branch 2 times, most recently from 529fb38 to 6831c2d Compare November 6, 2024 05:58

scotts marked this pull request as ready for review November 6, 2024 06:00

scotts requested a review from ahmadsharif1 November 6, 2024 06:00

NicolasHug approved these changes Nov 6, 2024

View reviewed changes

ahmadsharif1 approved these changes Nov 6, 2024

View reviewed changes

benchmarks/decoders/benchmark_decoders.py Outdated Show resolved Hide resolved

Benchmark code formatting, more refactoring

26c9ccb

scotts force-pushed the readme_decord branch from 6831c2d to 26c9ccb Compare November 6, 2024 15:46

scotts merged commit 8ac81b7 into meta-pytorch:main Nov 6, 2024
21 of 29 checks passed

scotts deleted the readme_decord branch November 6, 2024 15:59

scotts mentioned this pull request Nov 6, 2024

Resolve sampler benchmark variability with setting random seed #340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More benchmark refactoring, more benchmarks. #337

More benchmark refactoring, more benchmarks. #337

Uh oh!

scotts commented Nov 5, 2024 •

edited

Loading

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

More benchmark refactoring, more benchmarks. #337

More benchmark refactoring, more benchmarks. #337

Uh oh!

Conversation

scotts commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

scotts commented Nov 5, 2024 •

edited

Loading