More benchmark refactoring, more benchmarks. #337
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes the following changes:
DecordAccurate. This is because the API calls use accurate seeking.DecordAccurateBatch. This uses the batch APIs. We believe this is an accurate API.TorchCodecCoredecoder kind toTorchCodecCoreNonBatch.TorchCodecCore- while it has the same name as a previous decoder kind, it's using the best core API for each scenario. We can directly compare it toTorchCodecPublic. Any systematic difference is likely caused by the logic inVideoDecoderitself.timeitinside of the experiments. If we want that data, we should create separate experiments for it. In general, if we are going to do something N iterations, and then time how long the N iterators take, we can't also time each N iteration. We don't want the cost of the fine-grained timers to add to the overall time. If we want fine-grained timers, we can't time the batch. And if we time the batch, we can't do fine-grained timers.benchmark_decoders.pyso that we have a registry of decoder kinds, and we access that registry to know what to run. This eliminates a lot of the bespoke logic. Adding new decoder kinds is now easy: just make a new entry to the registry, and the rest of the code works. As a bonus, this unifies specifying and adding options for decoder kinds.The following results were run with:
These are four different calls of the above:
Some observations:
a.
DecordAccurate.b.
DecordAccurateBatch.c.
TorchVision.d.
TorchCodecCoreBatch.a.
TorchAudio.b.
TorchCodecCoreNonBatch.c.
TorchCodecCore.d.
TorchCodecPublic.TorchCodecCoreis consistently slightly faster thanTorchCodecPublic. This means we have an opportunity to shave off some time in the logic in the public API.TorchCodecCoreandTorchCodecPublicdisplay variation across runs, they notably always move together within a run. That is, ifTorchCodecCorehas a "good" run, then so doesTorchCodecPublic. That means there may be something systematic going on that determines if a run is "good" or not. Maybe something to do with how the video gets laid out in memory?TorchVisionis consistently the best performer in 100 next.