-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[Data] [stats] Add RefBundle retrieval time metric to iterator dataset stats #58422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] [stats] Add RefBundle retrieval time metric to iterator dataset stats #58422
Conversation
Signed-off-by: xgui <[email protected]>
Signed-off-by: xgui <[email protected]>
Signed-off-by: xgui <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new metric, iter_get_ref_bundles_s, to measure the time spent retrieving RefBundles during dataset iteration. The changes are well-integrated across the stats collection, reporting, and testing components. I've found one area for improvement to ensure the new metric is captured consistently.
Signed-off-by: Xinyuan <[email protected]>
Signed-off-by: xgui <[email protected]>
srinathk10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…t stats (ray-project#58422) ## Why These Changes Are Needed This PR adds a new metric to track the time spent retrieving `RefBundle` objects during dataset iteration. This metric provides better visibility into the performance breakdown of batch iteration, specifically capturing the time spent in `get_next_ref_bundle()` calls within the `prefetch_batches_locally` function. ## Related Issue Number N/A ## Example ``` dataloader/train = {'producer_throughput': 8361.841782656593, 'iter_stats': {'prefetch_block-avg': inf, 'prefetch_block-min': inf, 'prefetch_block-max': 0, 'prefetch_block-total': 0, 'get_ref_bundles-avg': 0.05172277254545271, 'get_ref_bundles-min': 1.1991999997462699e-05, 'get_ref_bundles-max': 11.057470971999976, 'get_ref_bundles-total': 15.361663445999454, 'fetch_block-avg': 0.31572694455743233, 'fetch_block-min': 0.0006362799999806157, 'fetch_block-max': 2.1665870369999993, 'fetch_block-total': 93.45517558899996, 'block_to_batch-avg': 0.001048687573988573, 'block_to_batch-min': 2.10620000302697e-05, 'block_to_batch-max': 0.049948245999985375, 'block_to_batch-total': 2.048086831999683, 'format_batch-avg': 0.0001013781433686053, 'format_batch-min': 1.415700000961806e-05, 'format_batch-max': 0.009682661999988795, 'format_batch-total': 0.19799151399888615, 'collate-avg': 0.01303446213312943, 'collate-min': 0.00025646699998560507, 'collate-max': 0.9855495820000328, 'collate-total': 25.456304546001775, 'finalize-avg': 0.012211385266257683, 'finalize-min': 0.004209667999987232, 'finalize-max': 0.3785081949999949, 'finalize-total': 23.848835425001255, 'time_spent_blocked-avg': 0.04783407008137157, 'time_spent_blocked-min': 1.2316999971062614e-05, 'time_spent_blocked-max': 12.46102861700001, 'time_spent_blocked-total': 93.46777293900004, 'time_spent_training-avg': 0.015053571562211652, 'time_spent_training-min': 1.3704999958008557e-05, 'time_spent_training-max': 1.079616685000019, 'time_spent_training-total': 29.399625260999358}} ``` ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: xgui <[email protected]> Signed-off-by: Xinyuan <[email protected]>
Why These Changes Are Needed
This PR adds a new metric to track the time spent retrieving
RefBundleobjects during dataset iteration. This metric provides better visibility into the performance breakdown of batch iteration, specifically capturing the time spent inget_next_ref_bundle()calls within theprefetch_batches_locallyfunction.Related Issue Number
N/A
Example
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.