-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
@Zenglinxiao
When you implemented #1912 you added setstate / getstate logics for multiprocessing.
If I am not wrong and @anderleich / @panosk faced the same issue, here is what happening:
When using build_vocab.py there is a call to make_transforms() in the main process, and then we spawn n_threads. Because we pass the transforms created in main, the pickling/unpickling mechanism triggers another call to warm_up() in the __setstate__ hence we could avoid the first call to warm_up in the make_transforms.
Even when we use n_threads=1 we spawn another process so same behavior.
When we train the story is a little different.
If we use num_worker=0 the Dataloader is not used, everything is happening in the main process, hence calling warm_up is required somewhere (currently in the make_transforms of the build_dynamic_dataset_iter
If num_worker>0then we fall back in the same situation as in build_vocab.
What do you think should be the best approach to avoid double warm_up (which is quite annoying for some transforms that loads big stuff)