Multiple call to transforms warm_up()

@Zenglinxiao 
When you implemented #1912 you added setstate / getstate logics for multiprocessing.

If I am not wrong and @anderleich / @panosk faced the same issue, here is what happening:

When using build_vocab.py there is a call to make_transforms() in the main process, and then we spawn n_threads. Because we pass the `transforms` created in main, the pickling/unpickling mechanism triggers another call to warm_up() in the `__setstate__` hence we could avoid the first call to `warm_up` in the `make_transforms`.
Even when we use `n_threads=1` we spawn another process so same behavior.

When we train the story is a little different.
If we use `num_worker=0` the Dataloader is not used, everything is happening in the main process, hence calling `warm_up` is required somewhere (currently in the `make_transforms` of the `build_dynamic_dataset_iter`
If `num_worker>0`then we fall back in the same situation as in build_vocab.

What do you think should be the best approach to avoid double `warm_up` (which is quite annoying for some transforms that loads big stuff)

cc @francoishernandez 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple call to transforms warm_up() #2317

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple call to transforms warm_up() #2317

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions