Skip to content

Multiple call to transforms warm_up() #2317

@vince62s

Description

@vince62s

@Zenglinxiao
When you implemented #1912 you added setstate / getstate logics for multiprocessing.

If I am not wrong and @anderleich / @panosk faced the same issue, here is what happening:

When using build_vocab.py there is a call to make_transforms() in the main process, and then we spawn n_threads. Because we pass the transforms created in main, the pickling/unpickling mechanism triggers another call to warm_up() in the __setstate__ hence we could avoid the first call to warm_up in the make_transforms.
Even when we use n_threads=1 we spawn another process so same behavior.

When we train the story is a little different.
If we use num_worker=0 the Dataloader is not used, everything is happening in the main process, hence calling warm_up is required somewhere (currently in the make_transforms of the build_dynamic_dataset_iter
If num_worker>0then we fall back in the same situation as in build_vocab.

What do you think should be the best approach to avoid double warm_up (which is quite annoying for some transforms that loads big stuff)

cc @francoishernandez

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions