Skip to content

doesn't work with multi-process DataLoader #4

@vayzenb

Description

@vayzenb

Hi,

I'm trying to speed up training by using a multi-processing in the DataLoader, however this throws an error. Any insight would be appreciated.

Here is a snippet of code

trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True)
valloader = torch.utils.data.DataLoader(val_dataset, batch_size=128, shuffle=True, num_workers = 4, pin_memory=True)

learning_rule = KrotovsRule()
optimizer = Local(named_params=model.named_parameters(), lr=0.01)

evaluator = HebbianEvaluator(model=model, score_name='accuracy',
                                score_function=lambda engine: engine.state.metrics['accuracy'], epochs=1, supervised_from=-1)

trainer = HebbianTrainer(model=model, learning_rule=learning_rule, optimizer=optimizer, supervised_from=-1, device='cuda')

evaluator.attach(trainer.engine, Events.EPOCH_COMPLETED(every=1), trainloader, valloader)

trainer.run(train_loader=trainloader, epochs=1)

Here is the error:

Traceback (most recent call last):
  File "cornet_hebbian_training.py", line 84, in <module>
    trainer.run(train_loader=valloader, epochs=1)
  File "/home/vayzenbe/GitHub_Repos/GiNN/Models/pytorch_hebbian/trainers.py", line 31, in run
    self.engine.run(train_loader, max_epochs=epochs)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 702, in run
    return self._internal_run()
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 775, in _internal_run
    self._handle_exception(e)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 469, in _handle_exception
    raise e
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 745, in _internal_run
    time_taken = self._run_once_on_dataset()
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 850, in _run_once_on_dataset
    self._handle_exception(e)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 469, in _handle_exception
    raise e
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 801, in _run_once_on_dataset
    self.state.batch = next(self._dataloader_iter)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
    return self._process_data(data)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/vayzenbe/GitHub_Repos/GiNN/Models/load_without_faces.py", line 62, in __getitem__
    tensor_image = self.transform(image)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 60, in __call__
    img = t(img)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 869, in forward
    i, j, h, w = self.get_params(img, self.scale, self.ratio)
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 831, in get_params
    log_ratio = torch.log(torch.tensor(ratio))
  File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 161, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions