-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Hi,
I'm trying to speed up training by using a multi-processing in the DataLoader, however this throws an error. Any insight would be appreciated.
Here is a snippet of code
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True)
valloader = torch.utils.data.DataLoader(val_dataset, batch_size=128, shuffle=True, num_workers = 4, pin_memory=True)
learning_rule = KrotovsRule()
optimizer = Local(named_params=model.named_parameters(), lr=0.01)
evaluator = HebbianEvaluator(model=model, score_name='accuracy',
score_function=lambda engine: engine.state.metrics['accuracy'], epochs=1, supervised_from=-1)
trainer = HebbianTrainer(model=model, learning_rule=learning_rule, optimizer=optimizer, supervised_from=-1, device='cuda')
evaluator.attach(trainer.engine, Events.EPOCH_COMPLETED(every=1), trainloader, valloader)
trainer.run(train_loader=trainloader, epochs=1)
Here is the error:
Traceback (most recent call last):
File "cornet_hebbian_training.py", line 84, in <module>
trainer.run(train_loader=valloader, epochs=1)
File "/home/vayzenbe/GitHub_Repos/GiNN/Models/pytorch_hebbian/trainers.py", line 31, in run
self.engine.run(train_loader, max_epochs=epochs)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 702, in run
return self._internal_run()
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 775, in _internal_run
self._handle_exception(e)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 469, in _handle_exception
raise e
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 745, in _internal_run
time_taken = self._run_once_on_dataset()
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 850, in _run_once_on_dataset
self._handle_exception(e)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 469, in _handle_exception
raise e
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 801, in _run_once_on_dataset
self.state.batch = next(self._dataloader_iter)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
data = self._next_data()
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/vayzenbe/GitHub_Repos/GiNN/Models/load_without_faces.py", line 62, in __getitem__
tensor_image = self.transform(image)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 60, in __call__
img = t(img)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 869, in forward
i, j, h, w = self.get_params(img, self.scale, self.ratio)
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 831, in get_params
log_ratio = torch.log(torch.tensor(ratio))
File "/home/vayzenbe/anaconda3/lib/python3.7/site-packages/torch/cuda/__init__.py", line 161, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Metadata
Metadata
Assignees
Labels
No labels