-
Notifications
You must be signed in to change notification settings - Fork 510
Description
I tried to train the model on my data set and got the following error:
UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at C:\actions-runner_work\pytorch\pytorch\pytorch\aten\src\ATen\Context.cpp:85.)
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 16 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
Unable to initialize TensorBoard. Logging is turned off for this session. Run 'pip install tensorboard' to enable logging.
Not using distributed mode
git:
sha: 443f480e2406840b1024296e5b4199c74a70a0d0, status: has uncommited changes, branch: mainNamespace(num_classes=2, grad_accum_steps=4, amp=True, lr=0.0001, lr_encoder=0.00015, batch_size=4, weight_decay=0.0001, epochs=10, lr_drop=100, clip_max_norm=0.1, lr_vit_layer_decay=0.8, lr_component_decay=0.7, do_benchmark=False, dropout=0, drop_path=0.0, drop_mode='standard', drop_schedule='constant', cutoff_epoch=0, pretrained_encoder=None, pretrain_weights='rf-detr-nano.pth', pretrain_exclude_keys=None, pretrain_keys_modify_to_load=None, pretrained_distiller=None, encoder='dinov2_windowed_small', vit_encoder_num_layers=12, window_block_indexes=None, position_embedding='sine', out_feature_indexes=[3, 6, 9, 12], freeze_encoder=False, layer_norm=True, rms_norm=False, backbone_lora=False, force_no_pretrain=False, dec_layers=2, dim_feedforward=2048, hidden_dim=256, sa_nheads=8, ca_nheads=16, num_queries=300, group_detr=13, two_stage=True, projector_scale=['P4'], lite_refpoint_refine=True, num_select=300, dec_n_points=2, decoder_norm='LN', bbox_reparam=True, freeze_batch_norm=False, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, cls_loss_coef=1.0, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, aux_loss=True, sum_group_losses=False, use_varifocal_loss=False, use_position_supervised_loss=False, ia_bce_loss=True, dataset_file='roboflow', coco_path=None, dataset_dir='C:\codes\obstacle_detector\data\dataset', square_resize_div_64=True, output_dir='output', dont_save_weights=False, checkpoint_interval=10, seed=42, resume='', start_epoch=0, eval=False, use_ema=True, ema_decay=0.993, ema_tau=100, num_workers=2, device='cuda', world_size=1, dist_url='env://', sync_bn=True, fp16_eval=False, encoder_only=False, backbone_only=False, resolution=384, use_cls_token=False, multi_scale=True, expanded_scales=True, do_random_resize_via_padding=False, warmup_epochs=0.0, lr_scheduler='step', lr_min_factor=0.0, early_stopping=False, early_stopping_patience=10, early_stopping_min_delta=0.001, early_stopping_use_ema=False, gradient_checkpointing=False, patch_size=16, num_windows=2, positional_encoding_size=24, mask_downsample_ratio=4, tensorboard=True, wandb=False, project=None, run=None, class_names=['obstacle'], run_test=True, segmentation_head=False, distributed=False)
number of params: 30147076
[544]
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[544]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[544]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Get benchmark
Start training
Grad accum steps: 4
Total batch size: 16
LENGTH OF DATA LOADER: 43
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "c:\codes\obstacle_detector\rfdetr\train.py", line 5, in
model.train(
File "C:\codes\obstacle_detector\rfdetr\venv\lib\site-packages\rfdetr\detr.py", line 83, in train
self.train_from_config(config, **kwargs)
File "C:\codes\obstacle_detector\rfdetr\venv\lib\site-packages\rfdetr\detr.py", line 191, in train_from_config
self.model.train(
File "C:\codes\obstacle_detector\rfdetr\venv\lib\site-packages\rfdetr\main.py", line 341, in train
train_stats = train_one_epoch(
File "C:\codes\obstacle_detector\rfdetr\venv\lib\site-packages\rfdetr\engine.py", line 88, in train_one_epoch
for data_iter_step, (samples, targets) in enumerate(
File "C:\codes\obstacle_detector\rfdetr\venv\lib\site-packages\rfdetr\util\misc.py", line 239, in log_every
for obj in iterable:
File "C:\codes\obstacle_detector\rfdetr\venv\lib\site-packages\torch\utils\data\dataloader.py", line 494, in iter
return self._get_iterator()
File "C:\codes\obstacle_detector\rfdetr\venv\lib\site-packages\torch\utils\data\dataloader.py", line 427, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\codes\obstacle_detector\rfdetr\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1170, in init
w.start()
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\0000018283959\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
The code I used was:
from rfdetr.detr import RFDETRNano
model = RFDETRNano(pretrain_weights='rf-detr-nano.pth', device='cuda')
model.train(
dataset_dir='dataset',
epochs=10,
batch_size=4,
grad_accum_steps=4,
lr=1e-4,
output_dir='output'
)
Environment:
Windows 11
rfdetr 1.3.0
torch 2.9.0+cu130
torchvision 0.24.0+cu130