Hi,
On the first run of the notebook "llama3_text2cypher_chat.ipynb" i got a CUDA Out of memory error when executing the cell with trainer.train()
I use it locally on a PC with a RTX 2070 (8GB of VRAM) and I use a quantized version with setting "load_in_4bit" to True in the first cells.
Only loading the model when doing "model, tokenizer = FastLanguageModel.from_pretained(...) already consumes 6GB of VRAM, even the 4bit quantized one.
Is it normal ? I really need to launch it on my machine and not on colab.
Thanks,