issue in inference_s2s_batch.sh

#######Thank you for your help in resolving the earlier issues! However, I'm now facing a new problem during inference:

Generating:   0%|                                                                                                                                                                         | 0/3000 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
Generating:  16%|████████████████████████▊                                                                                                                                      | 469/3000 [00:24<02:12, 19.07it/s]
[2025-03-31 20:48:37][root][INFO] - LLM Inference Time: 25.14s
Error executing job with overrides: ['++model_config.llm_name=qwen2-0.5b', '++model_config.llm_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/Qwen2-0.5B', '++model_config.llm_dim=896', '++model_config.encoder_name=whisper', '++model_config.encoder_projector_ds_rate=5', '++model_config.encoder_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/small.pt', '++model_config.encoder_dim=768', '++model_config.encoder_projector=linear', '++model_config.codec_decoder_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/pretrained_models/CosyVoice-300M-SFT', '++model_config.codec_decode=true', '++model_config.vocab_config.code_layer=3', '++model_config.vocab_config.total_audio_vocabsize=4160', '++model_config.vocab_config.total_vocabsize=156160', '++model_config.code_type=CosyVoice', '++model_config.codec_decoder_type=CosyVoice', '++model_config.group_decode=true', '++model_config.group_decode_adapter_type=linear', '++dataset_config.dataset=speech_dataset_s2s', '++dataset_config.val_data_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni/data/dev_manifest.jsonl', '++dataset_config.train_data_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/Dataset/VoiceAssistant-400K-SLAM-Omni/data/dev_manifest.jsonl', '++dataset_config.input_type=mel', '++dataset_config.mel_size=80', '++dataset_config.inference_mode=true', '++dataset_config.manifest_format=jsonl', '++dataset_config.split_size=0.002', '++dataset_config.load_from_cache_file=false', '++dataset_config.task_type=s2s', '++dataset_config.seed=777', '++dataset_config.vocab_config.code_layer=3', '++dataset_config.vocab_config.total_audio_vocabsize=4160', '++dataset_config.vocab_config.total_vocabsize=156160', '++dataset_config.code_type=CosyVoice', '++dataset_config.num_latency_tokens=0', '++dataset_config.do_layershift=false', '++train_config.model_name=s2s', '++train_config.freeze_encoder=true', '++train_config.freeze_llm=true', '++train_config.freeze_encoder_projector=true', '++train_config.freeze_group_decode_adapter=true', '++train_config.batching_strategy=custom', '++train_config.num_epochs=1', '++train_config.val_batch_size=1', '++train_config.num_workers_dataloader=2', '++train_config.task_type=s2s', '++decode_config.text_repetition_penalty=1.2', '++decode_config.audio_repetition_penalty=1.2', '++decode_config.max_new_tokens=3000', '++decode_config.task_type=s2s', '++decode_config.do_sample=false', '++decode_config.top_p=1.0', '++decode_config.top_k=0', '++decode_config.temperature=1.0', '++decode_config.decode_text_only=false', '++decode_config.do_layershift=false', '++decode_log=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/Qwen2-0.5b-whisper_small-latency0-group3-single-round-English-20250201T121121Z-002/Qwen2-0.5b-whisper_small-latency0-group3-single-round-English/s2s_decode__trp1.2_arp1.2_seed777_greedy', '++decode_config.num_latency_tokens=0', '++ckpt_path=/DATA/Lalaram/SLAM_omni/SLAM-LLM/models/Qwen2-0.5b-whisper_small-latency0-group3-single-round-English-20250201T121121Z-002/Qwen2-0.5b-whisper_small-latency0-group3-single-round-English/model.pt', '++output_text_only=false', '++inference_online=false', '++speech_sample_rate=22050', '++audio_prompt_path=/DATA/Lalaram/SLAM_omni_Jsn/SLAM-LLM/examples/s2s/audio_prompt/en/prompt_3.wav']
Traceback (most recent call last):
  File "/DATA/Lalaram/SLAM_omni_Jsn/SLAM-LLM/examples/s2s/inference_s2s.py", line 102, in main_hydra
    batch_inference(cfg)
  File "/DATA/Lalaram/SLAM_omni_Jsn/SLAM-LLM/examples/s2s/generate/generate_s2s_batch.py", line 176, in main
    q.write(key + "\t" + source_text + "\n")
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.


####     I'm facing this issue while running inference_s2s_batch.sh with both the pre-trained and fine-tuned models. However, when I load the pre-trained model using inference_s2s_online.sh, it successfully generates both the target text and audio. Please look into this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue in inference_s2s_batch.sh #218

I'm facing this issue while running inference_s2s_batch.sh with both the pre-trained and fine-tuned models. However, when I load the pre-trained model using inference_s2s_online.sh, it successfully generates both the target text and audio. Please look into this.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

issue in inference_s2s_batch.sh #218

Description

I'm facing this issue while running inference_s2s_batch.sh with both the pre-trained and fine-tuned models. However, when I load the pre-trained model using inference_s2s_online.sh, it successfully generates both the target text and audio. Please look into this.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions