-
Notifications
You must be signed in to change notification settings - Fork 302
Description
using gen_ai to inference spark-tts LLM(Qwen2 Architecture),but in OVModelForCausalLM,the result is right,but using openvino, every token always "!", that is 0(after tokenizer)
openvino_genai: version:2025.3.0.0
code:
import time
import os
import torch
prompt = "<|task_tts|><|start_content|>量子力学(quantum mechanics)是描述原子尺度及以下微观世界行为的物理学分支,是现代物理学的两大支柱之一。<|end_content|><|start_global_token|><|bicodec_global_2391|><|bicodec_global_1229|><|bicodec_global_1008|><|bicodec_global_3279|><|bicodec_global_273|><|bicodec_global_1590|><|bicodec_global_1232|><|bicodec_global_2201|><|bicodec_global_1356|><|bicodec_global_2700|><|bicodec_global_972|><|bicodec_global_1061|><|bicodec_global_225|><|bicodec_global_3848|><|bicodec_global_3128|><|bicodec_global_3572|><|bicodec_global_758|><|bicodec_global_4095|><|bicodec_global_2290|><|bicodec_global_3325|><|bicodec_global_3445|><|bicodec_global_2683|><|bicodec_global_972|><|bicodec_global_3911|><|bicodec_global_1265|><|bicodec_global_3342|><|bicodec_global_3305|><|bicodec_global_253|><|bicodec_global_113|><|bicodec_global_3665|><|bicodec_global_507|><|bicodec_global_316|><|end_global_token|>"
from transformers import AutoTokenizer
import openvino_genai
pipe = openvino_genai.LLMPipeline("./spark_tts_ov/LLM", device="CPU")
tokenizer = AutoTokenizer.from_pretrained("./spark_tts_ov/LLM")
t0 = time.perf_counter()
result = pipe.generate("你好",max_new_tokens=10)
print(f"warm up time:{time.perf_counter() - t0:4f}")
for token in pipe.generate(prompt,stream=True,max_new_tokens=500):
print(token)
print(tokenizer.encode(token))