-
Notifications
You must be signed in to change notification settings - Fork 615
Description
thanks for this job! When I use the tinyllama w8a8 in the hugging face, and convert to .mlir using torch-mlir. But when I compile the mlir file to iree .vmfb (iree-compile tinyllama11.mlir --iree-hal-target-device=local --iree-hal-local-target-device-backends=llvm-cpu --iree-llvmcpu-target-cpu=host -o tiny_cpu.vmfb), it commented:
tinyllama11.mlir:449:12: note: see current operation: %594 = "torch.aten.scaled_dot_product_attention"(%573, %587, %593, %503, %420, %449, %419, %449) : (!torch.vtensor<[1,32,4,64],f16>, !torch.vtensor<[1,32,4,64],f16>, !torch.vtensor<[1,32,4,64],f16>, !torch.vtensor<[1,1,4,4],i1>, !torch.float, !torch.bool, !torch.float, !torch.bool) -> !torch.vtensor<[1,32,4,64],f16>
Is this because we haven't supported the expression of attention? Is there a way for me to convert attention into a conventional expression (similar to matmul+linear)? thanks!