how to compile scaled_dot_product_attention to iree--.vmfb?

thanks for this job! When I use the [tinyllama w8a8](https://huggingface.co/nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8-Static-Asym-e2e)  in the hugging face, and convert to .mlir using torch-mlir. But when I compile the mlir file to iree .vmfb (iree-compile tinyllama11.mlir --iree-hal-target-device=local --iree-hal-local-target-device-backends=llvm-cpu --iree-llvmcpu-target-cpu=host -o tiny_cpu.vmfb), it commented:

`tinyllama11.mlir:449:12: note: see current operation: %594 = "torch.aten.scaled_dot_product_attention"(%573, %587, %593, %503, %420, %449, %419, %449) : (!torch.vtensor<[1,32,4,64],f16>, !torch.vtensor<[1,32,4,64],f16>, !torch.vtensor<[1,32,4,64],f16>, !torch.vtensor<[1,1,4,4],i1>, !torch.float, !torch.bool, !torch.float, !torch.bool) -> !torch.vtensor<[1,32,4,64],f16>`

Is this because we haven't supported the expression of attention? Is there a way for me to convert attention into a conventional expression (similar to matmul+linear)? thanks！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to compile scaled_dot_product_attention to iree--.vmfb? #4379

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

how to compile scaled_dot_product_attention to iree--.vmfb? #4379

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions