This revision adds the first intrinsic for llvm.matrix.multiply.
This uses the more general LLVM_OneResultOp for now since the goal is to use the
specific Matrix builders that @fhahn has created recently.
When piped through:
opt -O3 -enable-matrix | llc -O3 -march=x86-64 -mcpu=skylake-avx512
this has been verified to generate ymm instructions.
Additional function attribute support will be needed to generate proper zmm instructions but at least things run end to end.
Benchmarking will be provided separately with the experimental metaprogramming ModelBuilder tool when ready.
Can you elaborate?