This patch maps DestructiveTernaryCommWithRev intrinsics to pesudo intructions. This makes it easier to choose whether to generate fmla/fmls/fnmla/fnmls or fmad/fmsb/fnmad/fnmsb which reduces the generation of mov instructions when computing is intensive.
Diff Detail
Event Timeline
llvm/test/CodeGen/AArch64/sve-intrinsic-fmla-fmad.ll | ||
---|---|---|
32 ↗ | (On Diff #490545) | nit: Expect multiple small cases for different instructions |
ping
llvm/test/CodeGen/AArch64/sve-intrinsic-fmla-fmad.ll | ||
---|---|---|
32 ↗ | (On Diff #490545) | Modified. |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
---|---|---|
660 | The intrinsics are defined to always merge into the first source operand, so passing this to a pseudo node which assumes the inactive lanes are undef doesn't seem entirely right. |
Hi @lizhijin, I don't think this patch makes much sense because at the code generation layer we already have pseudo instructions to allow better FMLA/FMAD usage based on what the register allocate chooses to do. I suspect the problem you care about is due to how the C/C++ builtins are lowered for things like svmla_x? which currently overly restricts code generation. I've created a patch series that ends with D143767 that I believe fulfils the intent of what you wanted to achieve. Please let me know if I've misunderstood the issue you wanted to solve.
To prevent some duplication of effort I just wanted to update my previous comment and say all the dependent patches have now landed and I'm planning to add _u intrinsics/builtins for the integer MLA instructions soon.
The intrinsics are defined to always merge into the first source operand, so passing this to a pseudo node which assumes the inactive lanes are undef doesn't seem entirely right.