Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics and unit tests.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/include/clang/Basic/arm_mve.td | ||
---|---|---|
289 | I wonder if we should implement at least the simple case (integer and unpredicated) using standard IR nodes instead of an IR intrinsic? We already implement vmaxq using an icmp and a select. We haven't implemented vabsq yet, but when we do, it will surely be done in a similar way, to take advantage of the existing pattern matching showcased in llvm/test/CodeGen/Thumb2/mve-abs.ll. So possibly we should code-generate vmaxaq(a,b) as if it was vmaxq(a, vabsq(b)), and write a more complicated isel pattern that will match that whole tree? The advantage would be that if a user had literally written a combination of vmaxq and vabsq, codegen would be able to fold them together into a single instruction at compile time. The FP versions might make sense to do the same way, using the standard @llvm.fabs IR intrinsic for the abs part. |
Nice one. Good to see codegen changes coming out of these intrinsics.
It took a while for me to figure out what the integer instruction was doing. That's a strange one.
The fp case I have a question about below.
llvm/lib/Target/ARM/ARMInstrMVE.td | ||
---|---|---|
3658 | If I'm reading the ARMARM correctly, the fp case seems to preform the abs on both operands. |
llvm/lib/Target/ARM/ARMInstrMVE.td | ||
---|---|---|
3658 | My bad. Fix coming under separate cover. |
I wonder if we should implement at least the simple case (integer and unpredicated) using standard IR nodes instead of an IR intrinsic?
We already implement vmaxq using an icmp and a select. We haven't implemented vabsq yet, but when we do, it will surely be done in a similar way, to take advantage of the existing pattern matching showcased in llvm/test/CodeGen/Thumb2/mve-abs.ll. So possibly we should code-generate vmaxaq(a,b) as if it was vmaxq(a, vabsq(b)), and write a more complicated isel pattern that will match that whole tree?
The advantage would be that if a user had literally written a combination of vmaxq and vabsq, codegen would be able to fold them together into a single instruction at compile time.
The FP versions might make sense to do the same way, using the standard @llvm.fabs IR intrinsic for the abs part.