This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1064 ↗ | (On Diff #215860) | Is there a VMINV patch too? Am I correct that this would enable both min and max reduction intrinsics? |
llvm/test/CodeGen/Thumb2/mve-vmaxv.ll | ||
14 ↗ | (On Diff #215860) | The old code for these was quite large, right? Much larger than needing to materialise the constant? |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1064 ↗ | (On Diff #215860) | It would enable both. There isn't a separate patch for vminv so I could add it to this one if you'd like. |
llvm/test/CodeGen/Thumb2/mve-vmaxv.ll | ||
14 ↗ | (On Diff #215860) | Yep, in this case it was 78 instructions long! And the others varied from 15 to 61. |
llvm/lib/Target/ARM/ARMInstrMVE.td | ||
---|---|---|
664 ↗ | (On Diff #215860) | Guess these need to be changed to MQPR. |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1064 ↗ | (On Diff #218350) | Is this actually needed for codegen? |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1064 ↗ | (On Diff #218350) | It is, as otherwise the smax and umax vector reductions aren't generated and can't be selected on. |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1064 ↗ | (On Diff #218350) | As in, the reductions get expanded somewhere and your tests would fail? |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1064 ↗ | (On Diff #218350) | Indeed, the tests fail and we get markedly worse codegen. 65c65,70 < vmaxv.u32 r2, q0 --- > vmov.f32 s5, s3 > vmax.u32 q0, q0, q1 > vmov.32 r2, q0[1] > vdup.32 q1, r2 > vmax.u32 q0, q0, q1 > vmov r2, s0 |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1064 ↗ | (On Diff #218350) | Ok, cheers. Then I expect Dave is right that the vmin could now get generated and will then assert because it can't be selected...? |
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1064 ↗ | (On Diff #218350) | Yeah, I think there's two different places that these intrinsics can be expanded, once in a pre-isel pass if this returns false, or in ISEL if the instructions are expand. So I think the selection would be OK, but it would be best to add vminv to make sure the vectoriser doesn't start generating them only for them the be messily expanded. |