This is an archive of the discontinued LLVM Phabricator instance.

[X86][XOP] Add support for lowering vector rotations
ClosedPublic

Authored by RKSimon on Oct 18 2015, 7:04 AM.

Details

Summary

This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.

This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future. I can commit this separately if people require but there is no way to add tests to just this part - I was surprised to find that other targets (PPC ALTIVEC?) didn't already have vector rotate support.

Elena - I've tried to set up the X86 lowering to make it straightforward to add AVX512 rotations in the future.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 37708.Oct 18 2015, 7:04 AM
RKSimon retitled this revision from to [X86][XOP] Add support for lowering vector rotations.
RKSimon updated this object.
RKSimon added reviewers: qcolombet, delena, andreadb, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
delena edited edge metadata.Oct 19 2015, 12:19 AM

Hi Simon,

I'd write X86 specific combiner for vector rotation because

  • VPROT supports 8 bits immediates only
  • AVX-512 supports variable rotation

And getConstantSplatNode() does not allows "undef". Do you know this?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3906 ↗(On Diff #37708)

Intel XOP set allows only 8 bit immediate even EltSizeInBits=64.

lib/Target/X86/X86ISelLowering.cpp
18845 ↗(On Diff #37708)

I suppose you can put assert(Subtarget->hasXOP()).

18852 ↗(On Diff #37708)

One "if" is enough here.
if (auto *RotateConst = Amt->getConstantSplatNode())

18856 ↗(On Diff #37708)

I think that the value should be positive here. You already know the rotate direction. And it should fit in 8 bits.

Hi Simon,

I'd write X86 specific combiner for vector rotation because

  • VPROT supports 8 bits immediates only
  • AVX-512 supports variable rotation

I'm not sure I agree - AMD XOP has both (128-bit) vector rotation by variable (per element not like SSE2 shifts) and by immediate (same immediate value for all elements) - similar enough to AVX512 but also pretty generic.

The changes to DAGCombiner are limited on purpose, they're just introducing basic vector type support to rotation pattern matching code that at the moment only works for scalars. All of the DAGCombiner changes will work fine for XOP / AVX512 on x86 targets, and could easily be used for other targets (I mentioned ALTIVEC but there may be others too).

And getConstantSplatNode() does not allows "undef". Do you know this?

Yes - I just didn't want to overload this patch with changes to the DAGCombiner as we don't yet have any targets that lower rotations of vector types (with which to test it), the changes were just enough to get it working for basic examples. With that in place I can then extend the DAGCombiner changes to be handle both all cases of scalar and vector types.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3906 ↗(On Diff #37708)

This is referring to the size of the scalar element to check for a rotation pattern from 2 shifts - it has nothing to do with instruction sizes and isn't target specific.

lib/Target/X86/X86ISelLowering.cpp
18845 ↗(On Diff #37708)

OK - as I said I was trying to setup for you guys to be able to easily add AVX512 support.

18856 ↗(On Diff #37708)

Would you prefer I add an assert to prove this?

Could you, please, add a test with variable rotation-right that requires the "sub" ?
One test with negative immediate.
And one test for 256-bit vector, just to cover all lines that you added.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3906 ↗(On Diff #37708)

Agree. thank you.

lib/Target/X86/X86ISelLowering.cpp
18856 ↗(On Diff #37708)

Yes, I suppose you don't need zextOrTrunc() here. Just extract the integer and check the size with assert.

RKSimon updated this revision to Diff 37920.Oct 20 2015, 2:23 PM
RKSimon edited edge metadata.

Updated with 256-bit tests. I've removed ROTR lowering from XOP for now - we don't have any combines that just create ROTR (all attempt to lower with either ROTL/ROTR depending on what is legal/custom), so we can avoid the extra cost of the subtract and always use ROTL on XOP.

RKSimon marked 2 inline comments as done.Oct 20 2015, 2:26 PM
RKSimon added inline comments.
lib/Target/X86/X86ISelLowering.cpp
18843 ↗(On Diff #37920)

I need both if() as we need to try to cast to BuildVectorSDNode before we can extract the splat.

Updated with 256-bit tests. I've removed ROTR lowering from XOP for now - we don't have any combines that just create ROTR (all attempt to lower with either ROTL/ROTR depending on what is legal/custom), so we can avoid the extra cost of the subtract and always use ROTL on XOP.

The rotation immediate can be negative. It will mean rotation right. I assume that the combiner catches this scenario and generates ROTL with negative value. Could you, please, add a test for this?

Updated with 256-bit tests. I've removed ROTR lowering from XOP for now - we don't have any combines that just create ROTR (all attempt to lower with either ROTL/ROTR depending on what is legal/custom), so we can avoid the extra cost of the subtract and always use ROTL on XOP.

The rotation immediate can be negative. It will mean rotation right. I assume that the combiner catches this scenario and generates ROTL with negative value. Could you, please, add a test for this?

I've tried to create such a test but without success - similar to shifts, for the ISD rotate ops any value less than 0 or greater than/equal to bitsize is undefined so I consider this to be correct for the XOP lowering. I've already removed the unnecessary XOP lowering of ISD::ROTR.

Would updating the lowering assert to check that the (sign extended rotation) constants are in range be satisfactory instead? Failing that I would have to add a ((1 << bitsize) - 1) mask to force the rotation to be in range which I believe isn't necessary in the same way that we don't do this for AVX2/XOP shifts by variable.

The rotation immediate can be negative. It will mean rotation right. I assume that the combiner catches this scenario and generates ROTL with negative value. Could you, please, add a test for this?

I've tried to create such a test but without success - similar to shifts, for the ISD rotate ops any value less than 0 or greater than/equal to bitsize is undefined so I consider this to be correct for the XOP lowering. I've already removed the unnecessary XOP lowering of ISD::ROTR.

Would updating the lowering assert to check that the (sign extended rotation) constants are in range be satisfactory instead? Failing that I would have to add a ((1 << bitsize) - 1) mask to force the rotation to be in range which I believe isn't necessary in the same way that we don't do this for AVX2/XOP shifts by variable.

I ran a scalar test with debugger to see what happens. I modified the DAG combiner and set ISD::ROTR to illegal. So I worked only with ROTL. Now I see that immediate is always positive.
Because "rotr $25" and "rotl $7" for 32 bit elt are the same.
Now I tried this test case:

define i32 @rotate_right_32(i32 %a, i32 %b) {
entry:

%and = and i32 %b, 31
%shl = lshr i32 %a, %and
%0 = sub i32 0, %b
%and3 = and i32 %0, 31
%shr = shl i32 %a, %and3
%or = or i32 %shl, %shr
ret i32 %or

}
With disabled ROTR I received one instruction more:

negl    %esi
movb    %sil, %cl
roll    %cl, %edi
movl    %edi, %eax
retq

And this code when we have the both ROTR and ROTL

movb    %sil, %cl
rorl    %cl, %edi
movl    %edi, %eax
retq

I'm almost sure that your code is correct. Could you, please, check the vector version of the test above? You should see the "negl" operation, probably in "sub" form.
And if you see that the code is correct, you can commit.
Thank you.

Thanks Elena, I ran that code replacing i32 with <4 x i32> and got the following as you expected:

vpxor   %xmm2, %xmm2, %xmm2
vpsubd  %xmm1, %xmm2, %xmm1
vpand   .LCPI1_0(%rip), %xmm1, %xmm1
vprotd  %xmm1, %xmm0, %xmm0
retq

Thank you for checking this. LGTM.

This revision was automatically updated to reflect the committed changes.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 9 2022, 8:59 AM