This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Do not use vtrn for vectorshuffle if the order is reversed
ClosedPublic

Authored by jketema on Aug 25 2015, 2:26 PM.

Details

Summary

The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case.

Diff Detail

Event Timeline

jketema updated this revision to Diff 33120.Aug 25 2015, 2:26 PM
jketema retitled this revision from to [ARM] Do not use vtrn for vectorshuffle if the order is reversed.
jketema updated this object.
jketema added reviewers: rengolin, aemerson.
jketema added a subscriber: llvm-commits.
ab accepted this revision.Sep 1 2015, 6:33 PM
ab added a reviewer: ab.
ab added a subscriber: ab.

LGTM with the additional suggested testcase (if it does make sense), thanks!

I'm surprised we didn't see this earlier, I guess it was exposed by r240118?

lib/Target/ARM/ARMISelLowering.cpp
5059–5062

Looking at this again, I realize: isn't this a tad too conservative? What happens when you have:

<-1, 4, 2, 6, 1, 5, 3, 7>

More importantly: could this break, say with:

<-1, 5, 3, 7, 1, 5, 3, 7>

which isn't a vtrn, but looks like it will match? After this patch the '>=' should catch this, I think. If so, could you add a testcase?

This revision is now accepted and ready to land.Sep 1 2015, 6:33 PM
jketema added inline comments.Sep 2 2015, 3:30 AM
lib/Target/ARM/ARMISelLowering.cpp
5059–5062

The first one still generates a vtrn as expected, using both the upper and lower part. The second one also generates a vtrn but uses the upper part twice and not the lower part. Curiously, there don't seem to be any test cases for this behavior.

jketema updated this revision to Diff 33797.Sep 2 2015, 4:15 AM
jketema edited edge metadata.

I've added two test cases based on the suggestion by Ahmed.

jketema added inline comments.Sep 2 2015, 4:20 AM
lib/Target/ARM/ARMISelLowering.cpp
5059–5062

I've added two test cases. Do these look ok?

ab added a comment.Sep 2 2015, 6:17 AM

I think the new tests generated vtrn because v8i32 had to be split. After that, there are no double-width shuffles anymore, so we'll just match (and later CSE) two identical shuffles. Can you try with v8i16 = v4i16,v4i16?

Also, from what I found, the <-1, 4, ...> also isn't tested, so it'd be great if you added it too.
I expected it to fail, because M[i] == 0 is false (when i == 0), so we will try and fail to match the upper result (WhichResult == 1) in the lower lanes (i == 0).

The investigation is much appreciated, thanks again!

jketema requested a review of this revision.Sep 3 2015, 2:48 PM
jketema edited edge metadata.

Is this ok now with the added tests?

ab requested changes to this revision.Sep 3 2015, 3:48 PM
ab edited edge metadata.

No: the added tests that generate <8 x i32> aren't testing the patched code: <8 x i32> isn't legal on ARM, so the tests' shufflevectors will be split into 2 separate <4 x i32> shufflevectors. Please test using (for instance) <4 x i16> and <8 x i16> instead of <4 x i32> and <8 x i32>.

This revision now requires changes to proceed.Sep 3 2015, 3:48 PM
jketema updated this revision to Diff 33999.Sep 3 2015, 5:52 PM
jketema edited edge metadata.

Update tests.

jketema added a comment.EditedSep 3 2015, 5:57 PM

Hi Ahmed,

I've updated the tests. Thanks for pointing out the issue with the type sizes, I had not realized this. Your comments also made me realise that there's an easier way to test for the incorrect vectors, as you see.

I agree that the case where only the upper or lower half is used, is indeed a bit conservative. Replacing

WhichResult = M[i] == 0 ? 0 : 1;

by

for (int k = i; k < i + NumElts; ++k) {
  if (M[k] >= 0) {
    WhichResult = (unsigned) M[k] % 2;
    break;
  }
}

could probably solve this, and would accept <-1, 4, 2, 6, 1, 5, 3, 7>. What do you think?

ab accepted this revision.Sep 8 2015, 5:42 PM
ab edited edge metadata.

Nice! We still can't catch <-1, 4, 2, 6> though, right? If so, a FIXME would be enough.

LGTM, thanks!

test/CodeGen/ARM/vtrn.ll
379

No need to check BB#0? (Same below)

This revision is now accepted and ready to land.Sep 8 2015, 5:42 PM
jketema updated this revision to Diff 34366.Sep 9 2015, 2:03 PM
jketema edited edge metadata.

Hi Ahmed,

I've added a FIXME comment, as <-1, 4, 2, 6> is indeed still rejected.

The check for @ BB#0 seems necessary, otherwise the tests fail by matching vtrn in the .type line. If there's a better way to not match on this line, please let me know.

jketema requested a review of this revision.Sep 9 2015, 2:04 PM
jketema edited edge metadata.

If this is all ok now, please commit on my behalf.

jmolloy accepted this revision.Sep 10 2015, 1:44 AM
jmolloy added a reviewer: jmolloy.
jmolloy added a subscriber: jmolloy.

Committed on Jeroen's behalf in rL257254.

This revision is now accepted and ready to land.Sep 10 2015, 1:44 AM
jketema closed this revision.Sep 10 2015, 1:55 PM

Thanks for committing James.