The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case.
Diff Detail
Event Timeline
LGTM with the additional suggested testcase (if it does make sense), thanks!
I'm surprised we didn't see this earlier, I guess it was exposed by r240118?
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
5059–5062 | Looking at this again, I realize: isn't this a tad too conservative? What happens when you have: <-1, 4, 2, 6, 1, 5, 3, 7> More importantly: could this break, say with: <-1, 5, 3, 7, 1, 5, 3, 7> which isn't a vtrn, but looks like it will match? After this patch the '>=' should catch this, I think. If so, could you add a testcase? |
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
5059–5062 | The first one still generates a vtrn as expected, using both the upper and lower part. The second one also generates a vtrn but uses the upper part twice and not the lower part. Curiously, there don't seem to be any test cases for this behavior. |
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
5059–5062 | I've added two test cases. Do these look ok? |
I think the new tests generated vtrn because v8i32 had to be split. After that, there are no double-width shuffles anymore, so we'll just match (and later CSE) two identical shuffles. Can you try with v8i16 = v4i16,v4i16?
Also, from what I found, the <-1, 4, ...> also isn't tested, so it'd be great if you added it too.
I expected it to fail, because M[i] == 0 is false (when i == 0), so we will try and fail to match the upper result (WhichResult == 1) in the lower lanes (i == 0).
The investigation is much appreciated, thanks again!
No: the added tests that generate <8 x i32> aren't testing the patched code: <8 x i32> isn't legal on ARM, so the tests' shufflevectors will be split into 2 separate <4 x i32> shufflevectors. Please test using (for instance) <4 x i16> and <8 x i16> instead of <4 x i32> and <8 x i32>.
Hi Ahmed,
I've updated the tests. Thanks for pointing out the issue with the type sizes, I had not realized this. Your comments also made me realise that there's an easier way to test for the incorrect vectors, as you see.
I agree that the case where only the upper or lower half is used, is indeed a bit conservative. Replacing
WhichResult = M[i] == 0 ? 0 : 1;
by
for (int k = i; k < i + NumElts; ++k) { if (M[k] >= 0) { WhichResult = (unsigned) M[k] % 2; break; } }
could probably solve this, and would accept <-1, 4, 2, 6, 1, 5, 3, 7>. What do you think?
Nice! We still can't catch <-1, 4, 2, 6> though, right? If so, a FIXME would be enough.
LGTM, thanks!
test/CodeGen/ARM/vtrn.ll | ||
---|---|---|
379 | No need to check BB#0? (Same below) |
Hi Ahmed,
I've added a FIXME comment, as <-1, 4, 2, 6> is indeed still rejected.
The check for @ BB#0 seems necessary, otherwise the tests fail by matching vtrn in the .type line. If there's a better way to not match on this line, please let me know.
Looking at this again, I realize: isn't this a tad too conservative? What happens when you have:
More importantly: could this break, say with:
which isn't a vtrn, but looks like it will match? After this patch the '>=' should catch this, I think. If so, could you add a testcase?