This is an archive of the discontinued LLVM Phabricator instance.

[AArch64 - BE] BUILD_VECTOR lane order is reversed in big-endian mode
AbandonedPublic

Authored by rmaprath on Aug 28 2014, 3:51 AM.

Details

Reviewers
t.p.northover
Summary

Hi Tim,

This is a revamp of [1].

That patch was rejected mainly because it lacked enough testing. I've added tests to cover all ModImmTypes and addressed the other minor points you mentioned there.

Notes:-

  • The need to use rev64 instructions with big-endian vectors is documented at [2]
  • Most ModImmTypes have a symmetric counter-part, and the lane reversal causes them to be encoded in that opposite pattern. A few ModImmTypes (7,8, 11, 12) do not have that property and gets pushed into memory. But this is irrelevant as far as correctness is concerned.

Thanks.

  • Asiri

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140714/226398.html
[2] http://llvm.org/docs/BigEndianNEON.html

Diff Detail

Event Timeline

rmaprath updated this revision to Diff 13026.Aug 28 2014, 3:51 AM
rmaprath retitled this revision from to [AArch64 - BE] BUILD_VECTOR lane order is reversed in big-endian mode.
rmaprath updated this object.
rmaprath edited the test plan for this revision. (Show Details)
rmaprath added a reviewer: t.p.northover.
rmaprath added a subscriber: Unknown Object (MLST).
t.p.northover edited edge metadata.Aug 29 2014, 4:11 AM

Hi Asiri,

Sorry for the delay on this one, you picked just the wrong time to start working on it as I've been away for the last week.

That patch was rejected mainly because it lacked enough testing.

No it wasn't, it was rejected (or at least delayed, it may well be correct) mainly because I thought the implementation was suspect. The testing would have to improve as well, but that was less important to me.

I'll take another look in light of your question, once I've caught up with everything.

Cheers.

Tim.

Hi Tim,

Thanks for looking into this.

Earlier I had some uncertain points about your original review comments (as they were written for James, not at the level I can process ;)), but I had a chat with James and got them resolved.

  • Asiri

Hi Asiri,

I'm afraid I still think that this isn't fixing the real issue, which is the dodgy casting that goes on *after* resolveBuildVector. Putting the change there seems like it would be trying to make two bugs cancel out into a feature (though I can't find any currently incorrect code that results).

  • It makes the resolveBuildVector's output incompatible with its fundamental building-block, isConstantSplat.
  • It means that CnstBits no longer represent the in-register value directly: "CnstBits & 1" no longer checks if the value would be even, for example.
  • We now end up passing a big-endian value into endian-agnostic isAdvSIMDModImmTypeN functions.

We bodge that last point up with some REVs afterwards, but that only works in the direct MOVI case, I think. For example:

@vec = global <4 x i16> <i16 1234, i16 5678, i16 9101, i16 1121>

define i16 @foo() {
  %in = load <4 x i16>* @vec
  %res_vec = and <4 x i16> %in, <i16 65400, i16 65535, i16 65400, i16 65535>
  %elt = extractelement <4 x i16> %res_vec, i16 0
  ret i16 %elt
}

This currently produces:

ld1     { v0.4h }, [x8]
bic     v0.2s, #0x87
rev32   v0.4h, v0.4h
umov    w0, v0.h[0]

where that REV32 is highly suspect. But after your patch it produces:

ld1     { v0.4h }, [x8]
bic     v0.2s, #0x87, lsl #16
rev32   v0.4h, v0.4h
umov    w0, v0.h[0]

where both the BIC and the REV32 are wrong, and don't cancel each other out.

Do my concerns make more sense now?

Cheers.

Tim.

Hi Tim,

I'm afraid I still think that this isn't fixing the real issue, which is the dodgy casting that goes on *after* resolveBuildVector. Putting the change there seems like it would be trying to make two bugs cancel out into a feature (though I can't find any currently incorrect code that results).

I had an itching about this and was trying to come up with a counter-example for a couple of days to prove that this fix was bit too narrow. But I finally gave in (after trying to break resolveBUILD_VECTOR with various ModImmTypes) and thought this is something we have to live with given how we handle vectors in big-endian mode.

  • It makes the resolveBuildVector's output incompatible with its fundamental building-block, isConstantSplat.
  • It means that CnstBits no longer represent the in-register value directly: "CnstBits & 1" no longer checks if the value would be even, for example.
  • We now end up passing a big-endian value into endian-agnostic isAdvSIMDModImmTypeN functions.

We bodge that last point up with some REVs afterwards, but that only works in the direct MOVI case, I think.

AFAICS the REV instructions are redundant (in the MOVI case). My understanding was that we have to perform the lane reversal within resolveBuildVector() because at the callee end (when a vector is passed to a function) we get something like:

define i16 @f(<4 x i16> %arg) nounwind {
  ; CHECK:          rev64   v0.4h, v0.4h
  ; CHECK-NEXT      umov    w0, v0.h[0]
  ; CHECK-NEXT      ret
  %v = extractelement <4 x i16> %arg, i32 0
  ret i16 %v
}

So, to cater for that rev64, we keep the vector lane-reversed. Perhaps a proper fix will have to get rid of that rev64.

Admittedly, my understanding of the problem was a bit narrow at the time so my focus was on somehow resurrecting the existing patch :)

For example:

@vec = global <4 x i16> <i16 1234, i16 5678, i16 9101, i16 1121>

define i16 @foo() {
  %in = load <4 x i16>* @vec
  %res_vec = and <4 x i16> %in, <i16 65400, i16 65535, i16 65400, i16 65535>
  %elt = extractelement <4 x i16> %res_vec, i16 0
  ret i16 %elt
}

This currently produces:

ld1     { v0.4h }, [x8]
bic     v0.2s, #0x87
rev32   v0.4h, v0.4h
umov    w0, v0.h[0]

where that REV32 is highly suspect. But after your patch it produces:

ld1     { v0.4h }, [x8]
bic     v0.2s, #0x87, lsl #16
rev32   v0.4h, v0.4h
umov    w0, v0.h[0]

where both the BIC and the REV32 are wrong, and don't cancel each other out.

Thanks for this example. This should be enough for me to look for a different fix.

Cheers!

Asiri.

AFAICS the REV instructions are redundant (in the MOVI case). My understanding was that we have to perform the lane reversal within resolveBuildVector() because at the callee end (when a vector is passed to a function) we get something like:

Ah, if you're looking at the call case specifically then some kind of
rev is needed for ABI compatibility, but what we're emitting at the
moment is a merged version of the necessary REV and the dodgy one.
When passing a v4i16 emit:

REV64 v0.2s, v0.2s

However, the DAG that actually gets created is closer to:

REV32 v0.4h, v0.4h // By the code associated with resolveBuildVector
REV64 v0.4h, v0.4h // By LowerCall

And the combination of those two operations is indeed a "REV64
v0.2s"[1]. I think that first one is a problem (particularly in the
BIC/OR case, but conceptually not what we want even for MOVI).

Cheers.

Tim.

[1] A good way to do these in your head is that the bigger size always
gets attached to the REV mnemonic, and combining two REVs with a
common size ("16" in this case) you drop that common size. So we have
64 <-> 16 and 32 <-> 16, which means the result is a 32 <-> 64,
written REV64 v0.2s.

Hi TIm,

However, the DAG that actually gets created is closer to:

REV32 v0.4h, v0.4h // By the code associated with resolveBuildVector
REV64 v0.4h, v0.4h // By LowerCall

And the combination of those two operations is indeed a "REV64
v0.2s"[1]. I think that first one is a problem (particularly in the
BIC/OR case, but conceptually not what we want even for MOVI).

Thanks for this explanation, helped a lot :)

I will play around getting rid of that dodgy rev.

Cheers!

Asiri.

rmaprath abandoned this revision.Sep 3 2014, 6:11 AM