This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Avoid unnecessary vector byte-swapping in big-endian
ClosedPublic

Authored by pbarrio on Jan 18 2018, 2:33 AM.

Details

Summary

Loads/stores of some NEON vector types are promoted to other vector
types with different lane sizes but same vector size. This is not a
problem in little-endian but, when in big-endian, it requires
additional byte reversals required to preserve the lane ordering
while keeping the right endianness of the data inside each lane.
For example:

%1 = load <4 x half>, <4 x half>* %p

results in the following assembly:

ld1 { v0.2s }, [x1]
rev32 v0.4h, v0.4h

This patch changes the promotion of these loads/stores so that the
actual vector load/store (LD1/ST1) takes care of the endianness
correctly and there is no need for further byte reversals. The
previous code now results in the following assembly:

ld1 { v0.4h }, [x1]

Diff Detail

Event Timeline

pbarrio created this revision.Jan 18 2018, 2:33 AM
efriedma added inline comments.Jan 19 2018, 11:24 AM
lib/Target/AArch64/AArch64ISelLowering.cpp
725

Could you use changeVectorElementTypeToInteger or something like that rather than listing out all the types individually?

742

Missing break.

pbarrio updated this revision to Diff 130864.Jan 22 2018, 4:06 AM

Instead of an ugly switch, reuse already-existing functionality in LLVM to do
the conversion of FP vector types into int vector types.

pbarrio marked an inline comment as done.Jan 22 2018, 4:07 AM

Thanks for the suggestion, very helpful :)

efriedma accepted this revision.Jan 22 2018, 10:57 AM

LGTM with one minor tweak.

lib/Target/AArch64/AArch64ISelLowering.cpp
724

We can assume VT is a vector type here.

This revision is now accepted and ready to land.Jan 22 2018, 10:57 AM
pbarrio updated this revision to Diff 130937.Jan 22 2018, 11:44 AM

VT should be a vector in addTypeForNEON, so the check can be an assert
instead of a proper test in the conditional.

pbarrio marked an inline comment as done.Jan 22 2018, 11:45 AM

Waiting a bit just in case there's extra feedback for the last change. I will commit this patch later today.

This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.

This apparently defeats the logic in splitStores() (aka the infamous FeatureMisalignedSlow128Bit) because that code had an exception for v2i64 vector types but wants to split all other vector types. An internal test case of ours:

define void @test_split_16B(<4 x float> %val, <4 x float>* %addr) {
  store <4 x float> %val, <4 x float>* %addr, align 8
  ret void
}

would no longer show the store getting split.

Independently of this I disabled FeatureMisalignedSlow128 for apple a few hours later so my interest is limited now :)

But I think Exynos is still using the "feature", so they may want to look into this.