Page MenuHomePhabricator

[AArch64] generate vuzp instead of mov
ClosedPublic

Authored by sebpop on Feb 28 2018, 1:40 PM.

Details

Summary

when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a
specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the
BUILD_VECTOR with either vuzp1 or vuzp2.

With this patch LLVM generates the following code for the first function fun1 in the testcase:
adrp x8, .LCPI0_0
ldr q0, [x8, :lo12:.LCPI0_0]
tbl v0.16b, { v0.16b }, v0.16b
ext v1.16b, v0.16b, v0.16b, #8
uzp1 v0.8b, v0.8b, v1.8b
str d0, [x8]
ret

Without this patch LLVM currently generates this code:
adrp x8, .LCPI0_0
ldr q0, [x8, :lo12:.LCPI0_0]
tbl v0.16b, { v0.16b }, v0.16b
mov v1.16b, v0.16b
mov v1.b[1], v0.b[2]
mov v1.b[2], v0.b[4]
mov v1.b[3], v0.b[6]
mov v1.b[4], v0.b[8]
mov v1.b[5], v0.b[10]
mov v1.b[6], v0.b[12]
mov v1.b[7], v0.b[14]
str d1, [x8]
ret

Diff Detail

Repository
rL LLVM

Event Timeline

sebpop created this revision.Feb 28 2018, 1:40 PM
SjoerdMeijer accepted this revision.Feb 28 2018, 2:29 PM

Looks very sensible to me.

This revision is now accepted and ready to land.Feb 28 2018, 2:29 PM
SjoerdMeijer added a comment.EditedFeb 28 2018, 2:53 PM

Was just thinking though that we probably need some negative tests where we expect the rewrite not to happen? Because e.g. the sequence has all even values except one value, if that makes sense.

Should AArch64 define define isExtractSubvectorCheap()? See https://reviews.llvm.org/D27774 .

Was just thinking though that we probably need some negative tests where we expect the rewrite not to happen? Because e.g. the sequence has all even values except one value, if that makes sense.

I will prepare two more negative test-cases. Thanks for pointing this out.

Should AArch64 define define isExtractSubvectorCheap()? See https://reviews.llvm.org/D27774 .

Yes. I think that makes sense: aarch64-neon should follow arm-neon.

This brings the question: do we want this vuzp pattern also recognized for arm-neon?

arm-neon already generates the correct vuzp for the given shuffle, as far as I can tell.

sebpop closed this revision.Mar 1 2018, 7:53 AM

Committed with two more tests in https://reviews.llvm.org/rL326443

This change made compiling freetype for aarch64 trigger assertion failures, see PR36582.

We are seeing this problem in testing too.
Sebastian, do you have time to work on this?

sebpop added a comment.Mar 5 2018, 8:38 AM

We are seeing this problem in testing too.
Sebastian, do you have time to work on this?

Yes, I'm working on a fix.

Ok, excellent, thanks a lot!