This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Narrow ZExt that feed binop followed by trunc.
AbandonedPublic

Authored by fhahn on Jul 3 2020, 8:56 AM.

Details

Summary

In the pattern below, the trunc can be eliminated by shortening the
zexts, if the zexts remain.

trunc (binop (zext), (zext)) to ty -> binop (zext to ty) (zext to ty)

Initially limited to add/sub.

This transform is only performed if the shortened zexts are free (can be
folded into the binary op).

I am not entirely sure VectorCombine is the right place to do the
transform, but I think we want to limit it to cases where we know the
shorter zexts are free/legal on the target. I am not sure if we have an
easy way to check the latter though.

Alive proof sketches (scalar versions so we do not run into timeouts):

On AArch64, codegen for the following input can be improved (this is
from hot code in SPEC2006/h264)

define <8 x i32> @test(<8 x i16>* %p1, <8 x i16>* %p2) {

%l.1 = load <8 x i16>, <8 x i16>* %p1, align 2
%ext.1 = zext <8 x i16> %l.1 to <8 x i64>
%l.2 = load <8 x i16>, <8 x i16>* %p2, align 2
%ext.2 = zext <8 x i16> %l.2 to <8 x i64>
%sub = sub nsw <8 x i64> %ext.1, %ext.2
%t = trunc <8 x i64> %sub to <8 x i32>
ret <8 x i32> %t

}

Without patch

ldr     q0, [x0]
ldr     q1, [x1]
ushll2  v2.4s, v0.8h, #0
ushll   v0.4s, v0.4h, #0
ushll2  v3.4s, v1.8h, #0
ushll   v1.4s, v1.4h, #0
usubl2  v4.2d, v0.4s, v1.4s
usubl   v0.2d, v0.2s, v1.2s
usubl   v1.2d, v2.2s, v3.2s
usubl2  v5.2d, v2.4s, v3.4s
xtn     v1.2s, v1.2d
xtn     v0.2s, v0.2d
xtn2    v1.4s, v5.2d
xtn2    v0.4s, v4.2d
ret

With patch

ldr     q0, [x0]
ldr     q2, [x1]
usubl2  v1.4s, v0.8h, v2.8h
usubl   v0.4s, v0.4h, v2.4h
ret

Diff Detail

Event Timeline

fhahn created this revision.Jul 3 2020, 8:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 3 2020, 8:56 AM
fhahn updated this revision to Diff 275408.Jul 3 2020, 8:58 AM

Move llvm/test/Transforms/VectorCombine/AArch64/lit.local.cfg to NFC test patch.

Doesn't instcombine handle this already? https://godbolt.org/z/TQLYd3

fhahn abandoned this revision.Jul 3 2020, 11:28 AM

Doesn't instcombine handle this already? https://godbolt.org/z/TQLYd3

Oh indeed, that's great. Looks like this patch is not needed :)