This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
ClosedPublic

Authored by huihuiz on Oct 20 2020, 3:26 PM.

Details

Summary

Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Diff Detail

Event Timeline

huihuiz created this revision.Oct 20 2020, 3:26 PM
huihuiz requested review of this revision.Oct 20 2020, 3:26 PM

Current upstream mis-compile, take t.ll , run "llc -mtriple=aarch64-linux-gnu -mattr=+sve < t.ll"

define <vscale x 4 x i32> @test(<vscale x 4 x i32> %a) {
  %pg = shufflevector <vscale x 4 x i1> insertelement (<vscale x 4 x i1> undef, i1 true, i32 0), <vscale x 4 x i1> undef, <vscale x 4 x i32> zeroinitializer
  %elt = insertelement <vscale x 4 x i32> undef, i32 257, i32 0
  %splat = shufflevector <vscale x 4 x i32> %elt, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
  %umin = call <vscale x 4 x i32> @llvm.aarch64.sve.umin.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %splat)
  ret <vscale x 4 x i32> %umin
}

declare <vscale x 4 x i32> @llvm.aarch64.sve.umin.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)

Then you see it's mis-compiled into
// %bb.0:

umin    z0.s, z0.s, #1

This patch fixes this error, and generate
// %bb.0:

mov     w8, #257
ptrue   p0.s
mov     z1.s, w8
umin    z0.s, p0/m, z0.s, z1.s
sdesmalen added inline comments.Oct 21 2020, 9:01 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3146

Maybe I'm missing something obvious, but I don't see why any masking is needed.
Is removing ImmVal = ImmVal & 0xFF; not sufficient?

huihuiz added inline comments.Oct 21 2020, 9:44 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3146

We need to apply masking "ImmVal & 0xFF" for i8 at least. Otherwise we got regression.

Take t.ll below, if we don't apply the mask for i8, then we generate "mov z1.b, #-127 ".
With the mask we generate "umax z0.b, z0.b, #129".
There are similar regression for i8 in range [128, 255]

define <vscale x 16 x i8> @umax_i8_large(<vscale x 16 x i8> %a) {
  %elt = insertelement <vscale x 16 x i8> undef, i8 129, i32 0
  %splat = shufflevector <vscale x 16 x i8> %elt, <vscale x 16 x i8> undef, <vscale x 16 x i32> zeroinitializer
  %cmp = icmp ugt <vscale x 16 x i8> %a, %splat
  %res = select <vscale x 16 x i1> %cmp, <vscale x 16 x i8> %a, <vscale x 16 x i8> %splat
  ret <vscale x 16 x i8> %res
}
efriedma added inline comments.Oct 21 2020, 2:25 PM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3146

To add to this, the SelectionDAG node for the operand looks something like the following: nxv16i8 AArch64ISD::DUP(i32 -127).

sdesmalen accepted this revision.Oct 23 2020, 4:22 AM

LGTM, thanks for fixing @huihuiz!

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3139

nit: DL can be inlined directly into its single use, i.e.

Imm = CurDAG->getTargetConstant(ImmVal, SDLoc(N), MVT::i32);
3146

Thanks, I see what you mean now!

This revision is now accepted and ready to land.Oct 23 2020, 4:22 AM
This revision was landed with ongoing or failed builds.Oct 23 2020, 9:44 AM
This revision was automatically updated to reflect the committed changes.
huihuiz marked an inline comment as done.

Thanks @sdesmalen for the review!
Fixed in the commit patch.