Page MenuHomePhabricator

[AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
ClosedPublic

Authored by huihuiz on Oct 20 2020, 3:26 PM.

Details

Summary

Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Diff Detail

Unit TestsFailed

TimeTest
340 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
80 mslinux > Polly.ScopInfo/NonAffine::non-affine-loop-condition-dependent-access_3.ll
Script: -- : 'RUN: at line 1'; opt -polly-process-unprofitable -polly-remarks-minimal -polly-use-llvm-names -polly-import-jscop-dir=/mnt/disks/ssd0/agent/llvm-project/polly/test/ScopInfo/NonAffine -polly-codegen-verify -basic-aa -polly-scops -polly-allow-nonaffine-branches -polly-allow-nonaffine-loops=false -analyze < /mnt/disks/ssd0/agent/llvm-project/polly/test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll | FileCheck /mnt/disks/ssd0/agent/llvm-project/polly/test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll --check-prefix=INNERMOST

Event Timeline

huihuiz created this revision.Oct 20 2020, 3:26 PM
huihuiz requested review of this revision.Oct 20 2020, 3:26 PM

Current upstream mis-compile, take t.ll , run "llc -mtriple=aarch64-linux-gnu -mattr=+sve < t.ll"

define <vscale x 4 x i32> @test(<vscale x 4 x i32> %a) {
  %pg = shufflevector <vscale x 4 x i1> insertelement (<vscale x 4 x i1> undef, i1 true, i32 0), <vscale x 4 x i1> undef, <vscale x 4 x i32> zeroinitializer
  %elt = insertelement <vscale x 4 x i32> undef, i32 257, i32 0
  %splat = shufflevector <vscale x 4 x i32> %elt, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
  %umin = call <vscale x 4 x i32> @llvm.aarch64.sve.umin.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> %splat)
  ret <vscale x 4 x i32> %umin
}

declare <vscale x 4 x i32> @llvm.aarch64.sve.umin.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>, <vscale x 4 x i32>)

Then you see it's mis-compiled into
// %bb.0:

umin    z0.s, z0.s, #1

This patch fixes this error, and generate
// %bb.0:

mov     w8, #257
ptrue   p0.s
mov     z1.s, w8
umin    z0.s, p0/m, z0.s, z1.s
sdesmalen added inline comments.Oct 21 2020, 9:01 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3146

Maybe I'm missing something obvious, but I don't see why any masking is needed.
Is removing ImmVal = ImmVal & 0xFF; not sufficient?

huihuiz added inline comments.Oct 21 2020, 9:44 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3146

We need to apply masking "ImmVal & 0xFF" for i8 at least. Otherwise we got regression.

Take t.ll below, if we don't apply the mask for i8, then we generate "mov z1.b, #-127 ".
With the mask we generate "umax z0.b, z0.b, #129".
There are similar regression for i8 in range [128, 255]

define <vscale x 16 x i8> @umax_i8_large(<vscale x 16 x i8> %a) {
  %elt = insertelement <vscale x 16 x i8> undef, i8 129, i32 0
  %splat = shufflevector <vscale x 16 x i8> %elt, <vscale x 16 x i8> undef, <vscale x 16 x i32> zeroinitializer
  %cmp = icmp ugt <vscale x 16 x i8> %a, %splat
  %res = select <vscale x 16 x i1> %cmp, <vscale x 16 x i8> %a, <vscale x 16 x i8> %splat
  ret <vscale x 16 x i8> %res
}
efriedma added inline comments.Oct 21 2020, 2:25 PM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3146

To add to this, the SelectionDAG node for the operand looks something like the following: nxv16i8 AArch64ISD::DUP(i32 -127).

sdesmalen accepted this revision.Oct 23 2020, 4:22 AM

LGTM, thanks for fixing @huihuiz!

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3139

nit: DL can be inlined directly into its single use, i.e.

Imm = CurDAG->getTargetConstant(ImmVal, SDLoc(N), MVT::i32);
3146

Thanks, I see what you mean now!

This revision is now accepted and ready to land.Oct 23 2020, 4:22 AM
This revision was landed with ongoing or failed builds.Oct 23 2020, 9:44 AM
This revision was automatically updated to reflect the committed changes.
huihuiz marked an inline comment as done.

Thanks @sdesmalen for the review!
Fixed in the commit patch.