This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors
ClosedPublic

Authored by dmgreen on Feb 17 2022, 1:09 AM.

Details

Summary

We have a combine for converting mul(dup(ext(..)), ...) into mul(ext(dup(..)), ..), for allowing more uses of smull and umull instructions. Currently it looks for vector insert and shuffle vectors to detect the element that we can convert to a vector extend. Not all cases will have a shufflevector/insert element though.

This started by extending the recognition to buildvectors (with elements that may be individually extended). The new method seems to cover all the cases that the old method captured though, as the shuffle will eventually be lowered to buildvectors, so the old method has been removed to keep the code a little simpler. The new code detects legal build_vector(ext(a), ext(b), ..), converting them to ext(build_vector(a, b, ..)) providing all the extends/types match up.

Found whilst looking at D96522 / D118979.

Diff Detail

Event Timeline

dmgreen created this revision.Feb 17 2022, 1:09 AM
dmgreen requested review of this revision.Feb 17 2022, 1:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2022, 1:09 AM

Looks good overall, with potential wins in both performance and codesize. Just 1 minor thing (and a nitpick about a comment).

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13512

Do we want to keep this comment (or replace it, rather than remove it)?

13542–13543

Is this correct? The previous behaviour used PreExtendType directly as an operand to DAG.getAnyExtOrTrunc (which kept it aligned with PreExtendVT), but this clamps it to a smallest type of i32.

dmgreen added inline comments.Feb 17 2022, 5:50 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13512

Oh yeah I can add this back in. I previously had both performCommonVectorExtendCombine and performBuildVectorExtendCombine methods being called, but removing the shuffle version made no effect on any of the tests we have if I removed it. The methods are still quite similar in places.

13542–13543

It needs to be a legal type, so we can't just create i8 or i16 out of nowhere. The BUILD_VECTOR can be made up of elements that are implicitly truncated.

/// BUILD_VECTOR(ELT0, ELT1, ELT2, ELT3,...) - Return a fixed-width vector
/// with the specified, possibly variable, elements. The types of the
/// operands must match the vector element type, except that integer types
/// are allowed to be larger than the element type, in which case the
/// operands are implicitly truncated. The types of the operands must all
/// be the same.
dmgreen updated this revision to Diff 409615.Feb 17 2022, 5:52 AM

Add back comment

NickGuy accepted this revision.Feb 17 2022, 5:53 AM

I wasn't aware of the implicit truncation. In that case, LGTM

This revision is now accepted and ready to land.Feb 17 2022, 5:53 AM
This revision was landed with ongoing or failed builds.Feb 21 2022, 7:44 AM
This revision was automatically updated to reflect the committed changes.
srj added a subscriber: srj.Feb 22 2022, 10:11 AM

FYI, this seems to have injected a build failure in Halide, but unfortunately I don't have a repro case yet (it's only been isolated in some proprietary code). Will work on providing one.

OK thanks. Let me know. Hopefully a reproduce should be able to be relatively small, given what this patch altered.

Reduced test case:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-eabi"

define void @foo() local_unnamed_addr {
  %tmp = xor <16 x i1> zeroinitializer, <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
  %tmp6 = load <8 x i16>, <8 x i16>* null, align 2
  %tmp7 = sub <8 x i16> zeroinitializer, %tmp6
  %tmp8 = shufflevector <8 x i16> %tmp7, <8 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
  %tmp9 = icmp slt i64 0, undef
  %tmp10 = zext i1 %tmp9 to i16
  %tmp11 = insertelement <16 x i16> undef, i16 %tmp10, i64 0
  %tmp12 = shufflevector <16 x i16> %tmp11, <16 x i16> undef, <16 x i32> zeroinitializer
  %tmp13 = mul nuw <16 x i16> %tmp8, %tmp12
  %tmp14 = icmp ne <16 x i16> %tmp13, zeroinitializer
  %tmp15 = and <16 x i1> %tmp14, %tmp
  %tmp16 = sext <16 x i1> %tmp15 to <16 x i8>
  %tmp17 = bitcast i8* undef to <16 x i8>*
  store <16 x i8> %tmp16, <16 x i8>* %tmp17, align 1
  ret void
}
$ llc t.ll
Value type is non-standard value, Other.
UNREACHABLE executed at llvm-project/llvm/include/llvm/Support/MachineValueType.h:869!

Oh it's an i1 type. I see.

Thanks for the reproducer.

Because the fix is pretty simple, I will recommit this with a fix for i1 types. Please let me know if there are any further problems.