This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] enable binop identity constant folds for add
ClosedPublic

Authored by LuoYuanke on Feb 12 2022, 8:16 PM.

Diff Detail

Event Timeline

LuoYuanke created this revision.Feb 12 2022, 8:16 PM
LuoYuanke requested review of this revision.Feb 12 2022, 8:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2022, 8:16 PM
xbolva00 added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2148

You can use Constantexpr::getBinOpIdentity to check if identity constant.

LuoYuanke added inline comments.Feb 13 2022, 2:47 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2148

Sorry, this is SDNode and it seems there is no getBinOpIdentity() API.
BTW, there is some regression on this patch.

  1. When the select can be combined with its operands, we don't need to invert the select folding. See below example.
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vbmi2,+avx512vl --show-mc-encoding | FileCheck %s --check-prefixes=CHECK,X64
define <16 x i16> @test_int_x86_avx512_mask_vpshldv_w_256(<16 x i16> %x0, <16 x i16> %x1, <16 x i16>* %x2p, <16 x i16> %x4, i16 %x3) {
  %x2 = load <16 x i16>, <16 x i16>* %x2p
  %1 = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x2)
  %2 = bitcast i16 %x3 to <16 x i1>
  %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %x0
  %4 = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x4)
  %5 = bitcast i16 %x3 to <16 x i1>
  %6 = select <16 x i1> %5, <16 x i16> %4, <16 x i16> zeroinitializer
  %res3 = add <16 x i16> %3, %6
  ret <16 x i16> %res3
}

declare <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x4)
  1. The freeze node seems to prevent the sub combine for below case.
define <4 x i32> @test_srem_allones(<4 x i32> %X) nounwind {
  %srem = srem <4 x i32> %X, <i32 4294967295, i32 4294967295, i32 4294967295, i32 4294967295>
  %cmp = icmp eq <4 x i32> %srem, <i32 0, i32 0, i32 0, i32 0>
  %ret = zext <4 x i1> %cmp to <4 x i32>
  ret <4 x i32> %ret
}
RKSimon added a subscriber: RKSimon.
RKSimon added inline comments.
llvm/test/CodeGen/X86/srem-seteq-vec-splat.ll
695 ↗(On Diff #408232)

Any chance you can track down the missing combine please?

The ISD::SUB should definitely fold away, not sure if the ISD::XOR is a zeroinitializer or not - but X86ISD::PCMPEQ will fold to all ones if the inputs are equal. And we should have constant folding for X86ISD::VSRLI

RKSimon added inline comments.Feb 13 2022, 2:53 AM
llvm/test/CodeGen/X86/srem-seteq-vec-splat.ll
695 ↗(On Diff #408232)

Sorry - missed your reply above!

LuoYuanke updated this revision to Diff 408255.Feb 13 2022, 6:51 AM

Fix regression for sub(freeze(x), x).

LuoYuanke marked an inline comment as done.Feb 13 2022, 6:52 AM
LuoYuanke added inline comments.Feb 13 2022, 6:55 AM
llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4241

This vmovdqa64 is emitted because the function need to return value by zmm0. Not sure if it is a regression.

xbolva00 added inline comments.Feb 13 2022, 6:59 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3333

Some general solution? FREEZE should be dropped much sooner, no?

LuoYuanke added inline comments.Feb 13 2022, 7:21 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3333

It is not dropped soon, because compiler can't guarantee it is NOT undef or poison value.

13469 SDValue DAGCombiner::visitFREEZE(SDNode *N) {
13470   SDValue N0 = N->getOperand(0);
13471
13472   if (DAG.isGuaranteedNotToBeUndefOrPoison(N0, /*PoisonOnly*/ false))
13473     return N0;
13474
13475   return SDValue();
13476 }

The freeze node live until instruction selection.

ISEL: Starting selection on root node: t40: v4i32 = freeze t2
RKSimon added inline comments.Feb 13 2022, 7:29 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3336

Is there anyway we can reduce the scope of this initially, its likely that losing freeze like this might have other effects - if we're just after the (sub x, x) -> 0 fold, maybe create a peekThroughFreeze helper:

if (peekThroughFreeze(N0) == peekThroughFreeze(N1))

return tryFoldToZero(DL, TLI, VT, DAG, LegalOperations);
RKSimon added inline comments.Feb 13 2022, 9:24 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2148

You can use Constantexpr::getBinOpIdentity to check if identity constant.

We already have SelectionDAG::getNeutralElement - I wonder if adding a SelectionDAG::isNeutralElement helper sibling would be useful?

LuoYuanke added inline comments.Feb 13 2022, 5:40 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2148

I prefer to inverting the operation one by one, so that the patch can be small. After all the operators are inverted, we can refactor the code by using isNeutralElement() and getNeutralElement(). What do you think?

3336

Good suggestion. I'll apply your idea. Thanks.

LuoYuanke updated this revision to Diff 408306.Feb 13 2022, 5:47 PM

Address Simon's comments.

LuoYuanke added inline comments.Feb 13 2022, 6:41 PM
llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4241

It seems fold select to its previous operands (psrl) is better, because the add operands is communitive so there is more chance to meet the hint (return register) of register allocator.

LuoYuanke added inline comments.Feb 16 2022, 11:41 PM
llvm/test/CodeGen/X86/avx512vnni-intrinsics-upgrade.ll
66

This can be improved in RA by evict the previous assigned physical register (zmm0) with below patch, but there is some risk on performance regression, because we change the general RA evicting rule. If anyone concern about this additional vmovdqa64, I can separate sub from add in the patch and we may submit sub patch first.

diff --git a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
index 718e12e5d602..863394fffeb6 100644
--- a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
+++ b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
@@ -168,6 +168,7 @@ bool DefaultEvictionAdvisor::canEvictHintInterference(
     const SmallVirtRegSet &FixedRegisters) const {
   EvictionCost MaxCost;
   MaxCost.setBrokenHints(1);
+  MaxCost.MaxWeight = VirtReg.weight();
   return canEvictInterferenceBasedOnCost(VirtReg, PhysReg, true, MaxCost,
                                          FixedRegisters);
 }
RKSimon added inline comments.Feb 17 2022, 1:06 PM
llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4241

These adds were just used for simplicity to make the result dependent on all 3 intrinsics.

We'd avoid all of the intrinsics-upgrade changes if we just changed these add ops to something else, preferably something that we're not going to add to foldSelectWithIdentityConstant in the future.

Alternatively we split these tests into the 3 normal / {k} / {k}{z} variants

xbolva00 added inline comments.Feb 17 2022, 1:08 PM
llvm/test/CodeGen/X86/avx512vnni-intrinsics-upgrade.ll
66

In any case, Consider posting this patch for RA on Phabricator.

RKSimon added inline comments.Feb 22 2022, 8:29 AM
llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4241

@LuoYuanke Something that might work is to return a { <8 x i64>, <8 x i64>, <8 x i64> } structure : https://gcc.godbolt.org/z/39ahrqM7E

define { <8 x i64>, <8 x i64>, <8 x i64> } @test_int_x86_avx512_mask_psrl_qi_512(<8 x i64> %x0, i32 %x1, <8 x i64> %x2, i8 %x3) {
  %res = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 4, <8 x i64> %x2, i8 %x3)
  %res1 = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 5, <8 x i64> %x2, i8 -1)
  %res2 = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 6, <8 x i64> zeroinitializer, i8 %x3)

  %r0 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } poison, <8 x i64> %res, 0
  %r1 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } %r0, <8 x i64> %res1, 1
  %r2 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } %r1, <8 x i64> %res2, 2
  ret { <8 x i64>, <8 x i64>, <8 x i64> } %r2
}
declare <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64>, i32, <8 x i64>, i8)

test_int_x86_avx512_mask_psrl_qi_512:   # @test_int_x86_avx512_mask_psrl_qi_512
        vmovdqa64       %zmm1, %zmm3            # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xd9]
        kmovw   %esi, %k1                       # encoding: [0xc5,0xf8,0x92,0xce]
        vpsrlq  $4, %zmm0, %zmm3 {%k1}          # encoding: [0x62,0xf1,0xe5,0x49,0x73,0xd0,0x04]
        vpsrlq  $5, %zmm0, %zmm1                # encoding: [0x62,0xf1,0xf5,0x48,0x73,0xd0,0x05]
        vpsrlq  $6, %zmm0, %zmm2 {%k1} {z}      # encoding: [0x62,0xf1,0xed,0xc9,0x73,0xd0,0x06]
        vmovdqa64       %zmm3, %zmm0            # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc3]
        retq                                    # encoding: [0xc3]

@LuoYuanke Please can you rebase and add test coverage for 'add+select' to vector-bo-select.ll? I've updated some of the intrinsic tests to avoid the issue so these shouldn't show up any more, I'll finish this cleanup when I have a free moment.

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2022, 9:28 AM

@LuoYuanke Please can you rebase and add test coverage for 'add+select' to vector-bo-select.ll? I've updated some of the intrinsic tests to avoid the issue so these shouldn't show up any more, I'll finish this cleanup when I have a free moment.

@RKSimon, it is very kind of you. Thanks! Let me follow your approach and update other test cases.

RKSimon retitled this revision from [SDAG] enable binop identity constant folds for add/sub to [SDAG] enable binop identity constant folds for add.Mar 8 2022, 12:10 AM

@LuoYuanke please can you rebase this?

@LuoYuanke rebase? I think this might be ready now

@LuoYuanke rebase? I think this might be ready now

Thanks, Simon. Yes, I think it is ready now.

RKSimon accepted this revision.Mar 20 2022, 1:36 AM

LGTM - cheers!

This revision is now accepted and ready to land.Mar 20 2022, 1:36 AM
This revision was landed with ongoing or failed builds.Mar 20 2022, 4:26 AM
This revision was automatically updated to reflect the committed changes.