Download Raw Diff

Details

Reviewers

RKSimon
spatel
pengfei
craig.topper

Commits

rG10bb62319281: enable binop identity constant folds for add

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

LuoYuanke created this revision.Feb 12 2022, 8:16 PM

Herald added subscribers: ecnelises, pengfei, hiraditya. · View Herald TranscriptFeb 12 2022, 8:16 PM

LuoYuanke requested review of this revision.Feb 12 2022, 8:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2022, 8:16 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B149253: Diff 408232.Feb 12 2022, 8:50 PM

xbolva00 added a subscriber: xbolva00.Feb 13 2022, 2:27 AM

xbolva00 added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2168	You can use Constantexpr::getBinOpIdentity to check if identity constant.

LuoYuanke added inline comments.Feb 13 2022, 2:47 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

2168

Sorry, this is SDNode and it seems there is no getBinOpIdentity() API.
BTW, there is some regression on this patch.

When the select can be combined with its operands, we don't need to invert the select folding. See below example.

; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vbmi2,+avx512vl --show-mc-encoding | FileCheck %s --check-prefixes=CHECK,X64
define <16 x i16> @test_int_x86_avx512_mask_vpshldv_w_256(<16 x i16> %x0, <16 x i16> %x1, <16 x i16>* %x2p, <16 x i16> %x4, i16 %x3) {
  %x2 = load <16 x i16>, <16 x i16>* %x2p
  %1 = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x2)
  %2 = bitcast i16 %x3 to <16 x i1>
  %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %x0
  %4 = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x4)
  %5 = bitcast i16 %x3 to <16 x i1>
  %6 = select <16 x i1> %5, <16 x i16> %4, <16 x i16> zeroinitializer
  %res3 = add <16 x i16> %3, %6
  ret <16 x i16> %res3
}

declare <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x4)

The freeze node seems to prevent the sub combine for below case.

define <4 x i32> @test_srem_allones(<4 x i32> %X) nounwind {
  %srem = srem <4 x i32> %X, <i32 4294967295, i32 4294967295, i32 4294967295, i32 4294967295>
  %cmp = icmp eq <4 x i32> %srem, <i32 0, i32 0, i32 0, i32 0>
  %ret = zext <4 x i1> %cmp to <4 x i32>
  ret <4 x i32> %ret
}

RKSimon added reviewers: RKSimon, spatel, pengfei.Feb 13 2022, 2:51 AM

RKSimon added a subscriber: RKSimon.

RKSimon added inline comments.

llvm/test/CodeGen/X86/srem-seteq-vec-splat.ll
695 ↗	(On Diff #408232)	Any chance you can track down the missing combine please? The ISD::SUB should definitely fold away, not sure if the ISD::XOR is a zeroinitializer or not - but X86ISD::PCMPEQ will fold to all ones if the inputs are equal. And we should have constant folding for X86ISD::VSRLI

RKSimon added inline comments.Feb 13 2022, 2:53 AM

llvm/test/CodeGen/X86/srem-seteq-vec-splat.ll
695 ↗	(On Diff #408232)	Sorry - missed your reply above!

Fix regression for sub(freeze(x), x).

LuoYuanke marked an inline comment as done.Feb 13 2022, 6:52 AM

LuoYuanke added inline comments.Feb 13 2022, 6:55 AM

llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4241 ↗	(On Diff #408255)	This vmovdqa64 is emitted because the function need to return value by zmm0. Not sure if it is a regression.

LuoYuanke added a reviewer: craig.topper.Feb 13 2022, 6:56 AM

xbolva00 added inline comments.Feb 13 2022, 6:59 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3356	Some general solution? FREEZE should be dropped much sooner, no?

LuoYuanke added inline comments.Feb 13 2022, 7:21 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

3356

It is not dropped soon, because compiler can't guarantee it is NOT undef or poison value.

13469 SDValue DAGCombiner::visitFREEZE(SDNode *N) {
13470   SDValue N0 = N->getOperand(0);
13471
13472   if (DAG.isGuaranteedNotToBeUndefOrPoison(N0, /*PoisonOnly*/ false))
13473     return N0;
13474
13475   return SDValue();
13476 }

The freeze node live until instruction selection.

ISEL: Starting selection on root node: t40: v4i32 = freeze t2

RKSimon added inline comments.Feb 13 2022, 7:29 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3359	Is there anyway we can reduce the scope of this initially, its likely that losing freeze like this might have other effects - if we're just after the (sub x, x) -> 0 fold, maybe create a peekThroughFreeze helper: if (peekThroughFreeze(N0) == peekThroughFreeze(N1)) return tryFoldToZero(DL, TLI, VT, DAG, LegalOperations);

Harbormaster completed remote builds in B149271: Diff 408255.Feb 13 2022, 7:30 AM

RKSimon added inline comments.Feb 13 2022, 9:24 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2168	You can use Constantexpr::getBinOpIdentity to check if identity constant. We already have SelectionDAG::getNeutralElement - I wonder if adding a SelectionDAG::isNeutralElement helper sibling would be useful?

LuoYuanke added inline comments.Feb 13 2022, 5:40 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2168	I prefer to inverting the operation one by one, so that the patch can be small. After all the operators are inverted, we can refactor the code by using isNeutralElement() and getNeutralElement(). What do you think?
3359	Good suggestion. I'll apply your idea. Thanks.

Address Simon's comments.

Harbormaster completed remote builds in B149308: Diff 408306.Feb 13 2022, 6:28 PM

LuoYuanke added inline comments.Feb 13 2022, 6:41 PM

llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4241 ↗	(On Diff #408255)	It seems fold select to its previous operands (psrl) is better, because the add operands is communitive so there is more chance to meet the hint (return register) of register allocator.

spatel mentioned this in D113442: [InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv..Feb 15 2022, 10:19 AM

LuoYuanke added inline comments.Feb 16 2022, 11:41 PM

llvm/test/CodeGen/X86/avx512vnni-intrinsics-upgrade.ll

66 ↗

(On Diff #408306)

This can be improved in RA by evict the previous assigned physical register (zmm0) with below patch, but there is some risk on performance regression, because we change the general RA evicting rule. If anyone concern about this additional vmovdqa64, I can separate sub from add in the patch and we may submit sub patch first.

diff --git a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
index 718e12e5d602..863394fffeb6 100644
--- a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
+++ b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
@@ -168,6 +168,7 @@ bool DefaultEvictionAdvisor::canEvictHintInterference(
     const SmallVirtRegSet &FixedRegisters) const {
   EvictionCost MaxCost;
   MaxCost.setBrokenHints(1);
+  MaxCost.MaxWeight = VirtReg.weight();
   return canEvictInterferenceBasedOnCost(VirtReg, PhysReg, true, MaxCost,
                                          FixedRegisters);
 }

RKSimon added inline comments.Feb 17 2022, 1:06 PM

llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
4241 ↗	(On Diff #408255)	These adds were just used for simplicity to make the result dependent on all 3 intrinsics. We'd avoid all of the intrinsics-upgrade changes if we just changed these add ops to something else, preferably something that we're not going to add to foldSelectWithIdentityConstant in the future. Alternatively we split these tests into the 3 normal / {k} / {k}{z} variants

xbolva00 added inline comments.Feb 17 2022, 1:08 PM

llvm/test/CodeGen/X86/avx512vnni-intrinsics-upgrade.ll
66 ↗	(On Diff #408306)	In any case, Consider posting this patch for RA on Phabricator.

LuoYuanke mentioned this in D120116: [SDAG] enable binop identity constant folds for sub.Feb 18 2022, 5:06 AM

LuoYuanke mentioned this in rG67ef63138b28: [SDAG] enable binop identity constant folds for sub.Feb 20 2022, 5:46 PM

RKSimon added inline comments.Feb 22 2022, 8:29 AM

llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll

4241 ↗

(On Diff #408255)

@LuoYuanke Something that might work is to return a { <8 x i64>, <8 x i64>, <8 x i64> } structure : https://gcc.godbolt.org/z/39ahrqM7E

define { <8 x i64>, <8 x i64>, <8 x i64> } @test_int_x86_avx512_mask_psrl_qi_512(<8 x i64> %x0, i32 %x1, <8 x i64> %x2, i8 %x3) {
  %res = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 4, <8 x i64> %x2, i8 %x3)
  %res1 = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 5, <8 x i64> %x2, i8 -1)
  %res2 = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 6, <8 x i64> zeroinitializer, i8 %x3)

  %r0 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } poison, <8 x i64> %res, 0
  %r1 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } %r0, <8 x i64> %res1, 1
  %r2 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } %r1, <8 x i64> %res2, 2
  ret { <8 x i64>, <8 x i64>, <8 x i64> } %r2
}
declare <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64>, i32, <8 x i64>, i8)

test_int_x86_avx512_mask_psrl_qi_512:   # @test_int_x86_avx512_mask_psrl_qi_512
        vmovdqa64       %zmm1, %zmm3            # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xd9]
        kmovw   %esi, %k1                       # encoding: [0xc5,0xf8,0x92,0xce]
        vpsrlq  $4, %zmm0, %zmm3 {%k1}          # encoding: [0x62,0xf1,0xe5,0x49,0x73,0xd0,0x04]
        vpsrlq  $5, %zmm0, %zmm1                # encoding: [0x62,0xf1,0xf5,0x48,0x73,0xd0,0x05]
        vpsrlq  $6, %zmm0, %zmm2 {%k1} {z}      # encoding: [0x62,0xf1,0xed,0xc9,0x73,0xd0,0x06]
        vmovdqa64       %zmm3, %zmm0            # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc3]
        retq                                    # encoding: [0xc3]

RKSimon mentioned this in rGec9b709a7382: [X86] Update AVX512-VNNI mask intrinsic tests to avoid adds.Mar 6 2022, 9:06 AM

RKSimon mentioned this in rG1bd836fa1087: [X86] Update AVX512 rotate intrinsic tests to avoid adds.

RKSimon mentioned this in rG830ba4cebe79: [X86] Update AVX512-BW mask intrinsic tests to avoid adds.Mar 6 2022, 9:24 AM

@LuoYuanke Please can you rebase and add test coverage for 'add+select' to vector-bo-select.ll? I've updated some of the intrinsic tests to avoid the issue so these shouldn't show up any more, I'll finish this cleanup when I have a free moment.

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2022, 9:28 AM

Rebase.

In D119654#3362458, @RKSimon wrote:

@LuoYuanke Please can you rebase and add test coverage for 'add+select' to vector-bo-select.ll? I've updated some of the intrinsic tests to avoid the issue so these shouldn't show up any more, I'll finish this cleanup when I have a free moment.

@RKSimon, it is very kind of you. Thanks! Let me follow your approach and update other test cases.

Harbormaster completed remote builds in B152841: Diff 413348.Mar 7 2022, 12:35 AM

LuoYuanke mentioned this in rGbe85f55b2dcb: [X86] Update some of the AVX512 intrinsic tests to avoid adds..Mar 7 2022, 1:03 AM

RKSimon retitled this revision from [SDAG] enable binop identity constant folds for add/sub to [SDAG] enable binop identity constant folds for add.Mar 8 2022, 12:10 AM

LuoYuanke mentioned this in D121188: [X86] Update rvx512vbmi2 intrinsic tests to avoid adds.Mar 8 2022, 12:36 AM

LuoYuanke mentioned this in rG1a423831a641: [X86] Update avx512vbmi2 intrinsic tests to avoid adds.Mar 8 2022, 12:46 AM

LuoYuanke mentioned this in D121196: [X86] Update avx512vbmi2 intrinsic tests to avoid adds.Mar 8 2022, 1:48 AM

LuoYuanke mentioned this in rG5494769e323a: [X86] Update avx512vbmi2 intrinsic tests to avoid adds.Mar 8 2022, 1:50 AM

Rebase

Harbormaster completed remote builds in B153104: Diff 413738.Mar 8 2022, 2:56 AM

RKSimon mentioned this in rGf0e3972f08e0: [X86] Add add / mul identity select tests for D119654.Mar 8 2022, 3:33 AM

RKSimon mentioned this in rG36e4ad1ed0f0: [X86] Add shift identity select tests.Mar 9 2022, 6:27 AM

RKSimon mentioned this in rG56021d0ae047: [X86] Update AVX512VL intrinsic tests to avoid adds.Mar 13 2022, 10:20 AM

RKSimon mentioned this in rG3e4950d7fa78: [X86] Update AVX512 intrinsic tests to avoid adds.Mar 13 2022, 10:31 AM

@LuoYuanke please can you rebase this?

Rebase.

LuoYuanke mentioned this in D121563: [X86] Update avx512vbmi2vl intrinsic tests to avoid adds.Mar 13 2022, 6:18 PM

Harbormaster completed remote builds in B154014: Diff 414968.Mar 13 2022, 6:18 PM

LuoYuanke mentioned this in rGec06edc6fa0a: [X86] Update avx512vbmi2vl intrinsic tests to avoid adds.Mar 13 2022, 6:30 PM

Rebase.

Harbormaster completed remote builds in B154017: Diff 414972.Mar 13 2022, 7:25 PM

RKSimon mentioned this in rG1f09c7d16d71: [X86] Update AVX512 VBMI2 VL intrinsic tests to avoid adds.Mar 14 2022, 3:57 AM

RKSimon mentioned this in rG2dacd0d9c3e9: [X86] Update remaining AVX512 VBMI2 VL intrinsic tests to avoid adds.Mar 19 2022, 8:42 AM

@LuoYuanke rebase? I think this might be ready now

Rebase.

In D119654#3394240, @RKSimon wrote:

@LuoYuanke rebase? I think this might be ready now

Thanks, Simon. Yes, I think it is ready now.

Harbormaster completed remote builds in B155240: Diff 416753.Mar 19 2022, 7:29 PM

LGTM - cheers!

This revision is now accepted and ready to land.Mar 20 2022, 1:36 AM

This revision was landed with ongoing or failed builds.Mar 20 2022, 4:26 AM

Closed by commit rG10bb62319281: enable binop identity constant folds for add (authored by LuoYuanke). · Explain Why

This revision was automatically updated to reflect the committed changes.

LuoYuanke added a commit: rG10bb62319281: enable binop identity constant folds for add.

Diff 416770

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,155 Lines • ▼ Show 20 Lines	if (ConstantFPSDNode *C = isConstOrConstSplatFP(V)) {
return C->isZero() && !C->isNegative();		return C->isZero() && !C->isNegative();
case ISD::FMUL: // X * 1.0 --> X		case ISD::FMUL: // X * 1.0 --> X
case ISD::FDIV: // X / 1.0 --> X		case ISD::FDIV: // X / 1.0 --> X
return C->isExactlyValue(1.0);		return C->isExactlyValue(1.0);
}		}
}		}
if (ConstantSDNode *C = isConstOrConstSplat(V)) {		if (ConstantSDNode *C = isConstOrConstSplat(V)) {
switch (Opcode) {		switch (Opcode) {
		case ISD::ADD: // X + 0 --> X
case ISD::SUB: // X - 0 --> X		case ISD::SUB: // X - 0 --> X
return C->isZero();		return C->isZero();
}		}
}		}
		xbolva00Unsubmitted Not Done Reply Inline Actions You can use Constantexpr::getBinOpIdentity to check if identity constant. xbolva00: You can use Constantexpr::getBinOpIdentity to check if identity constant.
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Sorry, this is SDNode and it seems there is no getBinOpIdentity() API. BTW, there is some regression on this patch. When the select can be combined with its operands, we don't need to invert the select folding. See below example. ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vbmi2,+avx512vl --show-mc-encoding \| FileCheck %s --check-prefixes=CHECK,X64 define <16 x i16> @test_int_x86_avx512_mask_vpshldv_w_256(<16 x i16> %x0, <16 x i16> %x1, <16 x i16>* %x2p, <16 x i16> %x4, i16 %x3) { %x2 = load <16 x i16>, <16 x i16>* %x2p %1 = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x2) %2 = bitcast i16 %x3 to <16 x i1> %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %x0 %4 = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x4) %5 = bitcast i16 %x3 to <16 x i1> %6 = select <16 x i1> %5, <16 x i16> %4, <16 x i16> zeroinitializer %res3 = add <16 x i16> %3, %6 ret <16 x i16> %res3 } declare <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x4) The freeze node seems to prevent the sub combine for below case. define <4 x i32> @test_srem_allones(<4 x i32> %X) nounwind { %srem = srem <4 x i32> %X, <i32 4294967295, i32 4294967295, i32 4294967295, i32 4294967295> %cmp = icmp eq <4 x i32> %srem, <i32 0, i32 0, i32 0, i32 0> %ret = zext <4 x i1> %cmp to <4 x i32> ret <4 x i32> %ret } LuoYuanke: Sorry, this is SDNode and it seems there is no getBinOpIdentity() API. BTW, there is some…
		RKSimonUnsubmitted Not Done Reply Inline Actions You can use Constantexpr::getBinOpIdentity to check if identity constant. We already have SelectionDAG::getNeutralElement - I wonder if adding a SelectionDAG::isNeutralElement helper sibling would be useful? RKSimon: > You can use Constantexpr::getBinOpIdentity to check if identity constant. We already have…
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I prefer to inverting the operation one by one, so that the patch can be small. After all the operators are inverted, we can refactor the code by using isNeutralElement() and getNeutralElement(). What do you think? LuoYuanke: I prefer to inverting the operation one by one, so that the patch can be small. After all the…
return false;		return false;
};		};

// This transform increases uses of N0, so freeze it to be safe.		// This transform increases uses of N0, so freeze it to be safe.
// binop N0, (vselect Cond, IDC, FVal) --> vselect Cond, N0, (binop N0, FVal)		// binop N0, (vselect Cond, IDC, FVal) --> vselect Cond, N0, (binop N0, FVal)
if (isIdentityConstantForOpcode(Opcode, TVal)) {		if (isIdentityConstantForOpcode(Opcode, TVal)) {
SDValue F0 = DAG.getFreeze(N0);		SDValue F0 = DAG.getFreeze(N0);
SDValue NewBO = DAG.getNode(Opcode, SDLoc(N), VT, F0, FVal, N->getFlags());		SDValue NewBO = DAG.getNode(Opcode, SDLoc(N), VT, F0, FVal, N->getFlags());
▲ Show 20 Lines • Show All 1,171 Lines • ▼ Show 20 Lines
}		}

SDValue DAGCombiner::visitSUB(SDNode *N) {		SDValue DAGCombiner::visitSUB(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
SDLoc DL(N);		SDLoc DL(N);

auto PeekThroughFreeze = [](SDValue N) {		auto PeekThroughFreeze = [](SDValue N) {
		xbolva00Unsubmitted Not Done Reply Inline Actions Some general solution? FREEZE should be dropped much sooner, no? xbolva00: Some general solution? FREEZE should be dropped much sooner, no?
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions It is not dropped soon, because compiler can't guarantee it is NOT undef or poison value. 13469 SDValue DAGCombiner::visitFREEZE(SDNode N) { 13470 SDValue N0 = N->getOperand(0); 13471 13472 if (DAG.isGuaranteedNotToBeUndefOrPoison(N0, /PoisonOnly/ false)) 13473 return N0; 13474 13475 return SDValue(); 13476 } The freeze node live until instruction selection. ISEL: Starting selection on root node: t40: v4i32 = freeze t2 LuoYuanke:* It is not dropped soon, because compiler can't guarantee it is NOT undef or poison value. ```…
if (N->getOpcode() == ISD::FREEZE && N.hasOneUse())		if (N->getOpcode() == ISD::FREEZE && N.hasOneUse())
return N->getOperand(0);		return N->getOperand(0);
return N;		return N;
		RKSimonUnsubmitted Not Done Reply Inline Actions Is there anyway we can reduce the scope of this initially, its likely that losing freeze like this might have other effects - if we're just after the (sub x, x) -> 0 fold, maybe create a peekThroughFreeze helper: if (peekThroughFreeze(N0) == peekThroughFreeze(N1)) return tryFoldToZero(DL, TLI, VT, DAG, LegalOperations); RKSimon: Is there anyway we can reduce the scope of this initially, its likely that losing freeze like…
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Good suggestion. I'll apply your idea. Thanks. LuoYuanke: Good suggestion. I'll apply your idea. Thanks.
};		};

// fold (sub x, x) -> 0		// fold (sub x, x) -> 0
// FIXME: Refactor this and xor and other similar operations together.		// FIXME: Refactor this and xor and other similar operations together.
if (PeekThroughFreeze(N0) == PeekThroughFreeze(N1))		if (PeekThroughFreeze(N0) == PeekThroughFreeze(N1))
return tryFoldToZero(DL, TLI, VT, DAG, LegalOperations);		return tryFoldToZero(DL, TLI, VT, DAG, LegalOperations);

// fold (sub c1, c2) -> c3		// fold (sub c1, c2) -> c3
▲ Show 20 Lines • Show All 21,101 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-bo-select.ll

	Show First 20 Lines • Show All 766 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vpaddd %xmm0, %xmm1, %xmm0			; AVX512F-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: add_v4i32:			; AVX512VL-LABEL: add_v4i32:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpslld $31, %xmm0, %xmm0			; AVX512VL-NEXT: vpslld $31, %xmm0, %xmm0
	; AVX512VL-NEXT: vptestmd %xmm0, %xmm0, %k1			; AVX512VL-NEXT: vptestmd %xmm0, %xmm0, %k1
	; AVX512VL-NEXT: vmovdqa32 %xmm2, %xmm0 {%k1} {z}			; AVX512VL-NEXT: vpaddd %xmm2, %xmm1, %xmm1 {%k1}
	; AVX512VL-NEXT: vpaddd %xmm0, %xmm1, %xmm0			; AVX512VL-NEXT: vmovdqa %xmm1, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%s = select <4 x i1> %b, <4 x i32> %y, <4 x i32> zeroinitializer			%s = select <4 x i1> %b, <4 x i32> %y, <4 x i32> zeroinitializer
	%r = add <4 x i32> %x, %s			%r = add <4 x i32> %x, %s
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <8 x i32> @add_v8i32_commute(<8 x i1> %b, <8 x i32> noundef %x, <8 x i32> noundef %y) {			define <8 x i32> @add_v8i32_commute(<8 x i1> %b, <8 x i32> noundef %x, <8 x i32> noundef %y) {
	; AVX2-LABEL: add_v8i32_commute:			; AVX2-LABEL: add_v8i32_commute:
	Show All 15 Lines
	; AVX512F-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: add_v8i32_commute:			; AVX512VL-LABEL: add_v8i32_commute:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovsxwd %xmm0, %ymm0			; AVX512VL-NEXT: vpmovsxwd %xmm0, %ymm0
	; AVX512VL-NEXT: vpslld $31, %ymm0, %ymm0			; AVX512VL-NEXT: vpslld $31, %ymm0, %ymm0
	; AVX512VL-NEXT: vptestmd %ymm0, %ymm0, %k1			; AVX512VL-NEXT: vptestmd %ymm0, %ymm0, %k1
	; AVX512VL-NEXT: vmovdqa32 %ymm2, %ymm0 {%k1} {z}			; AVX512VL-NEXT: vpaddd %ymm2, %ymm1, %ymm1 {%k1}
	; AVX512VL-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vmovdqa %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%s = select <8 x i1> %b, <8 x i32> %y, <8 x i32> zeroinitializer			%s = select <8 x i1> %b, <8 x i32> %y, <8 x i32> zeroinitializer
	%r = add <8 x i32> %s, %x			%r = add <8 x i32> %s, %x
	ret <8 x i32> %r			ret <8 x i32> %r
	}			}

	define <8 x i32> @add_v8i32_cast_cond(i8 noundef zeroext %pb, <8 x i32> noundef %x, <8 x i32> noundef %y) {			define <8 x i32> @add_v8i32_cast_cond(i8 noundef zeroext %pb, <8 x i32> noundef %x, <8 x i32> noundef %y) {
	; AVX2-LABEL: add_v8i32_cast_cond:			; AVX2-LABEL: add_v8i32_cast_cond:
	Show All 13 Lines
	; AVX512F-NEXT: kmovw %edi, %k1			; AVX512F-NEXT: kmovw %edi, %k1
	; AVX512F-NEXT: vmovdqa32 %zmm1, %zmm1 {%k1} {z}			; AVX512F-NEXT: vmovdqa32 %zmm1, %zmm1 {%k1} {z}
	; AVX512F-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: add_v8i32_cast_cond:			; AVX512VL-LABEL: add_v8i32_cast_cond:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: kmovw %edi, %k1			; AVX512VL-NEXT: kmovw %edi, %k1
	; AVX512VL-NEXT: vmovdqa32 %ymm1, %ymm1 {%k1} {z}			; AVX512VL-NEXT: vpaddd %ymm1, %ymm0, %ymm0 {%k1}
	; AVX512VL-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%b = bitcast i8 %pb to <8 x i1>			%b = bitcast i8 %pb to <8 x i1>
	%s = select <8 x i1> %b, <8 x i32> %y, <8 x i32> zeroinitializer			%s = select <8 x i1> %b, <8 x i32> %y, <8 x i32> zeroinitializer
	%r = add <8 x i32> %x, %s			%r = add <8 x i32> %x, %s
	ret <8 x i32> %r			ret <8 x i32> %r
	}			}

	define <8 x i64> @add_v8i64_cast_cond(i8 noundef zeroext %pb, <8 x i64> noundef %x, <8 x i64> noundef %y) {			define <8 x i64> @add_v8i64_cast_cond(i8 noundef zeroext %pb, <8 x i64> noundef %x, <8 x i64> noundef %y) {
	Show All 11 Lines
	; AVX2-NEXT: vpand %ymm2, %ymm4, %ymm2			; AVX2-NEXT: vpand %ymm2, %ymm4, %ymm2
	; AVX2-NEXT: vpaddq %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddq %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddq %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddq %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: add_v8i64_cast_cond:			; AVX512-LABEL: add_v8i64_cast_cond:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: kmovw %edi, %k1			; AVX512-NEXT: kmovw %edi, %k1
	; AVX512-NEXT: vmovdqa64 %zmm1, %zmm1 {%k1} {z}			; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0 {%k1}
	; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%b = bitcast i8 %pb to <8 x i1>			%b = bitcast i8 %pb to <8 x i1>
	%s = select <8 x i1> %b, <8 x i64> %y, <8 x i64> zeroinitializer			%s = select <8 x i1> %b, <8 x i64> %y, <8 x i64> zeroinitializer
	%r = add <8 x i64> %x, %s			%r = add <8 x i64> %x, %s
	ret <8 x i64> %r			ret <8 x i64> %r
	}			}

	define <4 x i32> @sub_v4i32(<4 x i1> %b, <4 x i32> noundef %x, <4 x i32> noundef %y) {			define <4 x i32> @sub_v4i32(<4 x i1> %b, <4 x i32> noundef %x, <4 x i32> noundef %y) {
	▲ Show 20 Lines • Show All 914 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] enable binop identity constant folds for add
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 416770

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/X86/vector-bo-select.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] enable binop identity constant folds for addClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 416770

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/X86/vector-bo-select.ll

[SDAG] enable binop identity constant folds for add
ClosedPublic