Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2168 | You can use Constantexpr::getBinOpIdentity to check if identity constant. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2168 | Sorry, this is SDNode and it seems there is no getBinOpIdentity() API.
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vbmi2,+avx512vl --show-mc-encoding | FileCheck %s --check-prefixes=CHECK,X64 define <16 x i16> @test_int_x86_avx512_mask_vpshldv_w_256(<16 x i16> %x0, <16 x i16> %x1, <16 x i16>* %x2p, <16 x i16> %x4, i16 %x3) { %x2 = load <16 x i16>, <16 x i16>* %x2p %1 = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x2) %2 = bitcast i16 %x3 to <16 x i1> %3 = select <16 x i1> %2, <16 x i16> %1, <16 x i16> %x0 %4 = call <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x4) %5 = bitcast i16 %x3 to <16 x i1> %6 = select <16 x i1> %5, <16 x i16> %4, <16 x i16> zeroinitializer %res3 = add <16 x i16> %3, %6 ret <16 x i16> %res3 } declare <16 x i16> @llvm.fshl.v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x4)
define <4 x i32> @test_srem_allones(<4 x i32> %X) nounwind { %srem = srem <4 x i32> %X, <i32 4294967295, i32 4294967295, i32 4294967295, i32 4294967295> %cmp = icmp eq <4 x i32> %srem, <i32 0, i32 0, i32 0, i32 0> %ret = zext <4 x i1> %cmp to <4 x i32> ret <4 x i32> %ret } |
llvm/test/CodeGen/X86/srem-seteq-vec-splat.ll | ||
---|---|---|
695 ↗ | (On Diff #408232) | Any chance you can track down the missing combine please? The ISD::SUB should definitely fold away, not sure if the ISD::XOR is a zeroinitializer or not - but X86ISD::PCMPEQ will fold to all ones if the inputs are equal. And we should have constant folding for X86ISD::VSRLI |
llvm/test/CodeGen/X86/srem-seteq-vec-splat.ll | ||
---|---|---|
695 ↗ | (On Diff #408232) | Sorry - missed your reply above! |
llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll | ||
---|---|---|
4241 ↗ | (On Diff #408255) | This vmovdqa64 is emitted because the function need to return value by zmm0. Not sure if it is a regression. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
3356 | Some general solution? FREEZE should be dropped much sooner, no? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
3356 | It is not dropped soon, because compiler can't guarantee it is NOT undef or poison value. 13469 SDValue DAGCombiner::visitFREEZE(SDNode *N) { 13470 SDValue N0 = N->getOperand(0); 13471 13472 if (DAG.isGuaranteedNotToBeUndefOrPoison(N0, /*PoisonOnly*/ false)) 13473 return N0; 13474 13475 return SDValue(); 13476 } The freeze node live until instruction selection. ISEL: Starting selection on root node: t40: v4i32 = freeze t2 |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
3359 | Is there anyway we can reduce the scope of this initially, its likely that losing freeze like this might have other effects - if we're just after the (sub x, x) -> 0 fold, maybe create a peekThroughFreeze helper: if (peekThroughFreeze(N0) == peekThroughFreeze(N1)) return tryFoldToZero(DL, TLI, VT, DAG, LegalOperations); |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2168 |
We already have SelectionDAG::getNeutralElement - I wonder if adding a SelectionDAG::isNeutralElement helper sibling would be useful? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2168 | I prefer to inverting the operation one by one, so that the patch can be small. After all the operators are inverted, we can refactor the code by using isNeutralElement() and getNeutralElement(). What do you think? | |
3359 | Good suggestion. I'll apply your idea. Thanks. |
llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll | ||
---|---|---|
4241 ↗ | (On Diff #408255) | It seems fold select to its previous operands (psrl) is better, because the add operands is communitive so there is more chance to meet the hint (return register) of register allocator. |
llvm/test/CodeGen/X86/avx512vnni-intrinsics-upgrade.ll | ||
---|---|---|
66 ↗ | (On Diff #408306) | This can be improved in RA by evict the previous assigned physical register (zmm0) with below patch, but there is some risk on performance regression, because we change the general RA evicting rule. If anyone concern about this additional vmovdqa64, I can separate sub from add in the patch and we may submit sub patch first. diff --git a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp index 718e12e5d602..863394fffeb6 100644 --- a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp +++ b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp @@ -168,6 +168,7 @@ bool DefaultEvictionAdvisor::canEvictHintInterference( const SmallVirtRegSet &FixedRegisters) const { EvictionCost MaxCost; MaxCost.setBrokenHints(1); + MaxCost.MaxWeight = VirtReg.weight(); return canEvictInterferenceBasedOnCost(VirtReg, PhysReg, true, MaxCost, FixedRegisters); } |
llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll | ||
---|---|---|
4241 ↗ | (On Diff #408255) | These adds were just used for simplicity to make the result dependent on all 3 intrinsics. We'd avoid all of the intrinsics-upgrade changes if we just changed these add ops to something else, preferably something that we're not going to add to foldSelectWithIdentityConstant in the future. Alternatively we split these tests into the 3 normal / {k} / {k}{z} variants |
llvm/test/CodeGen/X86/avx512vnni-intrinsics-upgrade.ll | ||
---|---|---|
66 ↗ | (On Diff #408306) | In any case, Consider posting this patch for RA on Phabricator. |
llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll | ||
---|---|---|
4241 ↗ | (On Diff #408255) | @LuoYuanke Something that might work is to return a { <8 x i64>, <8 x i64>, <8 x i64> } structure : https://gcc.godbolt.org/z/39ahrqM7E define { <8 x i64>, <8 x i64>, <8 x i64> } @test_int_x86_avx512_mask_psrl_qi_512(<8 x i64> %x0, i32 %x1, <8 x i64> %x2, i8 %x3) { %res = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 4, <8 x i64> %x2, i8 %x3) %res1 = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 5, <8 x i64> %x2, i8 -1) %res2 = call <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64> %x0, i32 6, <8 x i64> zeroinitializer, i8 %x3) %r0 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } poison, <8 x i64> %res, 0 %r1 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } %r0, <8 x i64> %res1, 1 %r2 = insertvalue { <8 x i64>, <8 x i64>, <8 x i64> } %r1, <8 x i64> %res2, 2 ret { <8 x i64>, <8 x i64>, <8 x i64> } %r2 } declare <8 x i64> @llvm.x86.avx512.mask.psrl.qi.512(<8 x i64>, i32, <8 x i64>, i8) test_int_x86_avx512_mask_psrl_qi_512: # @test_int_x86_avx512_mask_psrl_qi_512 vmovdqa64 %zmm1, %zmm3 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xd9] kmovw %esi, %k1 # encoding: [0xc5,0xf8,0x92,0xce] vpsrlq $4, %zmm0, %zmm3 {%k1} # encoding: [0x62,0xf1,0xe5,0x49,0x73,0xd0,0x04] vpsrlq $5, %zmm0, %zmm1 # encoding: [0x62,0xf1,0xf5,0x48,0x73,0xd0,0x05] vpsrlq $6, %zmm0, %zmm2 {%k1} {z} # encoding: [0x62,0xf1,0xed,0xc9,0x73,0xd0,0x06] vmovdqa64 %zmm3, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc3] retq # encoding: [0xc3] |
@LuoYuanke Please can you rebase and add test coverage for 'add+select' to vector-bo-select.ll? I've updated some of the intrinsic tests to avoid the issue so these shouldn't show up any more, I'll finish this cleanup when I have a free moment.
@RKSimon, it is very kind of you. Thanks! Let me follow your approach and update other test cases.
You can use Constantexpr::getBinOpIdentity to check if identity constant.