This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] canonicalize splat shuffle after cmp
ClosedPublic

Authored by spatel on Jan 28 2020, 12:34 PM.

Download Raw Diff

Details

Reviewers

nikic
lebedev.ri
efriedma

Commits

rG87f6314f8cd1: [InstCombine] canonicalize splat shuffle after cmp

Summary

cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M

As discussed in PR44588:
https://bugs.llvm.org/show_bug.cgi?id=44588
...we try harder to push shuffles after binops than after compares.

This patch handles the special (but presumably most common case) of splat shuffles. If both operands are splats, then we can do the comparison on the non-splat inputs followed by splat of the compare. That should take care of the regression noted in D73411.

There's another potential fold requested in PR37463 to scalarize the compare, but that's another patch (and it's not clear if we can do that without the ability to undo it later):
https://bugs.llvm.org/show_bug.cgi?id=37463

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Jan 28 2020, 12:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2020, 12:34 PM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

LGTM

Does it make sense to extend this for arbitrary shuffles in the future? The mask construction logic in https://github.com/llvm/llvm-project/blob/4aa8cdfeebec115b928e0ccb452551b520d00f0b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L1612-L1660 can probably mostly be reused for the icmp case.

This revision is now accepted and ready to land.Jan 28 2020, 1:05 PM

In D73575#1845464, @nikic wrote:

LGTM

Does it make sense to extend this for arbitrary shuffles in the future? The mask construction logic in https://github.com/llvm/llvm-project/blob/4aa8cdfeebec115b928e0ccb452551b520d00f0b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L1612-L1660 can probably mostly be reused for the icmp case.

Yes, I was debating if it was worth lifting that code or going for this more direct approach for splats-only. Based on the vector code that I've looked at so far, the splat case is by far the common case. I'll add a TODO here though in case we want to revisit that choice.

Closed by commit rG87f6314f8cd1: [InstCombine] canonicalize splat shuffle after cmp (authored by spatel). · Explain WhyJan 29 2020, 5:50 AM

This revision was automatically updated to reflect the committed changes.

nikic mentioned this in D73411: [InstCombine] Process newly inserted instructions in the correct order.Jan 29 2020, 9:24 AM

nikic mentioned this in rG80581966771a: [InstCombine] Process newly inserted instructions in the correct order.Jan 30 2020, 12:45 AM

LuoYuanke added a subscriber: LuoYuanke.Nov 11 2021, 12:27 AM

LuoYuanke added inline comments.

llvm/test/Transforms/InstCombine/gep-inbounds-null.ll

On X86 the result of the vector compare instruction would be in %k register, but there is no shuffle instruction for %k register. Here is the test case that was regressed due to this patch. We can duplicate it with "llc -mcpu=skylake-avx512". Any idea to improve it?

define void @before_canonicalization_i1(<16 x i1> %msk, i32 %in, <16 x i1>* %dst) {
entry:
  %insrt = insertelement <16 x i32> undef, i32 %in, i32 0
  %splat = shufflevector <16 x i32> %insrt, <16 x i32> poison, <16 x i32> zeroinitializer
  %mul = mul <16 x i32> <i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789>, %splat
  %cmp1 = icmp eq <16 x i32> %mul, <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
  %and = and <16 x i1> %msk, %cmp1
  %bc = bitcast <16 x i1> %and to i16
  %cmp = icmp ne i16 %bc, 0
  br i1 %cmp, label %b1, label %b2
b1:
  store <16 x i1> %and, <16 x i1>* %dst, align 8
  ret void
b2:
  store <16 x i1> %cmp1, <16 x i1>* %dst, align 8
  ret void
}

define void @after_canonicalization_i1(<16 x i1> %msk, i32 %in, <16 x i1>* %dst) {
entry:
  %insrt = insertelement <16 x i32> undef, i32 %in, i32 0
  %0 = mul <16 x i32> %insrt, <i32 789, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %1 = icmp eq <16 x i32> %0, zeroinitializer
  %cmp1 = shufflevector <16 x i1> %1, <16 x i1> poison, <16 x i32> zeroinitializer
  %and = and <16 x i1> %cmp1, %msk
  %bc = bitcast <16 x i1> %and to i16
  %cmp.not = icmp eq i16 %bc, 0
  br i1 %cmp.not, label %b2, label %b1
b1:
  store <16 x i1> %and, <16 x i1>* %dst, align 8
  ret void
b2:
  store <16 x i1> %cmp1, <16 x i1>* %dst, align 8
  ret void
}

spatel added inline comments.Nov 11 2021, 11:51 AM

llvm/test/Transforms/InstCombine/gep-inbounds-null.ll
95	This example (not sure if this was over-reduced from original code...) shows a few potential missed folds. I'm not sure yet why we failed all of them. Reduced the vector length for readability: define <2 x i1> @src(i32 %in) { %insrt = insertelement <2 x i32> undef, i32 %in, i32 0 %m = mul <2 x i32> %insrt, <i32 789, i32 poison> %r = icmp eq <2 x i32> %m, zeroinitializer ret <2 x i1> %r } Scalarize the mul? Followed by scalarize the icmp? We also missed eliminating the mul - probably because we didn't recognize the constant with poison. https://alive2.llvm.org/ce/z/TZyTU2

LuoYuanke added inline comments.Nov 12 2021, 6:32 AM

llvm/test/Transforms/InstCombine/gep-inbounds-null.ll

I mean we should avoid shuffle <X x i1> after icmp with AVX512 enabled, because there is no shuffle instruction for k register. Take below code as example.

cat shufvXi32.ll

define <16 x i1> @shuffle(<16 x i1> %msk, i32 %in) {
entry:
  %insrt = insertelement <16 x i32> undef, i32 %in, i32 0
  %splat = shufflevector <16 x i32> %insrt, <16 x i32> poison, <16 x i32> zeroinitializer
  %mul = mul <16 x i32> <i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789>, %splat
  %cmp1 = icmp eq <16 x i32> %mul, <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
  %and = and <16 x i1> %msk, %cmp1
  ret <16 x i1> %and
}

opt -S < shufvxi32.ll -instcombine -o shufvXi1.ll

We get below transformed code.

define <16 x i1> @shuffle(<16 x i1> %msk, i32 %in) {
entry:
  %insrt = insertelement <16 x i32> undef, i32 %in, i32 0
  %0 = mul <16 x i32> %insrt, <i32 789, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %1 = icmp eq <16 x i32> %0, zeroinitializer
  %cmp1 = shufflevector <16 x i1> %1, <16 x i1> poison, <16 x i32> zeroinitializer
  %and = and <16 x i1> %cmp1, %msk
  ret <16 x i1> %and
}

llc -mcpu=skylake-avx512 shufvXi32.l
We got below assembly

# %bb.0:                                # %entry
        vpsllw  $7, %xmm0, %xmm0
        vpmovb2m        %xmm0, %k1
        vpbroadcastd    %edi, %zmm0
        vpmulld .LCPI0_0(%rip){1to16}, %zmm0, %zmm0
        vptestnmd       %zmm0, %zmm0, %k0 {%k1}
        vpmovm2b        %k0, %xmm0
        vzeroupper
        retq

llc -mcpu=skylake-avx512 shufvXi1.ll
We got below assembly.

# %bb.0:                                # %entry
        vpsllw  $7, %xmm0, %xmm0
        vpxor   %xmm1, %xmm1, %xmm1
        vmovd   %edi, %xmm2
        movl    $789, %eax                      # imm = 0x315
        vmovd   %eax, %xmm3
        vpmulld %xmm3, %xmm2, %xmm2
        vptestnmd       %zmm2, %zmm2, %k0
        vpmovm2w        %k0, %ymm2
        vpbroadcastw    %xmm2, %ymm2
        vpmovw2m        %ymm2, %k1
        vpcmpgtb        %xmm0, %xmm1, %k0 {%k1}
        vpmovm2b        %k0, %xmm0
        vzeroupper
        retq

You can see there is more instruction generated for shufvXi1.ll.

spatel added inline comments.Nov 12 2021, 9:13 AM

llvm/test/Transforms/InstCombine/gep-inbounds-null.ll
95	We can't really avoid the transform based on target constraints - it's a canonicalization, so it's up to some later pass to invert it if necessary. I think the mul and maybe even the icmp are distractions from the real problem in these examples. There's a question of should we transform a splat of bool to sext in IR: https://alive2.llvm.org/ce/z/ra54Xe But that doesn't appear to change codegen, so I think there needs to be backend fix that does that transform (or the inverse?). Is there a bug report for this example? If not, can you file it? Thanks!

LuoYuanke added inline comments.Nov 13 2021, 6:37 PM

llvm/test/Transforms/InstCombine/gep-inbounds-null.ll
95	I filed a bug (https://bugs.llvm.org/show_bug.cgi?id=52500) in Bugzilla. Thanks for the suggestion.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

21 lines

test/

Transforms/

InstCombine/

gep-inbounds-null.ll

4 lines

getelementptr.ll

8 lines

icmp-vec.ll

22 lines

Diff 241128

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 5,363 Lines • ▼ Show 20 Lines	static Instruction *foldVectorCmp(CmpInst &Cmp,
Type *V1Ty = V1->getType();		Type *V1Ty = V1->getType();
if (match(RHS, m_ShuffleVector(m_Value(V2), m_Undef(), m_Specific(M))) &&		if (match(RHS, m_ShuffleVector(m_Value(V2), m_Undef(), m_Specific(M))) &&
V1Ty == V2->getType() && (LHS->hasOneUse() \|\| RHS->hasOneUse())) {		V1Ty == V2->getType() && (LHS->hasOneUse() \|\| RHS->hasOneUse())) {
Value *NewCmp = IsFP ? Builder.CreateFCmp(Pred, V1, V2)		Value *NewCmp = IsFP ? Builder.CreateFCmp(Pred, V1, V2)
: Builder.CreateICmp(Pred, V1, V2);		: Builder.CreateICmp(Pred, V1, V2);
return new ShuffleVectorInst(NewCmp, UndefValue::get(NewCmp->getType()), M);		return new ShuffleVectorInst(NewCmp, UndefValue::get(NewCmp->getType()), M);
}		}

		// Try to canonicalize compare with splatted operand and splat constant.
		// TODO: We could generalize this for more than splats. See/use the code in
		// InstCombiner::foldVectorBinop().
		Constant *C;
		if (!LHS->hasOneUse() \|\| !match(RHS, m_Constant(C)))
		return nullptr;

		// Length-changing splats are ok, so adjust the constants as needed:
		// cmp (shuffle V1, M), C --> shuffle (cmp V1, C'), M
		Constant ScalarC = C->getSplatValue(/ AllowUndefs */ true);
		Constant ScalarM = M->getSplatValue(/ AllowUndefs */ true);
		if (ScalarC && ScalarM) {
		// We allow undefs in matching, but this transform removes those for safety.
		// Demanded elements analysis should be able to recover some/all of that.
		C = ConstantVector::getSplat(V1Ty->getVectorNumElements(), ScalarC);
		M = ConstantVector::getSplat(M->getType()->getVectorNumElements(), ScalarM);
		Value *NewCmp = IsFP ? Builder.CreateFCmp(Pred, V1, C)
		: Builder.CreateICmp(Pred, V1, C);
		return new ShuffleVectorInst(NewCmp, UndefValue::get(NewCmp->getType()), M);
		}

return nullptr;		return nullptr;
}		}

// extract(uadd.with.overflow(A, B), 0) ult A		// extract(uadd.with.overflow(A, B), 0) ult A
// -> extract(uadd.with.overflow(A, B), 1)		// -> extract(uadd.with.overflow(A, B), 1)
static Instruction *foldICmpOfUAddOv(ICmpInst &I) {		static Instruction *foldICmpOfUAddOv(ICmpInst &I) {
CmpInst::Predicate Pred = I.getPredicate();		CmpInst::Predicate Pred = I.getPredicate();
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
▲ Show 20 Lines • Show All 765 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gep-inbounds-null.ll

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	entry:
%cnd = icmp eq <2 x i8*> %gep, zeroinitializer		%cnd = icmp eq <2 x i8*> %gep, zeroinitializer
ret <2 x i1> %cnd		ret <2 x i1> %cnd
}		}

define <2 x i1> @test_vector_index(i8* %base, <2 x i64> %idx) {		define <2 x i1> @test_vector_index(i8* %base, <2 x i64> %idx) {
; CHECK-LABEL: @test_vector_index(		; CHECK-LABEL: @test_vector_index(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i8> undef, i8* [[BASE:%.*]], i32 0		; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i8> undef, i8* [[BASE:%.*]], i32 0
; CHECK-NEXT: [[DOTSPLAT:%.]] = shufflevector <2 x i8> [[DOTSPLATINSERT]], <2 x i8*> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP0:%.]] = icmp eq <2 x i8> [[DOTSPLATINSERT]], zeroinitializer
; CHECK-NEXT: [[CND:%.]] = icmp eq <2 x i8> [[DOTSPLAT]], zeroinitializer		; CHECK-NEXT: [[CND:%.*]] = shufflevector <2 x i1> [[TMP0]], <2 x i1> undef, <2 x i32> zeroinitializer
		LuoYuankeUnsubmitted Not Done Reply Inline Actions On X86 the result of the vector compare instruction would be in %k register, but there is no shuffle instruction for %k register. Here is the test case that was regressed due to this patch. We can duplicate it with "llc -mcpu=skylake-avx512". Any idea to improve it? define void @before_canonicalization_i1(<16 x i1> %msk, i32 %in, <16 x i1>* %dst) { entry: %insrt = insertelement <16 x i32> undef, i32 %in, i32 0 %splat = shufflevector <16 x i32> %insrt, <16 x i32> poison, <16 x i32> zeroinitializer %mul = mul <16 x i32> <i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789>, %splat %cmp1 = icmp eq <16 x i32> %mul, <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0> %and = and <16 x i1> %msk, %cmp1 %bc = bitcast <16 x i1> %and to i16 %cmp = icmp ne i16 %bc, 0 br i1 %cmp, label %b1, label %b2 b1: store <16 x i1> %and, <16 x i1>* %dst, align 8 ret void b2: store <16 x i1> %cmp1, <16 x i1>* %dst, align 8 ret void } define void @after_canonicalization_i1(<16 x i1> %msk, i32 %in, <16 x i1>* %dst) { entry: %insrt = insertelement <16 x i32> undef, i32 %in, i32 0 %0 = mul <16 x i32> %insrt, <i32 789, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> %1 = icmp eq <16 x i32> %0, zeroinitializer %cmp1 = shufflevector <16 x i1> %1, <16 x i1> poison, <16 x i32> zeroinitializer %and = and <16 x i1> %cmp1, %msk %bc = bitcast <16 x i1> %and to i16 %cmp.not = icmp eq i16 %bc, 0 br i1 %cmp.not, label %b2, label %b1 b1: store <16 x i1> %and, <16 x i1>* %dst, align 8 ret void b2: store <16 x i1> %cmp1, <16 x i1>* %dst, align 8 ret void } LuoYuanke: On X86 the result of the vector compare instruction would be in %k register, but there is no…
		spatelAuthorUnsubmitted Done Reply Inline Actions This example (not sure if this was over-reduced from original code...) shows a few potential missed folds. I'm not sure yet why we failed all of them. Reduced the vector length for readability: define <2 x i1> @src(i32 %in) { %insrt = insertelement <2 x i32> undef, i32 %in, i32 0 %m = mul <2 x i32> %insrt, <i32 789, i32 poison> %r = icmp eq <2 x i32> %m, zeroinitializer ret <2 x i1> %r } Scalarize the mul? Followed by scalarize the icmp? We also missed eliminating the mul - probably because we didn't recognize the constant with poison. https://alive2.llvm.org/ce/z/TZyTU2 spatel: This example (not sure if this was over-reduced from original code...) shows a few potential…
		LuoYuankeUnsubmitted Not Done Reply Inline Actions I mean we should avoid shuffle <X x i1> after icmp with AVX512 enabled, because there is no shuffle instruction for k register. Take below code as example. cat shufvXi32.ll define <16 x i1> @shuffle(<16 x i1> %msk, i32 %in) { entry: %insrt = insertelement <16 x i32> undef, i32 %in, i32 0 %splat = shufflevector <16 x i32> %insrt, <16 x i32> poison, <16 x i32> zeroinitializer %mul = mul <16 x i32> <i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789>, %splat %cmp1 = icmp eq <16 x i32> %mul, <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0> %and = and <16 x i1> %msk, %cmp1 ret <16 x i1> %and } opt -S < shufvxi32.ll -instcombine -o shufvXi1.ll We get below transformed code. define <16 x i1> @shuffle(<16 x i1> %msk, i32 %in) { entry: %insrt = insertelement <16 x i32> undef, i32 %in, i32 0 %0 = mul <16 x i32> %insrt, <i32 789, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> %1 = icmp eq <16 x i32> %0, zeroinitializer %cmp1 = shufflevector <16 x i1> %1, <16 x i1> poison, <16 x i32> zeroinitializer %and = and <16 x i1> %cmp1, %msk ret <16 x i1> %and } llc -mcpu=skylake-avx512 shufvXi32.l We got below assembly # %bb.0: # %entry vpsllw $7, %xmm0, %xmm0 vpmovb2m %xmm0, %k1 vpbroadcastd %edi, %zmm0 vpmulld .LCPI0_0(%rip){1to16}, %zmm0, %zmm0 vptestnmd %zmm0, %zmm0, %k0 {%k1} vpmovm2b %k0, %xmm0 vzeroupper retq llc -mcpu=skylake-avx512 shufvXi1.ll We got below assembly. # %bb.0: # %entry vpsllw $7, %xmm0, %xmm0 vpxor %xmm1, %xmm1, %xmm1 vmovd %edi, %xmm2 movl $789, %eax # imm = 0x315 vmovd %eax, %xmm3 vpmulld %xmm3, %xmm2, %xmm2 vptestnmd %zmm2, %zmm2, %k0 vpmovm2w %k0, %ymm2 vpbroadcastw %xmm2, %ymm2 vpmovw2m %ymm2, %k1 vpcmpgtb %xmm0, %xmm1, %k0 {%k1} vpmovm2b %k0, %xmm0 vzeroupper retq You can see there is more instruction generated for shufvXi1.ll. LuoYuanke: I mean we should avoid shuffle <X x i1> after icmp with AVX512 enabled, because there is no…
		spatelAuthorUnsubmitted Done Reply Inline Actions We can't really avoid the transform based on target constraints - it's a canonicalization, so it's up to some later pass to invert it if necessary. I think the mul and maybe even the icmp are distractions from the real problem in these examples. There's a question of should we transform a splat of bool to sext in IR: https://alive2.llvm.org/ce/z/ra54Xe But that doesn't appear to change codegen, so I think there needs to be backend fix that does that transform (or the inverse?). Is there a bug report for this example? If not, can you file it? Thanks! spatel: We can't really avoid the transform based on target constraints - it's a canonicalization, so…
		LuoYuankeUnsubmitted Not Done Reply Inline Actions I filed a bug (https://bugs.llvm.org/show_bug.cgi?id=52500) in Bugzilla. Thanks for the suggestion. LuoYuanke: I filed a bug (https://bugs.llvm.org/show_bug.cgi?id=52500) in Bugzilla. Thanks for the…
; CHECK-NEXT: ret <2 x i1> [[CND]]		; CHECK-NEXT: ret <2 x i1> [[CND]]
;		;
entry:		entry:
%gep = getelementptr inbounds i8, i8* %base, <2 x i64> %idx		%gep = getelementptr inbounds i8, i8* %base, <2 x i64> %idx
%cnd = icmp eq <2 x i8*> %gep, zeroinitializer		%cnd = icmp eq <2 x i8*> %gep, zeroinitializer
ret <2 x i1> %cnd		ret <2 x i1> %cnd
}		}

▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/getelementptr.ll

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	;
%C = icmp eq <2 x i32*> %A, %B		%C = icmp eq <2 x i32*> %A, %B
ret <2 x i1> %C		ret <2 x i1> %C
}		}

define <2 x i1> @test13_vector2(i64 %X, <2 x %S*> %P) nounwind {		define <2 x i1> @test13_vector2(i64 %X, <2 x %S*> %P) nounwind {
; CHECK-LABEL: @test13_vector2(		; CHECK-LABEL: @test13_vector2(
; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i64> undef, i64 [[X:%.]], i32 0		; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i64> undef, i64 [[X:%.]], i32 0
; CHECK-NEXT: [[TMP1:%.*]] = shl <2 x i64> [[DOTSPLATINSERT]], <i64 2, i64 undef>		; CHECK-NEXT: [[TMP1:%.*]] = shl <2 x i64> [[DOTSPLATINSERT]], <i64 2, i64 undef>
; CHECK-NEXT: [[A_IDX:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <2 x i64> [[TMP1]], <i64 -4, i64 -4>
; CHECK-NEXT: [[C:%.*]] = icmp eq <2 x i64> [[A_IDX]], <i64 -4, i64 -4>		; CHECK-NEXT: [[C:%.*]] = shufflevector <2 x i1> [[TMP2]], <2 x i1> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[C]]		; CHECK-NEXT: ret <2 x i1> [[C]]
;		;
%A = getelementptr inbounds %S, <2 x %S*> %P, <2 x i64> zeroinitializer, <2 x i32> <i32 1, i32 1>, i64 %X		%A = getelementptr inbounds %S, <2 x %S*> %P, <2 x i64> zeroinitializer, <2 x i32> <i32 1, i32 1>, i64 %X
%B = getelementptr inbounds %S, <2 x %S*> %P, <2 x i64> <i64 0, i64 0>, <2 x i32> <i32 0, i32 0>		%B = getelementptr inbounds %S, <2 x %S*> %P, <2 x i64> <i64 0, i64 0>, <2 x i32> <i32 0, i32 0>
%C = icmp eq <2 x i32*> %A, %B		%C = icmp eq <2 x i32*> %A, %B
ret <2 x i1> %C		ret <2 x i1> %C
}		}

; This is a test of icmp + shl nuw in disguise - 4611... is 0x3fff...		; This is a test of icmp + shl nuw in disguise - 4611... is 0x3fff...
define <2 x i1> @test13_vector3(i64 %X, <2 x %S*> %P) nounwind {		define <2 x i1> @test13_vector3(i64 %X, <2 x %S*> %P) nounwind {
; CHECK-LABEL: @test13_vector3(		; CHECK-LABEL: @test13_vector3(
; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i64> undef, i64 [[X:%.]], i32 0		; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i64> undef, i64 [[X:%.]], i32 0
; CHECK-NEXT: [[TMP1:%.*]] = shl <2 x i64> [[DOTSPLATINSERT]], <i64 2, i64 undef>		; CHECK-NEXT: [[TMP1:%.*]] = shl <2 x i64> [[DOTSPLATINSERT]], <i64 2, i64 undef>
; CHECK-NEXT: [[A_IDX:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <2 x i64> [[TMP1]], <i64 4, i64 4>
; CHECK-NEXT: [[C:%.*]] = icmp eq <2 x i64> [[A_IDX]], <i64 4, i64 4>		; CHECK-NEXT: [[C:%.*]] = shufflevector <2 x i1> [[TMP2]], <2 x i1> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[C]]		; CHECK-NEXT: ret <2 x i1> [[C]]
;		;
%A = getelementptr inbounds %S, <2 x %S*> %P, <2 x i64> zeroinitializer, <2 x i32> <i32 1, i32 1>, i64 %X		%A = getelementptr inbounds %S, <2 x %S*> %P, <2 x i64> zeroinitializer, <2 x i32> <i32 1, i32 1>, i64 %X
%B = getelementptr inbounds %S, <2 x %S*> %P, <2 x i64> <i64 0, i64 0>, <2 x i32> <i32 1, i32 1>, i64 1		%B = getelementptr inbounds %S, <2 x %S*> %P, <2 x i64> <i64 0, i64 0>, <2 x i32> <i32 1, i32 1>, i64 1
%C = icmp eq <2 x i32*> %A, %B		%C = icmp eq <2 x i32*> %A, %B
ret <2 x i1> %C		ret <2 x i1> %C
}		}

▲ Show 20 Lines • Show All 966 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/icmp-vec.ll

Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	;
%cmp = icmp eq <2 x i8> %shufx, %shufy		%cmp = icmp eq <2 x i8> %shufx, %shufy
call void @use_v2i8(<2 x i8> %shufx)		call void @use_v2i8(<2 x i8> %shufx)
call void @use_v2i8(<2 x i8> %shufy)		call void @use_v2i8(<2 x i8> %shufy)
ret <2 x i1> %cmp		ret <2 x i1> %cmp
}		}

define <4 x i1> @splat_icmp(<4 x i8> %x) {		define <4 x i1> @splat_icmp(<4 x i8> %x) {
; CHECK-LABEL: @splat_icmp(		; CHECK-LABEL: @splat_icmp(
; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		; CHECK-NEXT: [[TMP1:%.]] = icmp sgt <4 x i8> [[X:%.]], <i8 42, i8 42, i8 42, i8 42>
; CHECK-NEXT: [[CMP:%.*]] = icmp sgt <4 x i8> [[SPLATX]], <i8 42, i8 42, i8 42, i8 42>		; CHECK-NEXT: [[CMP:%.*]] = shufflevector <4 x i1> [[TMP1]], <4 x i1> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
; CHECK-NEXT: ret <4 x i1> [[CMP]]		; CHECK-NEXT: ret <4 x i1> [[CMP]]
;		;
%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%cmp = icmp sgt <4 x i8> %splatx, <i8 42, i8 42, i8 42, i8 42>		%cmp = icmp sgt <4 x i8> %splatx, <i8 42, i8 42, i8 42, i8 42>
ret <4 x i1> %cmp		ret <4 x i1> %cmp
}		}

define <4 x i1> @splat_icmp_undef(<4 x i8> %x) {		define <4 x i1> @splat_icmp_undef(<4 x i8> %x) {
; CHECK-LABEL: @splat_icmp_undef(		; CHECK-LABEL: @splat_icmp_undef(
; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 2>		; CHECK-NEXT: [[TMP1:%.]] = icmp ult <4 x i8> [[X:%.]], <i8 42, i8 42, i8 42, i8 42>
; CHECK-NEXT: [[CMP:%.*]] = icmp ult <4 x i8> [[SPLATX]], <i8 undef, i8 42, i8 undef, i8 42>		; CHECK-NEXT: [[CMP:%.*]] = shufflevector <4 x i1> [[TMP1]], <4 x i1> undef, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
; CHECK-NEXT: ret <4 x i1> [[CMP]]		; CHECK-NEXT: ret <4 x i1> [[CMP]]
;		;
%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 2>		%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 2>
%cmp = icmp ult <4 x i8> %splatx, <i8 undef, i8 42, i8 undef, i8 42>		%cmp = icmp ult <4 x i8> %splatx, <i8 undef, i8 42, i8 undef, i8 42>
ret <4 x i1> %cmp		ret <4 x i1> %cmp
}		}

define <4 x i1> @splat_icmp_larger_size(<2 x i8> %x) {		define <4 x i1> @splat_icmp_larger_size(<2 x i8> %x) {
; CHECK-LABEL: @splat_icmp_larger_size(		; CHECK-LABEL: @splat_icmp_larger_size(
; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> undef, <4 x i32> <i32 1, i32 undef, i32 1, i32 undef>		; CHECK-NEXT: [[TMP1:%.]] = icmp eq <2 x i8> [[X:%.]], <i8 42, i8 42>
; CHECK-NEXT: [[CMP:%.*]] = icmp eq <4 x i8> [[SPLATX]], <i8 42, i8 42, i8 undef, i8 42>		; CHECK-NEXT: [[CMP:%.*]] = shufflevector <2 x i1> [[TMP1]], <2 x i1> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: ret <4 x i1> [[CMP]]		; CHECK-NEXT: ret <4 x i1> [[CMP]]
;		;
%splatx = shufflevector <2 x i8> %x, <2 x i8> undef, <4 x i32> <i32 1, i32 undef, i32 1, i32 undef>		%splatx = shufflevector <2 x i8> %x, <2 x i8> undef, <4 x i32> <i32 1, i32 undef, i32 1, i32 undef>
%cmp = icmp eq <4 x i8> %splatx, <i8 42, i8 42, i8 undef, i8 42>		%cmp = icmp eq <4 x i8> %splatx, <i8 42, i8 42, i8 undef, i8 42>
ret <4 x i1> %cmp		ret <4 x i1> %cmp
}		}

define <4 x i1> @splat_fcmp_smaller_size(<5 x float> %x) {		define <4 x i1> @splat_fcmp_smaller_size(<5 x float> %x) {
; CHECK-LABEL: @splat_fcmp_smaller_size(		; CHECK-LABEL: @splat_fcmp_smaller_size(
; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <5 x float> [[X:%.]], <5 x float> undef, <4 x i32> <i32 1, i32 undef, i32 1, i32 undef>		; CHECK-NEXT: [[TMP1:%.]] = fcmp oeq <5 x float> [[X:%.]], <float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01>
; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq <4 x float> [[SPLATX]], <float 4.200000e+01, float 4.200000e+01, float undef, float 4.200000e+01>		; CHECK-NEXT: [[CMP:%.*]] = shufflevector <5 x i1> [[TMP1]], <5 x i1> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: ret <4 x i1> [[CMP]]		; CHECK-NEXT: ret <4 x i1> [[CMP]]
;		;
%splatx = shufflevector <5 x float> %x, <5 x float> undef, <4 x i32> <i32 1, i32 undef, i32 1, i32 undef>		%splatx = shufflevector <5 x float> %x, <5 x float> undef, <4 x i32> <i32 1, i32 undef, i32 1, i32 undef>
%cmp = fcmp oeq <4 x float> %splatx, <float 42.0, float 42.0, float undef, float 42.0>		%cmp = fcmp oeq <4 x float> %splatx, <float 42.0, float 42.0, float undef, float 42.0>
ret <4 x i1> %cmp		ret <4 x i1> %cmp
}		}

		; Negative test

define <4 x i1> @splat_icmp_extra_use(<4 x i8> %x) {		define <4 x i1> @splat_icmp_extra_use(<4 x i8> %x) {
; CHECK-LABEL: @splat_icmp_extra_use(		; CHECK-LABEL: @splat_icmp_extra_use(
; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
; CHECK-NEXT: call void @use_v4i8(<4 x i8> [[SPLATX]])		; CHECK-NEXT: call void @use_v4i8(<4 x i8> [[SPLATX]])
; CHECK-NEXT: [[CMP:%.*]] = icmp sgt <4 x i8> [[SPLATX]], <i8 42, i8 42, i8 42, i8 42>		; CHECK-NEXT: [[CMP:%.*]] = icmp sgt <4 x i8> [[SPLATX]], <i8 42, i8 42, i8 42, i8 42>
; CHECK-NEXT: ret <4 x i1> [[CMP]]		; CHECK-NEXT: ret <4 x i1> [[CMP]]
;		;
%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
call void @use_v4i8(<4 x i8> %splatx)		call void @use_v4i8(<4 x i8> %splatx)
%cmp = icmp sgt <4 x i8> %splatx, <i8 42, i8 42, i8 42, i8 42>		%cmp = icmp sgt <4 x i8> %splatx, <i8 42, i8 42, i8 42, i8 42>
ret <4 x i1> %cmp		ret <4 x i1> %cmp
}		}

		; Negative test

define <4 x i1> @not_splat_icmp(<4 x i8> %x) {		define <4 x i1> @not_splat_icmp(<4 x i8> %x) {
; CHECK-LABEL: @not_splat_icmp(		; CHECK-LABEL: @not_splat_icmp(
; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <4 x i32> <i32 3, i32 2, i32 3, i32 3>		; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <4 x i32> <i32 3, i32 2, i32 3, i32 3>
; CHECK-NEXT: [[CMP:%.*]] = icmp sgt <4 x i8> [[SPLATX]], <i8 42, i8 42, i8 42, i8 42>		; CHECK-NEXT: [[CMP:%.*]] = icmp sgt <4 x i8> [[SPLATX]], <i8 42, i8 42, i8 42, i8 42>
; CHECK-NEXT: ret <4 x i1> [[CMP]]		; CHECK-NEXT: ret <4 x i1> [[CMP]]
;		;
%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 3, i32 2, i32 3, i32 3>		%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 3, i32 2, i32 3, i32 3>
%cmp = icmp sgt <4 x i8> %splatx, <i8 42, i8 42, i8 42, i8 42>		%cmp = icmp sgt <4 x i8> %splatx, <i8 42, i8 42, i8 42, i8 42>
ret <4 x i1> %cmp		ret <4 x i1> %cmp
}		}

		; Negative test

define <4 x i1> @not_splat_icmp2(<4 x i8> %x) {		define <4 x i1> @not_splat_icmp2(<4 x i8> %x) {
; CHECK-LABEL: @not_splat_icmp2(		; CHECK-LABEL: @not_splat_icmp2(
; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <4 x i32> <i32 2, i32 2, i32 2, i32 2>		; CHECK-NEXT: [[SPLATX:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> undef, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
; CHECK-NEXT: [[CMP:%.*]] = icmp sgt <4 x i8> [[SPLATX]], <i8 43, i8 42, i8 42, i8 42>		; CHECK-NEXT: [[CMP:%.*]] = icmp sgt <4 x i8> [[SPLATX]], <i8 43, i8 42, i8 42, i8 42>
; CHECK-NEXT: ret <4 x i1> [[CMP]]		; CHECK-NEXT: ret <4 x i1> [[CMP]]
;		;
%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 2, i32 2, i32 2, i32 2>		%splatx = shufflevector <4 x i8> %x, <4 x i8> undef, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
%cmp = icmp sgt <4 x i8> %splatx, <i8 43, i8 42, i8 42, i8 42>		%cmp = icmp sgt <4 x i8> %splatx, <i8 43, i8 42, i8 42, i8 42>
ret <4 x i1> %cmp		ret <4 x i1> %cmp
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] canonicalize splat shuffle after cmpClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 241128

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/test/Transforms/InstCombine/gep-inbounds-null.ll

llvm/test/Transforms/InstCombine/getelementptr.ll

llvm/test/Transforms/InstCombine/icmp-vec.ll

[InstCombine] canonicalize splat shuffle after cmp
ClosedPublic