This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
TargetLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
jump_sign.ll

Differential D63390

[Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
ClosedPublic

Authored by lebedev.ri on Jun 16 2019, 3:24 PM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
xbolva00
spatel

Commits

rGcdd43eac4fe3: [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
rL364286: [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible

Summary

This addresses the regression that is being exposed by D50222 in test/CodeGen/X86/jump_sign.ll
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the urem-by-non-power-of-two,
and the urem is of something that may at most have a single bit set (or no bits set at all),
the urem is not needed.

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Jun 16 2019, 3:24 PM

lebedev.ri mentioned this in D63391: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2).

lebedev.ri added a parent revision: D63391: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2).

lebedev.ri mentioned this in D50222: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case).

Thanks, you fixed it!

Can you add some new explicit tests for this fold?

Diffusion mentioned this in rL363537: [NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x….Jun 17 2019, 2:47 AM

lebedev.ri mentioned this in rG25a043e78a9c: [NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x….Jun 17 2019, 2:48 AM

+Standalone test coverage

Looks fine.

Thanks for alive proof as well!

This revision is now accepted and ready to land.Jun 17 2019, 3:22 AM

lebedev.ri added a reviewer: spatel.Jun 17 2019, 3:24 AM

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Thank you for the reviews.

In D63390#1546676, @spatel wrote:

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Indeed, this fold is missed in middle-end too, but the regression at hand is in back-end test,
so i'm not sure if we should just hand-wave and only fix it in middle-end.
Also, where should the middle-end fix be? Again InstCombine?

I'm also not sure if this is the best approach, the fold doesn't care *where* those bits are,
it only cares about the *count* of ones, and is thus easily defeated by shift operations,
as it is seen from tests.

In D63390#1546687, @lebedev.ri wrote:

Thank you for the reviews.

In D63390#1546676, @spatel wrote:

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Indeed, this fold is missed in middle-end too, but the regression at hand is in back-end test,
so i'm not sure if we should just hand-wave and only fix it in middle-end.
Also, where should the middle-end fix be? Again InstCombine?

Yep, yet another clause under InstCombiner::visitICmpInst().

I'm also not sure if this is the best approach, the fold doesn't care *where* those bits are,
it only cares about the *count* of ones, and is thus easily defeated by shift operations,
as it is seen from tests.

True, although without some real-world evidence that this matters, I'd say "good enough". :)

Closed by commit rGcdd43eac4fe3: [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible (authored by lebedev.ri). · Explain WhyJun 25 2019, 3:05 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 25 2019, 3:05 AM

Diffusion mentioned this in rL364563: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2).Jun 27 2019, 9:45 AM

lebedev.ri mentioned this in rG0627b09863b8: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2).Jun 27 2019, 9:47 AM

Diffusion mentioned this in rL364600: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3).Jun 27 2019, 2:52 PM

lebedev.ri mentioned this in rG29d05c005fa8: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3).Jun 27 2019, 2:53 PM

Diffusion mentioned this in rL364737: [NFC][InstCombine] Copy test for omit urem when possible from TargetLowering.Jul 1 2019, 2:41 AM

Diffusion mentioned this in rL364738: [InstCombine] Omit 'urem' where possible.

lebedev.ri mentioned this in rG0f82f64c8326: [NFC][InstCombine] Copy test for omit urem when possible from TargetLowering.Jul 1 2019, 2:42 AM

lebedev.ri mentioned this in rGf55818e3a720: [InstCombine] Omit 'urem' where possible.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

17 lines

test/

CodeGen/

X86/

jump_sign.ll

7 lines

Diff 204968

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 3,019 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::ZERO_EXTEND) {
if (Op0.getOpcode() == ISD::AssertZext &&		if (Op0.getOpcode() == ISD::AssertZext &&
cast<VTSDNode>(Op0.getOperand(1))->getVT() == MVT::i1)		cast<VTSDNode>(Op0.getOperand(1))->getVT() == MVT::i1)
return DAG.getSetCC(dl, VT, Op0,		return DAG.getSetCC(dl, VT, Op0,
DAG.getConstant(0, dl, Op0.getValueType()),		DAG.getConstant(0, dl, Op0.getValueType()),
Cond == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ);		Cond == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ);
}		}
}		}

		// Given:
		// icmp eq/ne (urem %x, C), 0
		// Iff C is not a power of two (those should not get to here though),
		// and %x may have at most one bit set, omit the 'urem':
		// icmp eq/ne %x, 0
		if (N0.getOpcode() == ISD::UREM && N1C->isNullValue() &&
		(Cond == ISD::SETEQ \|\| Cond == ISD::SETNE)) {
		if (auto *N01C = dyn_cast<ConstantSDNode>(N0.getOperand(1).getNode())) {
		// We shouldn't have 'urem %x, power-of-2' by now, but just to be sure.
		if (!N01C->getAPIntValue().isPowerOf2()) {
		KnownBits Known = DAG.computeKnownBits(N0.getOperand(0));
		if (Known.countMaxPopulation() == 1)
		return DAG.getSetCC(dl, VT, N0.getOperand(0), N1, Cond);
		}
		}
		}

if (SDValue V =		if (SDValue V =
optimizeSetCCOfSignedTruncationCheck(VT, N0, N1, Cond, DCI, dl))		optimizeSetCCOfSignedTruncationCheck(VT, N0, N1, Cond, DCI, dl))
return V;		return V;
}		}

// These simplifications apply to splat vectors as well.		// These simplifications apply to splat vectors as well.
// TODO: Handle more splat vector cases.		// TODO: Handle more splat vector cases.
if (auto *N1C = isConstOrConstSplat(N1)) {		if (auto *N1C = isConstOrConstSplat(N1)) {
▲ Show 20 Lines • Show All 3,176 Lines • Show Last 20 Lines

test/CodeGen/X86/jump_sign.ll

	Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines

	; PR13966			; PR13966
	@b = common global i32 0, align 4			@b = common global i32 0, align 4
	@a = common global i32 0, align 4			@a = common global i32 0, align 4
	define i32 @func_test1(i32 %p1) nounwind uwtable {			define i32 @func_test1(i32 %p1) nounwind uwtable {
	; CHECK-LABEL: func_test1:			; CHECK-LABEL: func_test1:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movl b, %eax			; CHECK-NEXT: movl b, %eax
	; CHECK-NEXT: xorl %ecx, %ecx
	; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: setb %cl			; CHECK-NEXT: setb %cl
	; CHECK-NEXT: movl a, %eax			; CHECK-NEXT: movl a, %eax
	; CHECK-NEXT: andl %eax, %ecx			; CHECK-NEXT: testb %al, %cl
	; CHECK-NEXT: imull $-85, %ecx, %ecx			; CHECK-NEXT: je .LBB18_2
	; CHECK-NEXT: cmpb $86, %cl
	; CHECK-NEXT: jb .LBB18_2
	; CHECK-NEXT: # %bb.1: # %if.then			; CHECK-NEXT: # %bb.1: # %if.then
	; CHECK-NEXT: decl %eax			; CHECK-NEXT: decl %eax
	; CHECK-NEXT: movl %eax, a			; CHECK-NEXT: movl %eax, a
	; CHECK-NEXT: .LBB18_2: # %if.end			; CHECK-NEXT: .LBB18_2: # %if.end
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%t0 = load i32, i32* @b, align 4			%t0 = load i32, i32* @b, align 4
	%cmp = icmp ult i32 %t0, %p1			%cmp = icmp ult i32 %t0, %p1
	Show All 19 Lines