This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
TargetLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
jump_sign.ll
-
omit-urem-of-power-of-two-or-zero-when-comparing-with-zero.ll

Differential D63390

[Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
ClosedPublic

Authored by lebedev.ri on Jun 16 2019, 3:24 PM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
xbolva00
spatel

Commits

rGcdd43eac4fe3: [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
rL364286: [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible

Summary

This addresses the regression that is being exposed by D50222 in test/CodeGen/X86/jump_sign.ll
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the urem-by-non-power-of-two,
and the urem is of something that may at most have a single bit set (or no bits set at all),
the urem is not needed.

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Jun 16 2019, 3:24 PM

lebedev.ri mentioned this in D63391: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2).

lebedev.ri added a parent revision: D63391: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2).

lebedev.ri mentioned this in D50222: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case).

Thanks, you fixed it!

Can you add some new explicit tests for this fold?

Diffusion mentioned this in rL363537: [NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x….Jun 17 2019, 2:47 AM

lebedev.ri mentioned this in rG25a043e78a9c: [NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x….Jun 17 2019, 2:48 AM

+Standalone test coverage

Looks fine.

Thanks for alive proof as well!

This revision is now accepted and ready to land.Jun 17 2019, 3:22 AM

lebedev.ri added a reviewer: spatel.Jun 17 2019, 3:24 AM

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Thank you for the reviews.

In D63390#1546676, @spatel wrote:

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Indeed, this fold is missed in middle-end too, but the regression at hand is in back-end test,
so i'm not sure if we should just hand-wave and only fix it in middle-end.
Also, where should the middle-end fix be? Again InstCombine?

I'm also not sure if this is the best approach, the fold doesn't care *where* those bits are,
it only cares about the *count* of ones, and is thus easily defeated by shift operations,
as it is seen from tests.

In D63390#1546687, @lebedev.ri wrote:

Thank you for the reviews.

In D63390#1546676, @spatel wrote:

LGTM. I have no idea if it's worth the trade-off, but we could do this sooner in IR (instcombine) instead of or in addition to SDAG?

Indeed, this fold is missed in middle-end too, but the regression at hand is in back-end test,
so i'm not sure if we should just hand-wave and only fix it in middle-end.
Also, where should the middle-end fix be? Again InstCombine?

Yep, yet another clause under InstCombiner::visitICmpInst().

I'm also not sure if this is the best approach, the fold doesn't care *where* those bits are,
it only cares about the *count* of ones, and is thus easily defeated by shift operations,
as it is seen from tests.

True, although without some real-world evidence that this matters, I'd say "good enough". :)

Closed by commit rGcdd43eac4fe3: [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible (authored by lebedev.ri). · Explain WhyJun 25 2019, 3:05 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 25 2019, 3:05 AM

Diffusion mentioned this in rL364563: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2).Jun 27 2019, 9:45 AM

lebedev.ri mentioned this in rG0627b09863b8: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2).Jun 27 2019, 9:47 AM

Diffusion mentioned this in rL364600: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3).Jun 27 2019, 2:52 PM

lebedev.ri mentioned this in rG29d05c005fa8: [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3).Jun 27 2019, 2:53 PM

Diffusion mentioned this in rL364737: [NFC][InstCombine] Copy test for omit urem when possible from TargetLowering.Jul 1 2019, 2:41 AM

Diffusion mentioned this in rL364738: [InstCombine] Omit 'urem' where possible.

lebedev.ri mentioned this in rG0f82f64c8326: [NFC][InstCombine] Copy test for omit urem when possible from TargetLowering.Jul 1 2019, 2:42 AM

lebedev.ri mentioned this in rGf55818e3a720: [InstCombine] Omit 'urem' where possible.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

12 lines

test/

CodeGen/

X86/

jump_sign.ll

7 lines

omit-urem-of-power-of-two-or-zero-when-comparing-with-zero.ll

14 lines

Diff 205010

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 3,019 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::ZERO_EXTEND) {
if (Op0.getOpcode() == ISD::AssertZext &&		if (Op0.getOpcode() == ISD::AssertZext &&
cast<VTSDNode>(Op0.getOperand(1))->getVT() == MVT::i1)		cast<VTSDNode>(Op0.getOperand(1))->getVT() == MVT::i1)
return DAG.getSetCC(dl, VT, Op0,		return DAG.getSetCC(dl, VT, Op0,
DAG.getConstant(0, dl, Op0.getValueType()),		DAG.getConstant(0, dl, Op0.getValueType()),
Cond == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ);		Cond == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ);
}		}
}		}

		// Given:
		// icmp eq/ne (urem %x, %y), 0
		// Iff %x has 0 or 1 bits set, and %y has at least 2 bits set, omit 'urem':
		// icmp eq/ne %x, 0
		if (N0.getOpcode() == ISD::UREM && N1C->isNullValue() &&
		(Cond == ISD::SETEQ \|\| Cond == ISD::SETNE)) {
		KnownBits XKnown = DAG.computeKnownBits(N0.getOperand(0));
		KnownBits YKnown = DAG.computeKnownBits(N0.getOperand(1));
		if (XKnown.countMaxPopulation() == 1 && YKnown.countMinPopulation() >= 2)
		return DAG.getSetCC(dl, VT, N0.getOperand(0), N1, Cond);
		}

if (SDValue V =		if (SDValue V =
optimizeSetCCOfSignedTruncationCheck(VT, N0, N1, Cond, DCI, dl))		optimizeSetCCOfSignedTruncationCheck(VT, N0, N1, Cond, DCI, dl))
return V;		return V;
}		}

// These simplifications apply to splat vectors as well.		// These simplifications apply to splat vectors as well.
// TODO: Handle more splat vector cases.		// TODO: Handle more splat vector cases.
if (auto *N1C = isConstOrConstSplat(N1)) {		if (auto *N1C = isConstOrConstSplat(N1)) {
▲ Show 20 Lines • Show All 3,176 Lines • Show Last 20 Lines

test/CodeGen/X86/jump_sign.ll

	Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines

	; PR13966			; PR13966
	@b = common global i32 0, align 4			@b = common global i32 0, align 4
	@a = common global i32 0, align 4			@a = common global i32 0, align 4
	define i32 @func_test1(i32 %p1) nounwind uwtable {			define i32 @func_test1(i32 %p1) nounwind uwtable {
	; CHECK-LABEL: func_test1:			; CHECK-LABEL: func_test1:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movl b, %eax			; CHECK-NEXT: movl b, %eax
	; CHECK-NEXT: xorl %ecx, %ecx
	; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: setb %cl			; CHECK-NEXT: setb %cl
	; CHECK-NEXT: movl a, %eax			; CHECK-NEXT: movl a, %eax
	; CHECK-NEXT: andl %eax, %ecx			; CHECK-NEXT: testb %al, %cl
	; CHECK-NEXT: imull $-85, %ecx, %ecx			; CHECK-NEXT: je .LBB18_2
	; CHECK-NEXT: cmpb $86, %cl
	; CHECK-NEXT: jb .LBB18_2
	; CHECK-NEXT: # %bb.1: # %if.then			; CHECK-NEXT: # %bb.1: # %if.then
	; CHECK-NEXT: decl %eax			; CHECK-NEXT: decl %eax
	; CHECK-NEXT: movl %eax, a			; CHECK-NEXT: movl %eax, a
	; CHECK-NEXT: .LBB18_2: # %if.end			; CHECK-NEXT: .LBB18_2: # %if.end
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%t0 = load i32, i32* @b, align 4			%t0 = load i32, i32* @b, align 4
	%cmp = icmp ult i32 %t0, %p1			%cmp = icmp ult i32 %t0, %p1
	Show All 19 Lines

test/CodeGen/X86/omit-urem-of-power-of-two-or-zero-when-comparing-with-zero.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse,+sse2,+avx,+avx2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse,+sse2,+avx,+avx2 \| FileCheck %s

	; Given:			; Given:
	; icmp eq/ne (urem %x, C), 0			; icmp eq/ne (urem %x, C), 0
	; Iff C is not a power of two (those should not get to here though),			; Iff C is not a power of two (those should not get to here though),
	; and %x may have at most one bit set, omit the 'urem':			; and %x may have at most one bit set, omit the 'urem':
	; icmp eq/ne %x, 0			; icmp eq/ne %x, 0

	;------------------------------------------------------------------------------;			;------------------------------------------------------------------------------;
	; Basic scalar tests			; Basic scalar tests
	;------------------------------------------------------------------------------;			;------------------------------------------------------------------------------;

	define i1 @p0_scalar_urem_by_const(i32 %x, i32 %y) {			define i1 @p0_scalar_urem_by_const(i32 %x, i32 %y) {
	; CHECK-LABEL: p0_scalar_urem_by_const:			; CHECK-LABEL: p0_scalar_urem_by_const:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: andl $128, %edi			; CHECK-NEXT: testb $-128, %dil
	; CHECK-NEXT: imull $-1431655765, %edi, %eax # imm = 0xAAAAAAAB			; CHECK-NEXT: sete %al
	; CHECK-NEXT: rorl %eax
	; CHECK-NEXT: cmpl $1431655766, %eax # imm = 0x55555556
	; CHECK-NEXT: setb %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = and i32 %x, 128 ; clearly a power-of-two or zero			%t0 = and i32 %x, 128 ; clearly a power-of-two or zero
	%t1 = urem i32 %t0, 6 ; '6' is clearly not a power of two			%t1 = urem i32 %t0, 6 ; '6' is clearly not a power of two
	%t2 = icmp eq i32 %t1, 0			%t2 = icmp eq i32 %t1, 0
	ret i1 %t2			ret i1 %t2
	}			}

	define i1 @p1_scalar_urem_by_nonconst(i32 %x, i32 %y) {			define i1 @p1_scalar_urem_by_nonconst(i32 %x, i32 %y) {
	; CHECK-LABEL: p1_scalar_urem_by_nonconst:			; CHECK-LABEL: p1_scalar_urem_by_nonconst:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: testb $-128, %dil
	; CHECK-NEXT: andl $128, %eax
	; CHECK-NEXT: orl $6, %esi
	; CHECK-NEXT: xorl %edx, %edx
	; CHECK-NEXT: divl %esi
	; CHECK-NEXT: testl %edx, %edx
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = and i32 %x, 128 ; clearly a power-of-two or zero			%t0 = and i32 %x, 128 ; clearly a power-of-two or zero
	%t1 = or i32 %y, 6 ; two bits set, clearly not a power of two			%t1 = or i32 %y, 6 ; two bits set, clearly not a power of two
	%t2 = urem i32 %t0, %t1			%t2 = urem i32 %t0, %t1
	%t3 = icmp eq i32 %t2, 0			%t3 = icmp eq i32 %t2, 0
	ret i1 %t3			ret i1 %t3
	}			}
	▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines