Download Raw Diff

Details

Reviewers

RKSimon
spatel
lebedev.ri
xbolva00

Commits

rGb8d6ba3ca203: [X86] Override BuildSDIVPow2 for X86.
rL371104: [X86] Override BuildSDIVPow2 for X86.

Summary

As noted in PR43197, we can use test+add+cmov+sra to implement
signed division by a power of 2.

This is based off the similar version in AArch64, but I've
adjusted it to use target independent nodes where AArch64 uses
target specific CMP and CSEL nodes. I've also blocked INT_MIN
as the transform isn't valid for that.

I've limited this to i32 and i64 on 64-bit targets for now and only
when CMOV is supported. i8 and i16 need further investigation to be
sure they get promoted to i32 well.

I adjusted a few tests to enable cmov to demonstrate the new
codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode
without cmov to avoid perturbing the scenario that is being
set up there.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Sep 2 2019, 3:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2019, 3:58 PM

Herald added subscribers: hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

Harbormaster completed remote builds in B37643: Diff 218396.Sep 2 2019, 4:00 PM

llvm/test/CodeGen/X86/pr32588.ll

19 ↗

(On Diff #218396)

Maybe not a big deal?

Middle-end can reduce this IR to
define void @fn1() local_unnamed_addr #0 {

%t0 = load i32, i32* @c, align 4
%tobool1 = icmp eq i32 %t0, 0
%xor = zext i1 %tobool1 to i32
store i32 %xor, i32* @d, align 4
ret void

}

and then llc..
fn1: # @fn1

xor     eax, eax
cmp     dword ptr [rip + c], 0
sete    al
mov     dword ptr [rip + d], eax
ret

We should add the minimal/motivating test from PR43197, so we cover that exact case, and we should have at least 1 test with a negative divisor constant rather than relying on srem transforms to get that case.

llvm/lib/Target/X86/X86ISelLowering.cpp
20109–20112 ↗	(On Diff #218396)	That's a complicated string of logic. It would be easier to read if it was split into 2-3 clauses: Check target capability Check type Check divisor value
20112 ↗	(On Diff #218396)	The min-signed check is not necessary? https://rise4fun.com/Alive/Ki68 Name: sdiv X, {+/-} power-of-2-constant Pre: isPowerOf2(C1) \|\| isPowerOf2(-C1) %r = sdiv i32 %x, C1 => %x_is_negative = icmp slt i32 %x, 0 %x_with_offset = add i32 %x, ((1 << countTrailingZeros(C1)) - 1) %normx = select i1 %x_is_negative, i32 %x_with_offset, i32 %x %a = ashr i32 %normx, countTrailingZeros(C1) %neg = sub i32 0, %a %divisor_is_negative = icmp slt i32 C1, 0 ; constant fold %r = select i1 %divisor_is_negative, i32 %neg, %a
20121 ↗	(On Diff #218396)	This comment doesn't match what we're creating - the 'add' is the true operand of the select: // (N0 < 0) ? (N0 + Pow2M1) : N0

Address review comments

With context

Do we already have test coverage for vector div-by-power-of-2?

Can you please add a test for

sdiv X, 2 ?

(gcc somehow avoids cmov in this case)

In D67087#1656269, @lebedev.ri wrote:

Do we already have test coverage for vector div-by-power-of-2?

There's some in vec_sdiv_to_shift.ll and vector-idiv-v2i32.ll and combine-sdiv.ll

Limit the transform to not apply to division by 2 or -2. Add more test cases

i8 and i16

gcc does this for char/long too.. Maybe you can try to lift restrictions and see how it looks like?
(if we miss test cases like sdiv i8 %x, 4 or sdiv i16 %x, 4 + srem variants - please add them - I was unable to find them in X86_combine-sdiv/ srem.ll / rem.ll)

Playing with chars, I think there is a way to improve -Oz codesize for this case too.

foo(char): # @foo(char)

mov     eax, edi
mov     cl, 4
cbw
idiv    cl
ret

gcc with -Os
foo(char):

mov     dl, 4
movsx   ax, dil
idiv    dl
ret

with short, -Oz
foo(long): # @foo(long)

mov     rax, rdi
push    4
pop     rcx
cqo
idiv    rcx
ret

Thinking about this sequence in relation to some other folds: if the add was pulled through the select operands, we'd have a select-of-constants where the false op is a zero, and that would get reduced to a 'shift+and' and defeat the goal of this patch. So if cmov is the better option for this case, it's possible that we are doing the wrong transform in some other cases.

Another consideration: if we want to avoid having the 'select' getting squashed by some other transform, then using the x86-specific opcode should ensure that (that might be why AArch chose to go with target-specific opcodes?). Having the tests here (and as usual, we should have the baseline versions committed 1st) should signal if someone changes the behavior, but it might be worth leaving a TODO comment.

llvm/lib/Target/X86/X86ISelLowering.cpp
20110 ↗	(On Diff #218527)	Does the caller always check this anyway? If so, could just make it an assert.

Diffusion mentioned this in rL370937: [X86] Pre-commit test cases and test run line changes for D67087.Sep 4 2019, 10:33 AM

craig.topper mentioned this in rGf0081dac81b3: [X86] Pre-commit test cases and test run line changes for D67087.Sep 4 2019, 10:33 AM

Rebase with tests pre-committed. I've also added i8/i16/i64 tests.

spatel added inline comments.Sep 5 2019, 10:22 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
20110 ↗	(On Diff #218527)	This question wasn't answered. The only caller is DAGCombiner::visitSDIVLike()?

Change divisor check to an assert.

LGTM

This revision is now accepted and ready to land.Sep 5 2019, 10:50 AM

Closed by commit rL371104: [X86] Override BuildSDIVPow2 for X86. (authored by ctopper). · Explain WhySep 5 2019, 11:14 AM

This revision was automatically updated to reflect the committed changes.

I've also blocked INT_MIN as the transform isn't valid for that.

The commit message says this, but I don't see any corresponding code?

In D67087#1659983, @efriedma wrote:

I've also blocked INT_MIN as the transform isn't valid for that.

The commit message says this, but I don't see any corresponding code?

Oops that was a mistake in my use of Alive. Sanjay pointed out it was fine, but to forgot to update the description.

Diff 218957

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,472 Lines • ▼ Show 20 Lines	SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
bool Reciprocal) const override;		bool Reciprocal) const override;

/// Use rcp* to speed up fdiv calculations.		/// Use rcp* to speed up fdiv calculations.
SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &RefinementSteps) const override;		int &RefinementSteps) const override;

/// Reassociate floating point divisions into multiply by reciprocal.		/// Reassociate floating point divisions into multiply by reciprocal.
unsigned combineRepeatedFPDivisors() const override;		unsigned combineRepeatedFPDivisors() const override;

		SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
		SmallVectorImpl<SDNode *> &Created) const override;
};		};

namespace X86 {		namespace X86 {
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
} // end namespace X86		} // end namespace X86

// Base class for all X86 non-masked store operations.		// Base class for all X86 non-masked store operations.
▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 20,074 Lines • ▼ Show 20 Lines
	/// CPU if a division's cost is not at least twice the cost of a multiplication.			/// CPU if a division's cost is not at least twice the cost of a multiplication.
	/// This is because we still need one division to calculate the reciprocal and			/// This is because we still need one division to calculate the reciprocal and
	/// then we need two multiplies by that reciprocal as replacements for the			/// then we need two multiplies by that reciprocal as replacements for the
	/// original divisions.			/// original divisions.
	unsigned X86TargetLowering::combineRepeatedFPDivisors() const {			unsigned X86TargetLowering::combineRepeatedFPDivisors() const {
	return 2;			return 2;
	}			}

				SDValue
				X86TargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,
				SelectionDAG &DAG,
				SmallVectorImpl<SDNode *> &Created) const {
				AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
				if (isIntDivCheap(N->getValueType(0), Attr))
				return SDValue(N,0); // Lower SDIV as SDIV

				assert((Divisor.isPowerOf2() \|\| (-Divisor).isPowerOf2()) &&
				"Unexpected divisor!");

				// Only perform this transform if CMOV is supported otherwise the select
				// below will become a branch.
				if (!Subtarget.hasCMov())
				return SDValue();

				// fold (sdiv X, pow2)
				EVT VT = N->getValueType(0);
				// FIXME: Support i8/i16.
				if ((VT != MVT::i32 && !(Subtarget.is64Bit() && VT == MVT::i64)))
				return SDValue();

				unsigned Lg2 = Divisor.countTrailingZeros();

				// If the divisor is 2 or -2, the default expansion is better.
				if (Lg2 == 1)
				return SDValue();

				SDLoc DL(N);
				SDValue N0 = N->getOperand(0);
				SDValue Zero = DAG.getConstant(0, DL, VT);
				SDValue Pow2MinusOne = DAG.getConstant((1ULL << Lg2) - 1, DL, VT);

				// If N0 is negative, we need to add (Pow2 - 1) to it before shifting right.
				SDValue Cmp = DAG.getSetCC(DL, MVT::i8, N0, Zero, ISD::SETLT);
				SDValue Add = DAG.getNode(ISD::ADD, DL, VT, N0, Pow2MinusOne);
				SDValue CMov = DAG.getNode(ISD::SELECT, DL, VT, Cmp, Add, N0);

				Created.push_back(Cmp.getNode());
				Created.push_back(Add.getNode());
				Created.push_back(CMov.getNode());

				// Divide by pow2.
				SDValue SRA =
				DAG.getNode(ISD::SRA, DL, VT, CMov, DAG.getConstant(Lg2, DL, MVT::i64));

				// If we're dividing by a positive value, we're done. Otherwise, we must
				// negate the result.
				if (Divisor.isNonNegative())
				return SRA;

				Created.push_back(SRA.getNode());
				return DAG.getNode(ISD::SUB, DL, VT, Zero, SRA);
				}

	/// Result of 'and' is compared against zero. Change to a BT node if possible.			/// Result of 'and' is compared against zero. Change to a BT node if possible.
	/// Returns the BT node and the condition code needed to use it.			/// Returns the BT node and the condition code needed to use it.
	static SDValue LowerAndToBT(SDValue And, ISD::CondCode CC,			static SDValue LowerAndToBT(SDValue And, ISD::CondCode CC,
	const SDLoc &dl, SelectionDAG &DAG,			const SDLoc &dl, SelectionDAG &DAG,
	SDValue &X86CC) {			SDValue &X86CC) {
	assert(And.getOpcode() == ISD::AND && "Expected AND node!");			assert(And.getOpcode() == ISD::AND && "Expected AND node!");
	SDValue Op0 = And.getOperand(0);			SDValue Op0 = And.getOperand(0);
	SDValue Op1 = And.getOperand(1);			SDValue Op1 = And.getOperand(1);
	▲ Show 20 Lines • Show All 26,062 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/combine-sdiv.ll

Show First 20 Lines • Show All 3,194 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%1 = sdiv i16 %x, -256		%1 = sdiv i16 %x, -256
ret i16 %1		ret i16 %1
}		}

define i32 @combine_i32_sdiv_pow2(i32 %x) {		define i32 @combine_i32_sdiv_pow2(i32 %x) {
; CHECK-LABEL: combine_i32_sdiv_pow2:		; CHECK-LABEL: combine_i32_sdiv_pow2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $edi killed $edi def $rdi		; CHECK-NEXT: # kill: def $edi killed $edi def $rdi
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: leal 15(%rdi), %eax
; CHECK-NEXT: sarl $31, %eax		; CHECK-NEXT: testl %edi, %edi
; CHECK-NEXT: shrl $28, %eax		; CHECK-NEXT: cmovnsl %edi, %eax
; CHECK-NEXT: addl %edi, %eax
; CHECK-NEXT: sarl $4, %eax		; CHECK-NEXT: sarl $4, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = sdiv i32 %x, 16		%1 = sdiv i32 %x, 16
ret i32 %1		ret i32 %1
}		}

define i32 @combine_i32_sdiv_negpow2(i32 %x) {		define i32 @combine_i32_sdiv_negpow2(i32 %x) {
; CHECK-LABEL: combine_i32_sdiv_negpow2:		; CHECK-LABEL: combine_i32_sdiv_negpow2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $edi killed $edi def $rdi		; CHECK-NEXT: # kill: def $edi killed $edi def $rdi
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: leal 255(%rdi), %eax
; CHECK-NEXT: sarl $31, %eax		; CHECK-NEXT: testl %edi, %edi
; CHECK-NEXT: shrl $24, %eax		; CHECK-NEXT: cmovnsl %edi, %eax
; CHECK-NEXT: addl %edi, %eax
; CHECK-NEXT: sarl $8, %eax		; CHECK-NEXT: sarl $8, %eax
; CHECK-NEXT: negl %eax		; CHECK-NEXT: negl %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = sdiv i32 %x, -256		%1 = sdiv i32 %x, -256
ret i32 %1		ret i32 %1
}		}

define i64 @combine_i64_sdiv_pow2(i64 %x) {		define i64 @combine_i64_sdiv_pow2(i64 %x) {
; CHECK-LABEL: combine_i64_sdiv_pow2:		; CHECK-LABEL: combine_i64_sdiv_pow2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movq %rdi, %rax		; CHECK-NEXT: leaq 15(%rdi), %rax
; CHECK-NEXT: sarq $63, %rax		; CHECK-NEXT: testq %rdi, %rdi
; CHECK-NEXT: shrq $60, %rax		; CHECK-NEXT: cmovnsq %rdi, %rax
; CHECK-NEXT: addq %rdi, %rax
; CHECK-NEXT: sarq $4, %rax		; CHECK-NEXT: sarq $4, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = sdiv i64 %x, 16		%1 = sdiv i64 %x, 16
ret i64 %1		ret i64 %1
}		}

define i64 @combine_i64_sdiv_negpow2(i64 %x) {		define i64 @combine_i64_sdiv_negpow2(i64 %x) {
; CHECK-LABEL: combine_i64_sdiv_negpow2:		; CHECK-LABEL: combine_i64_sdiv_negpow2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movq %rdi, %rax		; CHECK-NEXT: leaq 255(%rdi), %rax
; CHECK-NEXT: sarq $63, %rax		; CHECK-NEXT: testq %rdi, %rdi
; CHECK-NEXT: shrq $56, %rax		; CHECK-NEXT: cmovnsq %rdi, %rax
; CHECK-NEXT: addq %rdi, %rax
; CHECK-NEXT: sarq $8, %rax		; CHECK-NEXT: sarq $8, %rax
; CHECK-NEXT: negq %rax		; CHECK-NEXT: negq %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = sdiv i64 %x, -256		%1 = sdiv i64 %x, -256
ret i64 %1		ret i64 %1
}		}

llvm/trunk/test/CodeGen/X86/combine-srem.ll

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
ret <4 x i32> %1		ret <4 x i32> %1
}		}

; TODO fold (srem x, INT_MIN)		; TODO fold (srem x, INT_MIN)
define i32 @combine_srem_by_minsigned(i32 %x) {		define i32 @combine_srem_by_minsigned(i32 %x) {
; CHECK-LABEL: combine_srem_by_minsigned:		; CHECK-LABEL: combine_srem_by_minsigned:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $edi killed $edi def $rdi		; CHECK-NEXT: # kill: def $edi killed $edi def $rdi
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: leal 2147483647(%rdi), %eax
; CHECK-NEXT: sarl $31, %eax		; CHECK-NEXT: testl %edi, %edi
; CHECK-NEXT: shrl %eax		; CHECK-NEXT: cmovnsl %edi, %eax
; CHECK-NEXT: addl %edi, %eax
; CHECK-NEXT: andl $-2147483648, %eax # imm = 0x80000000		; CHECK-NEXT: andl $-2147483648, %eax # imm = 0x80000000
; CHECK-NEXT: addl %edi, %eax		; CHECK-NEXT: addl %edi, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = srem i32 %x, -2147483648		%1 = srem i32 %x, -2147483648
ret i32 %1		ret i32 %1
}		}

define <4 x i32> @combine_vec_srem_by_minsigned(<4 x i32> %x) {		define <4 x i32> @combine_vec_srem_by_minsigned(<4 x i32> %x) {
▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%1 = srem i16 %x, -256		%1 = srem i16 %x, -256
ret i16 %1		ret i16 %1
}		}

define i32 @combine_srem_pow2(i32 %x) {		define i32 @combine_srem_pow2(i32 %x) {
; CHECK-LABEL: combine_srem_pow2:		; CHECK-LABEL: combine_srem_pow2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: leal 15(%rax), %ecx
; CHECK-NEXT: sarl $31, %ecx		; CHECK-NEXT: testl %edi, %edi
; CHECK-NEXT: shrl $28, %ecx		; CHECK-NEXT: cmovnsl %edi, %ecx
; CHECK-NEXT: addl %edi, %ecx
; CHECK-NEXT: andl $-16, %ecx		; CHECK-NEXT: andl $-16, %ecx
; CHECK-NEXT: subl %ecx, %eax		; CHECK-NEXT: subl %ecx, %eax
		; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = srem i32 %x, 16		%1 = srem i32 %x, 16
ret i32 %1		ret i32 %1
}		}

define i32 @combine_srem_negpow2(i32 %x) {		define i32 @combine_srem_negpow2(i32 %x) {
; CHECK-LABEL: combine_srem_negpow2:		; CHECK-LABEL: combine_srem_negpow2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: leal 255(%rax), %ecx
; CHECK-NEXT: sarl $31, %ecx		; CHECK-NEXT: testl %edi, %edi
; CHECK-NEXT: shrl $24, %ecx		; CHECK-NEXT: cmovnsl %edi, %ecx
; CHECK-NEXT: addl %edi, %ecx
; CHECK-NEXT: andl $-256, %ecx		; CHECK-NEXT: andl $-256, %ecx
; CHECK-NEXT: subl %ecx, %eax		; CHECK-NEXT: subl %ecx, %eax
		; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = srem i32 %x, -256		%1 = srem i32 %x, -256
ret i32 %1		ret i32 %1
}		}

define i64 @combine_i64_srem_pow2(i64 %x) {		define i64 @combine_i64_srem_pow2(i64 %x) {
; CHECK-LABEL: combine_i64_srem_pow2:		; CHECK-LABEL: combine_i64_srem_pow2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movq %rdi, %rax		; CHECK-NEXT: movq %rdi, %rax
; CHECK-NEXT: movq %rdi, %rcx		; CHECK-NEXT: leaq 15(%rdi), %rcx
; CHECK-NEXT: sarq $63, %rcx		; CHECK-NEXT: testq %rdi, %rdi
; CHECK-NEXT: shrq $60, %rcx		; CHECK-NEXT: cmovnsq %rdi, %rcx
; CHECK-NEXT: addq %rdi, %rcx
; CHECK-NEXT: andq $-16, %rcx		; CHECK-NEXT: andq $-16, %rcx
; CHECK-NEXT: subq %rcx, %rax		; CHECK-NEXT: subq %rcx, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = srem i64 %x, 16		%1 = srem i64 %x, 16
ret i64 %1		ret i64 %1
}		}

define i64 @combine_i64_srem_negpow2(i64 %x) {		define i64 @combine_i64_srem_negpow2(i64 %x) {
; CHECK-LABEL: combine_i64_srem_negpow2:		; CHECK-LABEL: combine_i64_srem_negpow2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movq %rdi, %rax		; CHECK-NEXT: movq %rdi, %rax
; CHECK-NEXT: movq %rdi, %rcx		; CHECK-NEXT: leaq 255(%rdi), %rcx
; CHECK-NEXT: sarq $63, %rcx		; CHECK-NEXT: testq %rdi, %rdi
; CHECK-NEXT: shrq $56, %rcx		; CHECK-NEXT: cmovnsq %rdi, %rcx
; CHECK-NEXT: addq %rdi, %rcx
; CHECK-NEXT: andq $-256, %rcx		; CHECK-NEXT: andq $-256, %rcx
; CHECK-NEXT: subq %rcx, %rax		; CHECK-NEXT: subq %rcx, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = srem i64 %x, -256		%1 = srem i64 %x, -256
ret i64 %1		ret i64 %1
}		}

llvm/trunk/test/CodeGen/X86/rem.ll

Show All 21 Lines	; CHECK-NEXT: retl
%tmp1 = srem i32 %X, 255		%tmp1 = srem i32 %X, 255
ret i32 %tmp1		ret i32 %tmp1
}		}

define i32 @test2(i32 %X) {		define i32 @test2(i32 %X) {
; CHECK-LABEL: test2:		; CHECK-LABEL: test2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax		; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: movl %eax, %ecx		; CHECK-NEXT: leal 255(%eax), %ecx
; CHECK-NEXT: sarl $31, %ecx		; CHECK-NEXT: testl %eax, %eax
; CHECK-NEXT: shrl $24, %ecx		; CHECK-NEXT: cmovnsl %eax, %ecx
; CHECK-NEXT: addl %eax, %ecx
; CHECK-NEXT: andl $-256, %ecx		; CHECK-NEXT: andl $-256, %ecx
; CHECK-NEXT: subl %ecx, %eax		; CHECK-NEXT: subl %ecx, %eax
; CHECK-NEXT: retl		; CHECK-NEXT: retl
%tmp1 = srem i32 %X, 256		%tmp1 = srem i32 %X, 256
ret i32 %tmp1		ret i32 %tmp1
}		}

define i32 @test3(i32 %X) {		define i32 @test3(i32 %X) {
Show All 39 Lines

llvm/trunk/test/CodeGen/X86/srem-seteq.ll

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret{{[l\|q]}}
ret i32 %ret		ret i32 %ret
}		}

; We can lower remainder of division by powers of two much better elsewhere.		; We can lower remainder of division by powers of two much better elsewhere.
define i32 @test_srem_pow2(i32 %X) nounwind {		define i32 @test_srem_pow2(i32 %X) nounwind {
; X86-LABEL: test_srem_pow2:		; X86-LABEL: test_srem_pow2:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl %ecx, %edx		; X86-NEXT: leal 15(%ecx), %edx
; X86-NEXT: sarl $31, %edx		; X86-NEXT: testl %ecx, %ecx
; X86-NEXT: shrl $28, %edx		; X86-NEXT: cmovnsl %ecx, %edx
; X86-NEXT: addl %ecx, %edx
; X86-NEXT: andl $-16, %edx		; X86-NEXT: andl $-16, %edx
; X86-NEXT: xorl %eax, %eax		; X86-NEXT: xorl %eax, %eax
; X86-NEXT: cmpl %edx, %ecx		; X86-NEXT: cmpl %edx, %ecx
; X86-NEXT: sete %al		; X86-NEXT: sete %al
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_srem_pow2:		; X64-LABEL: test_srem_pow2:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movl %edi, %ecx		; X64-NEXT: # kill: def $edi killed $edi def $rdi
; X64-NEXT: sarl $31, %ecx		; X64-NEXT: leal 15(%rdi), %ecx
; X64-NEXT: shrl $28, %ecx		; X64-NEXT: testl %edi, %edi
; X64-NEXT: addl %edi, %ecx		; X64-NEXT: cmovnsl %edi, %ecx
; X64-NEXT: andl $-16, %ecx		; X64-NEXT: andl $-16, %ecx
; X64-NEXT: xorl %eax, %eax		; X64-NEXT: xorl %eax, %eax
; X64-NEXT: cmpl %ecx, %edi		; X64-NEXT: cmpl %ecx, %edi
; X64-NEXT: sete %al		; X64-NEXT: sete %al
; X64-NEXT: retq		; X64-NEXT: retq
%srem = srem i32 %X, 16		%srem = srem i32 %X, 16
%cmp = icmp eq i32 %srem, 0		%cmp = icmp eq i32 %srem, 0
%ret = zext i1 %cmp to i32		%ret = zext i1 %cmp to i32
ret i32 %ret		ret i32 %ret
}		}

; The fold is only valid for positive divisors, and we can't negate INT_MIN.		; The fold is only valid for positive divisors, and we can't negate INT_MIN.
define i32 @test_srem_int_min(i32 %X) nounwind {		define i32 @test_srem_int_min(i32 %X) nounwind {
; X86-LABEL: test_srem_int_min:		; X86-LABEL: test_srem_int_min:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl %ecx, %edx		; X86-NEXT: leal 2147483647(%ecx), %edx
; X86-NEXT: sarl $31, %edx		; X86-NEXT: testl %ecx, %ecx
; X86-NEXT: shrl %edx		; X86-NEXT: cmovnsl %ecx, %edx
; X86-NEXT: addl %ecx, %edx
; X86-NEXT: andl $-2147483648, %edx # imm = 0x80000000		; X86-NEXT: andl $-2147483648, %edx # imm = 0x80000000
; X86-NEXT: xorl %eax, %eax		; X86-NEXT: xorl %eax, %eax
; X86-NEXT: addl %ecx, %edx		; X86-NEXT: addl %ecx, %edx
; X86-NEXT: sete %al		; X86-NEXT: sete %al
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: test_srem_int_min:		; X64-LABEL: test_srem_int_min:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movl %edi, %ecx		; X64-NEXT: # kill: def $edi killed $edi def $rdi
; X64-NEXT: sarl $31, %ecx		; X64-NEXT: leal 2147483647(%rdi), %ecx
; X64-NEXT: shrl %ecx		; X64-NEXT: testl %edi, %edi
; X64-NEXT: addl %edi, %ecx		; X64-NEXT: cmovnsl %edi, %ecx
; X64-NEXT: andl $-2147483648, %ecx # imm = 0x80000000		; X64-NEXT: andl $-2147483648, %ecx # imm = 0x80000000
; X64-NEXT: xorl %eax, %eax		; X64-NEXT: xorl %eax, %eax
; X64-NEXT: addl %edi, %ecx		; X64-NEXT: addl %edi, %ecx
; X64-NEXT: sete %al		; X64-NEXT: sete %al
; X64-NEXT: retq		; X64-NEXT: retq
%srem = srem i32 %X, 2147483648		%srem = srem i32 %X, 2147483648
%cmp = icmp eq i32 %srem, 0		%cmp = icmp eq i32 %srem, 0
%ret = zext i1 %cmp to i32		%ret = zext i1 %cmp to i32
Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Override BuildSDIVPow2 for X86.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 218957

llvm/trunk/lib/Target/X86/X86ISelLowering.h

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/combine-sdiv.ll

llvm/trunk/test/CodeGen/X86/combine-srem.ll

llvm/trunk/test/CodeGen/X86/rem.ll

llvm/trunk/test/CodeGen/X86/srem-seteq.ll

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Override BuildSDIVPow2 for X86.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 218957

llvm/trunk/lib/Target/X86/X86ISelLowering.h

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/combine-sdiv.ll

llvm/trunk/test/CodeGen/X86/combine-srem.ll

llvm/trunk/test/CodeGen/X86/rem.ll

llvm/trunk/test/CodeGen/X86/srem-seteq.ll

[X86] Override BuildSDIVPow2 for X86.
ClosedPublic