This is an archive of the discontinued LLVM Phabricator instance.

ComputeKnownBits: be a bit smarter about ADDs
ClosedPublic

Authored by escha on Jun 17 2015, 11:53 AM.

Download Raw Diff

Details

Reviewers

chandlerc
resistor

Summary

If our two inputs have known top-zero bit counts M and N, we trivially know that the output cannot have any bits set in the top (min(M, N)-1) bits, since nothing could carry past that point.

Diff Detail

Repository: rL LLVM

Event Timeline

escha updated this revision to Diff 27857.Jun 17 2015, 11:53 AM

escha retitled this revision from to ComputeKnownBits: be a bit smarter about ADDs.

escha updated this object.

escha edited the test plan for this revision. (Show Details)

escha added a reviewer: resistor.

escha set the repository for this revision to rL LLVM.

escha added a subscriber: Unknown Object (MLST).

Added a test on x86 that utilizes this feature of ComputeKnownBits.

Fix one test that broke as a result of this, because LLVM decided to shrink the constant.

Fix one last test now that I've run all of the CodeGen tests.

escha added a reviewer: chandlerc.Jun 17 2015, 1:38 PM

Is there any hope of fixing the regression on test/CodeGen/X86/win64_frame.ll?

I really can't think of anything without going into x86-specific backend hacks (or other larger-scale changes in the constant optimization bits of the backend) :-/

LLVM tries to aggressively shrink constants in the general case, and unfortunately it hurts here... but I imagine it already hurts in a lot of other cases, this optimization just exposes one more.

In D10512#196063, @escha wrote:

I really can't think of anything without going into x86-specific backend hacks (or other larger-scale changes in the constant optimization bits of the backend) :-/

LLVM tries to aggressively shrink constants in the general case, and unfortunately it hurts here... but I imagine it already hurts in a lot of other cases, this optimization just exposes one more.

I agree, and we should fix this. The code that generates 'andq $-16, %rax' should more-generally match $34359738352 too as -16 by checking that the upper bits of the incoming value are known to be zero.

Is this patch okay? Are there any other concerns?

Patch LGTM. Can you file a PR for X86?

This revision is now accepted and ready to land.Jul 10 2015, 11:08 AM

escha closed this revision.Jul 10 2015, 11:29 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

19 lines

test/

CodeGen/

AArch64/

aarch64-dynamic-stack-layout.ll

10 lines

X86/

win64_frame.ll

5 lines

Diff 27867

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Context not available.
	// Output known-0 bits are known if clear or set in both the low clear bits	// Output known-0 bits are known if clear or set in both the low clear bits
	// common to both LHS & RHS. For example, 8+(X<<3) is known to have the	// common to both LHS & RHS. For example, 8+(X<<3) is known to have the
	// low 3 bits clear.	// low 3 bits clear.
		// Output known-0 bits are also known if the top bits of each input are
		// known to be clear. For example, if one input has the top 10 bits clear
		// and the other has the top 8 bits clear, we know the top 7 bits of the
		// output must be clear.
	computeKnownBits(Op.getOperand(0), KnownZero2, KnownOne2, Depth+1);	computeKnownBits(Op.getOperand(0), KnownZero2, KnownOne2, Depth+1);
	unsigned KnownZeroOut = KnownZero2.countTrailingOnes();	unsigned KnownZeroHigh = KnownZero2.countLeadingOnes();
		unsigned KnownZeroLow = KnownZero2.countTrailingOnes();

	computeKnownBits(Op.getOperand(1), KnownZero2, KnownOne2, Depth+1);	computeKnownBits(Op.getOperand(1), KnownZero2, KnownOne2, Depth+1);
	KnownZeroOut = std::min(KnownZeroOut,	KnownZeroHigh = std::min(KnownZeroHigh,
		KnownZero2.countLeadingOnes());
		KnownZeroLow = std::min(KnownZeroLow,
	KnownZero2.countTrailingOnes());	KnownZero2.countTrailingOnes());

	if (Op.getOpcode() == ISD::ADD) {	if (Op.getOpcode() == ISD::ADD) {
	KnownZero \|= APInt::getLowBitsSet(BitWidth, KnownZeroOut);	KnownZero \|= APInt::getLowBitsSet(BitWidth, KnownZeroLow);
		if (KnownZeroHigh > 1)
		KnownZero \|= APInt::getHighBitsSet(BitWidth, KnownZeroHigh - 1);
	break;	break;
	}	}

Context not available.
	// information if we know (at least) that the low two bits are clear. We	// information if we know (at least) that the low two bits are clear. We
	// then return to the caller that the low bit is unknown but that other bits	// then return to the caller that the low bit is unknown but that other bits
	// are known zero.	// are known zero.
	if (KnownZeroOut >= 2) // ADDE	if (KnownZeroLow >= 2) // ADDE
	KnownZero \|= APInt::getBitsSet(BitWidth, 1, KnownZeroOut);	KnownZero \|= APInt::getBitsSet(BitWidth, 1, KnownZeroLow);
	break;	break;
	}	}
	case ISD::SREM:	case ISD::SREM:
Context not available.

test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll

Context not available.
	; CHECK: ubfx x9, x0, #0, #32	; CHECK: ubfx x9, x0, #0, #32
	; CHECK: lsl x9, x9, #2	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15	; CHECK: add x9, x9, #15
	; CHECK: and x9, x9, #0xfffffffffffffff0	; CHECK: and x9, x9, #0x7fffffff0
	; CHECK: mov x10, sp	; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]	; CHECK: mov sp, x[[VLASPTMP]]
Context not available.
	; CHECK: ubfx x9, x0, #0, #32	; CHECK: ubfx x9, x0, #0, #32
	; CHECK: lsl x9, x9, #2	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15	; CHECK: add x9, x9, #15
	; CHECK: and x9, x9, #0xfffffffffffffff0	; CHECK: and x9, x9, #0x7fffffff0
	; CHECK: mov x10, sp	; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]	; CHECK: mov sp, x[[VLASPTMP]]
Context not available.
	; CHECK: ubfx x9, x0, #0, #32	; CHECK: ubfx x9, x0, #0, #32
	; CHECK: lsl x9, x9, #2	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15	; CHECK: add x9, x9, #15
	; CHECK: and x9, x9, #0xfffffffffffffff0	; CHECK: and x9, x9, #0x7fffffff0
	; CHECK: mov x10, sp	; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]	; CHECK: mov sp, x[[VLASPTMP]]
Context not available.
	; CHECK: ubfx x9, x0, #0, #32	; CHECK: ubfx x9, x0, #0, #32
	; CHECK: lsl x9, x9, #2	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15	; CHECK: add x9, x9, #15
	; CHECK: and x9, x9, #0xfffffffffffffff0	; CHECK: and x9, x9, #0x7fffffff0
	; CHECK: mov x10, sp	; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]	; CHECK: mov sp, x[[VLASPTMP]]
Context not available.
	; CHECK: ubfx x9, x0, #0, #32	; CHECK: ubfx x9, x0, #0, #32
	; CHECK: lsl x9, x9, #2	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15	; CHECK: add x9, x9, #15
	; CHECK: and x9, x9, #0xfffffffffffffff0	; CHECK: and x9, x9, #0x7fffffff0
	; CHECK: mov x10, sp	; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]	; CHECK: mov sp, x[[VLASPTMP]]
Context not available.

test/CodeGen/X86/win64_frame.ll

Context not available.

	alloca i32, i32 %a	alloca i32, i32 %a
	; CHECK: movl %ecx, %eax	; CHECK: movl %ecx, %eax
	; CHECK: leaq 15(,%rax,4), %rax	; CHECK: leaq 15(,%rax,4), %rcx
	; CHECK: andq $-16, %rax	; CHECK: movabsq $34359738352, %rax
		; CHECK: andq %rcx, %rax
	; CHECK: callq __chkstk	; CHECK: callq __chkstk
	; CHECK: subq %rax, %rsp	; CHECK: subq %rax, %rsp

Context not available.