This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
5/6
InstCombineAddSub.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2
sub.ll

Differential D61307

[InstCombine] Add new combine to sub folding
AbandonedPublic

Authored by cdawson on Apr 30 2019, 6:29 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
majnemer
lebedev.ri

Summary

(X | Y) - Y --> (X | Y) ^ Y
(Y | X) - Y --> (X | Y) ^ Y

I verified the correctness using Alive:
https://rise4fun.com/Alive/czes

This transform enables these further transforms that already exist in instcombine:
(X | Y) ^ Y --> X & ~Y
(Y | X) ^ Y --> X & ~Y

As a result, the full expected transform is:
(X | Y) - Y --> X & ~Y
(Y | X) - Y --> X & ~Y

I've added tests for cases where Y is constant and where Y is non-constant (with operands in either order).

In the constant case the optimisation is a clear win as we go from 2 instructions to 1 as we can pre-compute ~Y.

I checked that the combine still appears to be profitable when Y is non-constant, by compiling for x86_64 -mpcu=btver2 where I observed that we go from generating

	movl	%ecx, %eax
	orl	%edx, %eax
	subl	%edx, %eax

	andnl	%ecx, %edx, %eax

Diff Detail

Event Timeline

cdawson created this revision.Apr 30 2019, 6:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2019, 6:29 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

cdawson retitled this revision from [InstCombine] Add new combine to sub folding. to [InstCombine] Add new combine to sub folding.Apr 30 2019, 6:29 AM

lebedev.ri added a reviewer: lebedev.ri.Apr 30 2019, 6:36 AM

Were there tests with extra uses? If not, do add.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1535–1536	I do not understand. You don't change the inner pattern at all. Just do: if (match(Op0, m_c_Or(m_Specific(Op1), m_Value()))) return BinaryOperator::CreateXor(Op0, Op1);

cdawson added inline comments.Apr 30 2019, 7:45 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1535–1536	I totally agree with the change to the return statement and the m_c_Or check. One point I'd like clarification on is the removal of the m_OneUse. I was including that since the transformation is to facilitate a further (X \| Y) ^ Y --> X & ~Y which will only happen if there are no extra uses of (X \| Y). If you think we should always transform (X \| Y) - Y --> (X \| Y) ^ Y I'm happy to remove the m_OneUse.

lebedev.ri added inline comments.Apr 30 2019, 8:47 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1535–1536	I was including that since the transformation is to facilitate a further (X \| Y) ^ Y --> X & ~Y which will only happen if there are no extra uses of (X \| Y). Because it needs to produce 2 instructions: `~Y` and then `X & (~Y)`, and instcombine is not allowed to increase instruction count, so unless `(X \| Y)` was one-use, and would go away, that transform isn't allowed to happen. So i guess the question is: ignoring any other transforms that may or may not happen afterwards, do we have a preference between `(X \| Y) - Y` and `(X \| Y) ^ Y`? I'd guess bitop is better regardless of any further optimizations, because LHS is already a bitop. But if we would prefer `(X \| Y) - Y` (then we'd need an inverse transform), then perhaps this should instead be if (match(Op0, m_OneUse(m_c_Or(m_Specific(Op1), m_Value(X))))) return BinaryOperator::CreateAnd(X, Builder.CreateNot(Op1)); directly.

spatel added inline comments.Apr 30 2019, 9:08 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1535–1536	The xor is probably better for analysis (bit-tracking, demanded-bits, etc), so we want (sub->xor) IMO.

lebedev.ri added inline comments.Apr 30 2019, 9:14 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1535–1536	That was my view, too.

nikic added a subscriber: nikic.Apr 30 2019, 9:29 AM

Addressed review comments

cdawson marked 5 inline comments as done.May 1 2019, 6:08 AM

lebedev.ri added inline comments.May 1 2019, 6:12 AM

llvm/test/Transforms/InstCombine/sub.ll
1270–1313	Please precommit all the tests.

Right, no commit rights i guess?
This looks good to me. I think this needs one more test.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1531–1532	This needs a newline
llvm/test/Transforms/InstCombine/sub.ll
1287	This needs a negative test, with swapped order in `sub`.

LGTM too with Roman's test suggestions in place.

Another couple of minor improvements for those:

I prefer to give the tests slightly meaningful names and/or copy the code comment to help explain the transform.
It's nice to have a test with vector type, so we know those work/won't regress too.

This revision is now accepted and ready to land.May 1 2019, 12:52 PM

cdawson mentioned this in D61517: [InstCombine] Add new combine to add folding..May 3 2019, 8:51 AM

Apologies for the spam.

Since the mentioned review I discovered that the non-constant case was already handled in visitSub but the constant case wasn't due to an earlier optimisation. I have spun off a new review for this: D61517

Was this committed already?

This patch has been superseded by D61517

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

7 lines

test/

Transforms/

InstCombine/

sub.ll

45 lines

Diff 197534

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,522 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitSub(BinaryOperator &I) {

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

// (AB)-(AC) -> A*(B-C) etc		// (AB)-(AC) -> A*(B-C) etc
if (Value *V = SimplifyUsingDistributiveLaws(I))		if (Value *V = SimplifyUsingDistributiveLaws(I))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

// If this is a 'B = x-(-A)', change to B = x+A.
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
		// (X \| Y) - Y --> (X \| Y) ^ Y
		lebedev.riUnsubmitted Not Done Reply Inline Actions This needs a newline lebedev.ri: This needs a newline
		// (Y \| X) - Y --> (Y \| X) ^ Y
		if (match(Op0, m_c_Or(m_Specific(Op1), m_Value())))
		return BinaryOperator::CreateXor(Op0, Op1);

		lebedev.riUnsubmitted Done Reply Inline Actions I do not understand. You don't change the inner pattern at all. Just do: if (match(Op0, m_c_Or(m_Specific(Op1), m_Value()))) return BinaryOperator::CreateXor(Op0, Op1); lebedev.ri: I do not understand. You don't change the inner pattern at all. Just do: ``` if (match(Op0…
		cdawsonAuthorUnsubmitted Done Reply Inline Actions I totally agree with the change to the return statement and the m_c_Or check. One point I'd like clarification on is the removal of the m_OneUse. I was including that since the transformation is to facilitate a further (X \| Y) ^ Y --> X & ~Y which will only happen if there are no extra uses of (X \| Y). If you think we should always transform (X \| Y) - Y --> (X \| Y) ^ Y I'm happy to remove the m_OneUse. cdawson: I totally agree with the change to the return statement and the m_c_Or check. One point I'd…
		lebedev.riUnsubmitted Done Reply Inline Actions I was including that since the transformation is to facilitate a further (X \| Y) ^ Y --> X & ~Y which will only happen if there are no extra uses of (X \| Y). Because it needs to produce 2 instructions: `~Y` and then `X & (~Y)`, and instcombine is not allowed to increase instruction count, so unless `(X \| Y)` was one-use, and would go away, that transform isn't allowed to happen. So i guess the question is: ignoring any other transforms that may or may not happen afterwards, do we have a preference between `(X \| Y) - Y` and `(X \| Y) ^ Y`? I'd guess bitop is better regardless of any further optimizations, because LHS is already a bitop. But if we would prefer `(X \| Y) - Y` (then we'd need an inverse transform), then perhaps this should instead be if (match(Op0, m_OneUse(m_c_Or(m_Specific(Op1), m_Value(X))))) return BinaryOperator::CreateAnd(X, Builder.CreateNot(Op1)); directly. lebedev.ri: > I was including that since the transformation is to facilitate a further (X \| Y) ^ Y --> X &…
		spatelUnsubmitted Done Reply Inline Actions The xor is probably better for analysis (bit-tracking, demanded-bits, etc), so we want (sub->xor) IMO. spatel: The xor is probably better for analysis (bit-tracking, demanded-bits, etc), so we want (sub…
		lebedev.riUnsubmitted Done Reply Inline Actions That was my view, too. lebedev.ri: That was my view, too.
		// If this is a 'B = x-(-A)', change to B = x+A.
if (Value *V = dyn_castNegVal(Op1)) {		if (Value *V = dyn_castNegVal(Op1)) {
BinaryOperator *Res = BinaryOperator::CreateAdd(Op0, V);		BinaryOperator *Res = BinaryOperator::CreateAdd(Op0, V);

if (const auto *BO = dyn_cast<BinaryOperator>(Op1)) {		if (const auto *BO = dyn_cast<BinaryOperator>(Op1)) {
assert(BO->getOpcode() == Instruction::Sub &&		assert(BO->getOpcode() == Instruction::Sub &&
"Expected a subtraction operator!");		"Expected a subtraction operator!");
if (BO->hasNoSignedWrap() && I.hasNoSignedWrap())		if (BO->hasNoSignedWrap() && I.hasNoSignedWrap())
Res->setHasNoSignedWrap(true);		Res->setHasNoSignedWrap(true);
▲ Show 20 Lines • Show All 385 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/sub.ll

	Show First 20 Lines • Show All 1,261 Lines • ▼ Show 20 Lines
	;			;
	%1 = xor <2 x i32> %x, <i32 -1, i32 -1>			%1 = xor <2 x i32> %x, <i32 -1, i32 -1>
	%2 = icmp sgt <2 x i32> %1, <i32 -256, i32 -128>			%2 = icmp sgt <2 x i32> %1, <i32 -256, i32 -128>
	%3 = select <2 x i1> %2, <2 x i32> %1, <2 x i32> <i32 -256, i32 -128>			%3 = select <2 x i1> %2, <2 x i32> %1, <2 x i32> <i32 -256, i32 -128>
	%res = sub <2 x i32> zeroinitializer, %3			%res = sub <2 x i32> zeroinitializer, %3
	ret <2 x i32> %res			ret <2 x i32> %res
	}			}

				define i32 @test70(i32 %A) {
				; CHECK-LABEL: @test70(
				; CHECK-NEXT: [[TMP1:%.]] = and i32 [[A:%.]], -124
				; CHECK-NEXT: ret i32 [[TMP1]]
				;
				%B = or i32 %A, 123
				%C = sub i32 %B, 123
				ret i32 %C
				}

				define i32 @test71(i32 %A, i32 %B) {
				; CHECK-LABEL: @test71(
				; CHECK-NEXT: [[TMP1:%.]] = xor i32 [[B:%.]], -1
				; CHECK-NEXT: [[D:%.]] = and i32 [[TMP1]], [[A:%.]]
				; CHECK-NEXT: ret i32 [[D]]
				;
				%C = or i32 %A, %B
				%D = sub i32 %C, %B
				lebedev.riUnsubmitted Not Done Reply Inline Actions This needs a negative test, with swapped order in `sub`. lebedev.ri: This needs a negative test, with swapped order in `sub`.
				ret i32 %D
				}

				define i32 @test72(i32 %A, i32 %B) {
				; CHECK-LABEL: @test72(
				; CHECK-NEXT: [[TMP1:%.]] = xor i32 [[B:%.]], -1
				; CHECK-NEXT: [[D:%.]] = and i32 [[TMP1]], [[A:%.]]
				; CHECK-NEXT: ret i32 [[D]]
				;
				%C = or i32 %B, %A
				%D = sub i32 %C, %B
				ret i32 %D
				}

				define i32 @test73(i32 %A, i32 %B) {
				; CHECK-LABEL: @test73(
				; CHECK-NEXT: [[C:%.]] = or i32 [[A:%.]], [[B:%.*]]
				; CHECK-NEXT: [[D:%.*]] = xor i32 [[C]], [[B]]
				; CHECK-NEXT: [[E:%.*]] = mul i32 [[C]], [[D]]
				; CHECK-NEXT: ret i32 [[E]]
				;
				%C = or i32 %A, %B
				%D = sub i32 %C, %B
				%E = mul i32 %C, %D
				ret i32 %E
				}
				lebedev.riUnsubmitted Not Done Reply Inline Actions Please precommit all the tests. lebedev.ri: Please precommit all the tests.

	define i32 @nsw_inference1(i32 %x, i32 %y) {			define i32 @nsw_inference1(i32 %x, i32 %y) {
	; CHECK-LABEL: @nsw_inference1(			; CHECK-LABEL: @nsw_inference1(
	; CHECK-NEXT: [[X2:%.]] = or i32 [[X:%.]], 1024			; CHECK-NEXT: [[X2:%.]] = or i32 [[X:%.]], 1024
	; CHECK-NEXT: [[Y2:%.]] = and i32 [[Y:%.]], 1			; CHECK-NEXT: [[Y2:%.]] = and i32 [[Y:%.]], 1
	; CHECK-NEXT: [[Z:%.*]] = sub nuw nsw i32 [[X2]], [[Y2]]			; CHECK-NEXT: [[Z:%.*]] = sub nuw nsw i32 [[X2]], [[Y2]]
	; CHECK-NEXT: ret i32 [[Z]]			; CHECK-NEXT: ret i32 [[Z]]
	;			;
	%x2 = or i32 %x, 1024			%x2 = or i32 %x, 1024
	Show All 17 Lines