This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineSelect.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
max-of-nots.ll

Differential D44266

[InstCombine] remove use restriction for min/max with not operands (PR35875)
AbandonedPublic

Authored by spatel on Mar 8 2018, 10:59 AM.

Download Raw Diff

Details

Reviewers

efriedma
craig.topper
majnemer
dmgreen
• dberlin

Summary

I looked around instcombine for precedence for this, but I don't see any existing folds where it comes up.
I'm proposing that we pull 'not' ops through min/max even if it means adding an instruction (because all of the 'not' values have other uses) .
The justification for this is that we're still eliminating all (4) of the 'not' value uses in the min/max, so it's still a net reduction of uses.
The bigger motivating case can be seen in the last PR35875 test. We end up eliminating instructions there because we can absorb 'not' into 'sub'.

Possible alternatives are:

Do a very narrow/large pattern match for min/max+sub. To get the motivating example, it would actually have to be a match of 3-way min/max + sub.
Defer this fold to some other pass (aggressive-instcombine, GVN?) where we can check users to make sure we won't increase the instruction count.

Diff Detail

Event Timeline

spatel created this revision.Mar 8 2018, 10:59 AM

Herald added a reviewer: • dberlin. · View Herald TranscriptMar 8 2018, 10:59 AM

Herald added a subscriber: mcrosier. · View Herald Transcript

craig.topper mentioned this in D44398: [InstCombine] Replace calls to getNumUses with hasNUses or hasNUsesOrMore.Mar 12 2018, 11:32 AM

Diffusion mentioned this in rL327315: [InstCombine] Replace calls to getNumUses with hasNUses or hasNUsesOrMore.Mar 12 2018, 11:50 AM

Ping.

Sorry for the delay. I obviously like this because it fixes my regression :)

Before I went away I ran a load of benchmarks. They eventually finished and showed no regressions except for this rgb to cmy case (which went up 48% on v6m targets!)

If this is unpalatable because it can increase instruction count, we may be able to do something in IsFreeOrProfitableToInvert in InstCombineSelect. Is checking the users in InstCombine (more than just counting them) not a done thing?

In D44266#1041299, @dmgreen wrote:

Before I went away I ran a load of benchmarks. They eventually finished and showed no regressions except for this rgb to cmy case (which went up 48% on v6m targets!)

Thanks for verifying that this change does what we intended on the motivating case (and does no noticeable harm otherwise).

If this is unpalatable because it can increase instruction count, we may be able to do something in IsFreeOrProfitableToInvert in InstCombineSelect. Is checking the users in InstCombine (more than just counting them) not a done thing?

It is done sometimes, but I view that code construct with suspicion. It's usually just covering for a misplaced pattern match. Ie, in this case it wouldn't be much different than starting a larger match from a sub.

I still don't know how we should handle this, but staring at a bit more, I see that this is just an application of DeMorgan's Law. So if we proceed down this path, we also want to loosen the use restrictions to allow:

Name: not_not_and_sub_with_not
%nx = xor i8 %x, -1
%ny = xor i8 %y, -1
%nz = xor i8 %z, -1
%a = and i8 %nx, %ny
%s = sub i8 %a, %nz
  =>
%o = or i8 %x, %y
%s = sub i8 %z, %o

https://rise4fun.com/Alive/cvv
...or if we decide to use larger matches starting from the sub, we'd want to match basic logic ops as well as min/max.

Ping * 2.

Ping * 3.

Anyone want to take a shot at this bit of IR canonicalization theory? :)

spatel mentioned this in D45317: Eliminate a bitwise 'not' op of 'not' min/max by inverting the min/max..Apr 5 2018, 6:05 AM

a.elovikov added a subscriber: a.elovikov.Apr 5 2018, 6:54 AM

There are basically three equivalencies here: ~~a == a, (~a-~b)==(b-a), and UMIN(~a, ~b) == ~UMAX(a, b). The tricky bit is turning those three equivalencies into a profitable transform. instcombine is generally bad at this sort of transform; we don't keep track of potential transforms, so instead we usually just suppress transforms which are not immediately profitable. So maybe we miss some transforms, but at least we aren't making the user's code worse.

So I guess these are the options:

Make the code worse, and hope it gets better later. That's what this patch does, but it's not something we like to do usually; often it happens to be profitable for the particular testcase someone is looking at, but it makes the code worse in other cases.
Use a gigantic match for your exact testcase. This works, but obviously isn't very general.
Make a new pass (or AggressiveInstCombine) which does some sort of "global" optimization (to, for example, minimize the total number of not operations in a function).

ArturGainullin added a subscriber: ArturGainullin.Apr 6 2018, 1:37 AM

In D44266#1059212, @efriedma wrote:

There are basically three equivalencies here: ~~a == a, (~a-~b)==(b-a), and UMIN(~a, ~b) == ~UMAX(a, b). The tricky bit is turning those three equivalencies into a profitable transform. instcombine is generally bad at this sort of transform; we don't keep track of potential transforms, so instead we usually just suppress transforms which are not immediately profitable. So maybe we miss some transforms, but at least we aren't making the user's code worse.

So I guess these are the options:

Make the code worse, and hope it gets better later. That's what this patch does, but it's not something we like to do usually; often it happens to be profitable for the particular testcase someone is looking at, but it makes the code worse in other cases.

Use a gigantic match for your exact testcase. This works, but obviously isn't very general.

Make a new pass (or AggressiveInstCombine) which does some sort of "global" optimization (to, for example, minimize the total number of not operations in a function).

Thanks! I agree with all of this. I figured it was worth a discussion to see if there was support for the 'more instructions, but less uses' argument, but I knew that was shaky at best. Also, getting a perf win by removing a line of code from the compiler was worth a shot.

I think AggressiveInstCombine is a better place to house rare and narrow matchers like what we need here (see also the discussion in D45173). It seems likely that we could also transfer some existing InstCombine functionality over there to reduce the scope and cost of InstCombine. Yes, that pass becomes the new island of misfit folds, but it's easy to remove if people don't like it (only running at -O3 currently).

spatel mentioned this in D45986: [AggressiveInstCombine] convert a chain of 'or-shift' bits into masked compare.Apr 23 2018, 1:46 PM

spatel mentioned this in rL331311: [AggressiveInstCombine] convert a chain of 'or-shift' bits into masked compare.May 1 2018, 2:06 PM

spatel mentioned this in D46760: [InstCombine] Enhance narrowUDivURem..May 20 2018, 10:56 AM

lebedev.ri mentioned this in D47113: [CVP] Teach CorrelatedValuePropagation to reduce the width of lshr instruction..May 21 2018, 3:19 AM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineSelect.cpp

3 lines

test/

Transforms/

InstCombine/

max-of-nots.ll

32 lines

Diff 137614

lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 1,629 Lines • ▼ Show 20 Lines	if (SelectPatternResult::isMinOrMax(SPF)) {

Value *NewCast = Builder.CreateCast(CastOp, NewSI, SelType);		Value *NewCast = Builder.CreateCast(CastOp, NewSI, SelType);
return replaceInstUsesWith(SI, NewCast);		return replaceInstUsesWith(SI, NewCast);
}		}

// MAX(~a, ~b) -> ~MIN(a, b)		// MAX(~a, ~b) -> ~MIN(a, b)
// MIN(~a, ~b) -> ~MAX(a, b)		// MIN(~a, ~b) -> ~MAX(a, b)
Value A, B;		Value A, B;
if (match(LHS, m_Not(m_Value(A))) && match(RHS, m_Not(m_Value(B))) &&		if (match(LHS, m_Not(m_Value(A))) && match(RHS, m_Not(m_Value(B)))) {
(LHS->getNumUses() <= 2 \|\| RHS->getNumUses() <= 2)) {
CmpInst::Predicate InvertedPred = getInverseMinMaxPred(SPF);		CmpInst::Predicate InvertedPred = getInverseMinMaxPred(SPF);
Value *InvertedCmp = Builder.CreateICmp(InvertedPred, A, B);		Value *InvertedCmp = Builder.CreateICmp(InvertedPred, A, B);
Value *NewSel = Builder.CreateSelect(InvertedCmp, A, B);		Value *NewSel = Builder.CreateSelect(InvertedCmp, A, B);
return BinaryOperator::CreateNot(NewSel);		return BinaryOperator::CreateNot(NewSel);
}		}

if (Instruction *I = factorizeMinMaxTree(SPF, LHS, RHS, Builder))		if (Instruction *I = factorizeMinMaxTree(SPF, LHS, RHS, Builder))
return I;		return I;
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

test/Transforms/InstCombine/max-of-nots.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
call void @extra_use(i8 %nx)		call void @extra_use(i8 %nx)
ret i8 %minxy		ret i8 %minxy
}		}

define i8 @umin_not_2_extra_use(i8 %x, i8 %y) {		define i8 @umin_not_2_extra_use(i8 %x, i8 %y) {
; CHECK-LABEL: @umin_not_2_extra_use(		; CHECK-LABEL: @umin_not_2_extra_use(
; CHECK-NEXT: [[NX:%.]] = xor i8 [[X:%.]], -1		; CHECK-NEXT: [[NX:%.]] = xor i8 [[X:%.]], -1
; CHECK-NEXT: [[NY:%.]] = xor i8 [[Y:%.]], -1		; CHECK-NEXT: [[NY:%.]] = xor i8 [[Y:%.]], -1
; CHECK-NEXT: [[CMPXY:%.*]] = icmp ult i8 [[NX]], [[NY]]		; CHECK-NEXT: [[TMP1:%.*]] = icmp ugt i8 [[X]], [[Y]]
; CHECK-NEXT: [[MINXY:%.*]] = select i1 [[CMPXY]], i8 [[NX]], i8 [[NY]]		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i8 [[X]], i8 [[Y]]
		; CHECK-NEXT: [[MINXY:%.*]] = xor i8 [[TMP2]], -1
; CHECK-NEXT: call void @extra_use(i8 [[NX]])		; CHECK-NEXT: call void @extra_use(i8 [[NX]])
; CHECK-NEXT: call void @extra_use(i8 [[NY]])		; CHECK-NEXT: call void @extra_use(i8 [[NY]])
; CHECK-NEXT: ret i8 [[MINXY]]		; CHECK-NEXT: ret i8 [[MINXY]]
;		;
%nx = xor i8 %x, -1		%nx = xor i8 %x, -1
%ny = xor i8 %y, -1		%ny = xor i8 %y, -1
%cmpxy = icmp ult i8 %nx, %ny		%cmpxy = icmp ult i8 %nx, %ny
%minxy = select i1 %cmpxy, i8 %nx, i8 %ny		%minxy = select i1 %cmpxy, i8 %nx, i8 %ny
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

declare void @use8(i8)		declare void @use8(i8)

define i8 @umin3_not_all_ops_extra_uses(i8 %x, i8 %y, i8 %z) {		define i8 @umin3_not_all_ops_extra_uses(i8 %x, i8 %y, i8 %z) {
; CHECK-LABEL: @umin3_not_all_ops_extra_uses(		; CHECK-LABEL: @umin3_not_all_ops_extra_uses(
; CHECK-NEXT: [[XN:%.]] = xor i8 [[X:%.]], -1		; CHECK-NEXT: [[XN:%.]] = xor i8 [[X:%.]], -1
; CHECK-NEXT: [[YN:%.]] = xor i8 [[Y:%.]], -1		; CHECK-NEXT: [[YN:%.]] = xor i8 [[Y:%.]], -1
; CHECK-NEXT: [[ZN:%.]] = xor i8 [[Z:%.]], -1		; CHECK-NEXT: [[ZN:%.]] = xor i8 [[Z:%.]], -1
; CHECK-NEXT: [[CMPXZ:%.*]] = icmp ult i8 [[XN]], [[ZN]]		; CHECK-NEXT: [[TMP1:%.*]] = icmp ugt i8 [[X]], [[Z]]
; CHECK-NEXT: [[MINXZ:%.*]] = select i1 [[CMPXZ]], i8 [[XN]], i8 [[ZN]]		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i8 [[X]], i8 [[Z]]
; CHECK-NEXT: [[CMPXYZ:%.*]] = icmp ult i8 [[MINXZ]], [[YN]]		; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i8 [[TMP2]], [[Y]]
; CHECK-NEXT: [[MINXYZ:%.*]] = select i1 [[CMPXYZ]], i8 [[MINXZ]], i8 [[YN]]		; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i8 [[TMP2]], i8 [[Y]]
		; CHECK-NEXT: [[MINXYZ:%.*]] = xor i8 [[TMP4]], -1
; CHECK-NEXT: call void @use8(i8 [[XN]])		; CHECK-NEXT: call void @use8(i8 [[XN]])
; CHECK-NEXT: call void @use8(i8 [[YN]])		; CHECK-NEXT: call void @use8(i8 [[YN]])
; CHECK-NEXT: call void @use8(i8 [[ZN]])		; CHECK-NEXT: call void @use8(i8 [[ZN]])
; CHECK-NEXT: ret i8 [[MINXYZ]]		; CHECK-NEXT: ret i8 [[MINXYZ]]
;		;
%xn = xor i8 %x, -1		%xn = xor i8 %x, -1
%yn = xor i8 %y, -1		%yn = xor i8 %y, -1
%zn = xor i8 %z, -1		%zn = xor i8 %z, -1
%cmpxz = icmp ult i8 %xn, %zn		%cmpxz = icmp ult i8 %xn, %zn
%minxz = select i1 %cmpxz, i8 %xn, i8 %zn		%minxz = select i1 %cmpxz, i8 %xn, i8 %zn
%cmpxyz = icmp ult i8 %minxz, %yn		%cmpxyz = icmp ult i8 %minxz, %yn
%minxyz = select i1 %cmpxyz, i8 %minxz, i8 %yn		%minxyz = select i1 %cmpxyz, i8 %minxz, i8 %yn
call void @use8(i8 %xn)		call void @use8(i8 %xn)
call void @use8(i8 %yn)		call void @use8(i8 %yn)
call void @use8(i8 %zn)		call void @use8(i8 %zn)
ret i8 %minxyz		ret i8 %minxyz
}		}

define void @umin3_not_all_ops_extra_uses_invert_subs(i8 %x, i8 %y, i8 %z) {		define void @umin3_not_all_ops_extra_uses_invert_subs(i8 %x, i8 %y, i8 %z) {
; CHECK-LABEL: @umin3_not_all_ops_extra_uses_invert_subs(		; CHECK-LABEL: @umin3_not_all_ops_extra_uses_invert_subs(
; CHECK-NEXT: [[XN:%.]] = xor i8 [[X:%.]], -1		; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i8 [[X:%.]], [[Z:%.*]]
; CHECK-NEXT: [[YN:%.]] = xor i8 [[Y:%.]], -1		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i8 [[X]], i8 [[Z]]
; CHECK-NEXT: [[ZN:%.]] = xor i8 [[Z:%.]], -1		; CHECK-NEXT: [[TMP3:%.]] = icmp ugt i8 [[TMP2]], [[Y:%.]]
; CHECK-NEXT: [[CMPXZ:%.*]] = icmp ult i8 [[XN]], [[ZN]]		; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i8 [[TMP2]], i8 [[Y]]
; CHECK-NEXT: [[MINXZ:%.*]] = select i1 [[CMPXZ]], i8 [[XN]], i8 [[ZN]]		; CHECK-NEXT: [[MINXYZ:%.*]] = xor i8 [[TMP4]], -1
; CHECK-NEXT: [[CMPXYZ:%.*]] = icmp ult i8 [[MINXZ]], [[YN]]		; CHECK-NEXT: [[XMIN:%.*]] = sub i8 [[TMP4]], [[X]]
; CHECK-NEXT: [[MINXYZ:%.*]] = select i1 [[CMPXYZ]], i8 [[MINXZ]], i8 [[YN]]		; CHECK-NEXT: [[YMIN:%.*]] = sub i8 [[TMP4]], [[Y]]
; CHECK-NEXT: [[XMIN:%.*]] = sub i8 [[XN]], [[MINXYZ]]		; CHECK-NEXT: [[ZMIN:%.*]] = sub i8 [[TMP4]], [[Z]]
; CHECK-NEXT: [[YMIN:%.*]] = sub i8 [[YN]], [[MINXYZ]]
; CHECK-NEXT: [[ZMIN:%.*]] = sub i8 [[ZN]], [[MINXYZ]]
; CHECK-NEXT: call void @use8(i8 [[MINXYZ]])		; CHECK-NEXT: call void @use8(i8 [[MINXYZ]])
; CHECK-NEXT: call void @use8(i8 [[XMIN]])		; CHECK-NEXT: call void @use8(i8 [[XMIN]])
; CHECK-NEXT: call void @use8(i8 [[YMIN]])		; CHECK-NEXT: call void @use8(i8 [[YMIN]])
; CHECK-NEXT: call void @use8(i8 [[ZMIN]])		; CHECK-NEXT: call void @use8(i8 [[ZMIN]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%xn = xor i8 %x, -1		%xn = xor i8 %x, -1
%yn = xor i8 %y, -1		%yn = xor i8 %y, -1
▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines