This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineAddSub.cpp
-
InstCombineInternal.h
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
sub-minmax.ll

Differential D52177

[InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A
ClosedPublic

Authored by dmgreen on Sep 17 2018, 10:00 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper

Commits

rG1e44c3b62c3d: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A
rL343569: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A

Summary

This is an attempt to get out of a local-minimum that instcombine currently gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at least neutral. This involves using IsFreeToInvert, which has been expanded a little to include selects that can be easily inverted.

This is trying to fix PR35875, using the ideas from Sanjay. It is a large improvement to one of our rgb to cmy kernels.

Diff Detail

Repository: rL LLVM

Event Timeline

dmgreen created this revision.Sep 17 2018, 10:00 AM

spatel mentioned this in D52070: [InstCombine] Fold (min/max ~X, Y) -> ~(max/min X, ~Y) when Y is freely invertible.Sep 17 2018, 11:22 AM

Thanks for pointing to D52070. I think I saw that (it gave me the idea for this), but hadn't realised it had come back out.

I'll rebase this once that's in.

Turns out there wasn't any conflict, but I've tried to clean this up a little and add a few more tests.

I was hoping to find a more general solution for min/max with nots, but I'm not seeing it, so just a few nits in the inline comments.
Craig's been fighting infinite loops in this area, so let's see if he has any comments on the safety constraints.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1670 ↗	(On Diff #166727)	typo: invertible
1683–1684 ↗	(On Diff #166727)	The uses constraints deserve a code comment.
lib/Transforms/InstCombine/InstCombineInternal.h
181 ↗	(On Diff #166727)	typo: invertible
181–183 ↗	(On Diff #165778)	This is similar to what I was imagining in a comment in D51964, but I'm still not sure if we need to special-case min/max patterns for the extra-uses. The LHS->hasNUsesOrMore(3) hack in the caller might be enough...

In D52177#1244827, @spatel wrote:

I was hoping to find a more general solution for min/max with nots, but I'm not seeing it, so just a few nits in the inline comments.

I have another I'm afraid. Over in D52508.

Craig's been fighting infinite loops in this area, so let's see if he has any comments on the safety constraints.

Any suggestions on finding these? I've run the testsuite and a bootstrap, I presume that wouldn't catch much?

lib/Transforms/InstCombine/InstCombineInternal.h
181–183 ↗	(On Diff #165778)	The LHS->hasNUsesOrMore(3) isn't _meant_ as a hack. Unless you mean it's ugly? It should have 2 uses from the min/max, so 3+ means we wouldn't invert all uses. The motivating case here (umin3_not_all_ops_extra_uses_invert_subs) is something like min(min(not(a), not(b)), not(c)), but with subs in there too. So there's two min's, one we are folding from, and other we are checking is freely invertible. That's what this is trying to catch. I was reluctant to make IsFreeToInvert recursive, but that would make it more powerful and remove the need for just checking m_Not's here.

Added a comment and correct spelling

spatel added inline comments.Sep 25 2018, 10:51 AM

lib/Transforms/InstCombine/InstCombineInternal.h
181–183 ↗	(On Diff #165778)	Yeah - the hack comment was really about the ugliness, and the root cause of that is not having intrinsics for integer min/max...but it'd still be easier reading if we had hasNUsesOrLess()?

In D52177#1245359, @dmgreen wrote:

Craig's been fighting infinite loops in this area, so let's see if he has any comments on the safety constraints.

Any suggestions on finding these? I've run the testsuite and a bootstrap, I presume that wouldn't catch much?

That's a good first try, but the earlier cases escaped to the wild...though not for long. :)
We end up being able to reduce the problems to relatively short IR tests which seem obvious in hindsight, but I don't know how to spot them in advance any better than what we've done.
I don't have any other feedback, so LGTM.

This revision is now accepted and ready to land.Sep 28 2018, 7:12 AM

Thanks

Let me know if you see anything funny from this patch.

Closed by commit rL343569: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A (authored by dmgreen). · Explain WhyOct 2 2018, 2:50 AM

This revision was automatically updated to reflect the committed changes.

I'm seeing a regression on goldmont and silvermont cpus in an rgb cmyk conversion benchmark in 32-bit mode. What I've observed is that the 3 subtracts in the code all now have the same LHS register. X86 destroys the LHS of a subtract instruction so we have to make copies before the subtracts. We're in 32-bit mode so our 8-bit register choices are %al, %bl, %cl, %dl, %ah, %bh, %ch, %dh. Silvermont and Goldmont have bad partial register handling for writing high and low 8-bit registers. Unlike Sandy Bridge, Haswell, Skylake, the high and low registers aren't renamed independently. I tried playing around with promoting everything to 32-bits to avoid the partial registers, but that was actually worse somehow.

@dmgreen what target are you using?

Oh, no. That's not what I wanted to hear. I presume we are looking at the same bit of code!

This was intended to fix things on the rgb kernel, and did pretty well on our tests. I think it was a 45% increase on some cpus, such as the m0plus. You can probably guess this was on Arm, in that case a thumb1 target. It sounds like we were hitting pretty much the opposite conditions, with the old code doing badly around the selects of nots in our case. I had presumed the smaller IR would have produced better code for everyone.

We do end up converting the whole thing to i32 because i8's are not legal types for our registers. We will do things with i8 for vectors on say aarch64 or v8a, but the cortex-m microcontrollers won't have that. I presume that Goldmont is in the same boat, not having vectorisation? (but, perhaps, supporting the instructions?) The vectorised code looked pretty good (especially on aarch64; I tried skylake too, which was better, but had a lot of shuffling going on).

Um, in terms of fixes, I guess our alternatives are make this instcombine dependant on the target, or try to undo it somehow in the backend. My understanding is that the first option is not generally done in instcombine. Can someone give me some history there? Was it something that was decided as a rule, or just something that never came up. (I guess also in this case, not supporting this fold because it happens to cause the extra subs in a random test isn't a great reason to not do it).

The other options would be something like; if there multiple subs with the same (lhs?) operand, and that whole thing is invertible (which may be easy said than done), we invert it all back using ~a - ~b = b - a. I'm not sure how easy that is exactly. This case is a bit of a tangle of multiple uses.
Dave

Goldmont does have SSE4.2 so shoudl support 128-bit vectors. Not sure why we didn't vectorize. Here's the IR for the loop that changed

for.body8.us:                                     ; preds = %for.body8.us, %for.body.us
  %EritePtr.0134.us = phi i8* [ %call.i7, %for.body.us ], [ %incdec.ptr58.us, %for.body8.us ]
  %ReadPtr.0133.us = phi i8* [ %phi.call3.i, %for.body.us ], [ %incdec.ptr10.us, %for.body8.us ]
  %i.0132.us = phi i32 [ 0, %for.body.us ], [ %inc.us, %for.body8.us ]
  %incdec.ptr.us = getelementptr inbounds i8, i8* %ReadPtr.0133.us, i32 1
  %44 = load i8, i8* %ReadPtr.0133.us, align 1, !tbaa !14
  %incdec.ptr9.us = getelementptr inbounds i8, i8* %ReadPtr.0133.us, i32 2
  %45 = load i8, i8* %incdec.ptr.us, align 1, !tbaa !14
  %incdec.ptr10.us = getelementptr inbounds i8, i8* %ReadPtr.0133.us, i32 3
  %46 = load i8, i8* %incdec.ptr9.us, align 1, !tbaa !14
  %47 = icmp ugt i8 %44, %46
  %48 = select i1 %47, i8 %44, i8 %46
  %49 = icmp ugt i8 %48, %45
  %50 = select i1 %49, i8 %48, i8 %45
  %51 = xor i8 %50, -1
  %sub45.us = sub i8 %50, %44
  %sub49.us = sub i8 %50, %45
  %sub53.us = sub i8 %50, %46
  %incdec.ptr55.us = getelementptr inbounds i8, i8* %EritePtr.0134.us, i32 1
  store i8 %sub45.us, i8* %EritePtr.0134.us, align 1, !tbaa !14
  %incdec.ptr56.us = getelementptr inbounds i8, i8* %EritePtr.0134.us, i32 2
  store i8 %sub49.us, i8* %incdec.ptr55.us, align 1, !tbaa !14
  %incdec.ptr57.us = getelementptr inbounds i8, i8* %EritePtr.0134.us, i32 3
  store i8 %sub53.us, i8* %incdec.ptr56.us, align 1, !tbaa !14
  %incdec.ptr58.us = getelementptr inbounds i8, i8* %EritePtr.0134.us, i32 4
  store i8 %51, i8* %incdec.ptr57.us, align 1, !tbaa !14
  %inc.us = add nuw i32 %i.0132.us, 1
  %exitcond138 = icmp eq i32 %inc.us, %mul
  br i1 %exitcond138, label %for.cond5.for.end_crit_edge.us, label %for.body8.us

I think the fact that we're pretty register constrained is hurting my attempt at promoting to 32 bit. We only had 8 32-bit registers to start with. We lost one to the load pointer, one to the store pointer, one to the loop index, one to the stack pointer. One seems to be used by a loop around this one. So that left us 3 registers inside this loop to do the math with.

Theoretically if I could turn the subtracts into add(negate(RHS), LHS) then there should be less register pressure because the RHS of the subtract isn't used again so the negate can happen without copy and then we can put it on the LHS of an add and have no issue overwriting it.

Yeah, that looks like similar IR to what I was looking at. The vectorised version on Skylake (https://godbolt.org/z/RBS2Os) has a lot of shuffling, perhaps that's deemed unprofitable on Goldmont?

I can agree that 8 registers are hard to deal with. Can you explain the "promoting everything to 32-bits", do you mean essentially zext's/truncs around the whole max/max/xor/sub's block? I gave that a try and the sub's still seemed to be using bl's. (it uses cmp's not branches though, which looks better to my untrained eyes).

In D52177#1262757, @dmgreen wrote:

Um, in terms of fixes, I guess our alternatives are make this instcombine dependant on the target, or try to undo it somehow in the backend. My understanding is that the first option is not generally done in instcombine. Can someone give me some history there? Was it something that was decided as a rule, or just something that never came up. (I guess also in this case, not supporting this fold because it happens to cause the extra subs in a random test isn't a great reason to not do it).

Instcombine is an early IR canonicalization pass. Its purpose is to reduce logically equivalent IR sequences to some common form (usually minimal instruction count). Often, the canonical/minimal IR form also happens to be the maximal perf form for a target, but there's no guarantee on that. It's the backend's job to transform the code for better perf based on target capabilities. So as long as this patch reduced the IR correctly, I don't think it's at fault (although we sometimes temporarily revert to avoid regressions while we fix the later passes). As always, consider an alternate scenario where the benchmark source was already in the logically equivalent form that this patch created. The perf problem already exists for that hypothetical benchmark independent of this patch, so we have to deal with the perf problem some other way.

Looking back at https://bugs.llvm.org/show_bug.cgi?id=35717#c14 ... is this the source for the loop that is causing problems:

void rgb_to_cmyk(char * restrict A, char * restrict B, unsigned I) {
    for (int i = 0; i < I; i++) {
      char xc = *A++;
      char xm = *A++;
      char xy = *A++;

      xc = 255-xc;
      xm = 255-xm;
      xy = 255-xy;

      char xk;
      if (xc < xm)
        xk = xc < xy ? xc : xy;
      else
        xk = xm < xy ? xm : xy;

      xc = xc - xk;
      xm = xm - xk;
      xy = xy - xk;

      *B++ = xk;
      *B++ = xc;
      *B++ = xm;
      *B++ = xy;
    }
}

That vectorizes for an AVX2 target, but not AVX1 or earlier, so we should see if the vectorizer cost model is behaving as expected before dealing with the backend problems?

spatel mentioned this in D109059: [InstCombine] allow more min/max with 'not' folds for intrinsics.Sep 1 2021, 8:42 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

33 lines

InstCombineInternal.h

9 lines

test/

Transforms/

InstCombine/

sub-minmax.ll

58 lines

Diff 167903

llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,656 Lines • ▼ Show 20 Lines	if (Op1->hasOneUse()) {
// No need to handle commuted multiply because multiply handling will		// No need to handle commuted multiply because multiply handling will
// ensure constant will be move to the right hand side.		// ensure constant will be move to the right hand side.
if (match(Op1, m_Mul(m_Value(A), m_Constant(CI)))) {		if (match(Op1, m_Mul(m_Value(A), m_Constant(CI)))) {
Value *NewMul = Builder.CreateMul(A, ConstantExpr::getNeg(CI));		Value *NewMul = Builder.CreateMul(A, ConstantExpr::getNeg(CI));
return BinaryOperator::CreateAdd(Op0, NewMul);		return BinaryOperator::CreateAdd(Op0, NewMul);
}		}
}		}

		{
		// ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A
		// ~A - Min/Max(O, ~A) -> Max/Min(A, ~O) - A
		// Min/Max(~A, O) - ~A -> A - Max/Min(A, ~O)
		// Min/Max(O, ~A) - ~A -> A - Max/Min(A, ~O)
		// So long as O here is freely invertible, this will be neutral or a win.
		Value LHS, RHS, *A;
		Value NotA = Op0, MinMax = Op1;
		SelectPatternFlavor SPF = matchSelectPattern(MinMax, LHS, RHS).Flavor;
		if (!SelectPatternResult::isMinOrMax(SPF)) {
		NotA = Op1;
		MinMax = Op0;
		SPF = matchSelectPattern(MinMax, LHS, RHS).Flavor;
		}
		if (SelectPatternResult::isMinOrMax(SPF) &&
		match(NotA, m_Not(m_Value(A))) && (NotA == LHS \|\| NotA == RHS)) {
		if (NotA == LHS)
		std::swap(LHS, RHS);
		// LHS is now O above and expected to have at least 2 uses (the min/max)
		// NotA is epected to have 2 uses from the min/max and 1 from the sub.
		if (IsFreeToInvert(LHS, !LHS->hasNUsesOrMore(3)) &&
		!NotA->hasNUsesOrMore(4)) {
		// Note: We don't generate the inverse max/min, just create the not of
		// it and let other folds do the rest.
		Value *Not = Builder.CreateNot(MinMax);
		if (NotA == Op0)
		return BinaryOperator::CreateSub(Not, A);
		else
		return BinaryOperator::CreateSub(A, Not);
		}
		}
		}

// Optimize pointer differences into the same array into a size. Consider:		// Optimize pointer differences into the same array into a size. Consider:
// &A[10] - &A[0]: we should compile this to "10".		// &A[10] - &A[0]: we should compile this to "10".
Value LHSOp, RHSOp;		Value LHSOp, RHSOp;
if (match(Op0, m_PtrToInt(m_Value(LHSOp))) &&		if (match(Op0, m_PtrToInt(m_Value(LHSOp))) &&
match(Op1, m_PtrToInt(m_Value(RHSOp))))		match(Op1, m_PtrToInt(m_Value(RHSOp))))
if (Value *Res = OptimizePointerDifference(LHSOp, RHSOp, I.getType()))		if (Value *Res = OptimizePointerDifference(LHSOp, RHSOp, I.getType()))
return replaceInstUsesWith(I, Res);		return replaceInstUsesWith(I, Res);

▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

Show All 14 Lines

#ifndef LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#ifndef LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H
#define LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#define LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/TargetFolder.h"		#include "llvm/Analysis/TargetFolder.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"		#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"
		#include "llvm/Transforms/Utils/Local.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

		using namespace llvm::PatternMatch;

namespace llvm {		namespace llvm {

class APInt;		class APInt;
class AssumptionCache;		class AssumptionCache;
class CallSite;		class CallSite;
class DataLayout;		class DataLayout;
class DominatorTree;		class DominatorTree;
class GEPOperator;		class GEPOperator;
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	static inline bool IsFreeToInvert(Value *V, bool WillInvertAllUses) {
// If `V` is of the form `A + Constant` then `-1 - V` can be folded into `(-1		// If `V` is of the form `A + Constant` then `-1 - V` can be folded into `(-1
// - Constant) - A` if we are willing to invert all of the uses.		// - Constant) - A` if we are willing to invert all of the uses.
if (BinaryOperator *BO = dyn_cast<BinaryOperator>(V))		if (BinaryOperator *BO = dyn_cast<BinaryOperator>(V))
if (BO->getOpcode() == Instruction::Add \|\|		if (BO->getOpcode() == Instruction::Add \|\|
BO->getOpcode() == Instruction::Sub)		BO->getOpcode() == Instruction::Sub)
if (isa<Constant>(BO->getOperand(0)) \|\| isa<Constant>(BO->getOperand(1)))		if (isa<Constant>(BO->getOperand(0)) \|\| isa<Constant>(BO->getOperand(1)))
return WillInvertAllUses;		return WillInvertAllUses;

		// Selects with invertible operands are freely invertible
		if (match(V, m_Select(m_Value(), m_Not(m_Value()), m_Not(m_Value()))))
		return WillInvertAllUses;

return false;		return false;
}		}

/// Specific patterns of overflow check idioms that we match.		/// Specific patterns of overflow check idioms that we match.
enum OverflowCheckFlavor {		enum OverflowCheckFlavor {
OCF_UNSIGNED_ADD,		OCF_UNSIGNED_ADD,
OCF_SIGNED_ADD,		OCF_SIGNED_ADD,
OCF_UNSIGNED_SUB,		OCF_UNSIGNED_SUB,
▲ Show 20 Lines • Show All 745 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/sub-minmax.ll

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	;
%l1 = select i1 %l0, i32 %B, i32 %not		%l1 = select i1 %l0, i32 %B, i32 %not
%x = sub i32 %not, %l1		%x = sub i32 %not, %l1
ret i32 %x		ret i32 %x
}		}


define i32 @max_na_bi_minux_na_use(i32 %A, i32 %Bi) {		define i32 @max_na_bi_minux_na_use(i32 %A, i32 %Bi) {
; CHECK-LABEL: @max_na_bi_minux_na_use(		; CHECK-LABEL: @max_na_bi_minux_na_use(
; CHECK-NEXT: [[NOT:%.]] = xor i32 [[A:%.]], -1		; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i32 [[A:%.]], -32
; CHECK-NEXT: [[L0:%.*]] = icmp ult i32 [[NOT]], 31		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[A]], i32 -32
; CHECK-NEXT: [[L1:%.*]] = select i1 [[L0]], i32 [[NOT]], i32 31		; CHECK-NEXT: [[L1:%.*]] = xor i32 [[TMP2]], -1
; CHECK-NEXT: [[X:%.*]] = sub i32 [[L1]], [[NOT]]		; CHECK-NEXT: [[X:%.*]] = sub i32 [[A]], [[TMP2]]
; CHECK-NEXT: call void @use32(i32 [[L1]])		; CHECK-NEXT: call void @use32(i32 [[L1]])
; CHECK-NEXT: ret i32 [[X]]		; CHECK-NEXT: ret i32 [[X]]
;		;
%not = xor i32 %A, -1		%not = xor i32 %A, -1
%l0 = icmp ult i32 %not, 31		%l0 = icmp ult i32 %not, 31
%l1 = select i1 %l0, i32 %not, i32 31		%l1 = select i1 %l0, i32 %not, i32 31
%x = sub i32 %l1, %not		%x = sub i32 %l1, %not
call void @use32(i32 %l1)		call void @use32(i32 %l1)
ret i32 %x		ret i32 %x
}		}

define i32 @na_minus_max_na_bi_use(i32 %A, i32 %Bi) {		define i32 @na_minus_max_na_bi_use(i32 %A, i32 %Bi) {
; CHECK-LABEL: @na_minus_max_na_bi_use(		; CHECK-LABEL: @na_minus_max_na_bi_use(
; CHECK-NEXT: [[NOT:%.]] = xor i32 [[A:%.]], -1		; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i32 [[A:%.]], -32
; CHECK-NEXT: [[L0:%.*]] = icmp ult i32 [[NOT]], 31		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[A]], i32 -32
; CHECK-NEXT: [[L1:%.*]] = select i1 [[L0]], i32 [[NOT]], i32 31		; CHECK-NEXT: [[L1:%.*]] = xor i32 [[TMP2]], -1
; CHECK-NEXT: [[X:%.*]] = sub i32 [[NOT]], [[L1]]		; CHECK-NEXT: [[X:%.*]] = sub i32 [[TMP2]], [[A]]
; CHECK-NEXT: call void @use32(i32 [[L1]])		; CHECK-NEXT: call void @use32(i32 [[L1]])
; CHECK-NEXT: ret i32 [[X]]		; CHECK-NEXT: ret i32 [[X]]
;		;
%not = xor i32 %A, -1		%not = xor i32 %A, -1
%l0 = icmp ult i32 %not, 31		%l0 = icmp ult i32 %not, 31
%l1 = select i1 %l0, i32 %not, i32 31		%l1 = select i1 %l0, i32 %not, i32 31
%x = sub i32 %not, %l1		%x = sub i32 %not, %l1
call void @use32(i32 %l1)		call void @use32(i32 %l1)
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	;
%x = sub i32 %not, %l1		%x = sub i32 %not, %l1
call void @use32(i32 %l1)		call void @use32(i32 %l1)
call void @use32(i32 %not)		call void @use32(i32 %not)
ret i32 %x		ret i32 %x
}		}

define i8 @umin_not_sub(i8 %x, i8 %y) {		define i8 @umin_not_sub(i8 %x, i8 %y) {
; CHECK-LABEL: @umin_not_sub(		; CHECK-LABEL: @umin_not_sub(
; CHECK-NEXT: [[NX:%.]] = xor i8 [[X:%.]], -1		; CHECK-NEXT: [[TMP1:%.]] = icmp ult i8 [[Y:%.]], [[X:%.*]]
; CHECK-NEXT: [[NY:%.]] = xor i8 [[Y:%.]], -1		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i8 [[X]], i8 [[Y]]
; CHECK-NEXT: [[CMPXY:%.*]] = icmp ult i8 [[NX]], [[NY]]		; CHECK-NEXT: [[MINXY:%.*]] = xor i8 [[TMP2]], -1
; CHECK-NEXT: [[MINXY:%.*]] = select i1 [[CMPXY]], i8 [[NX]], i8 [[NY]]		; CHECK-NEXT: [[SUBX:%.*]] = sub i8 [[TMP2]], [[X]]
; CHECK-NEXT: [[SUBX:%.*]] = sub i8 [[NX]], [[MINXY]]		; CHECK-NEXT: [[SUBY:%.*]] = sub i8 [[TMP2]], [[Y]]
; CHECK-NEXT: [[SUBY:%.*]] = sub i8 [[NY]], [[MINXY]]
; CHECK-NEXT: call void @use8(i8 [[SUBX]])		; CHECK-NEXT: call void @use8(i8 [[SUBX]])
; CHECK-NEXT: call void @use8(i8 [[SUBY]])		; CHECK-NEXT: call void @use8(i8 [[SUBY]])
; CHECK-NEXT: ret i8 [[MINXY]]		; CHECK-NEXT: ret i8 [[MINXY]]
;		;
%nx = xor i8 %x, -1		%nx = xor i8 %x, -1
%ny = xor i8 %y, -1		%ny = xor i8 %y, -1
%cmpxy = icmp ult i8 %nx, %ny		%cmpxy = icmp ult i8 %nx, %ny
%minxy = select i1 %cmpxy, i8 %nx, i8 %ny		%minxy = select i1 %cmpxy, i8 %nx, i8 %ny
%subx = sub i8 %nx, %minxy		%subx = sub i8 %nx, %minxy
%suby = sub i8 %ny, %minxy		%suby = sub i8 %ny, %minxy
call void @use8(i8 %subx)		call void @use8(i8 %subx)
call void @use8(i8 %suby)		call void @use8(i8 %suby)
ret i8 %minxy		ret i8 %minxy
}		}

define i8 @umin_not_sub_rev(i8 %x, i8 %y) {		define i8 @umin_not_sub_rev(i8 %x, i8 %y) {
; CHECK-LABEL: @umin_not_sub_rev(		; CHECK-LABEL: @umin_not_sub_rev(
; CHECK-NEXT: [[NX:%.]] = xor i8 [[X:%.]], -1		; CHECK-NEXT: [[TMP1:%.]] = icmp ult i8 [[Y:%.]], [[X:%.*]]
; CHECK-NEXT: [[NY:%.]] = xor i8 [[Y:%.]], -1		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i8 [[X]], i8 [[Y]]
; CHECK-NEXT: [[CMPXY:%.*]] = icmp ult i8 [[NX]], [[NY]]		; CHECK-NEXT: [[MINXY:%.*]] = xor i8 [[TMP2]], -1
; CHECK-NEXT: [[MINXY:%.*]] = select i1 [[CMPXY]], i8 [[NX]], i8 [[NY]]		; CHECK-NEXT: [[SUBX:%.*]] = sub i8 [[X]], [[TMP2]]
; CHECK-NEXT: [[SUBX:%.*]] = sub i8 [[MINXY]], [[NX]]		; CHECK-NEXT: [[SUBY:%.*]] = sub i8 [[Y]], [[TMP2]]
; CHECK-NEXT: [[SUBY:%.*]] = sub i8 [[MINXY]], [[NY]]
; CHECK-NEXT: call void @use8(i8 [[SUBX]])		; CHECK-NEXT: call void @use8(i8 [[SUBX]])
; CHECK-NEXT: call void @use8(i8 [[SUBY]])		; CHECK-NEXT: call void @use8(i8 [[SUBY]])
; CHECK-NEXT: ret i8 [[MINXY]]		; CHECK-NEXT: ret i8 [[MINXY]]
;		;
%nx = xor i8 %x, -1		%nx = xor i8 %x, -1
%ny = xor i8 %y, -1		%ny = xor i8 %y, -1
%cmpxy = icmp ult i8 %nx, %ny		%cmpxy = icmp ult i8 %nx, %ny
%minxy = select i1 %cmpxy, i8 %nx, i8 %ny		%minxy = select i1 %cmpxy, i8 %nx, i8 %ny
%subx = sub i8 %minxy, %nx		%subx = sub i8 %minxy, %nx
%suby = sub i8 %minxy, %ny		%suby = sub i8 %minxy, %ny
call void @use8(i8 %subx)		call void @use8(i8 %subx)
call void @use8(i8 %suby)		call void @use8(i8 %suby)
ret i8 %minxy		ret i8 %minxy
}		}

define void @umin3_not_all_ops_extra_uses_invert_subs(i8 %x, i8 %y, i8 %z) {		define void @umin3_not_all_ops_extra_uses_invert_subs(i8 %x, i8 %y, i8 %z) {
; CHECK-LABEL: @umin3_not_all_ops_extra_uses_invert_subs(		; CHECK-LABEL: @umin3_not_all_ops_extra_uses_invert_subs(
; CHECK-NEXT: [[XN:%.]] = xor i8 [[X:%.]], -1		; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i8 [[X:%.]], [[Z:%.*]]
; CHECK-NEXT: [[YN:%.]] = xor i8 [[Y:%.]], -1		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i8 [[X]], i8 [[Z]]
; CHECK-NEXT: [[ZN:%.]] = xor i8 [[Z:%.]], -1		; CHECK-NEXT: [[TMP3:%.]] = icmp ugt i8 [[TMP2]], [[Y:%.]]
; CHECK-NEXT: [[CMPXZ:%.*]] = icmp ult i8 [[XN]], [[ZN]]		; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i8 [[TMP2]], i8 [[Y]]
; CHECK-NEXT: [[MINXZ:%.*]] = select i1 [[CMPXZ]], i8 [[XN]], i8 [[ZN]]		; CHECK-NEXT: [[TMP5:%.*]] = xor i8 [[TMP4]], -1
; CHECK-NEXT: [[CMPXYZ:%.*]] = icmp ult i8 [[MINXZ]], [[YN]]		; CHECK-NEXT: [[XMIN:%.*]] = sub i8 [[TMP4]], [[X]]
; CHECK-NEXT: [[MINXYZ:%.*]] = select i1 [[CMPXYZ]], i8 [[MINXZ]], i8 [[YN]]		; CHECK-NEXT: [[YMIN:%.*]] = sub i8 [[TMP4]], [[Y]]
; CHECK-NEXT: [[XMIN:%.*]] = sub i8 [[XN]], [[MINXYZ]]		; CHECK-NEXT: [[ZMIN:%.*]] = sub i8 [[TMP4]], [[Z]]
; CHECK-NEXT: [[YMIN:%.*]] = sub i8 [[YN]], [[MINXYZ]]		; CHECK-NEXT: call void @use8(i8 [[TMP5]])
; CHECK-NEXT: [[ZMIN:%.*]] = sub i8 [[ZN]], [[MINXYZ]]
; CHECK-NEXT: call void @use8(i8 [[MINXYZ]])
; CHECK-NEXT: call void @use8(i8 [[XMIN]])		; CHECK-NEXT: call void @use8(i8 [[XMIN]])
; CHECK-NEXT: call void @use8(i8 [[YMIN]])		; CHECK-NEXT: call void @use8(i8 [[YMIN]])
; CHECK-NEXT: call void @use8(i8 [[ZMIN]])		; CHECK-NEXT: call void @use8(i8 [[ZMIN]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%xn = xor i8 %x, -1		%xn = xor i8 %x, -1
%yn = xor i8 %y, -1		%yn = xor i8 %y, -1
%zn = xor i8 %z, -1		%zn = xor i8 %z, -1
Show All 16 Lines