This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/X86/
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineInternal.h
-
InstCombineSelect.cpp
-
InstructionCombining.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1
select-cmp-cttz-ctlz.ll

Differential D29088

Do not create ctlz/cttz(X, false) when the target do not support zero defined ctlz/cttz.
Needs ReviewPublic

Authored by deadalnix on Jan 24 2017, 9:58 AM.

Download Raw Diff

Details

Reviewers

majnemer
andreadb
mehdi_amini

Summary

The branch/select is removed in InstCombine and then recreated in CodegenPrepare, effectively making it invisible to most of the optimization pipeline, which lead to bad optimizations of cttz/ctlz when the target doesn't have a zero defined version of them.

Diff Detail

Build Status

Buildable 3248
Build 3248: arc lint + arc unit

Event Timeline

deadalnix created this revision.Jan 24 2017, 9:58 AM

Harbormaster completed remote builds in B3248: Diff 85602.Jan 24 2017, 9:58 AM

The branch/select is removed in InstCombine and then recreated in CodegenPrepare, effectively making it invisible to most of the optimization pipeline, which lead to bad optimizations of cttz/ctlz when the target doesn't have a zero defined version of them.

That canonicalization transform looks perfectly valid to me. It is undone in CodeGenPrepare if cttz/ctlz (with is_zero_undef == 0) is not supported by the target.
I don't see how it can lead to bad codegen. Could you please provide an example that shows the issue?

@andreadb it hides the branch many of the subsequent optimization passes, resulting in bad codegen. I stumbled on bad codegen from cttz/ctlz several time recently. Thing that I noticed are : doing the 0 case check several time, failure to constant fold the 0 case when there is one, etc...

It doesn't make sense to re-implement all of this in CodeGenprepare nor does it to special case it in the whole pipeline. Canonicalisation should help subsequent passes, not hide information to them, which is what this is doing.

In D29088#655338, @deadalnix wrote:

@andreadb it hides the branch many of the subsequent optimization passes, resulting in bad codegen. I stumbled on bad codegen from cttz/ctlz several time recently. Thing that I noticed are : doing the 0 case check several time, failure to constant fold the 0 case when there is one, etc...

It doesn't make sense to re-implement all of this in CodeGenprepare nor does it to special case it in the whole pipeline. Canonicalisation should help subsequent passes, not hide information to them, which is what this is doing.

There is obviously a tradeoff here. Canonicalizing toward the intrinsics provides later passes with more information about what the code is doing. There are a number of transformations and analysis that have special logic for ctlz (etc.). However, when we then expand the intrinsic, we need to make sure that we can optimize away redundancies in those expansions.

I really think we need some examples here. Many backends (although perhaps not x86) run EarlyCSE and other IR-level cleanups in CodeGen. Maybe that's a better solution here?

In D29088#655338, @deadalnix wrote:

@andreadb it hides the branch many of the subsequent optimization passes, resulting in bad codegen. I stumbled on bad codegen from cttz/ctlz several time recently. Thing that I noticed are : doing the 0 case check several time, failure to constant fold the 0 case when there is one, etc...

To me, bad codegen has a different meaning, which is why I was concerned by your initial post and the lack of specific tests for the problematic cases. I now understand that you meant what I would call poor codegen (as a synonim of suboptimal).

It doesn't make sense to re-implement all of this in CodeGenprepare nor does it to special case it in the whole pipeline. Canonicalisation should help subsequent passes, not hide information to them, which is what this is doing.

As far as I remember, this combine rule was originally added to specifically target [cttz|ctlz]+select pairs introduced by SimplifyCFG as the result of aggressively flattening simple if-then CFGs. Most (if not all) of those simple if-then constructs originated from conditional statements in x86 bmi/lzcnt intrinsic definitions. SimplifyCFG was originally modified to enable a more aggressive flattening of simple if-then branches. However, there was a concern with the performance of cttz/ctlz+select for targets witn no tzcnt/bmi. So Sanjay implemented a "despeculation" logic in CodeGenPrepare (http://llvm.org/viewvc/llvm-project?view=revision&revision=253573) to revert the CFG flattening done by SimplifyCFG (only for targets which don't provide a cheap cttz/ctlz - see also: llvm.org/viewvc/llvm-project?rev=255660&view=rev).

If there is a problem, then it is very likely to be caused by a bad interaction between these three entities:

SimplifyCFG (which aggressively flattens simple if-then blocks even in the presence of one expensive cttz/ctlz).
InstCombine (which canonicalizes a cttz/ctlz +select into a single cttz/ctlz [is_zero_undef=0]).
CodeGenPrepare (which is able to despeculate a cttz/ctlz [is_zero_undef=0] by undoing the flattening of the CFG done by SimplifyCFG).

I am not suggesting that the current design is optimal. However, as Hal pointed out, there is a trade off here. I am not convinced that InstCombine is "hiding information". FWIW, the "branch" is already hidden by the speculation/cfg flattening performed by SimplifyCFG.
I think that we really need to see some code samples to have a better understand of what is going wrong. There may be better places where to fix your issue.

Cheers,
Andrea

andreadb added a subscriber: spatel.Jan 25 2017, 3:13 AM

Let's call it poor codegen if that is clearer. Indeed it isn't bad in the sense that it does something invalid, it is bad in the sense that it could be better. It doesn't seems to be a good idea to remove the branch and then reintroducing it later on in the case the target needs a branch anyway.

The reason is that the check for 0 is now implicit and, even if some optimization are done, it is very easy to defeat them. For instance, it is able to set the zero_undef flag when doing :

if (n == 0) {
  // ...
} else {
    ctlz(n);
}

But not in trivial variations such as:

if (n == 0xff) {
  // ...
} else {
  ctlz(~n);
}

This second case is a real one I faced when working on a serialization/deserialization library. While tweaking the code to get the optimization is possible, this would be brittle, and considering I faced variation of this problem several time, I though there is probably something to do here.

It doesn't seems that teaching all passes about this special case is going to fly. There are just too many cases and variations to match. All the code required to do these optimizations already exists, so I think it is smarter to leverage it.

Grepping quickly in there, it doesn't looks like SimplifyCFG create the select. I think we are good here. It may be profitable to explode the select into a branch/phi in that case, but that isn't suitable for InstCombine anyway as it preserve the control flow and it may not be necessary at all in the end, and improving the ability of the compiler to handle select seems like the right path anyway if that's the case at it'll be profitable to other structures as well.

This patch goes against my understanding that InstCombine is only for target-independent transforms.

Like Andrea and Hal have requested, I'd like to see an example for how we get to the problem state.
I tried to put together an IR example from what you described here:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109398.html

int goo(int n) {
  int a = (n == 0x0) ? 32 : __builtin_clz(n) ;
  int b = ((a * 36) + 35) >> 8;
  return b;
}

$ ./clang -O2 ctlz.c -S -o - -emit-llvm

source_filename = "ctlz.c"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

define i32 @goo(i32 %n) local_unnamed_addr #0 {
entry:
  %cmp = icmp eq i32 %n, 0
  br i1 %cmp, label %cond.end, label %cond.false

cond.false:  
  %0 = tail call i32 @llvm.ctlz.i32(i32 %n, i1 true)
  %phitmp = mul nuw nsw i32 %0, 36
  %phitmp4 = add nuw nsw i32 %phitmp, 35
  %phitmp56 = lshr i32 %phitmp4, 8
  br label %cond.end

cond.end:  
  %cond = phi i32 [ %phitmp56, %cond.false ], [ 4, %entry ]  <--- constant folded as expected?
  ret i32 %cond
}

I have one case that boils down to this for instance:

define i64 @_D2gc1d4util6decodeFMKAxhZm({ i64, i8* }* nocapture readonly %data) local_unnamed_addr #2 {
entry:
  %0 = getelementptr inbounds { i64, i8* }, { i64, i8* }* %data, i64 0, i32 1
  %1 = load i8*, i8** %0, align 8
  %2 = load i8, i8* %1, align 1
  %3 = icmp eq i8 %2, -1
  br i1 %3, label %then, label %endif

then:                                             ; preds = %entry
  ret i64 23

endif:                                            ; preds = %entry
  %4 = xor i8 %2, -1
  %5 = tail call i8 @llvm.ctlz.i8(i8 %4, i1 false)
  %6 = zext i8 %5 to i64
  ret i64 %6
}

This doesn't optimize further.

And the codegen:

_D2gc1d4util6decodeFMKAxhZm:
      movq    8(%rdi), %rax
      movb    (%rax), %al
      cmpb    $-1, %al
      je      .LBB2_5
      je      .LBB2_2
      notb    %al
      movzbl  %al, %eax
      bsrl    %eax, %eax
      xorl    $7, %eax
      movzbl  %al, %eax
      retq
.LBB2_5:
      movl    $23, %eax
      retq
.LBB2_2:
      movb    $8, %al
      movzbl  %al, %eax
      retq

Note the double branch.

Thanks Amaury,

Sorry in advance for this long post.

Your code example gave me some hints about what is going wrong.
I think I have been able to find a small reproducible for your particular scenario (see below).

static int my_clz(int n) {
  return (n)? __builtin_clz(n) : 32;
}

int foo(int n) {
  unsigned val;
  if (n == -1)
    return 32;
  return my_clz(~n);
}

'my_clz' matches the definition of __lzcnt32 from 'lzcntintrin.h'. Before we run SimplifyCFG, that function would look like this:

define internal i32 @my_clz(i32 %n) {
entry:
  %tobool = icmp ne i32 %n, 0
  br i1 %tobool, label %cond.true, label %cond.end

cond.true:                                        ; preds = %entry
  %0 = call i32 @llvm.ctlz.i32(i32 %n, i1 true)
  br label %cond.end

cond.end:                                         ; preds = %entry, %cond.true
  %cond = phi i32 [ %0, %cond.true ], [ 32, %entry ]
  ret i32 %cond
}

SimplifyCFG would firstly speculate the (potentially expensive) call to llvm.ctlz, and then flatten the cfg by inserting a select:

define internal i32 @my_clz(i32 %n) {
entry:
  %tobool = icmp eq i32 %n, 0
  %0 = call i32 @llvm.ctlz.i32(i32 %n, i1 true)
  %cond = select i1 %tobool, i32 32, i32 %0
  ret i32 %cond
}

At this point, our "problematic" instcombine kicks in, and that entire code sequence is simplified into @llvm.ctlz.i32(i32 %n, i1 false).
I can see how this is going to be sub-optimal if we are in the following scenario:

We are building for a non-LZCNT target (example: SandyBridge), and
Our instcombine is triggered before my_clz is inlined into foo.

Before CodeGenPrepare, we would end up with code like this:

define i32 @_Z3fooi(i32 %n) local_unnamed_addr #0 {
entry:
  %cmp = icmp eq i32 %n, -1
  br i1 %cmp, label %cleanup, label %if.end

if.end:                                           ; preds = %entry
  %neg = xor i32 %n, -1
  %0 = tail call i32 @llvm.ctlz.i32(i32 %neg, i1 false) #2   ;; <--- suboptimal flag!.
  br label %cleanup

cleanup:                                          ; preds = %entry, %if.end
  %retval.0 = phi i32 [ %0, %if.end ], [ 32, %entry ]
  ret i32 %retval.0
}

Basically, I can see how triggering that instcombine too prematurely might lead to poor codegen for your non-lzcnt target.

Ideally, we would want that canonicalization to be performed directly before (or during) codegen to help instruction selection on targets that have a fast ctz/clz defined on zero.

What if instead we move that transform into CodeGenPrepare::optimizeSelectInst()? I think that would fix the issue in a cleaner way, since we would not need to introduce new target hooks to conditionalize its execution.

-Andrea

In D29088#657478, @andreadb wrote:

Basically, I can see how triggering that instcombine too prematurely might lead to poor codegen for your non-lzcnt target.

Ideally, we would want that canonicalization to be performed directly before (or during) codegen to help instruction selection on targets that have a fast ctz/clz defined on zero.

What if instead we move that transform into CodeGenPrepare::optimizeSelectInst()? I think that would fix the issue in a cleaner way, since we would not need to introduce new target hooks to conditionalize its execution.

Thanks for posting the example. That made it clear, and the proposal to move the functionality of foldSelectCttzCtlz() later sounds good to me.

Before CodeGenPrepare, we would end up with code like this:

define i32 @_Z3fooi(i32 %n) local_unnamed_addr #0 {
entry:
  %cmp = icmp eq i32 %n, -1
  br i1 %cmp, label %cleanup, label %if.end

if.end:                                           ; preds = %entry
  %neg = xor i32 %n, -1
  %0 = tail call i32 @llvm.ctlz.i32(i32 %neg, i1 false) #2   ;; <--- suboptimal flag!.
  br label %cleanup

cleanup:                                          ; preds = %entry, %if.end
  %retval.0 = phi i32 [ %0, %if.end ], [ 32, %entry ]
  ret i32 %retval.0
}

Can you clarify why the flag is suboptimal by itself? The intrinsic carries the same semantic as the unfolded sequence, isn't it?
This seems to me like just a missing optimization here to recover that at this point: can't we just figure that %neg can't be zero and turn the flag to true?

include/llvm/Analysis/TargetTransformInfo.h
198	`s/defnied/defined`.
test/Transforms/InstCombine/select-cmp-cttz-ctlz.ll
3	Some comment before the two new lines can be nice to have. Also I suspect this won't work without the X86 backend configured in.

Can you clarify why the flag is suboptimal by itself? The intrinsic carries the same semantic as the unfolded sequence, isn't it?

Yes. That is exactly what I originally pointed out in this thread.

This seems to me like just a missing optimization here to recover that at this point: can't we just figure that %neg can't be zero and turn the flag to true?

I agree that we are currently missing an optimization.
That said, (if I remember correctly) the only place where we form cttz/ctlz with is_zero_undef=false is in foldSelectCttzCtlz() and the only goal of that transform is to canonicalize cttz/ctlz in preparation for codegen. That's why I suggested considering the possibility of moving that transform into CGP. If we do this, then we no longer need to add extra optimization rules to "fix" the fact that we prematurely canonicalized.

In D29088#657697, @andreadb wrote:

I agree that we are currently missing an optimization.
That said, (if I remember correctly) the only place where we form cttz/ctlz with is_zero_undef=false is in foldSelectCttzCtlz() and the only goal of that transform is to canonicalize cttz/ctlz in preparation for codegen. That's why I suggested considering the possibility of moving that transform into CGP. If we do this, then we no longer need to add extra optimization rules to "fix" the fact that we prematurely canonicalized.

We also are doing it in InstCombine ( see foldCttzCtlz ) using isKnownNonZero, but that guy is unable to figure that one out. Looking at the implementation, it looks pretty ad hoc.

In D29088#658651, @deadalnix wrote:

In D29088#657697, @andreadb wrote:

I agree that we are currently missing an optimization.
That said, (if I remember correctly) the only place where we form cttz/ctlz with is_zero_undef=false is in foldSelectCttzCtlz() and the only goal of that transform is to canonicalize cttz/ctlz in preparation for codegen. That's why I suggested considering the possibility of moving that transform into CGP. If we do this, then we no longer need to add extra optimization rules to "fix" the fact that we prematurely canonicalized.

We also are doing it in InstCombine ( see foldCttzCtlz ) using isKnownNonZero, but that guy is unable to figure that one out. Looking at the implementation, it looks pretty ad hoc.

Just to clarify, foldSelectCttzCtlz() is the only place where we introduce cttz/ctlz with is_zero_undef=false.

We could potentially extend the logic in isKnownNonZero to add more cases. That said, I am not suggesting that we do that, since we can solve this entire issue by just moving the canonicalization rule from InstCombine to CGP.

Out of curiosity, I had a quick look at what extra rules would be needed to fix our reproducible. We would need to teach isKnownNonZero how to look through a xor (see below).

ConstantInt *C
if (match(V, m_Xor(m_Value(X), m_ConstantInt(C))))
  return isKnownNonEqual(X, C, Q);

However, function isKnownNonEqual would not know how to analyze users of X. So, we would need to add extra logic in ValueTracking.cpp to loop over the users of X in search of a dominating ICmpInst([ICMP_EQ|ICMP_NE], X, C) (similarly to what function isKnownNonNullFromDominatingCondition() does).

So, although it is possible, it may not be the best way to fix this.

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

13 lines

TargetTransformInfoImpl.h

4 lines

lib/

Analysis/

TargetTransformInfo.cpp

8 lines

Target/

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

8 lines

Transforms/

InstCombine/

InstCombineInternal.h

12 lines

InstCombineSelect.cpp

47 lines

InstructionCombining.cpp

16 lines

test/

Transforms/

InstCombine/

select-cmp-cttz-ctlz.ll

269 lines

Diff 85602

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys) const;		ArrayRef<Type *> ParamTys) const;

/// \brief Estimate the cost of an intrinsic when lowered.		/// \brief Estimate the cost of an intrinsic when lowered.
///		///
/// Mirrors the \c getCallCost method but uses an intrinsic identifier.		/// Mirrors the \c getCallCost method but uses an intrinsic identifier.
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments) const;		ArrayRef<const Value *> Arguments) const;

		/// \brief Return if the target has a way to compute cttz/ctlz that is
		/// defnied when the argument is zero.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions `s/defnied/defined`. mehdi_amini: `s/defnied/defined`.
		bool hasZeroDefinedCtlz() const;
		bool hasZeroDefinedCttz() const;

/// \brief Estimate the cost of a given IR user when lowered.		/// \brief Estimate the cost of a given IR user when lowered.
///		///
/// This can estimate the cost of either a ConstantExpr or Instruction when		/// This can estimate the cost of either a ConstantExpr or Instruction when
/// lowered. It has two primary advantages over the \c getOperationCost and		/// lowered. It has two primary advantages over the \c getOperationCost and
/// \c getGEPCost above, and one significant disadvantage: it can only be		/// \c getGEPCost above, and one significant disadvantage: it can only be
/// used when the IR construct has already been formed.		/// used when the IR construct has already been formed.
///		///
/// The advantages are that it can inspect the SSA use graph to reason more		/// The advantages are that it can inspect the SSA use graph to reason more
▲ Show 20 Lines • Show All 507 Lines • ▼ Show 20 Lines	public:
virtual int getCallCost(const Function *F, int NumArgs) = 0;		virtual int getCallCost(const Function *F, int NumArgs) = 0;
virtual int getCallCost(const Function *F,		virtual int getCallCost(const Function *F,
ArrayRef<const Value *> Arguments) = 0;		ArrayRef<const Value *> Arguments) = 0;
virtual unsigned getInliningThresholdMultiplier() = 0;		virtual unsigned getInliningThresholdMultiplier() = 0;
virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys) = 0;		ArrayRef<Type *> ParamTys) = 0;
virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments) = 0;		ArrayRef<const Value *> Arguments) = 0;
		virtual bool hasZeroDefinedCtlz() const = 0;
		virtual bool hasZeroDefinedCttz() const = 0;
virtual int getUserCost(const User *U) = 0;		virtual int getUserCost(const User *U) = 0;
virtual bool hasBranchDivergence() = 0;		virtual bool hasBranchDivergence() = 0;
virtual bool isSourceOfDivergence(const Value *V) = 0;		virtual bool isSourceOfDivergence(const Value *V) = 0;
virtual bool isLoweredToCall(const Function *F) = 0;		virtual bool isLoweredToCall(const Function *F) = 0;
virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;		virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;
virtual bool isLegalAddImmediate(int64_t Imm) = 0;		virtual bool isLegalAddImmediate(int64_t Imm) = 0;
virtual bool isLegalICmpImmediate(int64_t Imm) = 0;		virtual bool isLegalICmpImmediate(int64_t Imm) = 0;
virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,		virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	public:
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys) override {		ArrayRef<Type *> ParamTys) override {
return Impl.getIntrinsicCost(IID, RetTy, ParamTys);		return Impl.getIntrinsicCost(IID, RetTy, ParamTys);
}		}
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments) override {		ArrayRef<const Value *> Arguments) override {
return Impl.getIntrinsicCost(IID, RetTy, Arguments);		return Impl.getIntrinsicCost(IID, RetTy, Arguments);
}		}
		bool hasZeroDefinedCtlz() const override {
		return Impl.hasZeroDefinedCtlz();
		}
		bool hasZeroDefinedCttz() const override {
		return Impl.hasZeroDefinedCttz();
		}
int getUserCost(const User *U) override { return Impl.getUserCost(U); }		int getUserCost(const User *U) override { return Impl.getUserCost(U); }
bool hasBranchDivergence() override { return Impl.hasBranchDivergence(); }		bool hasBranchDivergence() override { return Impl.hasBranchDivergence(); }
bool isSourceOfDivergence(const Value *V) override {		bool isSourceOfDivergence(const Value *V) override {
return Impl.isSourceOfDivergence(V);		return Impl.isSourceOfDivergence(V);
}		}
bool isLoweredToCall(const Function *F) override {		bool isLoweredToCall(const Function *F) override {
return Impl.isLoweredToCall(F);		return Impl.isLoweredToCall(F);
}		}
▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
case Intrinsic::coro_suspend:		case Intrinsic::coro_suspend:
case Intrinsic::coro_param:		case Intrinsic::coro_param:
case Intrinsic::coro_subfn_addr:		case Intrinsic::coro_subfn_addr:
// These intrinsics don't actually represent code after lowering.		// These intrinsics don't actually represent code after lowering.
return TTI::TCC_Free;		return TTI::TCC_Free;
}		}
}		}

		bool hasZeroDefinedCtlz() const { return false; }

		bool hasZeroDefinedCttz() const { return false; }

bool hasBranchDivergence() { return false; }		bool hasBranchDivergence() { return false; }

bool isSourceOfDivergence(const Value *V) { return false; }		bool isSourceOfDivergence(const Value *V) { return false; }

bool isLoweredToCall(const Function *F) {		bool isLoweredToCall(const Function *F) {
// FIXME: These should almost certainly not be handled here, and instead		// FIXME: These should almost certainly not be handled here, and instead
// handled with the help of TLI or the target itself. This was largely		// handled with the help of TLI or the target itself. This was largely
// ported from existing analysis heuristics here so that such refactorings		// ported from existing analysis heuristics here so that such refactorings
▲ Show 20 Lines • Show All 485 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines

	int TargetTransformInfo::getIntrinsicCost(			int TargetTransformInfo::getIntrinsicCost(
	Intrinsic::ID IID, Type RetTy, ArrayRef<const Value > Arguments) const {			Intrinsic::ID IID, Type RetTy, ArrayRef<const Value > Arguments) const {
	int Cost = TTIImpl->getIntrinsicCost(IID, RetTy, Arguments);			int Cost = TTIImpl->getIntrinsicCost(IID, RetTy, Arguments);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

				bool TargetTransformInfo::hasZeroDefinedCtlz() const {
				return TTIImpl->hasZeroDefinedCtlz();
				}

				bool TargetTransformInfo::hasZeroDefinedCttz() const {
				return TTIImpl->hasZeroDefinedCttz();
				}

	int TargetTransformInfo::getUserCost(const User *U) const {			int TargetTransformInfo::getUserCost(const User *U) const {
	int Cost = TTIImpl->getUserCost(U);			int Cost = TTIImpl->getUserCost(U);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

	bool TargetTransformInfo::hasBranchDivergence() const {			bool TargetTransformInfo::hasBranchDivergence() const {
	return TTIImpl->hasBranchDivergence();			return TTIImpl->hasBranchDivergence();
	▲ Show 20 Lines • Show All 420 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

	Show All 40 Lines
	public:			public:
	explicit X86TTIImpl(const X86TargetMachine *TM, const Function &F)			explicit X86TTIImpl(const X86TargetMachine *TM, const Function &F)
	: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),			: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
	TLI(ST->getTargetLowering()) {}			TLI(ST->getTargetLowering()) {}

	/// \name Scalar TTI Implementations			/// \name Scalar TTI Implementations
	/// @{			/// @{
	TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth);			TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth);
				bool hasZeroDefinedCtlz() const;
				bool hasZeroDefinedCttz() const;

	/// @}			/// @}

	/// \name Vector TTI Implementations			/// \name Vector TTI Implementations
	/// @{			/// @{

	unsigned getNumberOfRegisters(bool Vector);			unsigned getNumberOfRegisters(bool Vector);
	unsigned getRegisterBitWidth(bool Vector);			unsigned getRegisterBitWidth(bool Vector);
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	X86TTIImpl::getPopcntSupport(unsigned TyWidth) {			X86TTIImpl::getPopcntSupport(unsigned TyWidth) {
	assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");			assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
	// TODO: Currently the __builtin_popcount() implementation using SSE3			// TODO: Currently the __builtin_popcount() implementation using SSE3
	// instructions is inefficient. Once the problem is fixed, we should			// instructions is inefficient. Once the problem is fixed, we should
	// call ST->hasSSE3() instead of ST->hasPOPCNT().			// call ST->hasSSE3() instead of ST->hasPOPCNT().
	return ST->hasPOPCNT() ? TTI::PSK_FastHardware : TTI::PSK_Software;			return ST->hasPOPCNT() ? TTI::PSK_FastHardware : TTI::PSK_Software;
	}			}

				bool X86TTIImpl::hasZeroDefinedCtlz() const {
				return ST->hasLZCNT();
				}

				bool X86TTIImpl::hasZeroDefinedCttz() const {
				return ST->hasBMI();
				}

	unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {			unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {
	if (Vector && !ST->hasSSE1())			if (Vector && !ST->hasSSE1())
	return 0;			return 0;

	if (ST->is64Bit()) {			if (ST->is64Bit()) {
	if (Vector && ST->hasAVX512())			if (Vector && ST->hasAVX512())
	return 32;			return 32;
	return 16;			return 16;
	▲ Show 20 Lines • Show All 2,187 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineInternal.h

Show All 9 Lines
///		///
/// This file provides internal interfaces used to implement the InstCombine.		/// This file provides internal interfaces used to implement the InstCombine.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#ifndef LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H
#define LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H		#define LLVM_LIB_TRANSFORMS_INSTCOMBINE_INSTCOMBINEINTERNAL_H

		#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"

#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetFolder.h"		#include "llvm/Analysis/TargetFolder.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Transforms/InstCombine/InstCombineWorklist.h"

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

namespace llvm {		namespace llvm {
class CallSite;		class CallSite;
class DataLayout;		class DataLayout;
class DominatorTree;		class DominatorTree;
class TargetLibraryInfo;		class TargetLibraryInfo;
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	private:
AssumptionCache &AC;		AssumptionCache &AC;
TargetLibraryInfo &TLI;		TargetLibraryInfo &TLI;
DominatorTree &DT;		DominatorTree &DT;
const DataLayout &DL;		const DataLayout &DL;

// Optional analyses. When non-null, these can both be used to do better		// Optional analyses. When non-null, these can both be used to do better
// combining and will be updated to reflect any changes.		// combining and will be updated to reflect any changes.
LoopInfo *LI;		LoopInfo *LI;
		TargetTransformInfo *TTI;

bool MadeIRChange;		bool MadeIRChange;

public:		public:
InstCombiner(InstCombineWorklist &Worklist, BuilderTy *Builder,		InstCombiner(InstCombineWorklist &Worklist, BuilderTy *Builder,
bool MinimizeSize, bool ExpensiveCombines, AliasAnalysis *AA,		bool MinimizeSize, bool ExpensiveCombines, AliasAnalysis *AA,
AssumptionCache &AC, TargetLibraryInfo &TLI,		AssumptionCache &AC, TargetLibraryInfo &TLI,
DominatorTree &DT, const DataLayout &DL, LoopInfo *LI)		DominatorTree &DT, const DataLayout &DL,
		TargetTransformInfo TTI, LoopInfo LI)
: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),		: Worklist(Worklist), Builder(Builder), MinimizeSize(MinimizeSize),
ExpensiveCombines(ExpensiveCombines), AA(AA), AC(AC), TLI(TLI), DT(DT),		ExpensiveCombines(ExpensiveCombines), AA(AA), AC(AC), TLI(TLI), DT(DT),
DL(DL), LI(LI), MadeIRChange(false) {}		DL(DL), LI(LI), TTI(TTI), MadeIRChange(false) {}

/// \brief Run the combiner over the entire worklist until it is empty.		/// \brief Run the combiner over the entire worklist until it is empty.
///		///
/// \returns true if the IR is changed.		/// \returns true if the IR is changed.
bool run();		bool run();

AssumptionCache &getAssumptionCache() const { return AC; }		AssumptionCache &getAssumptionCache() const { return AC; }

const DataLayout &getDataLayout() const { return DL; }		const DataLayout &getDataLayout() const { return DL; }

DominatorTree &getDominatorTree() const { return DT; }		DominatorTree &getDominatorTree() const { return DT; }

LoopInfo *getLoopInfo() const { return LI; }		LoopInfo *getLoopInfo() const { return LI; }

TargetLibraryInfo &getTargetLibraryInfo() const { return TLI; }		TargetLibraryInfo &getTargetLibraryInfo() const { return TLI; }

		TargetTransformInfo *getTargetTransformInfo() const { return TTI; }

// Visitation implementation - Implement instruction combining for different		// Visitation implementation - Implement instruction combining for different
// instruction types. The semantics are as follows:		// instruction types. The semantics are as follows:
// Return Value:		// Return Value:
// null - No change was made		// null - No change was made
// I - Change was made, I is still valid, I may be dead though		// I - Change was made, I is still valid, I may be dead though
// otherwise - Change was made, replace I with returned instruction		// otherwise - Change was made, replace I with returned instruction
//		//
Instruction *visitAdd(BinaryOperator &I);		Instruction *visitAdd(BinaryOperator &I);
▲ Show 20 Lines • Show All 433 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 357 Lines • ▼ Show 20 Lines
/// \code		/// \code
/// %0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)		/// %0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
/// %1 = icmp ne i32 %x, 0		/// %1 = icmp ne i32 %x, 0
/// %2 = select i1 %1, i32 %0, i32 32		/// %2 = select i1 %1, i32 %0, i32 32
/// \code		/// \code
///		///
/// into:		/// into:
/// %0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)		/// %0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
static Value foldSelectCttzCtlz(ICmpInst ICI, Value TrueVal, Value FalseVal,		static Value foldSelectCttzCtlz(SelectInst &SI, ICmpInst ICI,
InstCombiner::BuilderTy *Builder) {		Value TrueVal, Value FalseVal,
		InstCombiner::BuilderTy *Builder,
		TargetTransformInfo *TTI) {
		if (!TTI)
		return nullptr;

ICmpInst::Predicate Pred = ICI->getPredicate();		ICmpInst::Predicate Pred = ICI->getPredicate();
Value *CmpLHS = ICI->getOperand(0);		Value *CmpLHS = ICI->getOperand(0);
Value *CmpRHS = ICI->getOperand(1);		Value *CmpRHS = ICI->getOperand(1);

// Check if the condition value compares a value for equality against zero.		// Check if the condition value compares a value for equality against zero.
if (!ICI->isEquality() \|\| !match(CmpRHS, m_Zero()))		if (!ICI->isEquality() \|\| !match(CmpRHS, m_Zero()))
return nullptr;		return nullptr;

Show All 9 Lines	if (match(Count, m_ZExt(m_Value(V))) \|\|
Count = V;		Count = V;

// Check if the value propagated on zero is a constant number equal to the		// Check if the value propagated on zero is a constant number equal to the
// sizeof in bits of 'Count'.		// sizeof in bits of 'Count'.
unsigned SizeOfInBits = Count->getType()->getScalarSizeInBits();		unsigned SizeOfInBits = Count->getType()->getScalarSizeInBits();
if (!match(ValueOnZero, m_SpecificInt(SizeOfInBits)))		if (!match(ValueOnZero, m_SpecificInt(SizeOfInBits)))
return nullptr;		return nullptr;

		if (!match(Count, m_Intrinsic<Intrinsic::ctlz>(m_Specific(CmpLHS))) &&
		!match(Count, m_Intrinsic<Intrinsic::cttz>(m_Specific(CmpLHS))))
		return nullptr;

		IntrinsicInst *II = cast<IntrinsicInst>(Count);

// Check that 'Count' is a call to intrinsic cttz/ctlz. Also check that the		// Check that 'Count' is a call to intrinsic cttz/ctlz. Also check that the
// input to the cttz/ctlz is used as LHS for the compare instruction.		// input to the cttz/ctlz is used as LHS for the compare instruction.
if (match(Count, m_Intrinsic<Intrinsic::cttz>(m_Specific(CmpLHS))) \|\|		if ((II->getIntrinsicID() == Intrinsic::ctlz && TTI->hasZeroDefinedCtlz()) \|\|
match(Count, m_Intrinsic<Intrinsic::ctlz>(m_Specific(CmpLHS)))) {		(II->getIntrinsicID() == Intrinsic::cttz && TTI->hasZeroDefinedCttz())) {
IntrinsicInst *II = cast<IntrinsicInst>(Count);
IRBuilder<> Builder(II);
// Explicitly clear the 'undef_on_zero' flag.		// Explicitly clear the 'undef_on_zero' flag.
IntrinsicInst *NewI = cast<IntrinsicInst>(II->clone());		IntrinsicInst *NewI = cast<IntrinsicInst>(II->clone());
Type *Ty = NewI->getArgOperand(1)->getType();		NewI->setArgOperand(1, Builder->getFalse());
NewI->setArgOperand(1, Constant::getNullValue(Ty));		Builder->Insert(NewI);
Builder.Insert(NewI);		return Builder->CreateZExtOrTrunc(NewI, ValueOnZero->getType());
return Builder.CreateZExtOrTrunc(NewI, ValueOnZero->getType());		}

		// If the select filter 0 and teh traget doesn't have a zero defined
		// cttz/ctlz, we can still convert to use the zero undefined version.
		if (match(II->getArgOperand(1), m_Zero())) {
		// Explicitly clear the 'undef_on_zero' flag.
		IntrinsicInst *NewI = cast<IntrinsicInst>(II->clone());
		NewI->setArgOperand(1, Builder->getTrue());
		Builder->Insert(NewI);
		auto *OldV = SI.getOperand((Pred == ICmpInst::ICMP_NE) ? 1 : 2);
		auto *NewV = Builder->CreateZExtOrTrunc(NewI, ValueOnZero->getType(),
		OldV->getName());
		SI.setOperand((Pred == ICmpInst::ICMP_NE) ? 1 : 2, NewV);
		return &SI;
}		}

return nullptr;		return nullptr;
}		}

/// Return true if we find and adjust an icmp+select pattern where the compare		/// Return true if we find and adjust an icmp+select pattern where the compare
/// is with a constant that can be incremented or decremented to match the		/// is with a constant that can be incremented or decremented to match the
/// minimum or maximum idiom.		/// minimum or maximum idiom.
▲ Show 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	if (IsBitTest) {
if (V)		if (V)
return replaceInstUsesWith(SI, V);		return replaceInstUsesWith(SI, V);
}		}
}		}

if (Value *V = foldSelectICmpAndOr(SI, TrueVal, FalseVal, Builder))		if (Value *V = foldSelectICmpAndOr(SI, TrueVal, FalseVal, Builder))
return replaceInstUsesWith(SI, V);		return replaceInstUsesWith(SI, V);

if (Value *V = foldSelectCttzCtlz(ICI, TrueVal, FalseVal, Builder))		if (Value *V = foldSelectCttzCtlz(SI, ICI, TrueVal, FalseVal, Builder,
return replaceInstUsesWith(SI, V);		getTargetTransformInfo()))
		return (&SI == V) ? &SI : replaceInstUsesWith(SI, V);

return Changed ? &SI : nullptr;		return Changed ? &SI : nullptr;
}		}


/// SI is a select whose condition is a PHI node (but the two may be in		/// SI is a select whose condition is a PHI node (but the two may be in
/// different blocks). See if the true/false values (V) are live in all of the		/// different blocks). See if the true/false values (V) are live in all of the
/// predecessor blocks of the PHI. For example, cases like this can't be mapped:		/// predecessor blocks of the PHI. For example, cases like this can't be mapped:
▲ Show 20 Lines • Show All 809 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
▲ Show 20 Lines • Show All 3,047 Lines • ▼ Show 20 Lines	static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,
}		}

return MadeIRChange;		return MadeIRChange;
}		}

static bool		static bool
combineInstructionsOverFunction(Function &F, InstCombineWorklist &Worklist,		combineInstructionsOverFunction(Function &F, InstCombineWorklist &Worklist,
AliasAnalysis *AA, AssumptionCache &AC,		AliasAnalysis *AA, AssumptionCache &AC,
TargetLibraryInfo &TLI, DominatorTree &DT,		TargetLibraryInfo &TLI, TargetTransformInfo *TTI, DominatorTree &DT,
bool ExpensiveCombines = true,		bool ExpensiveCombines = true,
LoopInfo *LI = nullptr) {		LoopInfo *LI = nullptr) {
auto &DL = F.getParent()->getDataLayout();		auto &DL = F.getParent()->getDataLayout();
ExpensiveCombines \|= EnableExpensiveCombines;		ExpensiveCombines \|= EnableExpensiveCombines;

/// Builder - This is an IRBuilder that automatically inserts new		/// Builder - This is an IRBuilder that automatically inserts new
/// instructions into the worklist when they are created.		/// instructions into the worklist when they are created.
IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(		IRBuilder<TargetFolder, IRBuilderCallbackInserter> Builder(
Show All 15 Lines	combineInstructionsOverFunction(Function &F, InstCombineWorklist &Worklist,
for (;;) {		for (;;) {
++Iteration;		++Iteration;
DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "		DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
<< F.getName() << "\n");		<< F.getName() << "\n");

bool Changed = prepareICWorklistFromFunction(F, DL, &TLI, Worklist);		bool Changed = prepareICWorklistFromFunction(F, DL, &TLI, Worklist);

InstCombiner IC(Worklist, &Builder, F.optForMinSize(), ExpensiveCombines,		InstCombiner IC(Worklist, &Builder, F.optForMinSize(), ExpensiveCombines,
AA, AC, TLI, DT, DL, LI);		AA, AC, TLI, DT, DL, TTI, LI);
Changed \|= IC.run();		Changed \|= IC.run();

if (!Changed)		if (!Changed)
break;		break;
}		}

return DbgDeclaresChanged \|\| Iteration > 1;		return DbgDeclaresChanged \|\| Iteration > 1;
}		}

PreservedAnalyses InstCombinePass::run(Function &F,		PreservedAnalyses InstCombinePass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);

auto *LI = AM.getCachedResult<LoopAnalysis>(F);		auto *LI = AM.getCachedResult<LoopAnalysis>(F);
		auto *TTI = AM.getCachedResult<TargetIRAnalysis>(F);

// FIXME: The AliasAnalysis is not yet supported in the new pass manager		// FIXME: The AliasAnalysis is not yet supported in the new pass manager
if (!combineInstructionsOverFunction(F, Worklist, nullptr, AC, TLI, DT,		if (!combineInstructionsOverFunction(F, Worklist, nullptr, AC, TLI, TTI,
ExpensiveCombines, LI))		DT, ExpensiveCombines, LI))
// No changes, all analyses are preserved.		// No changes, all analyses are preserved.
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// Mark all the analyses that instcombine updates as preserved.		// Mark all the analyses that instcombine updates as preserved.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<AAManager>();		PA.preserve<AAManager>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
Show All 20 Lines	bool InstructionCombiningPass::runOnFunction(Function &F) {
auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		auto AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();

// Optional analyses.		// Optional analyses.
auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();		auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;		auto *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;
		auto *TTIWP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>();
		auto *TTI = TTIWP ? &TTIWP->getTTI(F) : nullptr;

return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, DT,		return combineInstructionsOverFunction(F, Worklist, AA, AC, TLI, TTI,
ExpensiveCombines, LI);		DT, ExpensiveCombines, LI);
}		}

char InstructionCombiningPass::ID = 0;		char InstructionCombiningPass::ID = 0;
INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",		INITIALIZE_PASS_BEGIN(InstructionCombiningPass, "instcombine",
"Combine redundant instructions", false, false)		"Combine redundant instructions", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
Show All 17 Lines

test/Transforms/InstCombine/select-cmp-cttz-ctlz.ll

	; RUN: opt -instcombine -S < %s \| FileCheck %s			; RUN: opt -S -instcombine < %s \| FileCheck %s --check-prefix=UNDEF
				; RUN: opt -S -instcombine -mattr=+bmi < %s \| FileCheck %s --check-prefix=TZDEF
				; RUN: opt -S -instcombine -mattr=+lzcnt < %s \| FileCheck %s --check-prefix=LZDEF
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Some comment before the two new lines can be nice to have. Also I suspect this won't work without the X86 backend configured in. mehdi_amini: Some comment before the two new lines can be nice to have. Also I suspect this won't work…

				target triple = "x86_64-unknown-unknown"

	; This test is to verify that the instruction combiner is able to fold			; This test is to verify that the instruction combiner is able to fold
	; a cttz/ctlz followed by a icmp + select into a single cttz/ctlz with			; a cttz/ctlz followed by a icmp + select into a single cttz/ctlz with
	; the 'is_zero_undef' flag cleared.			; the 'is_zero_undef' flag cleared.

	define i16 @test1(i16 %x) {			define i16 @test1(i16 %x) {
	; CHECK-LABEL: @test1(			; UNDEF-LABEL: @test1(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i16 @llvm.ctlz.i16(i16 %x, i1 false)			; UNDEF: tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)
	; CHECK-NEXT: ret i16 [[VAR]]
				; LZDEF-LABEL: @test1(
				; LZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i16 @llvm.ctlz.i16(i16 %x, i1 false)
				; LZDEF-NEXT: ret i16 [[VAR]]
	entry:			entry:
	%0 = tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)			%0 = tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)
	%tobool = icmp ne i16 %x, 0			%tobool = icmp ne i16 %x, 0
	%cond = select i1 %tobool, i16 %0, i16 16			%cond = select i1 %tobool, i16 %0, i16 16
	ret i16 %cond			ret i16 %cond
	}			}

	define i32 @test2(i32 %x) {			define i32 @test2(i32 %x) {
	; CHECK-LABEL: @test2(			; UNDEF-LABEL: @test2(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i32 @llvm.ctlz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)
	; CHECK-NEXT: ret i32 [[VAR]]
				; LZDEF-LABEL: @test2(
				; LZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i32 @llvm.ctlz.i32(i32 %x, i1 false)
				; LZDEF-NEXT: ret i32 [[VAR]]
	entry:			entry:
	%0 = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)			%0 = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)
	%tobool = icmp ne i32 %x, 0			%tobool = icmp ne i32 %x, 0
	%cond = select i1 %tobool, i32 %0, i32 32			%cond = select i1 %tobool, i32 %0, i32 32
	ret i32 %cond			ret i32 %cond
	}			}

	define i64 @test3(i64 %x) {			define i64 @test3(i64 %x) {
	; CHECK-LABEL: @test3(			; UNDEF-LABEL: @test3(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)			; UNDEF: tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
	; CHECK-NEXT: ret i64 [[VAR]]
				; LZDEF-LABEL: @test3(
				; LZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)
				; LZDEF-NEXT: ret i64 [[VAR]]
	entry:			entry:
	%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)			%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
	%tobool = icmp ne i64 %x, 0			%tobool = icmp ne i64 %x, 0
	%cond = select i1 %tobool, i64 %0, i64 64			%cond = select i1 %tobool, i64 %0, i64 64
	ret i64 %cond			ret i64 %cond
	}			}

	define i16 @test4(i16 %x) {			define i16 @test4(i16 %x) {
	; CHECK-LABEL: @test4(			; UNDEF-LABEL: @test4(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i16 @llvm.ctlz.i16(i16 %x, i1 false)			; UNDEF: tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)
	; CHECK-NEXT: ret i16 [[VAR]]
				; LZDEF-LABEL: @test4(
				; LZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i16 @llvm.ctlz.i16(i16 %x, i1 false)
				; LZDEF-NEXT: ret i16 [[VAR]]
	entry:			entry:
	%0 = tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)			%0 = tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)
	%tobool = icmp eq i16 %x, 0			%tobool = icmp eq i16 %x, 0
	%cond = select i1 %tobool, i16 16, i16 %0			%cond = select i1 %tobool, i16 16, i16 %0
	ret i16 %cond			ret i16 %cond
	}			}

	define i32 @test5(i32 %x) {			define i32 @test5(i32 %x) {
	; CHECK-LABEL: @test5(			; UNDEF-LABEL: @test5(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i32 @llvm.ctlz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)
	; CHECK-NEXT: ret i32 [[VAR]]
				; LZDEF-LABEL: @test5(
				; LZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i32 @llvm.ctlz.i32(i32 %x, i1 false)
				; LZDEF-NEXT: ret i32 [[VAR]]
	entry:			entry:
	%0 = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)			%0 = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)
	%tobool = icmp eq i32 %x, 0			%tobool = icmp eq i32 %x, 0
	%cond = select i1 %tobool, i32 32, i32 %0			%cond = select i1 %tobool, i32 32, i32 %0
	ret i32 %cond			ret i32 %cond
	}			}

	define i64 @test6(i64 %x) {			define i64 @test6(i64 %x) {
	; CHECK-LABEL: @test6(			; UNDEF-LABEL: @test6(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)			; UNDEF: tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
	; CHECK-NEXT: ret i64 [[VAR]]
				; LZDEF-LABEL: @test6(
				; LZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)
				; LZDEF-NEXT: ret i64 [[VAR]]
	entry:			entry:
	%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)			%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
	%tobool = icmp eq i64 %x, 0			%tobool = icmp eq i64 %x, 0
	%cond = select i1 %tobool, i64 64, i64 %0			%cond = select i1 %tobool, i64 64, i64 %0
	ret i64 %cond			ret i64 %cond
	}			}

	define i16 @test1b(i16 %x) {			define i16 @test1b(i16 %x) {
	; CHECK-LABEL: @test1b(			; UNDEF-LABEL: @test1b(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i16 @llvm.cttz.i16(i16 %x, i1 false)			; UNDEF: tail call i16 @llvm.cttz.i16(i16 %x, i1 true)
	; CHECK-NEXT: ret i16 [[VAR]]
				; TZDEF-LABEL: @test1b(
				; TZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i16 @llvm.cttz.i16(i16 %x, i1 false)
				; TZDEF-NEXT: ret i16 [[VAR]]
	entry:			entry:
	%0 = tail call i16 @llvm.cttz.i16(i16 %x, i1 true)			%0 = tail call i16 @llvm.cttz.i16(i16 %x, i1 true)
	%tobool = icmp ne i16 %x, 0			%tobool = icmp ne i16 %x, 0
	%cond = select i1 %tobool, i16 %0, i16 16			%cond = select i1 %tobool, i16 %0, i16 16
	ret i16 %cond			ret i16 %cond
	}			}

	define i32 @test2b(i32 %x) {			define i32 @test2b(i32 %x) {
	; CHECK-LABEL: @test2b(			; UNDEF-LABEL: @test2b(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	; CHECK-NEXT: ret i32 [[VAR]]
				; TZDEF-LABEL: @test2b(
				; TZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
				; TZDEF-NEXT: ret i32 [[VAR]]
	entry:			entry:
	%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)			%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	%tobool = icmp ne i32 %x, 0			%tobool = icmp ne i32 %x, 0
	%cond = select i1 %tobool, i32 %0, i32 32			%cond = select i1 %tobool, i32 %0, i32 32
	ret i32 %cond			ret i32 %cond
	}			}

	define i64 @test3b(i64 %x) {			define i64 @test3b(i64 %x) {
	; CHECK-LABEL: @test3b(			; UNDEF-LABEL: @test3b(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)			; UNDEF: tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
	; CHECK-NEXT: ret i64 [[VAR]]
				; TZDEF-LABEL: @test3b(
				; TZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)
				; TZDEF-NEXT: ret i64 [[VAR]]
	entry:			entry:
	%0 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)			%0 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
	%tobool = icmp ne i64 %x, 0			%tobool = icmp ne i64 %x, 0
	%cond = select i1 %tobool, i64 %0, i64 64			%cond = select i1 %tobool, i64 %0, i64 64
	ret i64 %cond			ret i64 %cond
	}			}

	define i16 @test4b(i16 %x) {			define i16 @test4b(i16 %x) {
	; CHECK-LABEL: @test4b(			; UNDEF-LABEL: @test4b(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i16 @llvm.cttz.i16(i16 %x, i1 false)			; UNDEF: tail call i16 @llvm.cttz.i16(i16 %x, i1 true)
	; CHECK-NEXT: ret i16 [[VAR]]
				; TZDEF-LABEL: @test4b(
				; TZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i16 @llvm.cttz.i16(i16 %x, i1 false)
				; TZDEF-NEXT: ret i16 [[VAR]]
	entry:			entry:
	%0 = tail call i16 @llvm.cttz.i16(i16 %x, i1 true)			%0 = tail call i16 @llvm.cttz.i16(i16 %x, i1 true)
	%tobool = icmp eq i16 %x, 0			%tobool = icmp eq i16 %x, 0
	%cond = select i1 %tobool, i16 16, i16 %0			%cond = select i1 %tobool, i16 16, i16 %0
	ret i16 %cond			ret i16 %cond
	}			}

	define i32 @test5b(i32 %x) {			define i32 @test5b(i32 %x) {
	; CHECK-LABEL: @test5b(			; UNDEF-LABEL: @test5b(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	; CHECK-NEXT: ret i32 [[VAR]]
				; TZDEF-LABEL: @test5b(
				; TZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
				; TZDEF-NEXT: ret i32 [[VAR]]
	entry:			entry:
	%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)			%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	%tobool = icmp eq i32 %x, 0			%tobool = icmp eq i32 %x, 0
	%cond = select i1 %tobool, i32 32, i32 %0			%cond = select i1 %tobool, i32 32, i32 %0
	ret i32 %cond			ret i32 %cond
	}			}

	define i64 @test6b(i64 %x) {			define i64 @test6b(i64 %x) {
	; CHECK-LABEL: @test6b(			; UNDEF-LABEL: @test6b(
	; CHECK: [[VAR:%[a-zA-Z0-9]+]] = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)			; UNDEF: tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
	; CHECK-NEXT: ret i64 [[VAR]]
				; TZDEF-LABEL: @test6b(
				; TZDEF: [[VAR:%[a-zA-Z0-9]+]] = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)
				; TZDEF-NEXT: ret i64 [[VAR]]
	entry:			entry:
	%0 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)			%0 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
	%tobool = icmp eq i64 %x, 0			%tobool = icmp eq i64 %x, 0
	%cond = select i1 %tobool, i64 64, i64 %0			%cond = select i1 %tobool, i64 64, i64 %0
	ret i64 %cond			ret i64 %cond
	}			}

	define i32 @test1c(i16 %x) {			define i32 @test1c(i16 %x) {
	; CHECK-LABEL: @test1c(			; UNDEF-LABEL: @test1c(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i16 @llvm.cttz.i16(i16 %x, i1 false)			; UNDEF: tail call i16 @llvm.cttz.i16(i16 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i16 [[VAR1]] to i32
	; CHECK-NEXT: ret i32 [[VAR2]]			; TZDEF-LABEL: @test1c(
				; TZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i16 @llvm.cttz.i16(i16 %x, i1 false)
				; TZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i16 [[VAR1]] to i32
				; TZDEF-NEXT: ret i32 [[VAR2]]
	entry:			entry:
	%0 = tail call i16 @llvm.cttz.i16(i16 %x, i1 true)			%0 = tail call i16 @llvm.cttz.i16(i16 %x, i1 true)
	%cast2 = zext i16 %0 to i32			%cast2 = zext i16 %0 to i32
	%tobool = icmp ne i16 %x, 0			%tobool = icmp ne i16 %x, 0
	%cond = select i1 %tobool, i32 %cast2, i32 16			%cond = select i1 %tobool, i32 %cast2, i32 16
	ret i32 %cond			ret i32 %cond
	}			}

	define i64 @test2c(i16 %x) {			define i64 @test2c(i16 %x) {
	; CHECK-LABEL: @test2c(			; UNDEF-LABEL: @test2c(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i16 @llvm.cttz.i16(i16 %x, i1 false)			; UNDEF: tail call i16 @llvm.cttz.i16(i16 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i16 [[VAR1]] to i64
	; CHECK-NEXT: ret i64 [[VAR2]]			; TZDEF-LABEL: @test2c(
				; TZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i16 @llvm.cttz.i16(i16 %x, i1 false)
				; TZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i16 [[VAR1]] to i64
				; TZDEF-NEXT: ret i64 [[VAR2]]
	entry:			entry:
	%0 = tail call i16 @llvm.cttz.i16(i16 %x, i1 true)			%0 = tail call i16 @llvm.cttz.i16(i16 %x, i1 true)
	%conv = zext i16 %0 to i64			%conv = zext i16 %0 to i64
	%tobool = icmp ne i16 %x, 0			%tobool = icmp ne i16 %x, 0
	%cond = select i1 %tobool, i64 %conv, i64 16			%cond = select i1 %tobool, i64 %conv, i64 16
	ret i64 %cond			ret i64 %cond
	}			}

	define i64 @test3c(i32 %x) {			define i64 @test3c(i32 %x) {
	; CHECK-LABEL: @test3c(			; UNDEF-LABEL: @test3c(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i32 [[VAR1]] to i64
	; CHECK-NEXT: ret i64 [[VAR2]]			; TZDEF-LABEL: @test3c(
				; TZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
				; TZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i32 [[VAR1]] to i64
				; TZDEF-NEXT: ret i64 [[VAR2]]
	entry:			entry:
	%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)			%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	%conv = zext i32 %0 to i64			%conv = zext i32 %0 to i64
	%tobool = icmp ne i32 %x, 0			%tobool = icmp ne i32 %x, 0
	%cond = select i1 %tobool, i64 %conv, i64 32			%cond = select i1 %tobool, i64 %conv, i64 32
	ret i64 %cond			ret i64 %cond
	}			}

	define i32 @test4c(i16 %x) {			define i32 @test4c(i16 %x) {
	; CHECK-LABEL: @test4c(			; UNDEF-LABEL: @test4c(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i16 @llvm.ctlz.i16(i16 %x, i1 false)			; UNDEF: tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i16 [[VAR1]] to i32
	; CHECK-NEXT: ret i32 [[VAR2]]			; LZDEF-LABEL: @test4c(
				; LZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i16 @llvm.ctlz.i16(i16 %x, i1 false)
				; LZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i16 [[VAR1]] to i32
				; LZDEF-NEXT: ret i32 [[VAR2]]
	entry:			entry:
	%0 = tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)			%0 = tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)
	%cast = zext i16 %0 to i32			%cast = zext i16 %0 to i32
	%tobool = icmp ne i16 %x, 0			%tobool = icmp ne i16 %x, 0
	%cond = select i1 %tobool, i32 %cast, i32 16			%cond = select i1 %tobool, i32 %cast, i32 16
	ret i32 %cond			ret i32 %cond
	}			}

	define i64 @test5c(i16 %x) {			define i64 @test5c(i16 %x) {
	; CHECK-LABEL: @test5c(			; UNDEF-LABEL: @test5c(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i16 @llvm.ctlz.i16(i16 %x, i1 false)			; UNDEF: tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i16 [[VAR1]] to i64
	; CHECK-NEXT: ret i64 [[VAR2]]			; LZDEF-LABEL: @test5c(
				; LZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i16 @llvm.ctlz.i16(i16 %x, i1 false)
				; LZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i16 [[VAR1]] to i64
				; LZDEF-NEXT: ret i64 [[VAR2]]
	entry:			entry:
	%0 = tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)			%0 = tail call i16 @llvm.ctlz.i16(i16 %x, i1 true)
	%cast = zext i16 %0 to i64			%cast = zext i16 %0 to i64
	%tobool = icmp ne i16 %x, 0			%tobool = icmp ne i16 %x, 0
	%cond = select i1 %tobool, i64 %cast, i64 16			%cond = select i1 %tobool, i64 %cast, i64 16
	ret i64 %cond			ret i64 %cond
	}			}

	define i64 @test6c(i32 %x) {			define i64 @test6c(i32 %x) {
	; CHECK-LABEL: @test6c(			; UNDEF-LABEL: @test6c(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.ctlz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i32 [[VAR1]] to i64
	; CHECK-NEXT: ret i64 [[VAR2]]			; LZDEF-LABEL: @test6c(
				; LZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.ctlz.i32(i32 %x, i1 false)
				; LZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i32 [[VAR1]] to i64
				; LZDEF-NEXT: ret i64 [[VAR2]]
	entry:			entry:
	%0 = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)			%0 = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)
	%cast = zext i32 %0 to i64			%cast = zext i32 %0 to i64
	%tobool = icmp ne i32 %x, 0			%tobool = icmp ne i32 %x, 0
	%cond = select i1 %tobool, i64 %cast, i64 32			%cond = select i1 %tobool, i64 %cast, i64 32
	ret i64 %cond			ret i64 %cond
	}			}

	define i16 @test1d(i64 %x) {			define i16 @test1d(i64 %x) {
	; CHECK-LABEL: @test1d(			; UNDEF-LABEL: @test1d(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)			; UNDEF: tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i64 [[VAR1]] to i16
	; CHECK-NEXT: ret i16 [[VAR2]]			; TZDEF-LABEL: @test1d(
				; TZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)
				; TZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i64 [[VAR1]] to i16
				; TZDEF-NEXT: ret i16 [[VAR2]]
	entry:			entry:
	%0 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)			%0 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
	%conv = trunc i64 %0 to i16			%conv = trunc i64 %0 to i16
	%tobool = icmp ne i64 %x, 0			%tobool = icmp ne i64 %x, 0
	%cond = select i1 %tobool, i16 %conv, i16 64			%cond = select i1 %tobool, i16 %conv, i16 64
	ret i16 %cond			ret i16 %cond
	}			}

	define i32 @test2d(i64 %x) {			define i32 @test2d(i64 %x) {
	; CHECK-LABEL: @test2d(			; UNDEF-LABEL: @test2d(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)			; UNDEF: tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i64 [[VAR1]] to i32
	; CHECK-NEXT: ret i32 [[VAR2]]			; TZDEF-LABEL: @test2d(
				; TZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)
				; TZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i64 [[VAR1]] to i32
				; TZDEF-NEXT: ret i32 [[VAR2]]
	entry:			entry:
	%0 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)			%0 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
	%cast = trunc i64 %0 to i32			%cast = trunc i64 %0 to i32
	%tobool = icmp ne i64 %x, 0			%tobool = icmp ne i64 %x, 0
	%cond = select i1 %tobool, i32 %cast, i32 64			%cond = select i1 %tobool, i32 %cast, i32 64
	ret i32 %cond			ret i32 %cond
	}			}

	define i16 @test3d(i32 %x) {			define i16 @test3d(i32 %x) {
	; CHECK-LABEL: @test3d(			; UNDEF-LABEL: @test3d(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i32 [[VAR1]] to i16
	; CHECK-NEXT: ret i16 [[VAR2]]			; TZDEF-LABEL: @test3d(
				; TZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
				; TZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i32 [[VAR1]] to i16
				; TZDEF-NEXT: ret i16 [[VAR2]]
	entry:			entry:
	%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)			%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	%cast = trunc i32 %0 to i16			%cast = trunc i32 %0 to i16
	%tobool = icmp ne i32 %x, 0			%tobool = icmp ne i32 %x, 0
	%cond = select i1 %tobool, i16 %cast, i16 32			%cond = select i1 %tobool, i16 %cast, i16 32
	ret i16 %cond			ret i16 %cond
	}			}

	define i16 @test4d(i64 %x) {			define i16 @test4d(i64 %x) {
	; CHECK-LABEL: @test4d(			; UNDEF-LABEL: @test4d(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)			; UNDEF: tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i64 [[VAR1]] to i16
	; CHECK-NEXT: ret i16 [[VAR2]]			; LZDEF-LABEL: @test4d(
				; LZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)
				; LZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i64 [[VAR1]] to i16
				; LZDEF-NEXT: ret i16 [[VAR2]]
	entry:			entry:
	%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)			%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
	%cast = trunc i64 %0 to i16			%cast = trunc i64 %0 to i16
	%tobool = icmp ne i64 %x, 0			%tobool = icmp ne i64 %x, 0
	%cond = select i1 %tobool, i16 %cast, i16 64			%cond = select i1 %tobool, i16 %cast, i16 64
	ret i16 %cond			ret i16 %cond
	}			}

	define i32 @test5d(i64 %x) {			define i32 @test5d(i64 %x) {
	; CHECK-LABEL: @test5d(			; UNDEF-LABEL: @test5d(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)			; UNDEF: tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i64 [[VAR1]] to i32
	; CHECK-NEXT: ret i32 [[VAR2]]			; LZDEF-LABEL: @test5d(
				; LZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)
				; LZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i64 [[VAR1]] to i32
				; LZDEF-NEXT: ret i32 [[VAR2]]
	entry:			entry:
	%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)			%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
	%cast = trunc i64 %0 to i32			%cast = trunc i64 %0 to i32
	%tobool = icmp ne i64 %x, 0			%tobool = icmp ne i64 %x, 0
	%cond = select i1 %tobool, i32 %cast, i32 64			%cond = select i1 %tobool, i32 %cast, i32 64
	ret i32 %cond			ret i32 %cond
	}			}

	define i16 @test6d(i32 %x) {			define i16 @test6d(i32 %x) {
	; CHECK-LABEL: @test6d(			; UNDEF-LABEL: @test6d(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.ctlz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i32 [[VAR1]] to i16
	; CHECK-NEXT: ret i16 [[VAR2]]			; LZDEF-LABEL: @test6d(
				; LZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.ctlz.i32(i32 %x, i1 false)
				; LZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i32 [[VAR1]] to i16
				; LZDEF-NEXT: ret i16 [[VAR2]]
	entry:			entry:
	%0 = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)			%0 = tail call i32 @llvm.ctlz.i32(i32 %x, i1 true)
	%cast = trunc i32 %0 to i16			%cast = trunc i32 %0 to i16
	%tobool = icmp ne i32 %x, 0			%tobool = icmp ne i32 %x, 0
	%cond = select i1 %tobool, i16 %cast, i16 32			%cond = select i1 %tobool, i16 %cast, i16 32
	ret i16 %cond			ret i16 %cond
	}			}

	define i64 @select_bug1(i32 %x) {			define i64 @select_bug1(i32 %x) {
	; CHECK-LABEL: @select_bug1(			; UNDEF-LABEL: @select_bug1(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i32 [[VAR1]] to i64
	; CHECK-NEXT: ret i64 [[VAR2]]			; TZDEF-LABEL: @select_bug1(
				; TZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
				; TZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = zext i32 [[VAR1]] to i64
				; TZDEF-NEXT: ret i64 [[VAR2]]
	entry:			entry:
	%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)			%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
	%conv = zext i32 %0 to i64			%conv = zext i32 %0 to i64
	%tobool = icmp ne i32 %x, 0			%tobool = icmp ne i32 %x, 0
	%cond = select i1 %tobool, i64 %conv, i64 32			%cond = select i1 %tobool, i64 %conv, i64 32
	ret i64 %cond			ret i64 %cond
	}			}

	define i16 @select_bug2(i32 %x) {			define i16 @select_bug2(i32 %x) {
	; CHECK-LABEL: @select_bug2(			; UNDEF-LABEL: @select_bug2(
	; CHECK: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)			; UNDEF: tail call i32 @llvm.cttz.i32(i32 %x, i1 true)
	; CHECK-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i32 [[VAR1]] to i16
	; CHECK-NEXT: ret i16 [[VAR2]]			; TZDEF-LABEL: @select_bug2(
				; TZDEF: [[VAR1:%[a-zA-Z0-9]+]] = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
				; TZDEF-NEXT: [[VAR2:%[a-zA-Z0-9]+]] = trunc i32 [[VAR1]] to i16
				; TZDEF-NEXT: ret i16 [[VAR2]]
	entry:			entry:
	%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)			%0 = tail call i32 @llvm.cttz.i32(i32 %x, i1 false)
	%conv = trunc i32 %0 to i16			%conv = trunc i32 %0 to i16
	%tobool = icmp ne i32 %x, 0			%tobool = icmp ne i32 %x, 0
	%cond = select i1 %tobool, i16 %conv, i16 32			%cond = select i1 %tobool, i16 %conv, i16 32
	ret i16 %cond			ret i16 %cond
	}			}


	declare i16 @llvm.ctlz.i16(i16, i1)			declare i16 @llvm.ctlz.i16(i16, i1)
	declare i32 @llvm.ctlz.i32(i32, i1)			declare i32 @llvm.ctlz.i32(i32, i1)
	declare i64 @llvm.ctlz.i64(i64, i1)			declare i64 @llvm.ctlz.i64(i64, i1)
	declare i16 @llvm.cttz.i16(i16, i1)			declare i16 @llvm.cttz.i16(i16, i1)
	declare i32 @llvm.cttz.i32(i32, i1)			declare i32 @llvm.cttz.i32(i32, i1)
	declare i64 @llvm.cttz.i64(i64, i1)			declare i64 @llvm.cttz.i64(i64, i1)