This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz.
ClosedPublic

Authored by andreadb on Dec 18 2014, 12:42 PM.

Download Raw Diff

Details

Reviewers

qcolombet
majnemer
hfinkel

Commits

rG22ee3f63b940: [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.
rL224899: [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.

Summary

Following up on the suggestions from D6679, here is a new patch to optimize cttz/ctlz in CodeGenPrepare.

If we know that the control flow is modelling an if-statement where the only instruction in 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph.

Example:
;;
entry:

%cmp = icmp eq i64 %Val, 0
br i1 %cmp, label %end.bb, label %then.bb

then.bb:

%c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
br label %EndBB

end.bb:

%cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
...

;;

The call to @llvm.cttz.i64 could be speculated. This would allow to get rid of 'then.bb' and merge %entry.bb with %end.bb.

The constraints are:
a) The 'then' basic block is taken only if the input operand to the cttz/ctlz is different than zero;
b) The phi node propagates the size-of (in bits) of the value %val in input to the cttz/ctlz if %val is zero.
c) The target says that it is "cheap" to speculate cttz/ctlz.

If all these constraints are met, CodeGenPrepare can hoist the call to cttz/ctlz from the 'then' basic block into the 'entry' basic block. The new cttz/ctlz instruction will also have the 'undef on zero' flag set to 'false'.

I added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'. Which means, CodeGenPrepare doesn't try to speculate calls to cttz/ctlz unless the target says that it is profitable to do it.

On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.
This may change in future. For now, I avoided to enable the transformation for all x86-64 targets with feature CMOV because I am not 100% it is always a win to speculate bsf/bsr. So, I left a couple of TODO comments in the code.

Please let me know what you think.

Thanks!
Andrea

Diff Detail

Repository: rL LLVM

Event Timeline

andreadb updated this revision to Diff 17464.Dec 18 2014, 12:42 PM

andreadb retitled this revision from to [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz..

andreadb updated this object.

andreadb edited the test plan for this revision. (Show Details)

andreadb added reviewers: qcolombet, hfinkel, majnemer.

andreadb added a subscriber: Unknown Object (MLST).

Uploaded a new version of the patch. Previous patch had a wrong test in it. Sorry for the confusion.

Patch updated.
Added missing check for TLI.

If your transformations fires, you need to make sure that ModifiedDT is set to true so that the DT will be recalculated. With that fixed, LGTM.

This revision is now accepted and ready to land.Dec 23 2014, 5:04 PM

Closed by commit rL224899: [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm. (authored by adibiagio). · Explain WhyDec 28 2014, 3:09 AM

This revision was automatically updated to reflect the committed changes.

Thanks for the review Hal!

I modified the patch so that ModifiedDT is set to true when this new transformation fires.
I also removed the TODO comments in X86ISelLowering.cpp as suggested by Chandler.
Committed revision 224899.

-Andrea

andreadb mentioned this in D6853: [CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz..Jan 6 2015, 5:31 AM

Diffusion mentioned this in rL225274: [CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz..Jan 6 2015, 9:42 AM

andreadb mentioned this in D7554: [TTI] improved cost heuristic for cttz/ctlz calls..Feb 11 2015, 5:50 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetLowering.h

10 lines

lib/

CodeGen/

CodeGenPrepare.cpp

141 lines

Target/

X86/

X86ISelLowering.h

4 lines

X86ISelLowering.cpp

10 lines

test/

CodeGen/

X86/

cttz-ctlz.ll

250 lines

Diff 17652

llvm/trunk/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	public:
/// On architectures that don't natively support some vector loads efficiently,		/// On architectures that don't natively support some vector loads efficiently,
/// casting the load to a smaller vector of larger types and loading		/// casting the load to a smaller vector of larger types and loading
/// is more efficient, however, this can be undone by optimizations in		/// is more efficient, however, this can be undone by optimizations in
/// dag combiner.		/// dag combiner.
virtual bool isLoadBitCastBeneficial(EVT /* Load /, EVT / Bitcast */) const {		virtual bool isLoadBitCastBeneficial(EVT /* Load /, EVT / Bitcast */) const {
return true;		return true;
}		}

		/// \brief Return true if it is cheap to speculate a call to intrinsic cttz.
		virtual bool isCheapToSpeculateCttz() const {
		return false;
		}

		/// \brief Return true if it is cheap to speculate a call to intrinsic ctlz.
		virtual bool isCheapToSpeculateCtlz() const {
		return false;
		}

/// \brief Return if the target supports combining a		/// \brief Return if the target supports combining a
/// chain like:		/// chain like:
/// \code		/// \code
/// %andResult = and %val1, #imm-with-one-bit-set;		/// %andResult = and %val1, #imm-with-one-bit-set;
/// %icmpResult = icmp %andResult, 0		/// %icmpResult = icmp %andResult, 0
/// br i1 %icmpResult, label %dest1, label %dest2		/// br i1 %icmpResult, label %dest1, label %dest2
/// \endcode		/// \endcode
/// into a single machine instruction of a form like:		/// into a single machine instruction of a form like:
▲ Show 20 Lines • Show All 2,523 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 3,972 Lines • ▼ Show 20 Lines	else if (isa<UndefValue>(Val) \|\| isa<ConstantInt>(Val) \|\|
assert(0 && "Did you modified shouldPromote and forgot to update this?");		assert(0 && "Did you modified shouldPromote and forgot to update this?");
ToBePromoted->setOperand(U.getOperandNo(), NewVal);		ToBePromoted->setOperand(U.getOperandNo(), NewVal);
}		}
Transition->removeFromParent();		Transition->removeFromParent();
Transition->insertAfter(ToBePromoted);		Transition->insertAfter(ToBePromoted);
Transition->setOperand(getTransitionOriginalValueIdx(), ToBePromoted);		Transition->setOperand(getTransitionOriginalValueIdx(), ToBePromoted);
}		}

		// See if we can speculate calls to intrinsic cttz/ctlz.
		//
		// Example:
		// entry:
		// ...
		// %cmp = icmp eq i64 %val, 0
		// br i1 %cmp, label %end.bb, label %then.bb
		//
		// then.bb:
		// %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
		// br label %EndBB
		//
		// end.bb:
		// %cond = phi i64 [ %c, %then.bb ], [ 64, %entry ]
		//
		// ==>
		//
		// entry:
		// ...
		// %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 false)
		//
		static bool OptimizeBranchInst(BranchInst *BrInst, const TargetLowering &TLI) {
		assert(BrInst->isConditional() && "Expected a conditional branch!");
		BasicBlock *ThenBB = BrInst->getSuccessor(1);
		BasicBlock *EndBB = BrInst->getSuccessor(0);

		// See if ThenBB contains only one instruction (excluding the
		// terminator and DbgInfoIntrinsic calls).
		IntrinsicInst *II = nullptr;
		for (BasicBlock::iterator I = ThenBB->begin(),
		E = std::prev(ThenBB->end()); I != E; ++I) {
		// Skip debug info.
		if (isa<DbgInfoIntrinsic>(I))
		continue;

		if (II)
		// Avoid speculating more than one instruction.
		return false;

		// See if this is a call to intrinsic cttz/ctlz.
		if (match(cast<Instruction>(I), m_Intrinsic<Intrinsic::cttz>())) {
		// Avoid speculating expensive intrinsic calls.
		if (!TLI.isCheapToSpeculateCttz())
		return false;
		}
		else if (match(cast<Instruction>(I), m_Intrinsic<Intrinsic::ctlz>())) {
		// Avoid speculating expensive intrinsic calls.
		if (!TLI.isCheapToSpeculateCtlz())
		return false;
		} else
		return false;

		II = cast<IntrinsicInst>(I);
		}

		// Look for PHI nodes with 'II' as the incoming value from 'ThenBB'.
		BasicBlock *EntryBB = BrInst->getParent();
		for (BasicBlock::iterator I = EndBB->begin();
		PHINode *PN = dyn_cast<PHINode>(I); ++I) {
		Value *ThenV = PN->getIncomingValueForBlock(ThenBB);
		Value *OrigV = PN->getIncomingValueForBlock(EntryBB);

		if (!OrigV \|\| ThenV != II)
		return false;

		if (ConstantInt *CInt = dyn_cast<ConstantInt>(OrigV)) {
		unsigned BitWidth = ThenV->getType()->getIntegerBitWidth();

		// Don't try to simplify this phi node if 'ThenV' is a cttz/ctlz
		// intrinsic call, but 'OrigV' is not equal to the 'size-of' in bits
		// of the value in input to the cttz/ctlz.
		if (CInt->getValue() != BitWidth)
		return false;

		// Hoist the call to cttz/ctlz from ThenBB into EntryBB.
		EntryBB->getInstList().splice(BrInst, ThenBB->getInstList(),
		ThenBB->begin(), std::prev(ThenBB->end()));

		// Update PN setting ThenV as the incoming value from both 'EntryBB'
		// and 'ThenBB'. Eventually, method 'OptimizeInst' will fold this
		// phi node if all the incoming values are the same.
		PN->setIncomingValue(PN->getBasicBlockIndex(EntryBB), ThenV);
		PN->setIncomingValue(PN->getBasicBlockIndex(ThenBB), ThenV);

		// Clear the 'undef on zero' flag of the cttz/ctlz intrinsic call.
		if (cast<ConstantInt>(II->getArgOperand(1))->isOne()) {
		Type *Ty = II->getArgOperand(0)->getType();
		Value *Args[] = { II->getArgOperand(0),
		ConstantInt::getFalse(II->getContext()) };
		Module *M = EntryBB->getParent()->getParent();
		Value *IF = Intrinsic::getDeclaration(M, II->getIntrinsicID(), Ty);
		IRBuilder<> Builder(BrInst);
		Instruction *NewI = Builder.CreateCall(IF, Args);

		// Replace the old call to cttz/ctlz.
		II->replaceAllUsesWith(NewI);
		II->eraseFromParent();
		}

		// Update BrInst condition so that the branch to EndBB is always taken.
		// Later on, method 'ConstantFoldTerminator' will simplify this branch
		// replacing it with a direct branch to 'EndBB'.
		// As a side effect, CodeGenPrepare will attempt to simplify the control
		// flow graph by deleting basic block 'ThenBB' and merging 'EntryBB' into
		// 'EndBB' (calling method 'EliminateFallThrough').
		BrInst->setCondition(ConstantInt::getTrue(BrInst->getContext()));
		return true;
		}
		}

		return false;
		}

/// Some targets can do store(extractelement) with one instruction.		/// Some targets can do store(extractelement) with one instruction.
/// Try to push the extractelement towards the stores when the target		/// Try to push the extractelement towards the stores when the target
/// has this feature and this is profitable.		/// has this feature and this is profitable.
bool CodeGenPrepare::OptimizeExtractElementInst(Instruction *Inst) {		bool CodeGenPrepare::OptimizeExtractElementInst(Instruction *Inst) {
unsigned CombineCost = UINT_MAX;		unsigned CombineCost = UINT_MAX;
if (DisableStoreExtract \|\| !TLI \|\|		if (DisableStoreExtract \|\| !TLI \|\|
(!StressStoreExtract &&		(!StressStoreExtract &&
!TLI->canCombineStoreAndExtract(Inst->getOperand(0)->getType(),		!TLI->canCombineStoreAndExtract(Inst->getOperand(0)->getType(),
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	if (SelectInst *SI = dyn_cast<SelectInst>(I))
return OptimizeSelectInst(SI);		return OptimizeSelectInst(SI);

if (ShuffleVectorInst *SVI = dyn_cast<ShuffleVectorInst>(I))		if (ShuffleVectorInst *SVI = dyn_cast<ShuffleVectorInst>(I))
return OptimizeShuffleVectorInst(SVI);		return OptimizeShuffleVectorInst(SVI);

if (isa<ExtractElementInst>(I))		if (isa<ExtractElementInst>(I))
return OptimizeExtractElementInst(I);		return OptimizeExtractElementInst(I);

		if (BranchInst *BI = dyn_cast<BranchInst>(I)) {
		if (TLI && BI->isConditional() && BI->getCondition()->hasOneUse()) {
		// Check if the branch condition compares a value agaist zero.
		if (ICmpInst *ICI = dyn_cast<ICmpInst>(BI->getCondition())) {
		if (ICI->getPredicate() == ICmpInst::ICMP_EQ &&
		match(ICI->getOperand(1), m_Zero())) {
		BasicBlock *ThenBB = BI->getSuccessor(1);
		BasicBlock *EndBB = BI->getSuccessor(0);

		// Check if ThenBB is only reachable from this basic block; also,
		// check if EndBB has more than one predecessor.
		if (ThenBB->getSinglePredecessor() &&
		!EndBB->getSinglePredecessor()) {
		TerminatorInst *TI = ThenBB->getTerminator();

		if (TI->getNumSuccessors() == 1 && TI->getSuccessor(0) == EndBB &&
		// Try to speculate calls to intrinsic cttz/ctlz from 'ThenBB'.
		OptimizeBranchInst(BI, *TLI)) {
		ModifiedDT = true;
		return true;
		}
		}
		}
		}
		}
		return false;
		}

return false;		return false;
}		}

// In this pass we look for GEP and cast instructions that are used		// In this pass we look for GEP and cast instructions that are used
// across basic blocks and rewrite them to improve basic-block-at-a-time		// across basic blocks and rewrite them to improve basic-block-at-a-time
// selection.		// selection.
bool CodeGenPrepare::OptimizeBlock(BasicBlock &BB, bool& ModifiedDT) {		bool CodeGenPrepare::OptimizeBlock(BasicBlock &BB, bool& ModifiedDT) {
SunkAddrs.clear();		SunkAddrs.clear();
▲ Show 20 Lines • Show All 345 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	public:
MachineBasicBlock *		MachineBasicBlock *
EmitInstrWithCustomInserter(MachineInstr *MI,		EmitInstrWithCustomInserter(MachineInstr *MI,
MachineBasicBlock *MBB) const override;		MachineBasicBlock *MBB) const override;


/// This method returns the name of a target specific DAG node.		/// This method returns the name of a target specific DAG node.
const char *getTargetNodeName(unsigned Opcode) const override;		const char *getTargetNodeName(unsigned Opcode) const override;

		bool isCheapToSpeculateCttz() const override;

		bool isCheapToSpeculateCtlz() const override;

/// Return the value type to use for ISD::SETCC.		/// Return the value type to use for ISD::SETCC.
EVT getSetCCResultType(LLVMContext &Context, EVT VT) const override;		EVT getSetCCResultType(LLVMContext &Context, EVT VT) const override;

/// Determine which of the bits specified in Mask are known to be either		/// Determine which of the bits specified in Mask are known to be either
/// zero or one and return them in the KnownZero/KnownOne bitsets.		/// zero or one and return them in the KnownZero/KnownOne bitsets.
void computeKnownBitsForTargetNode(const SDValue Op,		void computeKnownBitsForTargetNode(const SDValue Op,
APInt &KnownZero,		APInt &KnownZero,
APInt &KnownOne,		APInt &KnownOne,
▲ Show 20 Lines • Show All 414 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,876 Lines • ▼ Show 20 Lines
	bool X86TargetLowering::isExtractSubvectorCheap(EVT ResVT,			bool X86TargetLowering::isExtractSubvectorCheap(EVT ResVT,
	unsigned Index) const {			unsigned Index) const {
	if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))			if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))
	return false;			return false;

	return (Index == 0 \|\| Index == ResVT.getVectorNumElements());			return (Index == 0 \|\| Index == ResVT.getVectorNumElements());
	}			}

				bool X86TargetLowering::isCheapToSpeculateCttz() const {
				// Don't try to speculate cttz if we can't directly use TZCNT.
				return Subtarget->hasBMI();
				}

				bool X86TargetLowering::isCheapToSpeculateCtlz() const {
				// Don't try to speculate ctlz if we can't directly use LZCNT.
				return Subtarget->hasLZCNT();
				}

	/// isUndefOrInRange - Return true if Val is undef or if its value falls within			/// isUndefOrInRange - Return true if Val is undef or if its value falls within
	/// the specified range (L, H].			/// the specified range (L, H].
	static bool isUndefOrInRange(int Val, int Low, int Hi) {			static bool isUndefOrInRange(int Val, int Low, int Hi) {
	return (Val < 0) \|\| (Val >= Low && Val < Hi);			return (Val < 0) \|\| (Val >= Low && Val < Hi);
	}			}

	/// isUndefOrEqual - Val is either less than zero (undef) or equal to the			/// isUndefOrEqual - Val is either less than zero (undef) or equal to the
	/// specified value.			/// specified value.
	▲ Show 20 Lines • Show All 22,621 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cttz-ctlz.ll

				; RUN: opt -S -codegenprepare -mtriple=x86_64-unknown-unknown -mattr=+bmi < %s \| FileCheck %s --check-prefix=ALL --check-prefix=BMI
				; RUN: opt -S -codegenprepare -mtriple=x86_64-unknown-unknown -mattr=+lzcnt < %s \| FileCheck %s --check-prefix=ALL --check-prefix=LZCNT
				; RUN: opt -S -codegenprepare -mtriple=x86_64-unknown-unknown < %s \| FileCheck %s --check-prefix=ALL --check-prefix=GENERIC


				define i64 @test1(i64 %A) {
				; ALL-LABEL: @test1(
				; LZCNT: [[CTLZ:%[A-Za-z0-9]+]] = call i64 @llvm.ctlz.i64(i64 %A, i1 false)
				; LZCNT-NEXT: ret i64 [[CTLZ]]
				; BMI: icmp eq i64 %A, 0
				; BMI: call i64 @llvm.ctlz.i64(i64 %A, i1 true)
				; GENERIC: icmp eq i64 %A, 0
				; GENERIC: call i64 @llvm.ctlz.i64(i64 %A, i1 true)
				entry:
				%tobool = icmp eq i64 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i64 @llvm.ctlz.i64(i64 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i64 [ %0, %cond.true ], [ 64, %entry ]
				ret i64 %cond
				}


				define i32 @test2(i32 %A) {
				; ALL-LABEL: @test2(
				; LZCNT: [[CTLZ:%[A-Za-z0-9]+]] = call i32 @llvm.ctlz.i32(i32 %A, i1 false)
				; LZCNT-NEXT: ret i32 [[CTLZ]]
				; BMI: icmp eq i32 %A, 0
				; BMI: call i32 @llvm.ctlz.i32(i32 %A, i1 true)
				; GENERIC: icmp eq i32 %A, 0
				; GENERIC: call i32 @llvm.ctlz.i32(i32 %A, i1 true)
				entry:
				%tobool = icmp eq i32 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i32 @llvm.ctlz.i32(i32 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i32 [ %0, %cond.true ], [ 32, %entry ]
				ret i32 %cond
				}


				define signext i16 @test3(i16 signext %A) {
				; ALL-LABEL: @test3(
				; LZCNT: [[CTLZ:%[A-Za-z0-9]+]] = call i16 @llvm.ctlz.i16(i16 %A, i1 false)
				; LZCNT-NEXT: ret i16 [[CTLZ]]
				; BMI: icmp eq i16 %A, 0
				; BMI: call i16 @llvm.ctlz.i16(i16 %A, i1 true)
				; GENERIC: icmp eq i16 %A, 0
				; GENERIC: call i16 @llvm.ctlz.i16(i16 %A, i1 true)
				entry:
				%tobool = icmp eq i16 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i16 @llvm.ctlz.i16(i16 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i16 [ %0, %cond.true ], [ 16, %entry ]
				ret i16 %cond
				}


				define i64 @test1b(i64 %A) {
				; ALL-LABEL: @test1b(
				; LZCNT: icmp eq i64 %A, 0
				; LZCNT: call i64 @llvm.cttz.i64(i64 %A, i1 true)
				; BMI: [[CTTZ:%[A-Za-z0-9]+]] = call i64 @llvm.cttz.i64(i64 %A, i1 false)
				; BMI-NEXT: ret i64 [[CTTZ]]
				; GENERIC: icmp eq i64 %A, 0
				; GENERIC: call i64 @llvm.cttz.i64(i64 %A, i1 true)
				entry:
				%tobool = icmp eq i64 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i64 [ %0, %cond.true ], [ 64, %entry ]
				ret i64 %cond
				}


				define i32 @test2b(i32 %A) {
				; ALL-LABEL: @test2b(
				; LZCNT: icmp eq i32 %A, 0
				; LZCNT: call i32 @llvm.cttz.i32(i32 %A, i1 true)
				; BMI: [[CTTZ:%[A-Za-z0-9]+]] = call i32 @llvm.cttz.i32(i32 %A, i1 false)
				; BMI-NEXT: ret i32 [[CTTZ]]
				; GENERIC: icmp eq i32 %A, 0
				; GENERIC: call i32 @llvm.cttz.i32(i32 %A, i1 true)
				entry:
				%tobool = icmp eq i32 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i32 @llvm.cttz.i32(i32 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i32 [ %0, %cond.true ], [ 32, %entry ]
				ret i32 %cond
				}


				define signext i16 @test3b(i16 signext %A) {
				; ALL-LABEL: @test3b(
				; LZCNT: icmp eq i16 %A, 0
				; LZCNT: call i16 @llvm.cttz.i16(i16 %A, i1 true)
				; BMI: [[CTTZ:%[A-Za-z0-9]+]] = call i16 @llvm.cttz.i16(i16 %A, i1 false)
				; BMI-NEXT: ret i16 [[CTTZ]]
				; GENERIC: icmp eq i16 %A, 0
				; GENERIC: call i16 @llvm.cttz.i16(i16 %A, i1 true)
				entry:
				%tobool = icmp eq i16 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i16 @llvm.cttz.i16(i16 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i16 [ %0, %cond.true ], [ 16, %entry ]
				ret i16 %cond
				}


				define i64 @test1c(i64 %A) {
				; ALL-LABEL: @test1c(
				; ALL: icmp eq i64 %A, 0
				; ALL: call i64 @llvm.ctlz.i64(i64 %A, i1 true)
				entry:
				%tobool = icmp eq i64 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i64 @llvm.ctlz.i64(i64 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i64 [ %0, %cond.true ], [ 63, %entry ]
				ret i64 %cond
				}

				define i32 @test2c(i32 %A) {
				; ALL-LABEL: @test2c(
				; ALL: icmp eq i32 %A, 0
				; ALL: call i32 @llvm.ctlz.i32(i32 %A, i1 true)
				entry:
				%tobool = icmp eq i32 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i32 @llvm.ctlz.i32(i32 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i32 [ %0, %cond.true ], [ 31, %entry ]
				ret i32 %cond
				}


				define signext i16 @test3c(i16 signext %A) {
				; ALL-LABEL: @test3c(
				; ALL: icmp eq i16 %A, 0
				; ALL: call i16 @llvm.ctlz.i16(i16 %A, i1 true)
				entry:
				%tobool = icmp eq i16 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i16 @llvm.ctlz.i16(i16 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i16 [ %0, %cond.true ], [ 15, %entry ]
				ret i16 %cond
				}


				define i64 @test1d(i64 %A) {
				; ALL-LABEL: @test1d(
				; ALL: icmp eq i64 %A, 0
				; ALL: call i64 @llvm.cttz.i64(i64 %A, i1 true)
				entry:
				%tobool = icmp eq i64 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i64 [ %0, %cond.true ], [ 63, %entry ]
				ret i64 %cond
				}


				define i32 @test2d(i32 %A) {
				; ALL-LABEL: @test2d(
				; ALL: icmp eq i32 %A, 0
				; ALL: call i32 @llvm.cttz.i32(i32 %A, i1 true)
				entry:
				%tobool = icmp eq i32 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i32 @llvm.cttz.i32(i32 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i32 [ %0, %cond.true ], [ 31, %entry ]
				ret i32 %cond
				}


				define signext i16 @test3d(i16 signext %A) {
				; ALL-LABEL: @test3d(
				; ALL: icmp eq i16 %A, 0
				; ALL: call i16 @llvm.cttz.i16(i16 %A, i1 true)
				entry:
				%tobool = icmp eq i16 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i16 @llvm.cttz.i16(i16 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i16 [ %0, %cond.true ], [ 15, %entry ]
				ret i16 %cond
				}


				declare i64 @llvm.ctlz.i64(i64, i1)
				declare i32 @llvm.ctlz.i32(i32, i1)
				declare i16 @llvm.ctlz.i16(i16, i1)
				declare i64 @llvm.cttz.i64(i64, i1)
				declare i32 @llvm.cttz.i32(i32, i1)
				declare i16 @llvm.cttz.i16(i16, i1)