This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
4
SimplifyCFG.cpp
-
test/Transforms/SimplifyCFG/
-
Transforms/
-
SimplifyCFG/
-
cttz-ctlz.ll

Differential D6679

[SimplifyCFG] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz.
AbandonedPublic

Authored by andreadb on Dec 16 2014, 4:05 AM.

Download Raw Diff

Details

Reviewers

qcolombet
majnemer
hfinkel

Summary

Hi David, Hal, Quentin (and all),

If we know that the control flow is modelling an if-statement where the only instruction in 'then' basic block (excluding the terminator) is a call to cttz/ctlz, it may be beneficial to speculate the cttz/ctlz call and let SimplifyCFG convert the associated phi instruction in the 'end' basic block into a select.

Example:
;;
entry:

%cmp = icmp eq i64 %Val, 0
br i1 %cmp, label %end.bb, label %then.bb

then.bb:

%c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
br label %EndBB

end.bb:

%cond = phi i64 [ %c, %then.bb ], [ 64, %entry]

;;

The call to @llvm.cttz.i64 could be speculated. This would allow to fold the entire code sequence into:

%cmp = icmp eq i64 %Val, 0
%c = tail call i64 @llvm.cttz.i64(i64 %val, i1 false)
%cond = select i1 %cmp, i64 64, i64 %c

The constraints are:
a) The 'then' basic block is taken only if the input operand to the cttz/ctlz is different than zero;
b) The phi node propagates the size-of (in bits) of the value %val in input to the cttz/ctlz if %val is zero.

If all these constraints are met, the optimizer can hoist the call to cttz/ctlz from the 'then' basic block into the 'entry' basic block. The phi instruction would then be replaced by a select statement.
The new cttz/ctlz instruction will also have the 'undef on zero' flag set to 'false'. This would allow the instruction combiner to further simplify the code by getting rid of the redundant select.

The IR from the example can be obtained from the following code:
///
unsigned long long foo(unsigned long long A) {

  return A ? __builtin_ctzll(A) : 64;
}

///

On X86-64 targets with feature TZCNT, this patch would allow the backend to generate optimal assembly code for function 'foo':

tzcntq %rdi, %rax
retq

On X86-64 targets with no TZCNT, the call to cttz with the 'undef on zero' flag cleared would be expanded into a longer sequence involving a conditional move instruction:

bsfq  %rdi, %rcx
movl  $64, %eax
cmovneq  %rc, %rax
popq  %rbp
retq

On X86 (not x86-64) targets with no LZCNT/TZCNT and no CMOV instructions, the backend would futher expand the conditional moves introducing machine basic blocks. Basically it would revert this optimization re-introducing the if-else structure.

My only questions are: is SimplifyCFG the correct place where to put this logic? If not, then where do you think I should put it?

Please, let me know what you think.

Thanks,
Andrea

Diff Detail

Event Timeline

andreadb updated this revision to Diff 17329.Dec 16 2014, 4:05 AM

andreadb retitled this revision from to [SimplifyCFG] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz..

andreadb updated this object.

andreadb edited the test plan for this revision. (Show Details)

andreadb added reviewers: majnemer, hfinkel, qcolombet.

andreadb added a subscriber: Unknown Object (MLST).

Hi Andrea,

On X86 (not x86-64) targets with no LZCNT/TZCNT and no CMOV instructions, the backend would futher expand the conditional moves introducing machine basic blocks. Basically it would revert this optimization re-introducing the if-else structure.

Have you checked that the output assembly is the same (or equivalent) performance-wise on such targets?

My only questions are: is SimplifyCFG the correct place where to put this logic? If not, then where do you think I should put it?

Assuming the answer of my previous question is yes, I think it makes sense to have that in SimplifyCFG.

Thanks,
-Quentin

In D6679#102065, @qcolombet wrote:

Hi Andrea,

...

Assuming the answer of my previous question is yes, I think it makes sense to have that in SimplifyCFG.

I agree; the transformation seems reasonable. If it turns out not be be good for targets that can't completely fold the result, you could put the transformation in CodeGenPrep where you can directly query the target.

majnemer added inline comments.Dec 16 2014, 1:13 PM

lib/Transforms/Utils/SimplifyCFG.cpp
1624–1629	This could be: if (match(ThenV, m_Intrinsic<Intrinsic::cttz>(Op0) \|\| match(ThenV, m_Intrinsic<Intrinsic::ctlz>(Op0))
1631–1633	If you use `m_APInt`, this will also work with vector types.
1635–1640	Please use dyn_cast.
1643–1646	Why not just use `match(Cmp->getOperand(1), m_Zero())`

Hi Quentin, Hal, David,

thanks a lot for the useful feedback!.

In D6679#102065, @qcolombet wrote:

Hi Andrea,

On X86 (not x86-64) targets with no LZCNT/TZCNT and no CMOV instructions, the backend would futher expand the conditional moves introducing machine basic blocks. Basically it would revert this optimization re-introducing the if-else structure.

Have you checked that the output assembly is the same (or equivalent) performance-wise on such targets?

So, I checked the output assembly for those targets. At first I thought that the codegen was equivalent performance-wise. But I was wrong, since there is unfortunately an important difference: the count leading/trailing zeros instruction is now always dominating the control flow. Therefore, it would always be speculatively executed.
While this is ok for the case where we the input value is known not to be zero (since BSF/BSR would be executed anyway), this is sub-optimal for the case where the value is zero.

Before this patch, with a value of zero in input to cttz/ctlz, the instructions dynamically executed would have been:

"TEST+conditional branch+propagation of constant"

With this patch, we would execute instead:

"BSF/BSR+conditional branch+propagation of constant".

bsf/bsr would be able to set the rFLAGS, so the backend avoids inserting an extra TEST. However, BSF/BSR is much slower (it may be microcoded on old x86 targets...). So, I am afraid that my patch would slow down the code for x86 with no CMOV.

I'll see if I can move this logic in CodeGenPrepare as suggested by Hal.

My idea is to do the following (if you agree):

move this logic into CodeGenPrepare;
guard the code against a check on the subtarget (something like 'isCheapToSpeculateCttzCtlz').
- On X86 that method would return true if we have TZCNT/LZCNT or if we have feature CMOV.

Thanks again for your time.
I'll prepare a new patch.

Cheers,
Andrea

Hi Andrea,

My idea is to do the following (if you agree):

move this logic into CodeGenPrepare;

guard the code against a check on the subtarget (something like 'isCheapToSpeculateCttzCtlz').

On X86 that method would return true if we have TZCNT/LZCNT or if we have feature CMOV.

Sounds good to me.

Thanks,
-Quentin

andreadb mentioned this in D6728: [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz..Dec 18 2014, 12:42 PM

Uploaded a new patch here: http://reviews.llvm.org/D6728

andreadb mentioned this in D6891: [InstCombine] Teach how to fold a select into a cttz/ctlz with the 'is_zero_undef' flag cleared..Jan 9 2015, 10:41 AM

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

88 lines

test/

Transforms/

SimplifyCFG/

cttz-ctlz.ll

229 lines

Diff 17329

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 1,532 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BBI = ThenBB->begin(),

// Don't hoist the instruction if it's unsafe or expensive.		// Don't hoist the instruction if it's unsafe or expensive.
if (!isSafeToSpeculativelyExecute(I, DL) &&		if (!isSafeToSpeculativelyExecute(I, DL) &&
!(HoistCondStores &&		!(HoistCondStores &&
(SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,		(SpeculatedStoreValue = isSafeToSpeculateStore(I, BB, ThenBB,
EndBB))))		EndBB))))
return false;		return false;
if (!SpeculatedStoreValue &&		if (!SpeculatedStoreValue &&
ComputeSpeculationCost(I, DL) > PHINodeFoldingThreshold)		ComputeSpeculationCost(I, DL) > PHINodeFoldingThreshold) {
		// Special case where the only instruction in the basic block (excluding
		// the terminator) is a cttz/ctlz intrinsic call. It may still be
		// beneficial to hoist it from 'ThenBB'.
		if (!isa<IntrinsicInst>(I))
return false;		return false;

		IntrinsicInst *II = cast<IntrinsicInst>(I);
		if (II->getIntrinsicID() != Intrinsic::cttz &&
		II->getIntrinsicID() != Intrinsic::ctlz)
		return false;
		}

// Store the store speculation candidate.		// Store the store speculation candidate.
if (SpeculatedStoreValue)		if (SpeculatedStoreValue)
SpeculatedStore = cast<StoreInst>(I);		SpeculatedStore = cast<StoreInst>(I);

// Do not hoist the instruction if any of its operands are defined but not		// Do not hoist the instruction if any of its operands are defined but not
// used in BB. The transformation will prevent the operand from		// used in BB. The transformation will prevent the operand from
// being sunk into the use block.		// being sunk into the use block.
for (User::op_iterator i = I->op_begin(), e = I->op_end();		for (User::op_iterator i = I->op_begin(), e = I->op_end();
Show All 31 Lines	for (BasicBlock::iterator I = EndBB->begin();
if (ThenV == OrigV)		if (ThenV == OrigV)
continue;		continue;

// Don't convert to selects if we could remove undefined behavior instead.		// Don't convert to selects if we could remove undefined behavior instead.
if (passingValueIsAlwaysUndefined(OrigV, PN) \|\|		if (passingValueIsAlwaysUndefined(OrigV, PN) \|\|
passingValueIsAlwaysUndefined(ThenV, PN))		passingValueIsAlwaysUndefined(ThenV, PN))
return false;		return false;

		// See if we can hoist a cttz/ctlz from ThenBB into BB.
		//
		// Example:
		// entry:
		// ...
		// %cmp = icmp eq i64 %val, 0
		// br i1 %cmp, label %end.bb, label %then.bb
		//
		// then.bb:
		// %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
		// br label %EndBB
		//
		// end.bb:
		// %cond = phi i64 [ %c, %then.bb ], [ 64, %entry ]
		//
		// ==>
		//
		// entry:
		// ...
		// %cmp = icmp eq i64 %val, 0
		// %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 false)
		// select i1 %cmp, i64 64, i64 %c
		//
		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(ThenV)) {
		// Don't convert this phi node into a select if 'ThenV' is a cttz/ctlz
		// intrinsic call, but 'OrigV' is not equal to the 'size-of' in bits of
		// the value in input to the cttz/ctlz.
		if (II->getIntrinsicID() == Intrinsic::cttz \|\|
		II->getIntrinsicID() == Intrinsic::ctlz) {
		majnemerUnsubmitted Not Done Reply Inline Actions This could be: if (match(ThenV, m_Intrinsic<Intrinsic::cttz>(Op0) \|\| match(ThenV, m_Intrinsic<Intrinsic::ctlz>(Op0)) majnemer: This could be: if (match(ThenV, m_Intrinsic<Intrinsic::cttz>(Op0) \|\| match(ThenV…
		unsigned BitWidth = ThenV->getType()->getIntegerBitWidth();
		ConstantInt *CInt = dyn_cast<ConstantInt>(OrigV);
		if (!CInt \|\| !CInt->equalsInt(BitWidth))
		return false;
		majnemerUnsubmitted Not Done Reply Inline Actions If you use `m_APInt`, this will also work with vector types. majnemer: If you use `m_APInt`, this will also work with vector types.

		// Don't convert to select if 'ThenBB' is not on the false edge of the
		// conditional branch.
		if (!isa<ICmpInst>(BrCond))
		return false;

		ICmpInst *Cmp = cast<ICmpInst>(BrCond);
		majnemerUnsubmitted Not Done Reply Inline Actions Please use dyn_cast. majnemer: Please use dyn_cast.
		if (Cmp->getPredicate() != ICmpInst::ICMP_EQ \|\|
		Cmp->getOperand(0) != II->getArgOperand(0) \|\|
		!isa<ConstantInt>(Cmp->getOperand(1)) \|\|
		// Make sure that 'ThenBB' is only taken if the input to the
		// cttz/ctlz intrinsic call is not zero.
		!cast<ConstantInt>(Cmp->getOperand(1))->isZero())
		majnemerUnsubmitted Not Done Reply Inline Actions Why not just use `match(Cmp->getOperand(1), m_Zero())` majnemer: Why not just use `match(Cmp->getOperand(1), m_Zero())`
		return false;
		}
		}

HaveRewritablePHIs = true;		HaveRewritablePHIs = true;
ConstantExpr *OrigCE = dyn_cast<ConstantExpr>(OrigV);		ConstantExpr *OrigCE = dyn_cast<ConstantExpr>(OrigV);
ConstantExpr *ThenCE = dyn_cast<ConstantExpr>(ThenV);		ConstantExpr *ThenCE = dyn_cast<ConstantExpr>(ThenV);
if (!OrigCE && !ThenCE)		if (!OrigCE && !ThenCE)
continue; // Known safe and cheap.		continue; // Known safe and cheap.

if ((ThenCE && !isSafeToSpeculativelyExecute(ThenCE, DL)) \|\|		if ((ThenCE && !isSafeToSpeculativelyExecute(ThenCE, DL)) \|\|
(OrigCE && !isSafeToSpeculativelyExecute(OrigCE, DL)))		(OrigCE && !isSafeToSpeculativelyExecute(OrigCE, DL)))
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (OrigV == ThenV)
continue;		continue;

// Create a select whose true value is the speculatively executed value and		// Create a select whose true value is the speculatively executed value and
// false value is the preexisting value. Swap them if the branch		// false value is the preexisting value. Swap them if the branch
// destinations were inverted.		// destinations were inverted.
Value TrueV = ThenV, FalseV = OrigV;		Value TrueV = ThenV, FalseV = OrigV;
if (Invert)		if (Invert)
std::swap(TrueV, FalseV);		std::swap(TrueV, FalseV);

		// A call to cttz/ctlz is only speculated if ThenBB is on the false edge of
		// the conditional branch. If so, then flag 'Invert' is always expected to
		// be set, and 'FalseV' would point to the call to cttz/ctlz.
		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(FalseV)) {
		Intrinsic::ID ID = II->getIntrinsicID();
		if ((ID == Intrinsic::cttz \|\| ID == Intrinsic::ctlz) &&
		cast<ConstantInt>(II->getArgOperand(1))->isOne()) {
		// Construct a new call to cttz/ctlz and clear the
		// "undefined on zero" flag.
		Type *Ty = II->getArgOperand(0)->getType();
		Value *Args[] = { II->getArgOperand(0),
		ConstantInt::getFalse(II->getContext()) };
		Module *M = BB->getParent()->getParent();
		Value *IF = Intrinsic::getDeclaration(M, ID, Ty);
		Instruction *NewI = Builder.CreateCall(IF, Args);

		// Replace the old call to cttz/ctlz with 'NewI'.
		II->replaceAllUsesWith(NewI);
		II->eraseFromParent();
		FalseV = NewI;
		}
		}

Value *V = Builder.CreateSelect(BrCond, TrueV, FalseV,		Value *V = Builder.CreateSelect(BrCond, TrueV, FalseV,
TrueV->getName() + "." + FalseV->getName());		TrueV->getName() + "." + FalseV->getName());
PN->setIncomingValue(OrigI, V);		PN->setIncomingValue(OrigI, V);
PN->setIncomingValue(ThenI, V);		PN->setIncomingValue(ThenI, V);
}		}

++NumSpeculations;		++NumSpeculations;
return true;		return true;
▲ Show 20 Lines • Show All 2,983 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/cttz-ctlz.ll

				; RUN: opt < %s -simplifycfg -S \| FileCheck %s

				define i64 @test1(i64 %A) {
				; CHECK-LABEL: @test1(
				; CHECK: [[CTLZ:%[A-Za-z0-9]+]] = call i64 @llvm.ctlz.i64(i64 %A, i1 false)
				; CHECK-NEXT: select i1 %tobool, i64 64, i64 [[CTLZ]]
				entry:
				%tobool = icmp eq i64 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i64 @llvm.ctlz.i64(i64 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i64 [ %0, %cond.true ], [ 64, %entry ]
				ret i64 %cond
				}


				define i32 @test2(i32 %A) {
				; CHECK-LABEL: @test2(
				; CHECK: [[CTLZ:%[A-Za-z0-9]+]] = call i32 @llvm.ctlz.i32(i32 %A, i1 false)
				; CHECK-NEXT: select i1 %tobool, i32 32, i32 [[CTLZ]]
				entry:
				%tobool = icmp eq i32 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i32 @llvm.ctlz.i32(i32 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i32 [ %0, %cond.true ], [ 32, %entry ]
				ret i32 %cond
				}


				define signext i16 @test3(i16 signext %A) {
				; CHECK-LABEL: @test3(
				; CHECK: [[CTLZ:%[A-Za-z0-9]+]] = call i16 @llvm.ctlz.i16(i16 %A, i1 false)
				; CHECK-NEXT: select i1 %tobool, i16 16, i16 [[CTLZ]]
				entry:
				%tobool = icmp eq i16 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i16 @llvm.ctlz.i16(i16 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i16 [ %0, %cond.true ], [ 16, %entry ]
				ret i16 %cond
				}


				define i64 @test1b(i64 %A) {
				; CHECK-LABEL: @test1b(
				; CHECK: [[CTTZ:%[A-Za-z0-9]+]] = call i64 @llvm.cttz.i64(i64 %A, i1 false)
				; CHECK-NEXT: select i1 %tobool, i64 64, i64 [[CTTZ]]
				entry:
				%tobool = icmp eq i64 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i64 [ %0, %cond.true ], [ 64, %entry ]
				ret i64 %cond
				}


				define i32 @test2b(i32 %A) {
				; CHECK-LABEL: @test2b(
				; CHECK: [[CTTZ:%[A-Za-z0-9]+]] = call i32 @llvm.cttz.i32(i32 %A, i1 false)
				; CHECK-NEXT: select i1 %tobool, i32 32, i32 [[CTTZ]]
				entry:
				%tobool = icmp eq i32 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i32 @llvm.cttz.i32(i32 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i32 [ %0, %cond.true ], [ 32, %entry ]
				ret i32 %cond
				}


				define signext i16 @test3b(i16 signext %A) {
				; CHECK-LABEL: @test3b(
				; CHECK: [[CTTZ:%[A-Za-z0-9]+]] = call i16 @llvm.cttz.i16(i16 %A, i1 false)
				; CHECK-NEXT: select i1 %tobool, i16 16, i16 [[CTTZ]]
				entry:
				%tobool = icmp eq i16 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i16 @llvm.cttz.i16(i16 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i16 [ %0, %cond.true ], [ 16, %entry ]
				ret i16 %cond
				}


				define i64 @test1c(i64 %A) {
				; CHECK-LABEL: @test1c(
				; CHECK: call i64 @llvm.ctlz.i64(i64 %A, i1 true)
				; CHECK: phi i64 [ %0, %cond.true ], [ 63, %entry ]
				; CHECK-NEXT: ret
				entry:
				%tobool = icmp eq i64 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i64 @llvm.ctlz.i64(i64 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i64 [ %0, %cond.true ], [ 63, %entry ]
				ret i64 %cond
				}

				define i32 @test2c(i32 %A) {
				; CHECK-LABEL: @test2c(
				; CHECK: call i32 @llvm.ctlz.i32(i32 %A, i1 true)
				; CHECK: phi i32 [ %0, %cond.true ], [ 31, %entry ]
				; CHECK-NEXT: ret
				entry:
				%tobool = icmp eq i32 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i32 @llvm.ctlz.i32(i32 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i32 [ %0, %cond.true ], [ 31, %entry ]
				ret i32 %cond
				}


				define signext i16 @test3c(i16 signext %A) {
				; CHECK-LABEL: @test3c(
				; CHECK: call i16 @llvm.ctlz.i16(i16 %A, i1 true)
				; CHECK: phi i16 [ %0, %cond.true ], [ 15, %entry ]
				; CHECK-NEXT: ret
				entry:
				%tobool = icmp eq i16 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i16 @llvm.ctlz.i16(i16 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i16 [ %0, %cond.true ], [ 15, %entry ]
				ret i16 %cond
				}


				define i64 @test1d(i64 %A) {
				; CHECK-LABEL: @test1d(
				; CHECK: call i64 @llvm.cttz.i64(i64 %A, i1 true)
				; CHECK: phi i64 [ %0, %cond.true ], [ 63, %entry ]
				; CHECK-NEXT: ret
				entry:
				%tobool = icmp eq i64 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i64 [ %0, %cond.true ], [ 63, %entry ]
				ret i64 %cond
				}


				define i32 @test2d(i32 %A) {
				; CHECK-LABEL: @test2d(
				; CHECK: call i32 @llvm.cttz.i32(i32 %A, i1 true)
				; CHECK: phi i32 [ %0, %cond.true ], [ 31, %entry ]
				; CHECK-NEXT: ret
				entry:
				%tobool = icmp eq i32 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i32 @llvm.cttz.i32(i32 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i32 [ %0, %cond.true ], [ 31, %entry ]
				ret i32 %cond
				}


				define signext i16 @test3d(i16 signext %A) {
				; CHECK-LABEL: @test3d(
				; CHECK: call i16 @llvm.cttz.i16(i16 %A, i1 true)
				; CHECK: phi i16 [ %0, %cond.true ], [ 15, %entry ]
				; CHECK-NEXT: ret
				entry:
				%tobool = icmp eq i16 %A, 0
				br i1 %tobool, label %cond.end, label %cond.true

				cond.true: ; preds = %entry
				%0 = tail call i16 @llvm.cttz.i16(i16 %A, i1 true)
				br label %cond.end

				cond.end: ; preds = %entry, %cond.true
				%cond = phi i16 [ %0, %cond.true ], [ 15, %entry ]
				ret i16 %cond
				}


				declare i64 @llvm.ctlz.i64(i64, i1)
				declare i32 @llvm.ctlz.i32(i32, i1)
				declare i16 @llvm.ctlz.i16(i16, i1)
				declare i64 @llvm.cttz.i64(i64, i1)
				declare i32 @llvm.cttz.i32(i32, i1)
				declare i16 @llvm.cttz.i16(i16, i1)