Download Raw Diff

Details

Reviewers

reames
craig.topper
lebedev.ri

Summary

Hi.

First, I hope you are fine and the same for your relatives.

I wrote a patch which solves the issue 17128.
The goal of this patch is to replace such snippet:

int cttz(unsigned long x){
	unsigned long i = 0;
	while(i < 64 && (((x >> i) & 0x1) == 0))
		i++;
	return i;
}

by calls to llvm cttz intrinsic which can then be translated to the corresponding assembly instruction, if the architecture has one.
In my case, the intrinsic was replaced by bfsq instruction.

To confirm my results, I wrote cttz.ll test to confirm the patch works and I ran the the check-llvm-unit and check-llvm targets.
The second gave me one failure for Bindings/Go/go.test:

# llvm.org/llvm/bindings/go/llvm.test
/usr/lib/go-1.11/pkg/tool/linux_amd64/link: running /usr/bin/c++ failed: exit status 1
ld.lld: error: unknown --compress-debug-sections value: zlib-gnu
collect2: error: ld returned 1 exit status

I do not think this problem is related to my patch but rather to my configuration.

I also quickly benchmarked my modifications.
First, I measured the compilation time of this program by compiling it 100 times using this command clang -O3 -S -emit-llvm and time as measuring tool:

#include <stdlib.h>


int cttz(unsigned long x){
	unsigned long i = 0;
	while(i < 64 && (((x >> i) & 0x1) == 0))
		i++;
	return i;
}

int main(void){
	int bits_field;
	int first_set;

	bits_field = rand();

	first_set = cttz(bits_field);

	return first_set;
}

The results are the following (in second):

	100 compilations	mean for 1 compilation
without patch	20.54	.21
with patch	16.19	.17

So, the patch reduces compilation time of approximately 23%.
However, I am not really sure of this result as I first though that the modification would make the compilation slower.
Maybe it is quicker due to the loop removing and then not having to optimize it.

Then, I measure the performance of the generated code by running the above code 10000 times using time as measuring tool, the results are as follows (in millisecond):

	10000 runs	mean for 1 run
without patch	8730	.873
with patch	8220	.822

So, the patch reduces code execution time by around 6%.

If you see any way to improve the patch or mistake I made, feel free to share.

Best regards.

Diff Detail

Event Timeline

eiffel created this revision.Jan 4 2021, 9:17 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 4 2021, 9:17 AM

eiffel requested review of this revision.Jan 4 2021, 9:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2021, 9:17 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nikic added a reviewer: lebedev.ri.Jan 4 2021, 9:28 AM

xbolva00 added a reviewer: craig.topper.Jan 4 2021, 9:52 AM

lebedev.ri retitled this revision from [SCEV] Replace cttz loop by call to cttz intrinsic. to [LoopIdiom] Replace cttz loop by call to cttz intrinsic..Jan 4 2021, 9:55 AM

Harbormaster completed remote builds in B83906: Diff 314386.Jan 4 2021, 10:02 AM

Add a check in recognizeAndReplaceCTZ() to test that CurLoop has a UniqueExitBlock.
This should correct the check-all failed test compiling compiler-rt/lib/gwp_asan/tests/mutex_test.cpp.

eiffel updated this revision to Diff 421805.Apr 10 2022, 1:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2022, 1:06 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Hi.

This contribution is a bit old but I rebased it and tested it.
All the tests are passing:

[100%] Running the LLVM regression tests

Testing Time: 6737.87s
  Skipped          :     7
  Unsupported      :   472
  Passed           : 47326
  Expectedly Failed:   161
[100%] Built target check-llvm

So is it possible to, please, get some opinions regarding it?

Best regards and thank you in advance.

Harbormaster completed remote builds in B158921: Diff 421805.Apr 10 2022, 1:39 PM

craig.topper added inline comments.Apr 11 2022, 10:00 AM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1885	Drop the comment about jdoerfert. Its fine for a commit message, but doesn't add anything to the understanding of the code.
1892	Use `auto *AddIncValue0`
1900	`auto`
1906	`auto`

Hi.

Thank you for your reviews.
I addressed your comments and updated the patch, I only ran the transforms tests and they are passing:

[100%] Running lit suite /home/francis/llvm/llvm-project/llvm/test/Transforms

Testing Time: 357.54s
  Unsupported      :   11
  Passed           : 7333
  Expectedly Failed:   33
[100%] Built target check-llvm-transforms

Best regards.

Harbormaster completed remote builds in B159518: Diff 422612.Apr 13 2022, 1:45 PM

yurai007 added a subscriber: yurai007.Apr 14 2022, 7:12 AM

yurai007 added inline comments.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1972	nit: const
llvm/test/Transforms/LoopIdiom/cttz.ll
67	Would it make sense to add some negative tests covering scenarios when pattern is (as expected) not recognized?

eiffel updated this revision to Diff 423752.Apr 19 2022, 3:39 PM

Hi.

I updated my code to adress the const nit.

However, I am not really sure of the output for cttz64 when ran with -O1.
Indeed, I get:

; ./bin/opt -O1 -S < ../llvm-project/llvm/test/Transforms/LoopIdiom/cttz.ll
; ModuleID = '<stdin>'
source_filename = "<stdin>"

; Function Attrs: nofree norecurse nosync nounwind readnone uwtable
define i32 @cttz32(i32 %x) local_unnamed_addr #0 {
entry:
  %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true), !range !0
  ret i32 %0
}

; Function Attrs: nofree norecurse nosync nounwind readnone ssp uwtable
define i32 @cttz64(i64 %x) local_unnamed_addr #1 {
entry:
  br label %land.rhs

land.rhs:                                         ; preds = %while.body, %entry
  %i.06 = phi i64 [ 0, %entry ], [ %inc, %while.body ]
  %0 = shl nuw i64 1, %i.06
  %1 = and i64 %0, %x
  %cmp1 = icmp eq i64 %1, 0
  br i1 %cmp1, label %while.body, label %while.end.split.loop.exit2

while.body:                                       ; preds = %land.rhs
  %inc = add nuw nsw i64 %i.06, 1
  %cmp = icmp ult i64 %i.06, 63
  br i1 %cmp, label %land.rhs, label %while.end

while.end.split.loop.exit2:                       ; preds = %land.rhs
  %extract.t1.le = trunc i64 %i.06 to i32
  br label %while.end

while.end:                                        ; preds = %while.body, %while.end.split.loop.exit2
  %i.0.lcssa.off0 = phi i32 [ %extract.t1.le, %while.end.split.loop.exit2 ], [ 64, %while.body ]
  ret i32 %i.0.lcssa.off0
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i32 @llvm.cttz.i32(i32, i1 immarg) #2

attributes #0 = { nofree norecurse nosync nounwind readnone uwtable }
attributes #1 = { nofree norecurse nosync nounwind readnone ssp uwtable }
attributes #2 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

!0 = !{i32 0, i32 33}

While the result is correct for cttz32, the intrinsic is not present for cttz64.
Note that, when I ran this, I get an output which is correct for both:

; ./bin/opt -loop-idiom -loop-deletion -S < ../llvm-project/llvm/test/Transforms/LoopIdiom/cttz.ll          
; ModuleID = '<stdin>'
source_filename = "<stdin>"

; Function Attrs: norecurse nounwind readnone uwtable
define i32 @cttz32(i32 %x) #0 {
entry:
  br label %while.end

while.end:                                        ; preds = %entry
  %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
  ret i32 %0
}

; Function Attrs: nounwind readnone ssp uwtable
define i32 @cttz64(i64 %x) #1 {
entry:
  br label %while.end

while.end:                                        ; preds = %entry
  %0 = call i64 @llvm.cttz.i64(i64 %x, i1 true)
  %conv = trunc i64 %0 to i32
  ret i32 %conv
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i32 @llvm.cttz.i32(i32, i1 immarg) #2

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i64 @llvm.cttz.i64(i64, i1 immarg) #2

attributes #0 = { norecurse nounwind readnone uwtable }
attributes #1 = { nounwind readnone ssp uwtable }
attributes #2 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

So, can someone with better experience give me his/her thoughts about this?

Best regards and thank you in advance.

Harbormaster completed remote builds in B160334: Diff 423752.Apr 19 2022, 5:02 PM

In D94015#3460499, @eiffel wrote:

Hi.

I updated my code to adress the const nit.

However, I am not really sure of the output for cttz64 when ran with -O1.
Indeed, I get:

; ./bin/opt -O1 -S < ../llvm-project/llvm/test/Transforms/LoopIdiom/cttz.ll
; ModuleID = '<stdin>'
source_filename = "<stdin>"

; Function Attrs: nofree norecurse nosync nounwind readnone uwtable
define i32 @cttz32(i32 %x) local_unnamed_addr #0 {
entry:
  %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true), !range !0
  ret i32 %0
}

; Function Attrs: nofree norecurse nosync nounwind readnone ssp uwtable
define i32 @cttz64(i64 %x) local_unnamed_addr #1 {
entry:
  br label %land.rhs

land.rhs:                                         ; preds = %while.body, %entry
  %i.06 = phi i64 [ 0, %entry ], [ %inc, %while.body ]
  %0 = shl nuw i64 1, %i.06
  %1 = and i64 %0, %x
  %cmp1 = icmp eq i64 %1, 0
  br i1 %cmp1, label %while.body, label %while.end.split.loop.exit2

while.body:                                       ; preds = %land.rhs
  %inc = add nuw nsw i64 %i.06, 1
  %cmp = icmp ult i64 %i.06, 63
  br i1 %cmp, label %land.rhs, label %while.end

while.end.split.loop.exit2:                       ; preds = %land.rhs
  %extract.t1.le = trunc i64 %i.06 to i32
  br label %while.end

while.end:                                        ; preds = %while.body, %while.end.split.loop.exit2
  %i.0.lcssa.off0 = phi i32 [ %extract.t1.le, %while.end.split.loop.exit2 ], [ 64, %while.body ]
  ret i32 %i.0.lcssa.off0
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i32 @llvm.cttz.i32(i32, i1 immarg) #2

attributes #0 = { nofree norecurse nosync nounwind readnone uwtable }
attributes #1 = { nofree norecurse nosync nounwind readnone ssp uwtable }
attributes #2 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

!0 = !{i32 0, i32 33}

While the result is correct for cttz32, the intrinsic is not present for cttz64.
Note that, when I ran this, I get an output which is correct for both:

; ./bin/opt -loop-idiom -loop-deletion -S < ../llvm-project/llvm/test/Transforms/LoopIdiom/cttz.ll          
; ModuleID = '<stdin>'
source_filename = "<stdin>"

; Function Attrs: norecurse nounwind readnone uwtable
define i32 @cttz32(i32 %x) #0 {
entry:
  br label %while.end

while.end:                                        ; preds = %entry
  %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
  ret i32 %0
}

; Function Attrs: nounwind readnone ssp uwtable
define i32 @cttz64(i64 %x) #1 {
entry:
  br label %while.end

while.end:                                        ; preds = %entry
  %0 = call i64 @llvm.cttz.i64(i64 %x, i1 true)
  %conv = trunc i64 %0 to i32
  ret i32 %conv
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i32 @llvm.cttz.i32(i32, i1 immarg) #2

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i64 @llvm.cttz.i64(i64, i1 immarg) #2

attributes #0 = { norecurse nounwind readnone uwtable }
attributes #1 = { nounwind readnone ssp uwtable }
attributes #2 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

So, can someone with better experience give me his/her thoughts about this?

Best regards and thank you in advance.

That's because -O1 runs pipeline of transformations and apparently cttz32 at the time of reaching LIR is expected and pattern is recognized successfully.
However it's not the case for cttz64 - perhaps IR was mutated by previous passes and in consequence recognizeAndReplaceCTZ bails out on unexpected pattern.
If you want to find out what's going on then LLVM_DEBUG macro and -debug/-print-changed options may be useful for debugging purpose.

yurai007 added inline comments.Apr 20 2022, 12:27 PM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1975	Since it's running on all targets I wonder whether checking size is enough. For example - in case of CTLZ transformation profitability is checked by additional query to TTI. More context: https://reviews.llvm.org/D32605#752717

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

Diff 423752

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	private:
/// @{		/// @{

bool runOnNoncountableLoop();		bool runOnNoncountableLoop();

bool recognizePopcount();		bool recognizePopcount();
void transformLoopToPopcount(BasicBlock PreCondBB, Instruction CntInst,		void transformLoopToPopcount(BasicBlock PreCondBB, Instruction CntInst,
PHINode CntPhi, Value Var);		PHINode CntPhi, Value Var);
bool recognizeAndInsertFFS(); /// Find First Set: ctlz or cttz		bool recognizeAndInsertFFS(); /// Find First Set: ctlz or cttz
		bool recognizeAndReplaceCTZ();
		void transformLoopToCTZ(Loop CurLoop, PHINode CntPhi, Value *Val);
void transformLoopToCountable(Intrinsic::ID IntrinID, BasicBlock *PreCondBB,		void transformLoopToCountable(Intrinsic::ID IntrinID, BasicBlock *PreCondBB,
Instruction CntInst, PHINode CntPhi,		Instruction CntInst, PHINode CntPhi,
Value Var, Instruction DefX,		Value Var, Instruction DefX,
const DebugLoc &DL, bool ZeroCheck,		const DebugLoc &DL, bool ZeroCheck,
bool IsCntPhiUsedOutsideLoop);		bool IsCntPhiUsedOutsideLoop);

bool recognizeShiftUntilBitTest();		bool recognizeShiftUntilBitTest();
bool recognizeShiftUntilZero();		bool recognizeShiftUntilZero();
▲ Show 20 Lines • Show All 1,314 Lines • ▼ Show 20 Lines

bool LoopIdiomRecognize::runOnNoncountableLoop() {		bool LoopIdiomRecognize::runOnNoncountableLoop() {
LLVM_DEBUG(dbgs() << DEBUG_TYPE " Scanning: F["		LLVM_DEBUG(dbgs() << DEBUG_TYPE " Scanning: F["
<< CurLoop->getHeader()->getParent()->getName()		<< CurLoop->getHeader()->getParent()->getName()
<< "] Noncountable Loop %"		<< "] Noncountable Loop %"
<< CurLoop->getHeader()->getName() << "\n");		<< CurLoop->getHeader()->getName() << "\n");

return recognizePopcount() \|\| recognizeAndInsertFFS() \|\|		return recognizePopcount() \|\| recognizeAndInsertFFS() \|\|
recognizeShiftUntilBitTest() \|\| recognizeShiftUntilZero();		recognizeShiftUntilBitTest() \|\| recognizeShiftUntilZero() \|\|
		recognizeAndReplaceCTZ();
}		}

/// Check if the given conditional branch is based on the comparison between		/// Check if the given conditional branch is based on the comparison between
/// a variable and zero, and if the variable is non-zero or zero (JmpOnZero is		/// a variable and zero, and if the variable is non-zero or zero (JmpOnZero is
/// true), the control yields to the loop entry. If the branch matches the		/// true), the control yields to the loop entry. If the branch matches the
/// behavior, the variable involved in the comparison is returned. This function		/// behavior, the variable involved in the comparison is returned. This function
/// will be called to see if the precondition and postcondition of the loop are		/// will be called to see if the precondition and postcondition of the loop are
/// in desirable form.		/// in desirable form.
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	for (Instruction &Inst : llvm::make_range(
break;		break;
}		}
if (!CntInst)		if (!CntInst)
return false;		return false;

return true;		return true;
}		}

		/// Return true if the idiom is detected in the loop.
		///
		/// Additionally:
		/// 1) \p CntPhi is set to the corresponding phi node.
		/// 2) \p Val is set to the value whose trailing zeros are being counted.
		///
		/// The core idiom we are trying to detect is:
		/// \code
		/// x = init-val;
		/// land.rhs:
		/// i = phi (0, i.next);
		/// count = phi(0, count.next);
		/// shl = 1 << i;
		/// and = shl & x; // Val
		/// if (and != 0)
		/// goto while.exit;
		/// else
		/// goto while.body;
		/// while.body:
		/// i.next = i + 1;
		/// count.next = count + 1;
		/// if (i.next < 32)
		/// goto land.rhs;
		/// while.exit:
		/// count.exit = phi(count.next, count); // CntPhi
		/// \endcode
		static bool detectCTZIdiom(Loop CurLoop, BasicBlock LoopHeader,
		BasicBlock LoopExit, PHINode &CntPhi,
		Value *&Val) {
		BasicBlock *LoopLatch = CurLoop->getLoopLatch();

		// Step 1: Get the phi after the loop and check everything about it.
		CntPhi = dyn_cast<PHINode>(&LoopExit->front());
		craig.topperUnsubmitted Not Done Reply Inline Actions Drop the comment about jdoerfert. Its fine for a commit message, but doesn't add anything to the understanding of the code. craig.topper: Drop the comment about jdoerfert. Its fine for a commit message, but doesn't add anything to…

		if (!CntPhi \|\| CntPhi->getNumIncomingValues() != 2)
		return false;

		auto *AddIncValue0 = dyn_cast<Instruction>(CntPhi->getIncomingValue(0));

		// Check that the first incoming value is an add which is inside LoopBody.
		craig.topperUnsubmitted Not Done Reply Inline Actions Use `auto AddIncValue0` craig.topper:* Use `auto *AddIncValue0`
		if (!AddIncValue0 \|\| AddIncValue0->getOpcode() != Instruction::Add \|\|
		AddIncValue0->getParent() != LoopLatch)
		return false;

		auto *Int = dyn_cast<ConstantInt>(AddIncValue0->getOperand(1));

		// Check that this add is a ++.
		if (!Int \|\| !Int->isOne())
		craig.topperUnsubmitted Not Done Reply Inline Actions `auto` craig.topper: `auto`
		return false;

		auto *PhiIntValue1 = dyn_cast<PHINode>(CntPhi->getIncomingValue(1));

		// Check that the incoming values comes from LoopHeader and LoopBody.
		// Check that the second incoming value comes from LoopBody.
		craig.topperUnsubmitted Not Done Reply Inline Actions `auto` craig.topper: `auto`
		if (!PhiIntValue1 \|\| PhiIntValue1->getParent() != LoopHeader)
		return false;

		// The phi must have a 0 value as first incoming value.
		Int = dyn_cast<ConstantInt>(PhiIntValue1->getIncomingValue(0));
		if (!Int \|\| !Int->isZero())
		return false;

		// Step 2: Check that the first non phi instruction is a left shift.
		Instruction *Shl = LoopHeader->getFirstNonPHI();

		if (!Shl \|\| Shl->getOpcode() != Instruction::Shl)
		return false;

		// The shift is done on 1 with i.
		Int = dyn_cast<ConstantInt>(Shl->getOperand(0));
		if (!Int \|\| !Int->isOne())
		return false;

		// Step 3:
		// Check that the instruction after the shift is an and which uses the result
		// of the previous shift as one of its operands.
		// Its other operand is the x which should be used as CTZ operand.
		Instruction *And = Shl->getNextNode();
		if (!And \|\| And->getOpcode() != Instruction::And)
		return false;

		// And first operand is the result of the previous Shl.
		if (And->getOperand(0) != Shl)
		return false;

		// And second operand is Val.
		Val = And->getOperand(1);

		return true;
		}

		/// Recognizes a count trailing zeros idiom in a non-countable loop.
		///
		/// If detected, transforms the relevant code to issue the cttz intrinsic
		/// function call, and returns true; otherwise, returns false.
		bool LoopIdiomRecognize::recognizeAndReplaceCTZ() {
		// The loop should have two blocks:
		// 1. A header.
		// 2. A body where the counter is incremented.
		// Give up if the loop has not 2 blocks or multiple backedges.
		if (CurLoop->getNumBackEdges() != 1 \|\| CurLoop->getNumBlocks() != 2)
		return false;

		// It should have a preheader containing nothing but an unconditional branch
		// to the loop header.
		BasicBlock *PH = CurLoop->getLoopPreheader();
		if (!PH \|\| &PH->front() != PH->getTerminator())
		return false;
		auto *EntryBI = dyn_cast<BranchInst>(PH->getTerminator());
		if (!EntryBI \|\| EntryBI->isConditional())
		return false;

		// The header counts minimum 5 instructions:
		// land.rhs:
		// %i.07 = phi i32 [ 0, %entry ], [ %inc, %while.body ]
		// %0 = shl nuw i32 1, %i.07
		// %1 = and i32 %0, %x
		// %cmp1 = icmp eq i32 %1, 0
		// br i1 %cmp1, label %while.body, label %while.end
		const ssize_t IdiomCanonicalSize = 5;
		yurai007Unsubmitted Not Done Reply Inline Actions nit: const yurai007: nit: const
		BasicBlock *LoopHeader = CurLoop->getHeader();

		if (LoopHeader->sizeWithoutDebug() < IdiomCanonicalSize)
		yurai007Unsubmitted Not Done Reply Inline Actions Since it's running on all targets I wonder whether checking size is enough. For example - in case of CTLZ transformation profitability is checked by additional query to TTI. More context: https://reviews.llvm.org/D32605#752717 yurai007: Since it's running on all targets I wonder whether checking size is enough. For example - in…
		return false;

		// The loop should have only one exit block.
		BasicBlock *LoopExit = CurLoop->getUniqueExitBlock();

		if (!LoopExit)
		return false;

		PHINode *CntPhi;
		Value *Val;
		if (!detectCTZIdiom(CurLoop, LoopHeader, LoopExit, CntPhi, Val))
		return false;

		transformLoopToCTZ(CurLoop, CntPhi, Val);
		return true;
		}

		/// Create a cttz intrinsic instruction ready to be used with is_zero_undef set
		/// to true.
		static CallInst createCtzIntrinsic(IRBuilder<> &IRBuilder, Value Val,
		const DebugLoc &DL) {
		// The cttz intrinsic takes 2 arguments:
		// 1. src to count the trailing zeros from. In this case src is Val.
		// 2. is_zero_undef, a boolean, if set cttz returns src type size in bits when
		// src is 0 or undef otherwise. For this call, we hardcode true.
		Value *Ops[] = {Val, IRBuilder.getTrue()};

		Module *M = IRBuilder.GetInsertBlock()->getParent()->getParent();
		Function *Func =
		Intrinsic::getDeclaration(M, Intrinsic::cttz, Val->getType());
		CallInst *CI = IRBuilder.CreateCall(Func, Ops);
		CI->setDebugLoc(DL);

		return CI;
		}

		/// Replace loop with a cttz intrinsic.
		void LoopIdiomRecognize::transformLoopToCTZ(Loop CurLoop, PHINode CntPhi,
		Value *Val) {
		// Step 1: Create the cttz intrinsic.
		IRBuilder<> Builder(CntPhi);
		const DebugLoc &DL = CntPhi->getDebugLoc();
		Instruction *Ctz = createCtzIntrinsic(Builder, Val, DL);

		// Step 2: Replace the phi by Ctz everywhere it appears.
		CntPhi->replaceAllUsesWith(Ctz);
		CntPhi->eraseFromParent();

		// Step 3: Forget the loop.
		SE->forgetLoop(CurLoop);
		}

/// Recognize CTLZ or CTTZ idiom in a non-countable loop and convert the loop		/// Recognize CTLZ or CTTZ idiom in a non-countable loop and convert the loop
/// to countable (with CTLZ / CTTZ trip count). If CTLZ / CTTZ inserted as a new		/// to countable (with CTLZ / CTTZ trip count). If CTLZ / CTTZ inserted as a new
/// trip count returns true; otherwise, returns false.		/// trip count returns true; otherwise, returns false.
bool LoopIdiomRecognize::recognizeAndInsertFFS() {		bool LoopIdiomRecognize::recognizeAndInsertFFS() {
// Give up if the loop has multiple blocks or multiple backedges.		// Give up if the loop has multiple blocks or multiple backedges.
if (CurLoop->getNumBackEdges() != 1 \|\| CurLoop->getNumBlocks() != 1)		if (CurLoop->getNumBackEdges() != 1 \|\| CurLoop->getNumBlocks() != 1)
return false;		return false;

▲ Show 20 Lines • Show All 1,082 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopIdiom/cttz.ll

This file was added.

				; RUN: opt -loop-idiom < %s -S \| FileCheck %s

				; To recognize this pattern:
				; int cttz(unsigned int x) {
				; unsigned int i = 0;
				; while(i < 32 && (((x >> i) & 0x1) == 0))
				; i++;
				; return i;
				; }
				;
				; CHECK: entry
				; CHECK: llvm.cttz.i32
				; CHECK: ret
				define i32 @cttz32(i32 %x) norecurse nounwind readnone uwtable {
				entry:
				br label %land.rhs

				land.rhs: ; preds = %entry, %while.body
				%i.06 = phi i32 [ 0, %entry ], [ %inc, %while.body ]
				%0 = shl nuw i32 1, %i.06
				%1 = and i32 %0, %x
				%cmp1 = icmp eq i32 %1, 0
				br i1 %cmp1, label %while.body, label %while.end

				while.body: ; preds = %land.rhs
				%inc = add nuw nsw i32 %i.06, 1
				%cmp = icmp ult i32 %i.06, 31
				br i1 %cmp, label %land.rhs, label %while.end

				while.end: ; preds = %while.body, %land.rhs
				%i.0.lcssa = phi i32 [ %inc, %while.body ], [ %i.06, %land.rhs ]
				ret i32 %i.0.lcssa
				}

				; To recognize this pattern:
				; int cttz(unsigned long x) {
				; unsigned long i = 0;
				; while(i < 64 && (((x >> i) & 0x1) == 0))
				; i++;
				; return i;
				; }
				;
				; CHECK: entry
				; CHECK: llvm.cttz.i64
				; CHECK: ret
				define i32 @cttz64(i64 %x) nounwind uwtable readnone ssp {
				entry:
				br label %land.rhs

				land.rhs: ; preds = %entry, %while.body
				%i.06 = phi i64 [ 0, %entry ], [ %inc, %while.body ]
				%0 = shl nuw i64 1, %i.06
				%1 = and i64 %0, %x
				%cmp1 = icmp eq i64 %1, 0
				br i1 %cmp1, label %while.body, label %while.end

				while.body: ; preds = %land.rhs
				%inc = add nuw nsw i64 %i.06, 1
				%cmp = icmp ult i64 %i.06, 63
				br i1 %cmp, label %land.rhs, label %while.end

				while.end: ; preds = %while.body, %land.rhs
				%i.0.lcssa = phi i64 [ %inc, %while.body ], [ %i.06, %land.rhs ]
				%conv = trunc i64 %i.0.lcssa to i32
				ret i32 %conv
				}
				No newline at end of file
				yurai007Unsubmitted Not Done Reply Inline Actions Would it make sense to add some negative tests covering scenarios when pattern is (as expected) not recognized? yurai007: Would it make sense to add some negative tests covering scenarios when pattern is (as expected)…

This is an archive of the discontinued LLVM Phabricator instance.

[LoopIdiom] Replace cttz loop by call to cttz intrinsic.
Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 423752

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

llvm/test/Transforms/LoopIdiom/cttz.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LoopIdiom] Replace cttz loop by call to cttz intrinsic.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 423752

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

llvm/test/Transforms/LoopIdiom/cttz.ll

[LoopIdiom] Replace cttz loop by call to cttz intrinsic.
Needs ReviewPublic