This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
2
ScalarEvolution.cpp
-
test/Transforms/
-
Transforms/
-
IndVarSimplify/
-
lftr.ll
-
LoopUnroll/ARM/
-
ARM/
-
dont-unroll-loopdec.ll
-
unittests/Analysis/
-
Analysis/
-
ScalarEvolutionTest.cpp

Differential D71563

[SCEV] Recognise the hardwareloop "loop.decrement.reg" intrinsic
ClosedPublic

Authored by SjoerdMeijer on Dec 16 2019, 12:01 PM.

Download Raw Diff

Details

Reviewers

samparker
fhahn
reames
nikic
sanjoy.google
apilipenko

Commits

rG67bf9a6154d4: [SVEV] Recognise hardware-loop intrinsic loop.decrement.reg

Summary

Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same semantics as a sub expression. The generic HardwareLoop pass introduced 3 intrinsics to model hardwareloops in IR; the @loop.decrement.reg is used to update the loop induction variable.

Teaching SCEV about @loop.decrement.reg means we can also use SCEV for hardwareloops. For example, we would like to rematerialize the loop iteration count value in loop exit blocks for hardwareloops, and this change allows us to do exactly that. This will enable us to remove any use of loop iteration counts in the hardware loop. I will follow up shortly to further support this (but that will be more ARM specific).

This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus, while hardwareloops and tripcounts now become analysable by SCEV, this prevents the usual loop transformations from applying transformations on hardwareloops, which is what we want. I have added test cases for loopunrolling and IndVarSimplify and LFTR for this.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

SjoerdMeijer created this revision.Dec 16 2019, 12:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2019, 12:01 PM

Herald added subscribers: javed.absar, hiraditya, kristof.beyls. · View Herald Transcript

llvm.loop.decrement.reg is not documented in the langref. Is this a target specific thing?

Also, is the entire gamut of optimizations this enables OK? For instance, would it be okay to LFTR replace the trip count of the loop such that it doesn't use the loop_decrement_reg intrinsic anymore?

Thanks for taking a look!

llvm.loop.decrement.reg is not documented in the langref. Is this a target specific thing?

No, it's a target independent thing, and is generated by the generic Hardwareloop pass. It is defined here:

https://github.com/llvm/llvm-project/blob/ff07fc66d9eef577f3b44716f72e581a18cd9ac9/llvm/include/llvm/IR/Intrinsics.td#L1304

Also, is the entire gamut of optimizations this enables OK? For instance, would it be okay to LFTR replace the trip count of the loop such that it doesn't use the loop_decrement_reg intrinsic anymore?

Hmmm, I don't have a good answer for that yet. First, just to be clear, the llvm.loop.decrement.reg intrinsic is exactly the same as a sub expression, so I think SCEV analysis functions should be okay. But yes, I think I can imagine that SCEV transformations that modify the loop update expression could potentially be problematic. But this is a bit of unexplored territory for me, so first I need to look into what LFTR exactly does. So, I will need to so some homework first, but if we assume for a moment that there will be problematic cases with SCEV transformation modifying the loop update expression (if they would ignore the loop.decremenent intrinsic if there is one present), does that make this approach completely unfeasible, or would you still see possibilities to use the SCEV analysis functions?

LFTR is quite picky, it only handles add/sub/getelementptr instructions in the addrec. So at least that particular case shouldn't pose a problem.

In D71563#1786529, @SjoerdMeijer wrote:

Thanks for taking a look!

llvm.loop.decrement.reg is not documented in the langref. Is this a target specific thing?

No, it's a target independent thing, and is generated by the generic Hardwareloop pass. It is defined here:

https://github.com/llvm/llvm-project/blob/ff07fc66d9eef577f3b44716f72e581a18cd9ac9/llvm/include/llvm/IR/Intrinsics.td#L1304

Also, is the entire gamut of optimizations this enables OK? For instance, would it be okay to LFTR replace the trip count of the loop such that it doesn't use the loop_decrement_reg intrinsic anymore?

Hmmm, I don't have a good answer for that yet. First, just to be clear, the llvm.loop.decrement.reg intrinsic is exactly the same as a sub expression, so I think SCEV analysis functions should be okay. But yes, I think I can imagine that SCEV transformations that modify the loop update expression could potentially be problematic. But this is a bit of unexplored territory for me, so first I need to look into what LFTR exactly does. So, I will need to so some homework first, but if we assume for a moment that there will be problematic cases with SCEV transformation modifying the loop update expression (if they would ignore the loop.decremenent intrinsic if there is one present), does that make this approach completely unfeasible, or would you still see possibilities to use the SCEV analysis functions?

I think the right mental model is that SCEV's users can rewrite the loop exit condition in arbitrary ways once SCEV can "understand" it. It may not be so bad in practice, but this is how I think about it.

Thanks again for sharing your thoughts and comments. I've done a bit of my homework:

LFTR is quite picky, it only handles add/sub/getelementptr instructions in the addrec. So at least that particular case shouldn't pose a problem.

That's one way of phrasing it :-) The other way is that LFTR analyses expressions, and gracelessly bails if it doesn't understand the expression. So yes, I also don't see a problem here.

I think the right mental model is that SCEV's users can rewrite the loop exit condition in arbitrary ways once SCEV can "understand" it. It may not be so bad in practice, but this is how I think about it.

Yep, got it, cheers.

What do we think? Is this something we could add to SCEV?
If there are more places to investigate or change, I am of course more than willing to do this.

It's currently the backends responsibility to lower and handle any of the intrinsics that are inserted for hardware loops, so I don't think we should be concerned with other transforms triggering. I think the benefit of SCEV still working for these loops far outweighs the codegen effort - especially since this effort already has to be done because of isel and machine optimisations. That said, it probably wouldn't hurt to add some tests for common transforms especially ones where the loop body would be duplicated.

I have added test cases for LFTR and loopunrolling showing that the said transformations don't trigger as int_loop_decrement_reg is described as IntrNoDuplicate

Herald added a subscriber: zzheng. · View Herald TranscriptJan 9 2020, 6:47 AM

I reckon adding something to that unroll test that would consistently drive the unroller, like a triple, should be added. Otherwise, LGTM.

This revision is now accepted and ready to land.Jan 9 2020, 8:45 AM

Thanks, and I will add that before committing.

Closed by commit rG67bf9a6154d4: [SVEV] Recognise hardware-loop intrinsic loop.decrement.reg (authored by SjoerdMeijer). · Explain WhyJan 10 2020, 1:36 AM

This revision was automatically updated to reflect the committed changes.

SCEV code looks fine, no comment on the expected semantics of the intrinsic.

llvm/lib/Analysis/ScalarEvolution.cpp
4512	This might be slightly cleaner using m_Intrinsic pattern match, but it's minor at best and a subjective judgement call. Take your pick.
4513	Rather than a single case switch, use a if please.

SjoerdMeijer mentioned this in rG07028b5a8780: [SCEV] Follow up of D71563: addressing post commit comment. NFC..Jan 13 2020, 1:02 AM

Thanks, and comment addressed in rG07028b5a8780.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ScalarEvolution.cpp

11 lines

test/

Transforms/

IndVarSimplify/

lftr.ll

29 lines

LoopUnroll/

ARM/

dont-unroll-loopdec.ll

40 lines

unittests/

Analysis/

ScalarEvolutionTest.cpp

28 lines

Diff 237266

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,501 Lines • ▼ Show 20 Lines
	} // end anonymous namespace			} // end anonymous namespace

	/// Try to map \p V into a BinaryOp, and return \c None on failure.			/// Try to map \p V into a BinaryOp, and return \c None on failure.
	static Optional<BinaryOp> MatchBinaryOp(Value *V, DominatorTree &DT) {			static Optional<BinaryOp> MatchBinaryOp(Value *V, DominatorTree &DT) {
	auto *Op = dyn_cast<Operator>(V);			auto *Op = dyn_cast<Operator>(V);
	if (!Op)			if (!Op)
	return None;			return None;

				// Recognise intrinsic loop.decrement.reg, and as this has exactly the same
				// semantics as a Sub, return a binary sub expression.
				if (auto *II = dyn_cast<IntrinsicInst>(V)) {
				reamesUnsubmitted Not Done Reply Inline Actions This might be slightly cleaner using m_Intrinsic pattern match, but it's minor at best and a subjective judgement call. Take your pick. reames: This might be slightly cleaner using m_Intrinsic pattern match, but it's minor at best and a…
				switch (II->getIntrinsicID()) {
				reamesUnsubmitted Not Done Reply Inline Actions Rather than a single case switch, use a if please. reames: Rather than a single case switch, use a if please.
				case Intrinsic::loop_decrement_reg:
				return BinaryOp(Instruction::Sub, II->getOperand(0), II->getOperand(1));
				default:
				return None;
				}
				}

	// Implementation detail: all the cleverness here should happen without			// Implementation detail: all the cleverness here should happen without
	// creating new SCEV expressions -- our caller knowns tricks to avoid creating			// creating new SCEV expressions -- our caller knowns tricks to avoid creating
	// SCEV expressions when possible, and we should not break that.			// SCEV expressions when possible, and we should not break that.

	switch (Op->getOpcode()) {			switch (Op->getOpcode()) {
	case Instruction::Add:			case Instruction::Add:
	case Instruction::Sub:			case Instruction::Sub:
	case Instruction::Mul:			case Instruction::Mul:
	▲ Show 20 Lines • Show All 8,085 Lines • Show Last 20 Lines

llvm/test/Transforms/IndVarSimplify/lftr.ll

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	loop:
%i2 = mul i32 %i, %i		%i2 = mul i32 %i, %i
%c = icmp ule i32 %i2, 1000		%c = icmp ule i32 %i2, 1000
br i1 %c, label %loop, label %loopexit		br i1 %c, label %loop, label %loopexit

loopexit:		loopexit:
ret i32 %i		ret i32 %i
}		}

		define i32 @quadratic_sgt_loopdec() {
		; CHECK-LABEL: @quadratic_sgt_loopdec(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br label [[LOOP:%.*]]
		; CHECK: loop:
		; CHECK-NEXT: [[I:%.]] = phi i32 [ 10, [[ENTRY:%.]] ], [ [[I_NEXT:%.*]], [[LOOP]] ]
		; CHECK-NEXT: [[I_NEXT]] = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 [[I]], i32 1)
		; CHECK-NEXT: store i32 [[I]], i32* @A
		; CHECK-NEXT: [[I2:%.*]] = mul i32 [[I]], [[I]]
		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp sgt i32 [[I2]], 0
		; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOP]], label [[LOOPEXIT:%.*]]
		; CHECK: loopexit:
		; CHECK-NEXT: ret i32 0

		entry:
		br label %loop

		loop:
		%i = phi i32 [ 10, %entry ], [ %i.next, %loop ]
		%i.next = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %i, i32 1)
		store i32 %i, i32* @A
		%i2 = mul i32 %i, %i
		%c = icmp sgt i32 %i2, 0
		br i1 %c, label %loop, label %loopexit

		loopexit:
		ret i32 %i
		}

@data = common global [240 x i8] zeroinitializer, align 16		@data = common global [240 x i8] zeroinitializer, align 16

define void @test_zext(i8* %a) #0 {		define void @test_zext(i8* %a) #0 {
; CHECK-LABEL: @test_zext(		; CHECK-LABEL: @test_zext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
▲ Show 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	for.body29:
%and = and i1 %cmp, %cmp		%and = and i1 %cmp, %cmp
br i1 %and, label %for.body29, label %exit		br i1 %and, label %for.body29, label %exit

exit:		exit:
ret void		ret void
}		}


		declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32)

llvm/test/Transforms/LoopUnroll/ARM/dont-unroll-loopdec.ll

This file was added.

				; RUN: opt -mtriple=thumbv8.1m.main -mattr=+mve.fp -loop-unroll --loop-unroll -S < %s \| FileCheck %s

				; CHECK-LABEL: foo
				; CHECK: 5:
				; CHECK: 6: ; preds = %6, %5
				; CHECK: 15: ; preds = %6
				; CHECK: br label %16
				; CHECK: 16: ; preds = %15, %3
				; CHECK: ret void
				; CHECK: }

				define void @foo(i8* nocapture, i8* nocapture readonly, i32) {
				%4 = icmp sgt i32 %2, 0
				br i1 %4, label %5, label %16

				; <label>:5:
				br label %6

				; <label>:6:
				%7 = phi i32 [ %13, %6 ], [ %2, %5 ]
				%8 = phi i8* [ %10, %6 ], [ %1, %5 ]
				%9 = phi i8* [ %12, %6 ], [ %0, %5 ]
				%10 = getelementptr inbounds i8, i8* %8, i32 1
				%11 = load i8, i8* %8, align 1
				%12 = getelementptr inbounds i8, i8* %9, i32 1
				store i8 %11, i8* %9, align 1

				%13 = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %7, i32 1)

				%14 = icmp sgt i32 %7, 1
				br i1 %14, label %6, label %15

				; <label>:15:
				br label %16

				; <label>:16:
				ret void
				}

				declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32)

llvm/unittests/Analysis/ScalarEvolutionTest.cpp

Show First 20 Lines • Show All 1,677 Lines • ▼ Show 20 Lines	TEST_F(ScalarEvolutionsTest, SCEVExpanderShlNSW) {

checkOneCase("define void @f(i8* %arrayidx) { "		checkOneCase("define void @f(i8* %arrayidx) { "
" %1 = load i8, i8* %arrayidx "		" %1 = load i8, i8* %arrayidx "
" %2 = and i8 %1, -128 "		" %2 = and i8 %1, -128 "
" ret void "		" ret void "
"} ");		"} ");
}		}

		TEST_F(ScalarEvolutionsTest, SCEVLoopDecIntrinsic) {
		LLVMContext C;
		SMDiagnostic Err;
		std::unique_ptr<Module> M = parseAssemblyString(
		"define void @foo(i32 %N) { "
		"entry: "
		" %cmp3 = icmp sgt i32 %N, 0 "
		" br i1 %cmp3, label %for.body, label %for.cond.cleanup "
		"for.cond.cleanup: "
		" ret void "
		"for.body: "
		" %i.04 = phi i32 [ %inc, %for.body ], [ 100, %entry ] "
		" %inc = call i32 @llvm.loop.decrement.reg.i32.i32.i32(i32 %i.04, i32 1) "
		" %exitcond = icmp ne i32 %inc, 0 "
		" br i1 %exitcond, label %for.cond.cleanup, label %for.body "
		"} "
		"declare i32 @llvm.loop.decrement.reg.i32.i32.i32(i32, i32) ",
		Err, C);

		ASSERT_TRUE(M && "Could not parse module?");
		ASSERT_TRUE(!verifyModule(*M) && "Must have been well formed!");

		runWithSE(*M, "foo", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto *ScevInc = SE.getSCEV(getInstructionByName(F, "inc"));
		EXPECT_TRUE(isa<SCEVAddRecExpr>(ScevInc));
		});
		}

TEST_F(ScalarEvolutionsTest, SCEVComputeConstantDifference) {		TEST_F(ScalarEvolutionsTest, SCEVComputeConstantDifference) {
LLVMContext C;		LLVMContext C;
SMDiagnostic Err;		SMDiagnostic Err;
std::unique_ptr<Module> M = parseAssemblyString(		std::unique_ptr<Module> M = parseAssemblyString(
"define void @foo(i32 %sz, i32 %pp) { "		"define void @foo(i32 %sz, i32 %pp) { "
"entry: "		"entry: "
" %v0 = add i32 %pp, 0 "		" %v0 = add i32 %pp, 0 "
" %v3 = add i32 %pp, 3 "		" %v3 = add i32 %pp, 3 "
▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines