This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
4
Loads.h
-
lib/
-
Analysis/
-
Loads.cpp
-
Transforms/Scalar/
-
Scalar/
1/1
JumpThreading.cpp
-
test/Transforms/JumpThreading/
-
Transforms/
-
JumpThreading/
2/5
thread-loads.ll

Differential D29200

[JumpThread] Enhance finding partial redundant loads by continuing scanning single predecessor
ClosedPublic

Authored by junbuml on Jan 26 2017, 3:26 PM.

Download Raw Diff

Details

Reviewers

rengolin
reames
davidxl
haicheng
mcrosier

Commits

rG180bc5a02170: [JumpThread] Enhance finding partial redundant loads by continuing scanning…
rL293896: [JumpThread] Enhance finding partial redundant loads by continuing scanning…

Summary

While scanning predecessors to find an available loaded value, if the predecessor has a single predecessor, we can continue scanning through the single predecessor.

Diff Detail

Event Timeline

junbuml created this revision.Jan 26 2017, 3:26 PM

junbuml added a reviewer: haicheng.Jan 27 2017, 8:09 AM

rengolin added inline comments.Jan 30 2017, 5:14 AM

lib/Transforms/Scalar/JumpThreading.cpp
1009	Why not just transform the single access above into the loop below? Why do you need both?
test/Transforms/JumpThreading/thread-loads.ll
312	Is this just the `phi`, or is there also a call to `fn2(%l2)`? If `c1/c3` is false and `c2` is true, then only `fn2` is called, not `fn3`, which means the new `cond3` block has to be only conditionalised via `c1` not `c2`, as in the original IR.
326	Isn't there just one hop here? `%l2 -> %l1`? I thought you were testing multiple predecessors.

Addressed Renato's comments.

junbuml added inline comments.Jan 30 2017, 1:17 PM

test/Transforms/JumpThreading/thread-loads.ll
312	If c1/c3 is false and c2 is true, only fn2() will be called, so in the new cond2, it unconditionally branch to %end after calling fn2(). I added CHECKs for the branch to %end in cond2 and for the call to fn2(%l2) in cond2. When %c1 is false, it directly branch to %cond3 from entry. We have this CHECK in entry. So, we only branch to cond2 from cond1 by %c1 when c1/c3 is false.
326	It's one hop from cond2 to entry, but cond2 -> cond1 -> entry is two hop. I also added another test which has three hop.

zzheng added a subscriber: zzheng.Jan 30 2017, 2:29 PM

zzheng added inline comments.

include/llvm/Analysis/Loads.h
89	Can we use Optional<unsigned &> NumScannedInst here?

junbuml added inline comments.Jan 31 2017, 7:38 AM

include/llvm/Analysis/Loads.h
89	Honestly I don't have much idea about the Optional<>, but it looks okay to use it here. However, I believe we can make a separate patch to consider using the Optional for other parameters, not just for this parameter. Please let me know if we have to use the Optional specifically in this parameter unlikely other parameters.

rengolin added inline comments.Jan 31 2017, 8:17 AM

include/llvm/Analysis/Loads.h
89	I agree this sounds like something for a different patch.

zzheng added inline comments.Jan 31 2017, 8:41 AM

include/llvm/Analysis/Loads.h
89	I agree with you guys. I was only looking at the new parameter and its usage and thought it's appropriate to use Optional<>.

LGTM, thanks!

test/Transforms/JumpThreading/thread-loads.ll
326	Right, now it makes more sense. Thanks!

This revision is now accepted and ready to land.Jan 31 2017, 9:12 AM

Closed by commit rL293896: [JumpThread] Enhance finding partial redundant loads by continuing scanning… (authored by junbuml). · Explain WhyFeb 2 2017, 7:24 AM

This revision was automatically updated to reflect the committed changes.

Why is the right answer here to improve jump threading and not PRE if you want to catch more "partially redundant" loads ?

Additionally, why is this necessary at all?
-gvn already takes care of both of your testcases.

llvm/trunk/lib/Analysis/Loads.cpp
316 ↗	(On Diff #86809)	Scanned
348 ↗	(On Diff #86809)	Scanned
349 ↗	(On Diff #86809)	Scanned

Also, for the record, all of these testcases are fully redundant loads.

they are all of the form

load
if (...)
{
load
}

This is a full redundancy. The initial load dominates the others, with nothing in between, and so they are fully redundant This is also why every pass we have that does any load elimination will already eliminate them.

A partially redundant load would be

if (a)
 load a
else
  <nothing>
load a

This load is partially redundant because it unnecessarily happens twice when we take the if(a)'s true branch

PRE will turn it into:

if (a)
 load a
else
 load a
result = phi(...)

As mentioned, GVN already does this transformation, so i'd really like to see a testcase where you doing this in jump threading enables an optimization we don't get already.

Revision Contents

Path

Size

include/

llvm/

Analysis/

Loads.h

3 lines

lib/

Analysis/

Loads.cpp

6 lines

Transforms/

Scalar/

JumpThreading.cpp

23 lines

test/

Transforms/

JumpThreading/

thread-loads.ll

38 lines

Diff 85973

include/llvm/Analysis/Loads.h

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	/// location in memory, as opposed to the value operand of a store.			/// location in memory, as opposed to the value operand of a store.
	///			///
	/// \returns The found value, or nullptr if no value is found.			/// \returns The found value, or nullptr if no value is found.
	Value FindAvailableLoadedValue(LoadInst Load,			Value FindAvailableLoadedValue(LoadInst Load,
	BasicBlock *ScanBB,			BasicBlock *ScanBB,
	BasicBlock::iterator &ScanFrom,			BasicBlock::iterator &ScanFrom,
	unsigned MaxInstsToScan = DefMaxInstsToScan,			unsigned MaxInstsToScan = DefMaxInstsToScan,
	AliasAnalysis *AA = nullptr,			AliasAnalysis *AA = nullptr,
	bool *IsLoadCSE = nullptr);			bool *IsLoadCSE = nullptr,
				unsigned *NumScanedInst = nullptr);
				zzhengUnsubmitted Not Done Reply Inline Actions Can we use Optional<unsigned &> NumScannedInst here? zzheng: Can we use ``` Optional<unsigned &> NumScannedInst ``` here?
				junbumlAuthorUnsubmitted Not Done Reply Inline Actions Honestly I don't have much idea about the Optional<>, but it looks okay to use it here. However, I believe we can make a separate patch to consider using the Optional for other parameters, not just for this parameter. Please let me know if we have to use the Optional specifically in this parameter unlikely other parameters. junbuml: Honestly I don't have much idea about the Optional<>, but it looks okay to use it here. However…
				rengolinUnsubmitted Not Done Reply Inline Actions I agree this sounds like something for a different patch. rengolin: I agree this sounds like something for a different patch.
				zzhengUnsubmitted Not Done Reply Inline Actions I agree with you guys. I was only looking at the new parameter and its usage and thought it's appropriate to use Optional<>. zzheng: I agree with you guys. I was only looking at the new parameter and its usage and thought it's…

	}			}

	#endif			#endif

lib/Analysis/Loads.cpp

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	llvm::DefMaxInstsToScan("available-load-scan-limit", cl::init(6), cl::Hidden,
cl::desc("Use this to specify the default maximum number of instructions "		cl::desc("Use this to specify the default maximum number of instructions "
"to scan backward from a given instruction, when searching for "		"to scan backward from a given instruction, when searching for "
"available loaded value"));		"available loaded value"));

Value llvm::FindAvailableLoadedValue(LoadInst Load,		Value llvm::FindAvailableLoadedValue(LoadInst Load,
BasicBlock *ScanBB,		BasicBlock *ScanBB,
BasicBlock::iterator &ScanFrom,		BasicBlock::iterator &ScanFrom,
unsigned MaxInstsToScan,		unsigned MaxInstsToScan,
AliasAnalysis AA, bool IsLoadCSE) {		AliasAnalysis AA, bool IsLoadCSE,
		unsigned *NumScanedInst) {
if (MaxInstsToScan == 0)		if (MaxInstsToScan == 0)
MaxInstsToScan = ~0U;		MaxInstsToScan = ~0U;

Value *Ptr = Load->getPointerOperand();		Value *Ptr = Load->getPointerOperand();
Type *AccessTy = Load->getType();		Type *AccessTy = Load->getType();

// We can never remove a volatile load		// We can never remove a volatile load
if (Load->isVolatile())		if (Load->isVolatile())
Show All 15 Lines	while (ScanFrom != ScanBB->begin()) {
// would affect codegen).		// would affect codegen).
Instruction Inst = &--ScanFrom;		Instruction Inst = &--ScanFrom;
if (isa<DbgInfoIntrinsic>(Inst))		if (isa<DbgInfoIntrinsic>(Inst))
continue;		continue;

// Restore ScanFrom to expected value in case next test succeeds		// Restore ScanFrom to expected value in case next test succeeds
ScanFrom++;		ScanFrom++;

		if (NumScanedInst)
		++(*NumScanedInst);

// Don't scan huge blocks.		// Don't scan huge blocks.
if (MaxInstsToScan-- == 0)		if (MaxInstsToScan-- == 0)
return nullptr;		return nullptr;

--ScanFrom;		--ScanFrom;
// If this is a load of Ptr, the loaded value is available.		// If this is a load of Ptr, the loaded value is available.
// (This is true even if the load is volatile or atomic, although		// (This is true even if the load is volatile or atomic, although
// those cases are unlikely.)		// those cases are unlikely.)
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

lib/Transforms/Scalar/JumpThreading.cpp

Show First 20 Lines • Show All 993 Lines • ▼ Show 20 Lines	bool JumpThreadingPass::SimplifyPartiallyRedundantLoad(LoadInst *LI) {
// block. Check to see if it is available in any of the predecessor blocks.		// block. Check to see if it is available in any of the predecessor blocks.
for (BasicBlock *PredBB : predecessors(LoadBB)) {		for (BasicBlock *PredBB : predecessors(LoadBB)) {
// If we already scanned this predecessor, skip it.		// If we already scanned this predecessor, skip it.
if (!PredsScanned.insert(PredBB).second)		if (!PredsScanned.insert(PredBB).second)
continue;		continue;

// Scan the predecessor to see if the value is available in the pred.		// Scan the predecessor to see if the value is available in the pred.
BBIt = PredBB->end();		BBIt = PredBB->end();
Value *PredAvailable = FindAvailableLoadedValue(LI, PredBB, BBIt,		unsigned NumScanedInst = 0;
DefMaxInstsToScan,		Value *PredAvailable =
nullptr,		FindAvailableLoadedValue(LI, PredBB, BBIt, DefMaxInstsToScan, nullptr,
&IsLoadCSE);		&IsLoadCSE, &NumScanedInst);

		// If PredBB has a single predecessor, continue scanning through the single
		// precessor.
		BasicBlock *CurPredBB = PredBB;
		rengolinUnsubmitted Done Reply Inline Actions Why not just transform the single access above into the loop below? Why do you need both? rengolin: Why not just transform the single access above into the loop below? Why do you need both?
		BasicBlock *SinglePredPredBB = PredBB->getSinglePredecessor();
		while (!PredAvailable && BBIt == CurPredBB->begin() && SinglePredPredBB &&
		NumScanedInst < DefMaxInstsToScan) {
		BBIt = SinglePredPredBB->end();
		PredAvailable = FindAvailableLoadedValue(
		LI, SinglePredPredBB, BBIt, (DefMaxInstsToScan - NumScanedInst),
		nullptr, &IsLoadCSE, &NumScanedInst);
		CurPredBB = SinglePredPredBB;
		SinglePredPredBB = SinglePredPredBB->getSinglePredecessor();
		}

if (!PredAvailable) {		if (!PredAvailable) {
OneUnavailablePred = PredBB;		OneUnavailablePred = PredBB;
continue;		continue;
}		}

if (IsLoadCSE)		if (IsLoadCSE)
CSELoads.push_back(cast<LoadInst>(PredAvailable));		CSELoads.push_back(cast<LoadInst>(PredAvailable));

▲ Show 20 Lines • Show All 1,010 Lines • Show Last 20 Lines

test/Transforms/JumpThreading/thread-loads.ll

	Show First 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	ret1:			ret1:
	ret void			ret void

	ret2:			ret2:
	%xxx = tail call i32 (...) @f1() nounwind			%xxx = tail call i32 (...) @f1() nounwind
	ret void			ret void
	}			}

				define i32 @fn_SinglePred(i1 %c2,i64* %P) {
				; CHECK-LABEL: @fn_SinglePred
				; CHECK-LABEL: entry:
				; CHECK: %[[L1:.]] = load i64, i64 %P
				; CHECK: br i1 %c, label %cond3, label %cond1
				; CHECK-LABEL: cond2:
				; CHECK-NOT: load
				; CHECK: %l2 = phi i64 [ %[[L1]], %cond1 ]
				rengolinUnsubmitted Done Reply Inline Actions Is this just the `phi`, or is there also a call to `fn2(%l2)`? If `c1/c3` is false and `c2` is true, then only `fn2` is called, not `fn3`, which means the new `cond3` block has to be only conditionalised via `c1` not `c2`, as in the original IR. rengolin: Is this just the `phi`, or is there also a call to `fn2(%l2)`? If `c1/c3` is false and `c2` is…
				junbumlAuthorUnsubmitted Not Done Reply Inline Actions If c1/c3 is false and c2 is true, only fn2() will be called, so in the new cond2, it unconditionally branch to %end after calling fn2(). I added CHECKs for the branch to %end in cond2 and for the call to fn2(%l2) in cond2. When %c1 is false, it directly branch to %cond3 from entry. We have this CHECK in entry. So, we only branch to cond2 from cond1 by %c1 when c1/c3 is false. junbuml: If c1/c3 is false and c2 is true, only fn2() will be called, so in the new cond2, it…
				; CHECK-LABEL: cond3:
				; CHECK: call void @fn2(i64 %l1)
				; CHECK: call void @fn3(i64 %l1)

				entry:
				%l1 = load i64, i64* %P
				%c = icmp eq i64 %l1, 0
				br i1 %c, label %cond2, label %cond1

				cond1:
				br i1 %c2, label %cond2, label %end

				cond2:
				%l2 = load i64, i64* %P
				rengolinUnsubmitted Done Reply Inline Actions Isn't there just one hop here? `%l2 -> %l1`? I thought you were testing multiple predecessors. rengolin: Isn't there just one hop here? `%l2 -> %l1`? I thought you were testing multiple predecessors.
				junbumlAuthorUnsubmitted Not Done Reply Inline Actions It's one hop from cond2 to entry, but cond2 -> cond1 -> entry is two hop. I also added another test which has three hop. junbuml: It's one hop from cond2 to entry, but cond2 -> cond1 -> entry is two hop. I also added another…
				rengolinUnsubmitted Not Done Reply Inline Actions Right, now it makes more sense. Thanks! rengolin: Right, now it makes more sense. Thanks!
				call void @fn2(i64 %l2)
				%c3 = icmp eq i64 %l2, 0
				br i1 %c3, label %cond3, label %end

				cond3:
				call void @fn3(i64 %l2)
				br label %end

				end:
				ret i32 0
				}

				declare void @fn2(i64)
				declare void @fn3(i64)


	!0 = !{!3, !3, i64 0}			!0 = !{!3, !3, i64 0}
	!1 = !{!"omnipotent char", !2}			!1 = !{!"omnipotent char", !2}
	!2 = !{!"Simple C/C++ TBAA"}			!2 = !{!"Simple C/C++ TBAA"}
	!3 = !{!"int", !1}			!3 = !{!"int", !1}
	!4 = !{ i32 0, i32 1 }			!4 = !{ i32 0, i32 1 }
	!5 = !{ i32 8, i32 10 }			!5 = !{ i32 8, i32 10 }
	!6 = !{!6}			!6 = !{!6}
	!7 = !{!7, !6}			!7 = !{!7, !6}
	!8 = !{!8, !6}			!8 = !{!8, !6}
	!9 = !{!7}			!9 = !{!7}
	!10 = !{!8}			!10 = !{!8}