This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1/1
CaptureTracking.h
-
lib/
-
Analysis/
3/4
CaptureTracking.cpp
-
Transforms/Scalar/
-
Scalar/
22/26
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/
-
Transforms/
-
DeadStoreElimination/
4/5
captures-before-load.ll

Differential D109844

[DSE] Track earliest escape, use for loads in isReadClobber.
ClosedPublic

Authored by fhahn on Sep 15 2021, 12:23 PM.

Download Raw Diff

Details

Reviewers

nikic
xbolva00
asbirlea

Commits

rG5ce89279c098: [DSE] Track earliest escape, use for loads in isReadClobber.

Summary

At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b130 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,

NewPM-O3: +0.05%
NewPM-ReleaseThinLTO: +0.05%
NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Sep 15 2021, 12:23 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 15 2021, 12:23 PM

fhahn requested review of this revision.Sep 15 2021, 12:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2021, 12:23 PM

Harbormaster completed remote builds in B124060: Diff 372771.Sep 15 2021, 12:24 PM

A note for the compile time numbers in the description: those are with also using the cache to handle the case where UseInst is a call which fixes PR50220.

rnk added a subscriber: rnk.Sep 15 2021, 1:17 PM

aeubanks added a subscriber: aeubanks.Sep 15 2021, 1:24 PM

fhahn mentioned this in D109907: [DSE] Use cached escape info for calls..Sep 16 2021, 12:09 PM

The changes to also handle calls are available in D109907

Thanks for this, changes look reasonable to me.

IIUC, there will be a +0.30% in tramp3d with this, but without the caching it would be +0.51%?

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
901	Does it make sense to use SmallDenseMap?
1271	Could you name either "MayBeCapturedBefore" or return the negated condition and rename "MayNotBeCapturedBefore"?
1277	If this can be `nullptr` (per the return condition below), shouldn't inserting into `Inst2Obj` be conditioned on it being non null?
1283	Negated condition looks more intuitive to me: return Iter.first->second && DT.properlyDominates(I, Iter.first->second) && !isPotentiallyReachable(I->getNextNode(), I, nullptr, &DT, &LI);
1762	Remove `DeadInst` from `Inst2Obj`.

Passing-by remark:
rGd1a1cce5b130630df0c821e8cafe5f683ccccb90 mentions not being cost-beneficial,
this mentions cost, but does not say anything about benefit.

Addressed comments, thanks!

In D109844#3005193, @lebedev.ri wrote:

Passing-by remark:
rGd1a1cce5b130630df0c821e8cafe5f683ccccb90 mentions not being cost-beneficial,
this mentions cost, but does not say anything about benefit.

Good point! With -O3 -flto MultiSource/SPEC2006/SPEC2017 there are a few
improvements (~200 more stores removed):

Metric: dse.NumFastStores

Program
 test-suite...urce/Applications/lua/lua.test    10.00   11.00  10.0%
 test-suite...3.xalancbmk/483.xalancbmk.test   675.00  694.00   2.8%
 test-suite...510.parest_r/510.parest_r.test   4527.00 4646.00  2.6%
 test-suite...lancbmk_r/523.xalancbmk_r.test   965.00  971.00   0.6%
 test-suite.../Benchmarks/Bullet/bullet.test   1183.00 1190.00  0.6%
 test-suite...:: External/Povray/povray.test   346.00  348.00   0.6%
 test-suite...006/453.povray/453.povray.test   540.00  543.00   0.6%
 test-suite...511.povray_r/511.povray_r.test   552.00  555.00   0.5%
 test-suite...7rate/502.gcc_r/502.gcc_r.test   1381.00 1388.00  0.5%
 test-suite...6.blender_r/526.blender_r.test   3488.00 3504.00  0.5%
 test-suite.../CINT2000/252.eon/252.eon.test   2118.00 2126.00  0.4%
 test-suite.../CINT2006/403.gcc/403.gcc.test   542.00  544.00   0.4%

Also, @rnk mentioned seeing plenty of stores not being removed in Chrome + auto-init in PR50220. Perhaps they could let us know the impact of the change + D109907. And this also fixes a regression compared to legacy DSE.

Harbormaster completed remote builds in B124397: Diff 373224.Sep 17 2021, 8:06 AM

fhahn added inline comments.Sep 17 2021, 8:08 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1271	I changed it to `notCapturedBefore` & added a doc-comment.
1277	If it is `nullptr` this means there is no capturing instruction. I added an early exit below.
1283	Updated and inverted the function name.
1762	Done, thanks!

fhahn added a subscriber: Gerolf.Sep 17 2021, 8:09 AM

fhahn mentioned this in D109978: [CaptureTracking] Allow passing LI to PointerMayBeCapturedBefore (NFC)..Sep 17 2021, 9:51 AM

asbirlea added inline comments.Sep 17 2021, 2:49 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1277	I meant adding this if condition to not call insert with a nullptr key: if(Iter.first->second) { auto Ins = Inst2Obj.insert({Iter.first->second, {}}); Ins.first->second.push_back(Object); }
1283	I believe `DT.dominates(I, Iter.first->second) && I != Iter.first->second` is congruent with `DT.properlyDominates(I, Iter.first->second)`.

Thanks for putting this together!

In D109844#3006260, @fhahn wrote:

Also, @rnk mentioned seeing plenty of stores not being removed in Chrome + auto-init in PR50220. Perhaps they could let us know the impact of the change + D109907. And this also fixes a regression compared to legacy DSE.

I probably won't be able to do any benchmarking, but if anyone is interested, you can probably run any benchmark set you like with -fauto-var-init=pattern before and after this patch and check how many extra stores are removed. I strongly suspect it will help on code patterns where initialization is separate from declaration:

struct Foo v;
...
v.x = 0;
...
escape(&v);

Even if this pattern isn't prevalent in common benchmark suites, I think it's pretty important to users of these security hardening flags.

nikic added inline comments.Sep 19 2021, 1:48 PM

llvm/include/llvm/Analysis/CaptureTracking.h
70	nit: cannot -> can
llvm/lib/Analysis/CaptureTracking.cpp
178	This looks incorrect to me. Why would reachability between two captures matter? The way I'm reading this, if you have code like this... if (...) { capture1(); } else { capture2(); } ...outside a loop, then one of the captures will be ignored and we'll compute an incorrect earliest capture (which should be before the if in this case). Possibly this check should be `DT.dominates(EarliestCapture, I)`? Though not sure it's necessary to explicitly check this at all.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1272	notCaptureBeforeOrAt maybe? Not important for the load case, but is for calls.
1278	`/**/` for booleans
1287	May be worth noting that the `dominates()` check is an approximation of `!isPotentiallyReachable`, presumably for compile-time reasons. Specifically, it misses this case, where the code-paths are disjoint, but non-dominating: if (...) { capture(); return; } I;
1321	Use `isIdentifiedFunctionLocal()` and `isEscapeSource()` here? Or else note that this is a compile-time optimization. I think we should be at least using isEscapeSource() though.

fhahn mentioned this in rG7f6a4826ac49: [CaptureTracking] Allow passing LI to PointerMayBeCapturedBefore (NFC)..Sep 20 2021, 1:18 AM

Address latest comments. This patch changes the capture tracking check to use dominance instead of reachability and the DSE check to only use reachability. This slightly increase compile-time geomeans by ~+0.01% but also increases the number of stores removed

http://llvm-compile-time-tracker.com/compare.php?from=92904cc68fbc1d000387b30accc8b05b3fe95daa&to=86d4647eca500205d5871e6faa0a55ea698f2277&stat=instructions

llvm/lib/Analysis/CaptureTracking.cpp
178	This looks incorrect to me. Yep this is not correct. I couldn't come up with a problematic test case with the earlier version of the patch due to the dominance checks in DSE. The latest version has the logic flipped now, as in dominance check here and reachability check only in DSE. Possibly this check should be DT.dominates(EarliestCapture, I)? Though not sure it's necessary to explicitly check this at all. yes, this should check for dominance. We could skip the check, but then we would fail to skip exploring branches early I think.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
901	I think we can have a substantial number of tracked escapes, but I am not sure what a good cut-off is for choosing SmallDenseMap vs DenseMap.
1277	Ok got it, thanks! Updated the code.
1283	The dominance check is gone now.
1287	yes, the original motivation was to have `dominates` as a quick proxy check to skip the more expensive `isPotentiallyReachable` check at expense of precision. But in the latest version, `isPotentiallyReachable` isn't used during capture-tracking analysis and using `isPotentiallyReachable` here directly only leads to a small increase in compile-time (geomeans increase by ~0.01%). The number of removed stores increases noticeably.
1321	Good point, I'll update it to use `isIdentifiedFunctionLocal` . It shouldn't be much more expensive in the grand scheme of things.

Harbormaster completed remote builds in B124674: Diff 373601.Sep 20 2021, 8:33 AM

fhahn mentioned this in rG963d3a22b34d: [DSE] Add additional tests to cover review comments..Sep 20 2021, 9:07 AM

fhahn mentioned this in D110094: [DSE] Use notCapturedBeforeOrAt for isInvisibleToCallerBeforeRet (WIP)..Sep 20 2021, 11:58 AM

nikic added inline comments.Sep 20 2021, 12:05 PM

llvm/lib/Analysis/CaptureTracking.cpp
178	As implemented now, I don't think the dominance check makes sense: You're calling isSafeToPrune() from captured(), which will effectively do the same check trying to determine the common dominator. With the current code structure, I'd suggest dropping it (and the whole isSafeToPrune method really). What you probably have in mind is doing the check in shouldExplore() instead. That will cut of search early, in exchange for a dominance check for each (non-capturing) use. In the CapturesBefore tracking, not checking reachability for each use was a major compile-time improvement, but possibly the dominance check is cheap enough that skipping extra use exploration is a better tradeoff. If you do want to move it into shouldExplore(), I think you'll need to drop your consistency assertions with PointerMayBeCapturedBefore(), as they could disagree in edge-cases, where you would not hit the use limit because some are discarded early.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1321	What about using isEscapeSource() in place of the `isa<LoadInst>` check?

Drop isSafeToPrune.

fhahn added inline comments.Sep 20 2021, 12:17 PM

llvm/lib/Analysis/CaptureTracking.cpp
178	Right, I removed `isSafeToPrune` alltogether. I guess we can look into using the dominance check in `shouldExplore` as potential followup
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1321	I can take look at that, but I'll need to make it accessible first. Happy to do that here in this patch or as follow up/

Harbormaster completed remote builds in B124731: Diff 373684.Sep 20 2021, 12:50 PM

Rebased on top of recent DSE changes. I think all comments should be addressed now :)

Harbormaster completed remote builds in B125102: Diff 374204.Sep 22 2021, 6:43 AM

LGTM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1292	It might be worthwhile to add a block reachability cache in the future (possibly that would help with D110094?)
1321	A followup is fine with me.
llvm/test/Transforms/DeadStoreElimination/captures-before-load.ll
127	To check my understanding, it would be fine to optimize this case right? That is, what we actually care about is not an escape before the load (`%in.lv.2`), but before the instruction producing the loaded value (`%in.lv.1`). That is, our context instruction could be `ReadUO` rather than `UseInst`. If that's correct, then we might be able to sink this new functionality into AA proper (e.g. via a BatchAA flag for more expensive capture analysis). I previously thought this wasn't possible because alias() does not have a context instruction, but with this we don't actually need one.
134	Isn't this test case the same as `test_captured_and_clobbered_before_load_same_bb_1`?

This revision is now accepted and ready to land.Sep 22 2021, 10:38 AM

fhahn marked 3 inline comments as done.Sep 23 2021, 4:44 AM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1292	Sounds good!
llvm/test/Transforms/DeadStoreElimination/captures-before-load.ll
127	We should be able to use `ReadUO` , once we loaded the object the escapes cannot change it. then we might be able to sink this new functionality into AA proper (e.g. via a BatchAA flag for more expensive capture analysis) Sounds good. Do you think it would be worth to also move the caching of the escapes to BatchAA? Are there any users of BatchAA you would be interested in (other than DSE)?
134	yes, it should use `escape_writeonly` instead of `escape_and_clobber` and call it before the first load. I'll fix that in the committed version.

This revision was landed with ongoing or failed builds.Sep 23 2021, 4:51 AM

Closed by commit rG5ce89279c098: [DSE] Track earliest escape, use for loads in isReadClobber. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn marked 3 inline comments as done.

fhahn added a commit: rG5ce89279c098: [DSE] Track earliest escape, use for loads in isReadClobber..

nikic added inline comments.Sep 23 2021, 1:03 PM

llvm/test/Transforms/DeadStoreElimination/captures-before-load.ll
127	A quick prototype for moving to AA: https://github.com/llvm/llvm-project/commit/8e3a5699f206ae36c513a01f953d758ee93e7f54 This has some compile-time impact, not sure how much is just from the dropped LoopInfo: https://llvm-compile-time-tracker.com/compare.php?from=1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff&to=8e3a5699f206ae36c513a01f953d758ee93e7f54&stat=instructions (Note that this implicitly enables the call capture part of the change.) I think we need some different way to expose the capture cache as a dependency though. I don't really like shoving more datastructures directly in AAQueryInfo, especially conditionally used ones. It would be nicer to pass in some CaptureInfo object that can then be backed either by a simple or by a reachability based implementation.

fhahn added inline comments.Sep 23 2021, 2:10 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1292	It looks like caching reachability for a pair of blocks helps a little bit, but not too much. It does soften the block of D110094 a bit more, but it only increases the total number of stores removed by 0.04% on my test set (SPEC2006/SPEC2017/MultiSource) Interestingly, it increases compile-time for `consumer-typeset` and `kimwitu++`. Perhaps there's a better way to cache it. http://llvm-compile-time-tracker.com/compare.php?from=a4964cf36bb0f777d82d6a1025435bbd9f67bc4f&to=3d62e94919f0786904633c21cbd66866e1e06ef6&stat=instructions https://github.com/llvm/llvm-project/commit/3d62e94919f0786904633c21cbd66866e1e06ef6

nikic mentioned this in D110368: [AA] Move earliest escape tracking from DSE to AA.Sep 23 2021, 2:28 PM

FYI, we're seeing compiler crashes that look like they're likely from this change: https://bugs.chromium.org/p/chromium/issues/detail?id=1252762

I haven't been able to make a reduced repro case yet.

thakis added a reverting change: rGdf56fc6ebbee: Revert "[DSE] Track earliest escape, use for loads in isReadClobber.".Sep 24 2021, 6:58 AM

Reduced repro: https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6

Reverted in df56fc6ebbe for now.

In D109844#3020675, @thakis wrote:

Reduced repro: https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6

Reverted in df56fc6ebbe for now.

Thanks for the report! The issue was that the earliest escape was a ptrtoint which gets removed later. The code to invalidate the cached earliest escape only considered removed memory instructions. I recommitted the patch in 6f28fb708149 with a fix to check whether we need to invalidate the cache for any deleted instruction.

nikic mentioned this in rGba664d906644: [AA] Move earliest escape tracking from DSE to AA.Sep 25 2021, 1:47 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

CaptureTracking.h

14 lines

lib/

Analysis/

CaptureTracking.cpp

76 lines

Transforms/

Scalar/

DeadStoreElimination.cpp

57 lines

test/

Transforms/

DeadStoreElimination/

captures-before-load.ll

65 lines

Diff 374508

llvm/include/llvm/Analysis/CaptureTracking.h

Show All 17 Lines
namespace llvm {		namespace llvm {

class Value;		class Value;
class Use;		class Use;
class DataLayout;		class DataLayout;
class Instruction;		class Instruction;
class DominatorTree;		class DominatorTree;
class LoopInfo;		class LoopInfo;
		class Function;

/// getDefaultMaxUsesToExploreForCaptureTracking - Return default value of		/// getDefaultMaxUsesToExploreForCaptureTracking - Return default value of
/// the maximal number of uses to explore before giving up. It is used by		/// the maximal number of uses to explore before giving up. It is used by
/// PointerMayBeCaptured family analysis.		/// PointerMayBeCaptured family analysis.
unsigned getDefaultMaxUsesToExploreForCaptureTracking();		unsigned getDefaultMaxUsesToExploreForCaptureTracking();

/// PointerMayBeCaptured - Return true if this pointer value may be captured		/// PointerMayBeCaptured - Return true if this pointer value may be captured
/// by the enclosing function (which is required to exist). This routine can		/// by the enclosing function (which is required to exist). This routine can
Show All 24 Lines	namespace llvm {
/// is zero, a default value is assumed.		/// is zero, a default value is assumed.
bool PointerMayBeCapturedBefore(const Value *V, bool ReturnCaptures,		bool PointerMayBeCapturedBefore(const Value *V, bool ReturnCaptures,
bool StoreCaptures, const Instruction *I,		bool StoreCaptures, const Instruction *I,
const DominatorTree *DT,		const DominatorTree *DT,
bool IncludeI = false,		bool IncludeI = false,
unsigned MaxUsesToExplore = 0,		unsigned MaxUsesToExplore = 0,
const LoopInfo *LI = nullptr);		const LoopInfo *LI = nullptr);

		// Returns the 'earliest' instruction that captures \p V in \F. An instruction
		// A is considered earlier than instruction B, if A dominates B. If 2 escapes
		// do not dominate each other, the terminator of the common dominator is
		// chosen. If not all uses can be analyzed, the earliest escape is set to
		nikicUnsubmitted Done Reply Inline Actions nit: cannot -> can nikic: nit: cannot -> can
		// the first instruction in the function entry block. If \p V does not escape,
		// nullptr is returned. Note that the caller of the function has to ensure
		// that the instruction the result value is compared against is not in a
		// cycle.
		Instruction FindEarliestCapture(const Value V, Function &F,
		bool ReturnCaptures, bool StoreCaptures,
		const DominatorTree &DT,
		unsigned MaxUsesToExplore = 0);

/// This callback is used in conjunction with PointerMayBeCaptured. In		/// This callback is used in conjunction with PointerMayBeCaptured. In
/// addition to the interface here, you'll need to provide your own getters		/// addition to the interface here, you'll need to provide your own getters
/// to see whether anything was captured.		/// to see whether anything was captured.
struct CaptureTracker {		struct CaptureTracker {
virtual ~CaptureTracker();		virtual ~CaptureTracker();

/// tooManyUses - The depth of traversal has breached a limit. There may be		/// tooManyUses - The depth of traversal has breached a limit. There may be
/// capturing instructions that will not be passed into captured().		/// capturing instructions that will not be passed into captured().
Show All 37 Lines

llvm/lib/Analysis/CaptureTracking.cpp

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	struct CapturesBefore : public CaptureTracker {

bool ReturnCaptures;		bool ReturnCaptures;
bool IncludeI;		bool IncludeI;

bool Captured;		bool Captured;

const LoopInfo *LI;		const LoopInfo *LI;
};		};

		/// Find the 'earliest' instruction before which the pointer is known not to
		/// be captured. Here an instruction A is considered earlier than instruction
		/// B, if A dominates B. If 2 escapes do not dominate each other, the
		/// terminator of the common dominator is chosen. If not all uses cannot be
		/// analyzed, the earliest escape is set to the first instruction in the
		/// function entry block.
		// NOTE: Users have to make sure instructions compared against the earliest
		// escape are not in a cycle.
		struct EarliestCaptures : public CaptureTracker {

		EarliestCaptures(bool ReturnCaptures, Function &F, const DominatorTree &DT)
		: DT(DT), ReturnCaptures(ReturnCaptures), Captured(false), F(F) {}

		void tooManyUses() override {
		Captured = true;
		EarliestCapture = &*F.getEntryBlock().begin();
		}

		bool captured(const Use *U) override {
		Instruction *I = cast<Instruction>(U->getUser());
		if (isa<ReturnInst>(I) && !ReturnCaptures)
		return false;

		if (!EarliestCapture) {
		EarliestCapture = I;
		} else if (EarliestCapture->getParent() == I->getParent()) {
		if (I->comesBefore(EarliestCapture))
		EarliestCapture = I;
		} else {
		BasicBlock *CurrentBB = I->getParent();
		BasicBlock *EarliestBB = EarliestCapture->getParent();
		if (DT.dominates(EarliestBB, CurrentBB)) {
		nikicUnsubmitted Done Reply Inline Actions This looks incorrect to me. Why would reachability between two captures matter? The way I'm reading this, if you have code like this... if (...) { capture1(); } else { capture2(); } ...outside a loop, then one of the captures will be ignored and we'll compute an incorrect earliest capture (which should be before the if in this case). Possibly this check should be `DT.dominates(EarliestCapture, I)`? Though not sure it's necessary to explicitly check this at all. nikic: This looks incorrect to me. Why would reachability between two captures matter? The way I'm…
		fhahnAuthorUnsubmitted Done Reply Inline Actions This looks incorrect to me. Yep this is not correct. I couldn't come up with a problematic test case with the earlier version of the patch due to the dominance checks in DSE. The latest version has the logic flipped now, as in dominance check here and reachability check only in DSE. Possibly this check should be DT.dominates(EarliestCapture, I)? Though not sure it's necessary to explicitly check this at all. yes, this should check for dominance. We could skip the check, but then we would fail to skip exploring branches early I think. fhahn: > This looks incorrect to me. Yep this is not correct. I couldn't come up with a problematic…
		nikicUnsubmitted Not Done Reply Inline Actions As implemented now, I don't think the dominance check makes sense: You're calling isSafeToPrune() from captured(), which will effectively do the same check trying to determine the common dominator. With the current code structure, I'd suggest dropping it (and the whole isSafeToPrune method really). What you probably have in mind is doing the check in shouldExplore() instead. That will cut of search early, in exchange for a dominance check for each (non-capturing) use. In the CapturesBefore tracking, not checking reachability for each use was a major compile-time improvement, but possibly the dominance check is cheap enough that skipping extra use exploration is a better tradeoff. If you do want to move it into shouldExplore(), I think you'll need to drop your consistency assertions with PointerMayBeCapturedBefore(), as they could disagree in edge-cases, where you would not hit the use limit because some are discarded early. nikic: As implemented now, I don't think the dominance check makes sense: You're calling isSafeToPrune…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Right, I removed `isSafeToPrune` alltogether. I guess we can look into using the dominance check in `shouldExplore` as potential followup fhahn: Right, I removed `isSafeToPrune` alltogether. I guess we can look into using the dominance…
		// EarliestCapture already comes before the current use.
		} else if (DT.dominates(CurrentBB, EarliestBB)) {
		EarliestCapture = I;
		} else {
		// Otherwise find the nearest common dominator and use its terminator.
		auto *NearestCommonDom =
		DT.findNearestCommonDominator(CurrentBB, EarliestBB);
		EarliestCapture = NearestCommonDom->getTerminator();
		}
		}
		Captured = true;

		// Return false to continue analysis; we need to see all potential
		// captures.
		return false;
		}

		Instruction *EarliestCapture = nullptr;

		const DominatorTree &DT;

		bool ReturnCaptures;

		bool Captured;

		Function &F;
		};
}		}

/// PointerMayBeCaptured - Return true if this pointer value may be captured		/// PointerMayBeCaptured - Return true if this pointer value may be captured
/// by the enclosing function (which is required to exist). This routine can		/// by the enclosing function (which is required to exist). This routine can
/// be expensive, so consider caching the results. The boolean ReturnCaptures		/// be expensive, so consider caching the results. The boolean ReturnCaptures
/// specifies whether returning the value (or part of it) from the function		/// specifies whether returning the value (or part of it) from the function
/// counts as capturing it or not. The boolean StoreCaptures specified whether		/// counts as capturing it or not. The boolean StoreCaptures specified whether
/// storing the value (or part of it) into memory anywhere automatically		/// storing the value (or part of it) into memory anywhere automatically
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	bool llvm::PointerMayBeCapturedBefore(const Value *V, bool ReturnCaptures,
PointerMayBeCaptured(V, &CB, MaxUsesToExplore);		PointerMayBeCaptured(V, &CB, MaxUsesToExplore);
if (CB.Captured)		if (CB.Captured)
++NumCapturedBefore;		++NumCapturedBefore;
else		else
++NumNotCapturedBefore;		++NumNotCapturedBefore;
return CB.Captured;		return CB.Captured;
}		}

		Instruction llvm::FindEarliestCapture(const Value V, Function &F,
		bool ReturnCaptures, bool StoreCaptures,
		const DominatorTree &DT,
		unsigned MaxUsesToExplore) {
		assert(!isa<GlobalValue>(V) &&
		"It doesn't make sense to ask whether a global is captured.");

		EarliestCaptures CB(ReturnCaptures, F, DT);
		PointerMayBeCaptured(V, &CB, MaxUsesToExplore);
		if (CB.Captured)
		++NumCapturedBefore;
		else
		++NumNotCapturedBefore;
		return CB.EarliestCapture;
		}

void llvm::PointerMayBeCaptured(const Value V, CaptureTracker Tracker,		void llvm::PointerMayBeCaptured(const Value V, CaptureTracker Tracker,
unsigned MaxUsesToExplore) {		unsigned MaxUsesToExplore) {
assert(V->getType()->isPointerTy() && "Capture is for pointers only!");		assert(V->getType()->isPointerTy() && "Capture is for pointers only!");
if (MaxUsesToExplore == 0)		if (MaxUsesToExplore == 0)
MaxUsesToExplore = DefaultMaxUsesToExplore;		MaxUsesToExplore = DefaultMaxUsesToExplore;

SmallVector<const Use *, 20> Worklist;		SmallVector<const Use *, 20> Worklist;
Worklist.reserve(getDefaultMaxUsesToExploreForCaptureTracking());		Worklist.reserve(getDefaultMaxUsesToExploreForCaptureTracking());
▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 32 Lines
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/MustExecute.h"		#include "llvm/Analysis/MustExecute.h"
▲ Show 20 Lines • Show All 843 Lines • ▼ Show 20 Lines	struct DSEState {
// Post-order numbers for each basic block. Used to figure out if memory		// Post-order numbers for each basic block. Used to figure out if memory
// accesses are executed before another access.		// accesses are executed before another access.
DenseMap<BasicBlock *, unsigned> PostOrderNumbers;		DenseMap<BasicBlock *, unsigned> PostOrderNumbers;

/// Keep track of instructions (partly) overlapping with killing MemoryDefs per		/// Keep track of instructions (partly) overlapping with killing MemoryDefs per
/// basic block.		/// basic block.
DenseMap<BasicBlock *, InstOverlapIntervalsTy> IOLs;		DenseMap<BasicBlock *, InstOverlapIntervalsTy> IOLs;

		DenseMap<const Value , Instruction > EarliestEscapes;
		asbirleaUnsubmitted Done Reply Inline Actions Does it make sense to use SmallDenseMap? asbirlea: Does it make sense to use SmallDenseMap?
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think we can have a substantial number of tracked escapes, but I am not sure what a good cut-off is for choosing SmallDenseMap vs DenseMap. fhahn: I think we can have a substantial number of tracked escapes, but I am not sure what a good cut…
		DenseMap<Instruction , TinyPtrVector<const Value >> Inst2Obj;

DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,		DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,
PostDominatorTree &PDT, const TargetLibraryInfo &TLI,		PostDominatorTree &PDT, const TargetLibraryInfo &TLI,
const LoopInfo &LI)		const LoopInfo &LI)
: F(F), AA(AA), BatchAA(AA), MSSA(MSSA), DT(DT), PDT(PDT), TLI(TLI),		: F(F), AA(AA), BatchAA(AA), MSSA(MSSA), DT(DT), PDT(PDT), TLI(TLI),
DL(F.getParent()->getDataLayout()), LI(LI) {}		DL(F.getParent()->getDataLayout()), LI(LI) {}

static DSEState get(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,		static DSEState get(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,
DominatorTree &DT, PostDominatorTree &PDT,		DominatorTree &DT, PostDominatorTree &PDT,
▲ Show 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	if (MaybeTermLoc->second) {
return BatchAA.isMustAlias(TermLoc.Ptr, LocUO);		return BatchAA.isMustAlias(TermLoc.Ptr, LocUO);
}		}
int64_t InstWriteOffset = 0;		int64_t InstWriteOffset = 0;
int64_t DepWriteOffset = 0;		int64_t DepWriteOffset = 0;
return isOverwrite(MaybeTerm, AccessI, TermLoc, Loc, InstWriteOffset,		return isOverwrite(MaybeTerm, AccessI, TermLoc, Loc, InstWriteOffset,
DepWriteOffset) == OW_Complete;		DepWriteOffset) == OW_Complete;
}		}

		/// Returns true if \p Object is not captured before or by \p I.
		asbirleaUnsubmitted Done Reply Inline Actions Could you name either "MayBeCapturedBefore" or return the negated condition and rename "MayNotBeCapturedBefore"? asbirlea: Could you name either "MayBeCapturedBefore" or return the negated condition and rename…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I changed it to `notCapturedBefore` & added a doc-comment. fhahn: I changed it to `notCapturedBefore` & added a doc-comment.
		bool notCapturedBeforeOrAt(const Value Object, Instruction I) {
		nikicUnsubmitted Done Reply Inline Actions notCaptureBeforeOrAt maybe? Not important for the load case, but is for calls. nikic: notCaptureBeforeOrAt maybe? Not important for the load case, but is for calls.
		if (!isIdentifiedFunctionLocal(Object))
		return false;

		auto Iter = EarliestEscapes.insert({Object, nullptr});
		if (Iter.second) {
		asbirleaUnsubmitted Done Reply Inline Actions If this can be `nullptr` (per the return condition below), shouldn't inserting into `Inst2Obj` be conditioned on it being non null? asbirlea: If this can be `nullptr` (per the return condition below), shouldn't inserting into `Inst2Obj`…
		fhahnAuthorUnsubmitted Done Reply Inline Actions If it is `nullptr` this means there is no capturing instruction. I added an early exit below. fhahn: If it is `nullptr` this means there is no capturing instruction. I added an early exit below.
		asbirleaUnsubmitted Done Reply Inline Actions I meant adding this if condition to not call insert with a nullptr key: if(Iter.first->second) { auto Ins = Inst2Obj.insert({Iter.first->second, {}}); Ins.first->second.push_back(Object); } asbirlea: I meant adding this if condition to not call insert with a nullptr key: ``` if(Iter.first…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Ok got it, thanks! Updated the code. fhahn: Ok got it, thanks! Updated the code.
		Instruction *EarliestCapture = FindEarliestCapture(
		nikicUnsubmitted Done Reply Inline Actions `//` for booleans nikic: `//` for booleans
		Object, F, /ReturnCaptures=/false, /StoreCaptures=/true, DT);
		if (EarliestCapture) {
		auto Ins = Inst2Obj.insert({EarliestCapture, {}});
		Ins.first->second.push_back(Object);
		}
		asbirleaUnsubmitted Done Reply Inline Actions Negated condition looks more intuitive to me: return Iter.first->second && DT.properlyDominates(I, Iter.first->second) && !isPotentiallyReachable(I->getNextNode(), I, nullptr, &DT, &LI); asbirlea: Negated condition looks more intuitive to me: ``` return Iter.first->second && DT.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Updated and inverted the function name. fhahn: Updated and inverted the function name.
		asbirleaUnsubmitted Done Reply Inline Actions I believe `DT.dominates(I, Iter.first->second) && I != Iter.first->second` is congruent with `DT.properlyDominates(I, Iter.first->second)`. asbirlea: I believe `DT.dominates(I, Iter.first->second) && I != Iter.first->second` is congruent with…
		fhahnAuthorUnsubmitted Done Reply Inline Actions The dominance check is gone now. fhahn: The dominance check is gone now.
		Iter.first->second = EarliestCapture;
		}

		// No capturing instruction.
		nikicUnsubmitted Not Done Reply Inline Actions May be worth noting that the `dominates()` check is an approximation of `!isPotentiallyReachable`, presumably for compile-time reasons. Specifically, it misses this case, where the code-paths are disjoint, but non-dominating: if (...) { capture(); return; } I; nikic: May be worth noting that the `dominates()` check is an approximation of `!
		fhahnAuthorUnsubmitted Done Reply Inline Actions yes, the original motivation was to have `dominates` as a quick proxy check to skip the more expensive `isPotentiallyReachable` check at expense of precision. But in the latest version, `isPotentiallyReachable` isn't used during capture-tracking analysis and using `isPotentiallyReachable` here directly only leads to a small increase in compile-time (geomeans increase by ~0.01%). The number of removed stores increases noticeably. fhahn: yes, the original motivation was to have `dominates` as a quick proxy check to skip the more…
		if (!Iter.first->second)
		return true;

		return I != Iter.first->second &&
		!isPotentiallyReachable(Iter.first->second, I, nullptr, &DT, &LI);
		nikicUnsubmitted Done Reply Inline Actions It might be worthwhile to add a block reachability cache in the future (possibly that would help with D110094?) nikic: It might be worthwhile to add a block reachability cache in the future (possibly that would…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Sounds good! fhahn: Sounds good!
		fhahnAuthorUnsubmitted Done Reply Inline Actions It looks like caching reachability for a pair of blocks helps a little bit, but not too much. It does soften the block of D110094 a bit more, but it only increases the total number of stores removed by 0.04% on my test set (SPEC2006/SPEC2017/MultiSource) Interestingly, it increases compile-time for `consumer-typeset` and `kimwitu++`. Perhaps there's a better way to cache it. http://llvm-compile-time-tracker.com/compare.php?from=a4964cf36bb0f777d82d6a1025435bbd9f67bc4f&to=3d62e94919f0786904633c21cbd66866e1e06ef6&stat=instructions https://github.com/llvm/llvm-project/commit/3d62e94919f0786904633c21cbd66866e1e06ef6 fhahn: It looks like caching reachability for a pair of blocks helps a little bit, but not too much.
		}

// Returns true if \p Use may read from \p DefLoc.		// Returns true if \p Use may read from \p DefLoc.
bool isReadClobber(const MemoryLocation &DefLoc, Instruction *UseInst) {		bool isReadClobber(const MemoryLocation &DefLoc, Instruction *UseInst) {
if (isNoopIntrinsic(UseInst))		if (isNoopIntrinsic(UseInst))
return false;		return false;

// Monotonic or weaker atomic stores can be re-ordered and do not need to be		// Monotonic or weaker atomic stores can be re-ordered and do not need to be
// treated as read clobber.		// treated as read clobber.
if (auto SI = dyn_cast<StoreInst>(UseInst))		if (auto SI = dyn_cast<StoreInst>(UseInst))
return isStrongerThan(SI->getOrdering(), AtomicOrdering::Monotonic);		return isStrongerThan(SI->getOrdering(), AtomicOrdering::Monotonic);

if (!UseInst->mayReadFromMemory())		if (!UseInst->mayReadFromMemory())
return false;		return false;

if (auto *CB = dyn_cast<CallBase>(UseInst))		if (auto *CB = dyn_cast<CallBase>(UseInst))
if (CB->onlyAccessesInaccessibleMemory())		if (CB->onlyAccessesInaccessibleMemory())
return false;		return false;

		// BasicAA does not spend linear time to check whether local objects escape
		// before potentially aliasing accesses. To improve DSE results, compute and
		// cache escape info for local objects in certain circumstances.
		if (auto *LI = dyn_cast<LoadInst>(UseInst)) {
		// If the loads reads from a loaded underlying object accesses the load
		// cannot alias DefLoc, if DefUO is a local object that has not escaped
		// before the load.
		auto *ReadUO = getUnderlyingObject(LI->getPointerOperand());
		auto *DefUO = getUnderlyingObject(DefLoc.Ptr);
		if (DefUO && ReadUO && isa<LoadInst>(ReadUO) &&
		nikicUnsubmitted Not Done Reply Inline Actions Use `isIdentifiedFunctionLocal()` and `isEscapeSource()` here? Or else note that this is a compile-time optimization. I think we should be at least using isEscapeSource() though. nikic: Use `isIdentifiedFunctionLocal()` and `isEscapeSource()` here? Or else note that this is a…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Good point, I'll update it to use `isIdentifiedFunctionLocal` . It shouldn't be much more expensive in the grand scheme of things. fhahn: Good point, I'll update it to use `isIdentifiedFunctionLocal `. It shouldn't be much more…
		nikicUnsubmitted Not Done Reply Inline Actions What about using isEscapeSource() in place of the `isa<LoadInst>` check? nikic: What about using isEscapeSource() in place of the `isa<LoadInst>` check?
		fhahnAuthorUnsubmitted Done Reply Inline Actions I can take look at that, but I'll need to make it accessible first. Happy to do that here in this patch or as follow up/ fhahn: I can take look at that, but I'll need to make it accessible first. Happy to do that here in…
		nikicUnsubmitted Not Done Reply Inline Actions A followup is fine with me. nikic: A followup is fine with me.
		notCapturedBeforeOrAt(DefUO, UseInst)) {
		assert(
		!PointerMayBeCapturedBefore(DefLoc.Ptr, false, true, UseInst, &DT,
		false, 0, &this->LI) &&
		"cached analysis disagrees with fresh PointerMayBeCapturedBefore");
		return false;
		}
		}

// NOTE: For calls, the number of stores removed could be slightly improved		// NOTE: For calls, the number of stores removed could be slightly improved
// by using AA.callCapturesBefore(UseInst, DefLoc, &DT), but that showed to		// by using AA.callCapturesBefore(UseInst, DefLoc, &DT), but that showed to
// be expensive compared to the benefits in practice. For now, avoid more		// be expensive compared to the benefits in practice. For now, avoid more
// expensive analysis to limit compile-time.		// expensive analysis to limit compile-time.
return isRefSet(BatchAA.getModRefInfo(UseInst, DefLoc));		return isRefSet(BatchAA.getModRefInfo(UseInst, DefLoc));
}		}

/// Returns true if a dependency between \p Current and \p KillingDef is		/// Returns true if a dependency between \p Current and \p KillingDef is
▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines	while (!NowDeadInsts.empty()) {
// Try to preserve debug information attached to the dead instruction.		// Try to preserve debug information attached to the dead instruction.
salvageDebugInfo(*DeadInst);		salvageDebugInfo(*DeadInst);
salvageKnowledge(DeadInst);		salvageKnowledge(DeadInst);

// Remove the Instruction from MSSA.		// Remove the Instruction from MSSA.
if (MemoryAccess *MA = MSSA.getMemoryAccess(DeadInst)) {		if (MemoryAccess *MA = MSSA.getMemoryAccess(DeadInst)) {
if (MemoryDef *MD = dyn_cast<MemoryDef>(MA)) {		if (MemoryDef *MD = dyn_cast<MemoryDef>(MA)) {
SkipStores.insert(MD);		SkipStores.insert(MD);

		// Clear any cached escape info for objects associated with the
		// removed instructions.
		auto Iter = Inst2Obj.find(DeadInst);
		if (Iter != Inst2Obj.end()) {
		for (const Value *Obj : Iter->second)
		EarliestEscapes.erase(Obj);
		asbirleaUnsubmitted Done Reply Inline Actions Remove `DeadInst` from `Inst2Obj`. asbirlea: Remove `DeadInst` from `Inst2Obj`.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Done, thanks! fhahn: Done, thanks!
		Inst2Obj.erase(DeadInst);
}		}
		}

Updater.removeMemoryAccess(MA);		Updater.removeMemoryAccess(MA);
}		}

auto I = IOLs.find(DeadInst->getParent());		auto I = IOLs.find(DeadInst->getParent());
if (I != IOLs.end())		if (I != IOLs.end())
I->second.erase(DeadInst);		I->second.erase(DeadInst);
// Remove its operands		// Remove its operands
for (Use &O : DeadInst->operands())		for (Use &O : DeadInst->operands())
▲ Show 20 Lines • Show All 483 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/captures-before-load.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -passes='dse' -S %s \| FileCheck %s		; RUN: opt -passes='dse' -S %s \| FileCheck %s

declare void @escape_and_clobber(i32*)		declare void @escape_and_clobber(i32*)
declare void @escape_writeonly(i32*) writeonly		declare void @escape_writeonly(i32*) writeonly
declare void @clobber()		declare void @clobber()

define i32 @test_not_captured_before_load_same_bb(i32** %in.ptr) {		define i32 @test_not_captured_before_load_same_bb(i32** %in.ptr) {
; CHECK-LABEL: @test_not_captured_before_load_same_bb(		; CHECK-LABEL: @test_not_captured_before_load_same_bb(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
; CHECK-NEXT: ret i32 [[IN_LV_2]]		; CHECK-NEXT: ret i32 [[IN_LV_2]]
;		;
%a = alloca i32, align 4		%a = alloca i32, align 4
store i32 55, i32* %a		store i32 55, i32* %a
%in.lv.1 = load i32* , i32** %in.ptr, align 2		%in.lv.1 = load i32* , i32** %in.ptr, align 2
%in.lv.2 = load i32 , i32* %in.lv.1, align 2		%in.lv.2 = load i32 , i32* %in.lv.1, align 2
store i32 99, i32* %a, align 4		store i32 99, i32* %a, align 4
call void @escape_and_clobber(i32* %a)		call void @escape_and_clobber(i32* %a)
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_not_captured_before_load_same_bb_escape_unreachable_block(i32** %in.ptr) {		define i32 @test_not_captured_before_load_same_bb_escape_unreachable_block(i32** %in.ptr) {
; CHECK-LABEL: @test_not_captured_before_load_same_bb_escape_unreachable_block(		; CHECK-LABEL: @test_not_captured_before_load_same_bb_escape_unreachable_block(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
; CHECK-NEXT: ret i32 [[IN_LV_2]]		; CHECK-NEXT: ret i32 [[IN_LV_2]]
; CHECK: unreach:		; CHECK: unreach:
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
Show All 30 Lines	;
store i32 99, i32* %a, align 4		store i32 99, i32* %a, align 4
call void @clobber()		call void @clobber()
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_captured_after_load_same_bb_2_clobbered_later(i32** %in.ptr) {		define i32 @test_captured_after_load_same_bb_2_clobbered_later(i32** %in.ptr) {
; CHECK-LABEL: @test_captured_after_load_same_bb_2_clobbered_later(		; CHECK-LABEL: @test_captured_after_load_same_bb_2_clobbered_later(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])		; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: call void @clobber()		; CHECK-NEXT: call void @clobber()
; CHECK-NEXT: ret i32 [[IN_LV_2]]		; CHECK-NEXT: ret i32 [[IN_LV_2]]
;		;
%a = alloca i32, align 4		%a = alloca i32, align 4
Show All 36 Lines
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: call void @clobber()		; CHECK-NEXT: call void @clobber()
; CHECK-NEXT: ret i32 [[IN_LV_2]]		; CHECK-NEXT: ret i32 [[IN_LV_2]]
;		;
%a = alloca i32, align 4		%a = alloca i32, align 4
store i32 55, i32* %a		store i32 55, i32* %a
%in.lv.1 = load i32* , i32** %in.ptr, align 2		%in.lv.1 = load i32* , i32** %in.ptr, align 2
call void @escape_writeonly(i32* %a)		call void @escape_writeonly(i32* %a)
		nikicUnsubmitted Done Reply Inline Actions To check my understanding, it would be fine to optimize this case right? That is, what we actually care about is not an escape before the load (`%in.lv.2`), but before the instruction producing the loaded value (`%in.lv.1`). That is, our context instruction could be `ReadUO` rather than `UseInst`. If that's correct, then we might be able to sink this new functionality into AA proper (e.g. via a BatchAA flag for more expensive capture analysis). I previously thought this wasn't possible because alias() does not have a context instruction, but with this we don't actually need one. nikic: To check my understanding, it would be fine to optimize this case right? That is, what we…
		fhahnAuthorUnsubmitted Done Reply Inline Actions We should be able to use `ReadUO` , once we loaded the object the escapes cannot change it. then we might be able to sink this new functionality into AA proper (e.g. via a BatchAA flag for more expensive capture analysis) Sounds good. Do you think it would be worth to also move the caching of the escapes to BatchAA? Are there any users of BatchAA you would be interested in (other than DSE)? fhahn: We should be able to use `ReadUO` , once we loaded the object the escapes cannot change it. >…
		nikicUnsubmitted Not Done Reply Inline Actions A quick prototype for moving to AA: https://github.com/llvm/llvm-project/commit/8e3a5699f206ae36c513a01f953d758ee93e7f54 This has some compile-time impact, not sure how much is just from the dropped LoopInfo: https://llvm-compile-time-tracker.com/compare.php?from=1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff&to=8e3a5699f206ae36c513a01f953d758ee93e7f54&stat=instructions (Note that this implicitly enables the call capture part of the change.) I think we need some different way to expose the capture cache as a dependency though. I don't really like shoving more datastructures directly in AAQueryInfo, especially conditionally used ones. It would be nicer to pass in some CaptureInfo object that can then be backed either by a simple or by a reachability based implementation. nikic: A quick prototype for moving to AA: https://github.com/llvm/llvm…
%in.lv.2 = load i32 , i32* %in.lv.1, align 2		%in.lv.2 = load i32 , i32* %in.lv.1, align 2
store i32 99, i32* %a, align 4		store i32 99, i32* %a, align 4
call void @clobber()		call void @clobber()
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_captured_before_load_same_bb_2(i32** %in.ptr) {		define i32 @test_captured_before_load_same_bb_2(i32** %in.ptr) {
		nikicUnsubmitted Done Reply Inline Actions Isn't this test case the same as `test_captured_and_clobbered_before_load_same_bb_1`? nikic: Isn't this test case the same as `test_captured_and_clobbered_before_load_same_bb_1`?
		fhahnAuthorUnsubmitted Done Reply Inline Actions yes, it should use `escape_writeonly` instead of `escape_and_clobber` and call it before the first load. I'll fix that in the committed version. fhahn: yes, it should use `escape_writeonly` instead of `escape_and_clobber` and call it before the…
; CHECK-LABEL: @test_captured_before_load_same_bb_2(		; CHECK-LABEL: @test_captured_before_load_same_bb_2(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4		; CHECK-NEXT: store i32 55, i32* [[A]], align 4
		; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: call void @clobber()		; CHECK-NEXT: call void @clobber()
; CHECK-NEXT: ret i32 [[IN_LV_2]]		; CHECK-NEXT: ret i32 [[IN_LV_2]]
;		;
%a = alloca i32, align 4		%a = alloca i32, align 4
store i32 55, i32* %a		store i32 55, i32* %a
		call void @escape_writeonly(i32* %a)
%in.lv.1 = load i32* , i32** %in.ptr, align 2		%in.lv.1 = load i32* , i32** %in.ptr, align 2
call void @escape_and_clobber(i32* %a)
%in.lv.2 = load i32 , i32* %in.lv.1, align 2		%in.lv.2 = load i32 , i32* %in.lv.1, align 2
store i32 99, i32* %a, align 4		store i32 99, i32* %a, align 4
call void @clobber()		call void @clobber()
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_not_captured_before_load_same_bb_clobber(i32** %in.ptr) {		define i32 @test_not_captured_before_load_same_bb_clobber(i32** %in.ptr) {
; CHECK-LABEL: @test_not_captured_before_load_same_bb_clobber(		; CHECK-LABEL: @test_not_captured_before_load_same_bb_clobber(
Show All 35 Lines	;
store i32 99, i32* %a, align 4		store i32 99, i32* %a, align 4
call void @escape_and_clobber(i32* %a)		call void @escape_and_clobber(i32* %a)
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_captured_sibling_path_to_load_other_blocks_1(i32** %in.ptr, i1 %c.1) {		define i32 @test_captured_sibling_path_to_load_other_blocks_1(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_captured_sibling_path_to_load_other_blocks_1(		; CHECK-LABEL: @test_captured_sibling_path_to_load_other_blocks_1(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])		; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: br label [[EXIT]]		; CHECK-NEXT: br label [[EXIT]]
Show All 21 Lines	exit:
store i32 99, i32* %a, align 4		store i32 99, i32* %a, align 4
call void @clobber()		call void @clobber()
ret i32 %p		ret i32 %p
}		}

define i32 @test_only_captured_sibling_path_with_ret_to_load_other_blocks(i32** %in.ptr, i1 %c.1) {		define i32 @test_only_captured_sibling_path_with_ret_to_load_other_blocks(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_only_captured_sibling_path_with_ret_to_load_other_blocks(		; CHECK-LABEL: @test_only_captured_sibling_path_with_ret_to_load_other_blocks(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])		; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	exit:
call void @escape_writeonly(i32* %a)		call void @escape_writeonly(i32* %a)
call void @clobber()		call void @clobber()
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_not_captured_before_load_other_blocks_1(i32** %in.ptr, i1 %c.1) {		define i32 @test_not_captured_before_load_other_blocks_1(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_not_captured_before_load_other_blocks_1(		; CHECK-LABEL: @test_not_captured_before_load_other_blocks_1(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: br label [[EXIT]]		; CHECK-NEXT: br label [[EXIT]]
Show All 17 Lines
exit:		exit:
call void @escape_and_clobber(i32* %a)		call void @escape_and_clobber(i32* %a)
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_not_captured_before_load_other_blocks_2(i32** %in.ptr, i1 %c.1) {		define i32 @test_not_captured_before_load_other_blocks_2(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_not_captured_before_load_other_blocks_2(		; CHECK-LABEL: @test_not_captured_before_load_other_blocks_2(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: else:		; CHECK: else:
Show All 19 Lines

exit:		exit:
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_not_captured_before_load_other_blocks_3(i32** %in.ptr, i1 %c.1) {		define i32 @test_not_captured_before_load_other_blocks_3(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_not_captured_before_load_other_blocks_3(		; CHECK-LABEL: @test_not_captured_before_load_other_blocks_3(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: else:		; CHECK: else:
Show All 17 Lines

exit:		exit:
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_not_captured_before_load_other_blocks_4(i32** %in.ptr, i1 %c.1) {		define i32 @test_not_captured_before_load_other_blocks_4(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_not_captured_before_load_other_blocks_4(		; CHECK-LABEL: @test_not_captured_before_load_other_blocks_4(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])		; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])
; CHECK-NEXT: br label [[EXIT]]		; CHECK-NEXT: br label [[EXIT]]
Show All 22 Lines	exit:
call void @clobber()		call void @clobber()
ret i32 %p		ret i32 %p
}		}

define i32 @test_not_captured_before_load_other_blocks_5(i32** %in.ptr, i1 %c.1) {		define i32 @test_not_captured_before_load_other_blocks_5(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_not_captured_before_load_other_blocks_5(		; CHECK-LABEL: @test_not_captured_before_load_other_blocks_5(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[EXIT:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[EXIT:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])		; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])
; CHECK-NEXT: br label [[EXIT]]		; CHECK-NEXT: br label [[EXIT]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: [[P:%.]] = phi i32 [ [[IN_LV_2]], [[THEN]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[P:%.]] = phi i32 [ [[IN_LV_2]], [[THEN]] ], [ 0, [[ENTRY:%.]] ]
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	exit:
call void @clobber()		call void @clobber()
ret i32 %p		ret i32 %p
}		}

define i32 @test_not_captured_before_load_other_blocks_7(i32** %in.ptr, i1 %c.1) {		define i32 @test_not_captured_before_load_other_blocks_7(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_not_captured_before_load_other_blocks_7(		; CHECK-LABEL: @test_not_captured_before_load_other_blocks_7(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])		; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[EXIT:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[EXIT:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])		; CHECK-NEXT: call void @escape_writeonly(i32* [[A]])
; CHECK-NEXT: br label [[EXIT]]		; CHECK-NEXT: br label [[EXIT]]
; CHECK: exit:		; CHECK: exit:
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	;
call void @escape_and_clobber(i32* %a)		call void @escape_and_clobber(i32* %a)
%res = add i32 %in.lv.2, %lv		%res = add i32 %in.lv.2, %lv
ret i32 %res		ret i32 %res
}		}

define i32 @test_captured_after_loop(i32** %in.ptr, i1 %c.1) {		define i32 @test_captured_after_loop(i32** %in.ptr, i1 %c.1) {
; CHECK-LABEL: @test_captured_after_loop(		; CHECK-LABEL: @test_captured_after_loop(
; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[LOOP]], label [[EXIT:%.]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[LOOP]], label [[EXIT:%.]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg)		declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg)

@global = external global [10 x i16]*		@global = external global [10 x i16]*

define void @test_memset_not_captured_before_load() {		define void @test_memset_not_captured_before_load() {
; CHECK-LABEL: @test_memset_not_captured_before_load(		; CHECK-LABEL: @test_memset_not_captured_before_load(
; CHECK-NEXT: [[A:%.*]] = alloca [2 x i32], align 4		; CHECK-NEXT: [[A:%.*]] = alloca [2 x i32], align 4
; CHECK-NEXT: [[CAST_A:%.]] = bitcast [2 x i32] [[A]] to i8*
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, i8 [[CAST_A]], i32 4
; CHECK-NEXT: call void @llvm.memset.p0i8.i32(i8* align 1 [[TMP1]], i8 0, i32 4, i1 false)
; CHECK-NEXT: [[LV_1:%.]] = load [10 x i16], [10 x i16]** @global, align 8		; CHECK-NEXT: [[LV_1:%.]] = load [10 x i16], [10 x i16]** @global, align 8
; CHECK-NEXT: [[GEP_A_0:%.]] = getelementptr inbounds [2 x i32], [2 x i32] [[A]], i32 0, i32 0		; CHECK-NEXT: [[GEP_A_0:%.]] = getelementptr inbounds [2 x i32], [2 x i32] [[A]], i32 0, i32 0
; CHECK-NEXT: store i32 1, i32* [[GEP_A_0]], align 4		; CHECK-NEXT: store i32 1, i32* [[GEP_A_0]], align 4
; CHECK-NEXT: [[GEP_LV:%.]] = getelementptr inbounds [10 x i16], [10 x i16] [[LV_1]], i64 0, i32 1		; CHECK-NEXT: [[GEP_LV:%.]] = getelementptr inbounds [10 x i16], [10 x i16] [[LV_1]], i64 0, i32 1
; CHECK-NEXT: [[LV_2:%.]] = load i16, i16 [[GEP_LV]], align 2		; CHECK-NEXT: [[LV_2:%.]] = load i16, i16 [[GEP_LV]], align 2
; CHECK-NEXT: [[EXT_LV_2:%.*]] = zext i16 [[LV_2]] to i32		; CHECK-NEXT: [[EXT_LV_2:%.*]] = zext i16 [[LV_2]] to i32
; CHECK-NEXT: [[GEP_A_1:%.]] = getelementptr inbounds [2 x i32], [2 x i32] [[A]], i32 0, i32 1		; CHECK-NEXT: [[GEP_A_1:%.]] = getelementptr inbounds [2 x i32], [2 x i32] [[A]], i32 0, i32 1
; CHECK-NEXT: store i32 [[EXT_LV_2]], i32* [[GEP_A_1]], align 4		; CHECK-NEXT: store i32 [[EXT_LV_2]], i32* [[GEP_A_1]], align 4
Show All 15 Lines	;
ret void		ret void
}		}

define void @test_test_not_captured_before_load(i1 %c.1) {		define void @test_test_not_captured_before_load(i1 %c.1) {
; CHECK-LABEL: @test_test_not_captured_before_load(		; CHECK-LABEL: @test_test_not_captured_before_load(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[A:%.*]] = alloca [2 x i32], align 4		; CHECK-NEXT: [[A:%.*]] = alloca [2 x i32], align 4
; CHECK-NEXT: [[CAST_A:%.]] = bitcast [2 x i32] [[A]] to i8*		; CHECK-NEXT: [[CAST_A:%.]] = bitcast [2 x i32] [[A]] to i8*
; CHECK-NEXT: call void @llvm.memset.p0i8.i32(i8* [[CAST_A]], i8 0, i32 8, i1 false)		; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[CAST_A]], i32 4
		; CHECK-NEXT: call void @llvm.memset.p0i8.i32(i8* align 1 [[TMP0]], i8 0, i32 4, i1 false)
; CHECK-NEXT: [[LV_1:%.]] = load [10 x i16], [10 x i16]** @global, align 8		; CHECK-NEXT: [[LV_1:%.]] = load [10 x i16], [10 x i16]** @global, align 8
; CHECK-NEXT: [[GEP_LV:%.]] = getelementptr inbounds [10 x i16], [10 x i16] [[LV_1]], i64 0, i32 1		; CHECK-NEXT: [[GEP_LV:%.]] = getelementptr inbounds [10 x i16], [10 x i16] [[LV_1]], i64 0, i32 1
; CHECK-NEXT: [[LV_2:%.]] = load i16, i16 [[GEP_LV]], align 2		; CHECK-NEXT: [[LV_2:%.]] = load i16, i16 [[GEP_LV]], align 2
; CHECK-NEXT: [[GEP_A_0:%.]] = getelementptr inbounds [2 x i32], [2 x i32] [[A]], i32 0, i32 0		; CHECK-NEXT: [[GEP_A_0:%.]] = getelementptr inbounds [2 x i32], [2 x i32] [[A]], i32 0, i32 0
; CHECK-NEXT: store i32 1, i32* [[GEP_A_0]], align 4		; CHECK-NEXT: store i32 1, i32* [[GEP_A_0]], align 4
; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[C_1:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: call void @escape_and_clobber(i32* [[GEP_A_0]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[GEP_A_0]])
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines
}		}


declare noalias i32* @alloc() nounwind		declare noalias i32* @alloc() nounwind

define i32 @test_not_captured_before_load_same_bb_noalias_call(i32** %in.ptr) {		define i32 @test_not_captured_before_load_same_bb_noalias_call(i32** %in.ptr) {
; CHECK-LABEL: @test_not_captured_before_load_same_bb_noalias_call(		; CHECK-LABEL: @test_not_captured_before_load_same_bb_noalias_call(
; CHECK-NEXT: [[A:%.]] = call i32 @alloc()		; CHECK-NEXT: [[A:%.]] = call i32 @alloc()
; CHECK-NEXT: store i32 55, i32* [[A]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A]], align 4
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
; CHECK-NEXT: ret i32 [[IN_LV_2]]		; CHECK-NEXT: ret i32 [[IN_LV_2]]
;		;
%a = call i32* @alloc()		%a = call i32* @alloc()
store i32 55, i32* %a		store i32 55, i32* %a
%in.lv.1 = load i32* , i32** %in.ptr, align 2		%in.lv.1 = load i32* , i32** %in.ptr, align 2
%in.lv.2 = load i32 , i32* %in.lv.1, align 2		%in.lv.2 = load i32 , i32* %in.lv.1, align 2
store i32 99, i32* %a, align 4		store i32 99, i32* %a, align 4
call void @escape_and_clobber(i32* %a)		call void @escape_and_clobber(i32* %a)
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

define i32 @test_not_captured_before_load_same_bb_noalias_arg(i32** %in.ptr, i32* noalias %a) {		define i32 @test_not_captured_before_load_same_bb_noalias_arg(i32** %in.ptr, i32* noalias %a) {
; CHECK-LABEL: @test_not_captured_before_load_same_bb_noalias_arg(		; CHECK-LABEL: @test_not_captured_before_load_same_bb_noalias_arg(
; CHECK-NEXT: store i32 55, i32* [[A:%.*]], align 4
; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2		; CHECK-NEXT: [[IN_LV_1:%.]] = load i32, i32** [[IN_PTR:%.*]], align 2
; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2		; CHECK-NEXT: [[IN_LV_2:%.]] = load i32, i32 [[IN_LV_1]], align 2
; CHECK-NEXT: store i32 99, i32* [[A]], align 4		; CHECK-NEXT: store i32 99, i32* [[A:%.*]], align 4
; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])		; CHECK-NEXT: call void @escape_and_clobber(i32* [[A]])
; CHECK-NEXT: ret i32 [[IN_LV_2]]		; CHECK-NEXT: ret i32 [[IN_LV_2]]
;		;
store i32 55, i32* %a		store i32 55, i32* %a
%in.lv.1 = load i32* , i32** %in.ptr, align 2		%in.lv.1 = load i32* , i32** %in.ptr, align 2
%in.lv.2 = load i32 , i32* %in.lv.1, align 2		%in.lv.2 = load i32 , i32* %in.lv.1, align 2
store i32 99, i32* %a, align 4		store i32 99, i32* %a, align 4
call void @escape_and_clobber(i32* %a)		call void @escape_and_clobber(i32* %a)
ret i32 %in.lv.2		ret i32 %in.lv.2
}		}

		define i32 @instruction_captures_multiple_objects(i32* %p.1, i32 %p.2, i32 %p.3, i1 %c) {
		; CHECK-LABEL: @instruction_captures_multiple_objects(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[A_1:%.*]] = alloca i32, align 4
		; CHECK-NEXT: [[A_2:%.*]] = alloca i32, align 4
		; CHECK-NEXT: store i32 0, i32* [[P_1:%.*]], align 8
		; CHECK-NEXT: br i1 [[C:%.]], label [[THEN:%.]], label [[ELSE:%.*]]
		; CHECK: then:
		; CHECK-NEXT: [[LV_2:%.]] = load i32, i32** [[P_2:%.*]], align 8
		; CHECK-NEXT: [[LV_2_2:%.]] = load i32, i32 [[LV_2]], align 4
		; CHECK-NEXT: ret i32 [[LV_2_2]]
		; CHECK: else:
		; CHECK-NEXT: [[LV_3:%.]] = load i32, i32** [[P_3:%.*]], align 8
		; CHECK-NEXT: [[LV_3_2:%.]] = load i32, i32 [[LV_3]], align 4
		; CHECK-NEXT: call void @capture_and_clobber_multiple(i32* [[A_1]], i32* [[A_2]])
		; CHECK-NEXT: ret i32 [[LV_3_2]]
		;
		entry:
		%a.1 = alloca i32
		%a.2 = alloca i32
		store i32 0, i32* %p.1, align 8
		br i1 %c, label %then, label %else

		then:
		store i32 99, i32* %a.2, align 4
		%lv.2 = load i32, i32* %p.2
		%lv.2.2 = load i32, i32* %lv.2
		store i32 0, i32* %a.1, align 8
		ret i32 %lv.2.2

		else:
		%lv.3 = load i32, i32* %p.3
		%lv.3.2 = load i32, i32* %lv.3
		call void @capture_and_clobber_multiple(i32* %a.1, i32* %a.2)
		ret i32 %lv.3.2
		}

		declare void @capture_and_clobber_multiple(i32, i32)