This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Track earliest escape, use for loads in isReadClobber.
ClosedPublic

Authored by fhahn on Sep 15 2021, 12:23 PM.

Details

Summary

At the moment, DSE only considers whether a pointer may be captured at
all in a function. This leads to cases where we fail to remove stores to
local objects because we do not check if they escape before potential
read-clobbers or after.

Doing context-sensitive escape queries in isReadClobber has been removed
a while ago in d1a1cce5b130 to save compile-time. See PR50220 for more
context.

This patch introduces a new capture tracker, which keeps track of the
'earliest' capture. An instruction A is considered earlier than instruction
B, if A dominates B. If 2 escapes do not dominate each other, the
terminator of the common dominator is chosen. If not all uses cannot be
analyzed, the earliest escape is set to the first instruction in the
function entry block.

If the query instruction dominates the earliest escape and is not in a
cycle, then pointer does not escape before the query instruction.

This patch uses this information when checking if a load of a loaded
underlying object may alias a write to a stack object. If the stack
object does not escape before the load, they do not alias.

I will share a follow-up patch to also use the information for call
instructions to fix PR50220.

In terms of compile-time, the impact is low in general,

NewPM-O3: +0.05%
NewPM-ReleaseThinLTO: +0.05%
NewPM-ReleaseLTO-g: +0.03

with the largest change being tramp3d-v4 (+0.30%)
http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Compared to always computing the capture information on demand, we get
the following benefits from the caching:
NewPM-O3: -0.03%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.04%

The biggest speedup is tramp3d-v4 (-0.21%).
http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions

Overall there is a small, but noticeable benefit from caching. I am not
entirely sure if the speedups warrant the extra complexity of caching.
The way the caching works also means that we might miss a few cases, as
it is less precise. Also, there may be a better way to cache things.

Diff Detail

Event Timeline

fhahn created this revision.Sep 15 2021, 12:23 PM
fhahn requested review of this revision.Sep 15 2021, 12:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2021, 12:23 PM

A note for the compile time numbers in the description: those are with also using the cache to handle the case where UseInst is a call which fixes PR50220.

rnk added a subscriber: rnk.Sep 15 2021, 1:17 PM

The changes to also handle calls are available in D109907

Thanks for this, changes look reasonable to me.

IIUC, there will be a +0.30% in tramp3d with this, but without the caching it would be +0.51%?

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
901

Does it make sense to use SmallDenseMap?

1271

Could you name either "MayBeCapturedBefore" or return the negated condition and rename "MayNotBeCapturedBefore"?

1277

If this can be nullptr (per the return condition below), shouldn't inserting into Inst2Obj be conditioned on it being non null?

1283

Negated condition looks more intuitive to me:

return Iter.first->second && DT.properlyDominates(I, Iter.first->second) &&
    !isPotentiallyReachable(I->getNextNode(), I, nullptr, &DT, &LI);
1762

Remove DeadInst from Inst2Obj.

Passing-by remark:
rGd1a1cce5b130630df0c821e8cafe5f683ccccb90 mentions not being cost-beneficial,
this mentions cost, but does not say anything about benefit.

fhahn updated this revision to Diff 373224.Sep 17 2021, 8:05 AM
fhahn marked 4 inline comments as done.

Addressed comments, thanks!

Passing-by remark:
rGd1a1cce5b130630df0c821e8cafe5f683ccccb90 mentions not being cost-beneficial,
this mentions cost, but does not say anything about benefit.

Good point! With -O3 -flto MultiSource/SPEC2006/SPEC2017 there are a few
improvements (~200 more stores removed):

Metric: dse.NumFastStores

Program
 test-suite...urce/Applications/lua/lua.test    10.00   11.00  10.0%
 test-suite...3.xalancbmk/483.xalancbmk.test   675.00  694.00   2.8%
 test-suite...510.parest_r/510.parest_r.test   4527.00 4646.00  2.6%
 test-suite...lancbmk_r/523.xalancbmk_r.test   965.00  971.00   0.6%
 test-suite.../Benchmarks/Bullet/bullet.test   1183.00 1190.00  0.6%
 test-suite...:: External/Povray/povray.test   346.00  348.00   0.6%
 test-suite...006/453.povray/453.povray.test   540.00  543.00   0.6%
 test-suite...511.povray_r/511.povray_r.test   552.00  555.00   0.5%
 test-suite...7rate/502.gcc_r/502.gcc_r.test   1381.00 1388.00  0.5%
 test-suite...6.blender_r/526.blender_r.test   3488.00 3504.00  0.5%
 test-suite.../CINT2000/252.eon/252.eon.test   2118.00 2126.00  0.4%
 test-suite.../CINT2006/403.gcc/403.gcc.test   542.00  544.00   0.4%

Also, @rnk mentioned seeing plenty of stores not being removed in Chrome + auto-init in PR50220. Perhaps they could let us know the impact of the change + D109907. And this also fixes a regression compared to legacy DSE.

fhahn added inline comments.Sep 17 2021, 8:08 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1271

I changed it to notCapturedBefore & added a doc-comment.

1277

If it is nullptr this means there is no capturing instruction. I added an early exit below.

1283

Updated and inverted the function name.

1762

Done, thanks!

fhahn added a subscriber: Gerolf.Sep 17 2021, 8:09 AM
asbirlea added inline comments.Sep 17 2021, 2:49 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1277

I meant adding this if condition to not call insert with a nullptr key:

if(Iter.first->second) {
    auto Ins = Inst2Obj.insert({Iter.first->second, {}});
     Ins.first->second.push_back(Object);
}
1283

I believe DT.dominates(I, Iter.first->second) && I != Iter.first->second is congruent with DT.properlyDominates(I, Iter.first->second).

rnk added a comment.Sep 17 2021, 2:58 PM

Thanks for putting this together!

Also, @rnk mentioned seeing plenty of stores not being removed in Chrome + auto-init in PR50220. Perhaps they could let us know the impact of the change + D109907. And this also fixes a regression compared to legacy DSE.

I probably won't be able to do any benchmarking, but if anyone is interested, you can probably run any benchmark set you like with -fauto-var-init=pattern before and after this patch and check how many extra stores are removed. I strongly suspect it will help on code patterns where initialization is separate from declaration:

struct Foo v;
...
v.x = 0;
...
escape(&v);

Even if this pattern isn't prevalent in common benchmark suites, I think it's pretty important to users of these security hardening flags.

nikic added inline comments.Sep 19 2021, 1:48 PM
llvm/include/llvm/Analysis/CaptureTracking.h
70

nit: cannot -> can

llvm/lib/Analysis/CaptureTracking.cpp
178

This looks incorrect to me. Why would reachability between two captures matter? The way I'm reading this, if you have code like this...

if (...) {
    capture1();
} else {
    capture2();
}

...outside a loop, then one of the captures will be ignored and we'll compute an incorrect earliest capture (which should be before the if in this case).

Possibly this check should be DT.dominates(EarliestCapture, I)? Though not sure it's necessary to explicitly check this at all.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1272

notCaptureBeforeOrAt maybe? Not important for the load case, but is for calls.

1278

/**/ for booleans

1287

May be worth noting that the dominates() check is an approximation of !isPotentiallyReachable, presumably for compile-time reasons. Specifically, it misses this case, where the code-paths are disjoint, but non-dominating:

if (...) {
    capture();
    return;
}
I;
1321

Use isIdentifiedFunctionLocal() and isEscapeSource() here? Or else note that this is a compile-time optimization. I think we should be at least using isEscapeSource() though.

fhahn updated this revision to Diff 373601.Sep 20 2021, 8:33 AM
fhahn marked 7 inline comments as done.

Address latest comments. This patch changes the capture tracking check to use dominance instead of reachability and the DSE check to only use reachability. This slightly increase compile-time geomeans by ~+0.01% but also increases the number of stores removed

http://llvm-compile-time-tracker.com/compare.php?from=92904cc68fbc1d000387b30accc8b05b3fe95daa&to=86d4647eca500205d5871e6faa0a55ea698f2277&stat=instructions

llvm/lib/Analysis/CaptureTracking.cpp
178

This looks incorrect to me.

Yep this is not correct. I couldn't come up with a problematic test case with the earlier version of the patch due to the dominance checks in DSE. The latest version has the logic flipped now, as in dominance check here and reachability check only in DSE.

Possibly this check should be DT.dominates(EarliestCapture, I)? Though not sure it's necessary to explicitly check this at all.

yes, this should check for dominance. We could skip the check, but then we would fail to skip exploring branches early I think.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
901

I think we can have a substantial number of tracked escapes, but I am not sure what a good cut-off is for choosing SmallDenseMap vs DenseMap.

1277

Ok got it, thanks! Updated the code.

1283

The dominance check is gone now.

1287

yes, the original motivation was to have dominates as a quick proxy check to skip the more expensive isPotentiallyReachable check at expense of precision. But in the latest version, isPotentiallyReachable isn't used during capture-tracking analysis and using isPotentiallyReachable here directly only leads to a small increase in compile-time (geomeans increase by ~0.01%). The number of removed stores increases noticeably.

1321

Good point, I'll update it to use isIdentifiedFunctionLocal . It shouldn't be much more expensive in the grand scheme of things.

nikic added inline comments.Sep 20 2021, 12:05 PM
llvm/lib/Analysis/CaptureTracking.cpp
178

As implemented now, I don't think the dominance check makes sense: You're calling isSafeToPrune() from captured(), which will effectively do the same check trying to determine the common dominator. With the current code structure, I'd suggest dropping it (and the whole isSafeToPrune method really).

What you probably have in mind is doing the check in shouldExplore() instead. That will cut of search early, in exchange for a dominance check for each (non-capturing) use. In the CapturesBefore tracking, not checking reachability for each use was a major compile-time improvement, but possibly the dominance check is cheap enough that skipping extra use exploration is a better tradeoff.

If you do want to move it into shouldExplore(), I think you'll need to drop your consistency assertions with PointerMayBeCapturedBefore(), as they could disagree in edge-cases, where you would not hit the use limit because some are discarded early.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1321

What about using isEscapeSource() in place of the isa<LoadInst> check?

fhahn updated this revision to Diff 373684.Sep 20 2021, 12:11 PM

Drop isSafeToPrune.

fhahn added inline comments.Sep 20 2021, 12:17 PM
llvm/lib/Analysis/CaptureTracking.cpp
178

Right, I removed isSafeToPrune alltogether. I guess we can look into using the dominance check in shouldExplore as potential followup

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1321

I can take look at that, but I'll need to make it accessible first. Happy to do that here in this patch or as follow up/

fhahn updated this revision to Diff 374204.Sep 22 2021, 6:25 AM

Rebased on top of recent DSE changes. I think all comments should be addressed now :)

nikic accepted this revision.Sep 22 2021, 10:38 AM

LGTM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1292

It might be worthwhile to add a block reachability cache in the future (possibly that would help with D110094?)

1321

A followup is fine with me.

llvm/test/Transforms/DeadStoreElimination/captures-before-load.ll
127

To check my understanding, it would be fine to optimize this case right? That is, what we actually care about is not an escape before the load (%in.lv.2), but before the instruction producing the loaded value (%in.lv.1). That is, our context instruction could be ReadUO rather than UseInst.

If that's correct, then we might be able to sink this new functionality into AA proper (e.g. via a BatchAA flag for more expensive capture analysis). I previously thought this wasn't possible because alias() does not have a context instruction, but with this we don't actually need one.

134

Isn't this test case the same as test_captured_and_clobbered_before_load_same_bb_1?

This revision is now accepted and ready to land.Sep 22 2021, 10:38 AM
fhahn marked 3 inline comments as done.Sep 23 2021, 4:44 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1292

Sounds good!

llvm/test/Transforms/DeadStoreElimination/captures-before-load.ll
127

We should be able to use ReadUO , once we loaded the object the escapes cannot change it.

then we might be able to sink this new functionality into AA proper (e.g. via a BatchAA flag for more expensive capture analysis)

Sounds good. Do you think it would be worth to also move the caching of the escapes to BatchAA? Are there any users of BatchAA you would be interested in (other than DSE)?

134

yes, it should use escape_writeonly instead of escape_and_clobber and call it before the first load. I'll fix that in the committed version.

This revision was landed with ongoing or failed builds.Sep 23 2021, 4:51 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 3 inline comments as done.
nikic added inline comments.Sep 23 2021, 1:03 PM
llvm/test/Transforms/DeadStoreElimination/captures-before-load.ll
127

A quick prototype for moving to AA: https://github.com/llvm/llvm-project/commit/8e3a5699f206ae36c513a01f953d758ee93e7f54

This has some compile-time impact, not sure how much is just from the dropped LoopInfo: https://llvm-compile-time-tracker.com/compare.php?from=1e3c6fc7cb9d2ee6a5328881f95d6643afeadbff&to=8e3a5699f206ae36c513a01f953d758ee93e7f54&stat=instructions (Note that this implicitly enables the call capture part of the change.)

I think we need some different way to expose the capture cache as a dependency though. I don't really like shoving more datastructures directly in AAQueryInfo, especially conditionally used ones. It would be nicer to pass in some CaptureInfo object that can then be backed either by a simple or by a reachability based implementation.

fhahn added inline comments.Sep 23 2021, 2:10 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1292

It looks like caching reachability for a pair of blocks helps a little bit, but not too much. It does soften the block of D110094 a bit more, but it only increases the total number of stores removed by 0.04% on my test set (SPEC2006/SPEC2017/MultiSource)

Interestingly, it increases compile-time for consumer-typeset and kimwitu++. Perhaps there's a better way to cache it.

http://llvm-compile-time-tracker.com/compare.php?from=a4964cf36bb0f777d82d6a1025435bbd9f67bc4f&to=3d62e94919f0786904633c21cbd66866e1e06ef6&stat=instructions

https://github.com/llvm/llvm-project/commit/3d62e94919f0786904633c21cbd66866e1e06ef6

thakis added a subscriber: thakis.Sep 24 2021, 6:51 AM

FYI, we're seeing compiler crashes that look like they're likely from this change: https://bugs.chromium.org/p/chromium/issues/detail?id=1252762

I haven't been able to make a reduced repro case yet.

fhahn added a comment.Sep 24 2021, 9:15 AM

Thanks for the report! The issue was that the earliest escape was a ptrtoint which gets removed later. The code to invalidate the cached earliest escape only considered removed memory instructions. I recommitted the patch in 6f28fb708149 with a fix to check whether we need to invalidate the cache for any deleted instruction.