This is an archive of the discontinued LLVM Phabricator instance.

[GVN] PRE can't hoist loads across calls in general.
Needs ReviewPublic

Authored by eli.friedman on Jun 6 2016, 2:19 PM.

Details

Summary

Issue exposed by noalias or more aggressive alias analysis.

I'm not particularly happy with the extra loop this patch adds, but I'm
not sure how to go about fixing it.

I'm also not very happy with the use of mayHaveSideEffects; we only care
specifically about control flow here.

Diff Detail

Event Timeline

eli.friedman retitled this revision from to [GVN] PRE can't hoist loads across calls in general..
eli.friedman updated this object.
eli.friedman added a subscriber: llvm-commits.
dberlin edited edge metadata.Jun 6 2016, 2:25 PM
dberlin added a subscriber: dberlin.

I'm very concerned about adding these loops everywhere that are N^2 and
require checking literally every instruction in every block to figure
things out.

Why are we not just adding edges to an exit block or something that the CFG
based algorithms will naturally see as a hoist blocker?

It probably doesn't make sense to turn every call instruction into an invoke or equivalent; that would cause the size of the IR to explode, and make the IR more difficult to use for passes which don't actually care about "trivial" edges (including most of GVN itself).

It might be possible to add some sort of sub-block abstraction over basic blocks... for example:

block:
  call void @a() ; sub-block 0
  %y = add i32 %x, 1 ; sub-block 1
  call void @b() ; sub-block 1
  ret i32 %y ; sub-block 2

Then we provide some sort of sub-block tree that includes edges to trivial unwind blocks.

Assuming we have such a thing, the PRE algorithm can iterate over it to behave correctly. (I'm assuming MemoryDependenceAnalysis itself wouldn't use it because of the overhead involved.)

The question, of course, is how exactly you would implement this. If you try to compute it on the fly, it boils down to exactly the same loop I've already written. If you try to maintain it as a side-table, you now have a gigantic hashtable containing every instruction in the function, and modifying the IR requires special sub-block-aware methods to insert, remove, or move instructions. Attaching it to the IR itself adds an overhead of at least one pointer per Instruction, plus extra overhead for passes which don't care.

sanjoy edited edge metadata.Jun 6 2016, 4:44 PM
sanjoy added a subscriber: broune.

One possibility is writing a "strong post dominance" analysis pass that was brought up by @broune earlier [0]. If it looks like a lot of LLVM's transform passes are buggy in the face of exit(0) or throwing or inf-looping calls, perhaps we will be okay paying the cost of computing and preserving this analysis?

We also need some infrastructural work around "halting" or "always_returns" attributes; I suspect the bug you're fixing here will also occur if the function being called does volatile int i = 0; while (true) i++; instead of exit(0).

[0]: http://lists.llvm.org/pipermail/llvm-dev/2015-July/087744.html

Whatever we choose, i think we need to take a step back and evaluate the
larger problem and whether we are solving it in the best way, rather than
just add tons of "check every instruction between hear and there" code :)

(If we end up thinking that is truly the best way, great, i'll not object.
i'm more concerned that we are just trying to patch what we see as bugs
instead of stopping to think about whether the whole thing is just broken
and in need of rethinking, because i suspect it is.)

eli.friedman edited edge metadata.

Improved control-flow check, fix for scalar PRE.

Still has crappy O(N^2) loop. My current thinking is that it's possible to precompute it on a
per-block basis in GVN::runImpl.

reames edited edge metadata.Jun 13 2016, 10:11 PM

Yuck, how have we managed to be this wrong for this long and not notice it?

Like Danny, I'm hesitant to add the instruction walks here. MDA already did that once, so doing the walk again seems very wasteful. I can see a couple of options here:

  1. Add an analysis which tracks where a basic block always exits if entered. By returning a safe result ("maybe?"), the invalidation wouldn't be too painful. This would be a step in a direction I've been thinking about for a while to consolidate all of our various dereferenceability checks.
  2. Extend MDA to track this information when doing it's walk. This would be strictly weaker than the current code though.

At minimum, the patch should be rewritten as an extension to isValueFullyAvailableInBlock; that's the bit that's supposed to be reasoning about anticipation for LoadPRE.

lib/Transforms/Scalar/GVN.cpp
2422

Can you point to the test case which requires this part? ScalarPRE is not supposed to need to reason about availability. Where does this break?

Also, can this be rephrased in terms of the same isValueFullyAvailableInBlock helper?

Yuck, how have we managed to be this wrong for this long and not notice it?

For the load case, you have to have a pointer which is both NoAlias relative to an arbitrary call and not dereferenceable... so it's basically impossible to trigger at the moment unless you have noalias metadata.

Like Danny, I'm hesitant to add the instruction walks here. MDA already did that once, so doing the walk again seems very wasteful. I can see a couple of options here:

  1. Add an analysis which tracks where a basic block always exits if entered. By returning a safe result ("maybe?"), the invalidation wouldn't be too painful. This would be a step in a direction I've been thinking about for a while to consolidate all of our various dereferenceability checks.
  2. Extend MDA to track this information when doing it's walk. This would be strictly weaker than the current code though.

See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160613/364534.html

I'm hesitant to add a new analysis pass because keeping the analysis up-to-date is inevitably painful... especially given that GVN probably wants to check this at higher resolution than just per-block. Maybe it's the right approach, though.

At minimum, the patch should be rewritten as an extension to isValueFullyAvailableInBlock; that's the bit that's supposed to be reasoning about anticipation for LoadPRE.

I'm not following; this isn't something we need to check on a per-predecessor basis. The issue is basically whether it's legal to hoist a given load from the middle of its parent BB to the beginning of that BB.

lib/Transforms/Scalar/GVN.cpp
2422

See testcase; this is basically just avoiding dividing by zero.

I'm hesitant to add a new analysis pass because keeping the analysis
up-to-date is inevitably painful... especially given that GVN probably
wants to check this at higher resolution than just per-block.

Again, it's very wasteful for it to have to know/compute (even once), where
the blocks that it has to look at in more detail are, when nobody makes
things *more* maythrow then they were before.

If the CFG properly represented "this block, somewhere, throws", it would
know "okay, if i want to hoist out of this block, i have to look harder".

Instead, we now have a bunch of passes that are, at a minimum, going to
look at every instruction in each block and start keeping track of the same
info. In fact, every patch we've done so far here (even mine!) tries to
recompute this very info so it can figure out where it needs to look harder
in constant time.

Also, the current PRE does not really do good PRE, and makes no attempt to
place instructions like a real PRE would.
Any real PRE or store sinking pass is going to want to know what blocks are
"transparent", that is, it is possible to hoist or sink through.

MemorySSA will tell them if the memory state is killed in the block in
constant time.
SSA will tell them if the operands are killed or reusable in the block in
constant time.

The per-block bit you are talking about is precisely the last piece that
tells you if a block is transparent in constant time.

Without it, you must touch every instruction in the program to know which
blocks are transparent.

Maybe it's the right approach, though.

At minimum, the patch should be rewritten as an extension to

isValueFullyAvailableInBlock; that's the bit that's supposed to be
reasoning about anticipation for LoadPRE.

I'm not following; this isn't something we need to check on a
per-predecessor basis. The issue is basically whether it's legal to hoist
a given load from the middle of its parent BB to the beginning of that BB.

This is just because of how current PRE works (which, as mentioned, is
pretty wasteful). If you only want to fix that behavior, then yeah, compute
it once, use that.

sanjoy resigned from this revision.Jun 24 2017, 12:38 PM

Inactive, as far as I can tell.

davide added a subscriber: davide.Jun 24 2017, 12:50 PM
reames resigned from this revision.Feb 25 2020, 9:01 AM

Resigning from a stale review (2016). Feel free to re-add if thread ever revived.