This is an archive of the discontinued LLVM Phabricator instance.

Fix Load Control Dependence in MemCpy Generation
ClosedPublic

Authored by niravd on Mar 10 2016, 2:05 PM.

Details

Summary

In Memcpy lowering we had missed a dependence from the load of the operation to successor operations. This causes us to potentially construct an in initial DAG with a memory dependence not fully represented in the chain sub-DAG but rather require looking at the entire DAG breaking alias analysis by allowing incorrect repositioning of memory operations.

To work around this, r200033 changed DAGCombiner::GatherAllAliases to be conservative if any possible issues to happen. Unfortunately this check forbade many non-problematic situations as well. For example, it's common for incoming argument lowering to add a non-aliasing load hanging off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would find that other (unvisited) use of the EntryNode chain, and just give up entirely. Furthermore, the check was incomplete: it would not actually detect all such potentially problematic DAG constructions, because GatherAllAliases did not guarantee to visit all chain nodes going up to the root EntryNode. This is in general fine -- giving up early will just miss a potential optimization, not generate incorrect results. But, for this non-chain dependency detection code, it's possible that you could have a load attached to a higher-up chain node than any which were visited. If that load aliases your store, but the only dependency is through the value operand of a non-aliasing store, it would've been missed by this code, and potentially reordered.

With the dependence added, this check can be removed and Alas Analysis can be much more aggressive. This fixes code quality regression in the Consecutive Store Merge cleanup (D14834).

Test Change:

ppc64-align-long-double.ll now may see multiple serializations
of its stores

Diff Detail

Repository
rL LLVM

Event Timeline

niravd updated this revision to Diff 50349.Mar 10 2016, 2:05 PM
niravd retitled this revision from to Fix Load Control Dependence in MemCpy Generation.
niravd updated this object.
niravd added reviewers: jyknight, hfinkel.
jyknight edited edge metadata.Mar 11 2016, 8:51 AM

Great! I'm fairly certain the reasoning we discussed (and that I've written below) is correct, but I'd really appreciate if Hal would chime in too, as a sanity check.

A longer commit message is really needed here, though, since it's pretty tricky. Here's more description:

In memcpy lowering, add dependency on loads to outgoing chain, and revert the hack added in r200033.
The code in DAGCombiner::GatherAllAliases which this deletes was a hack which guarded against a situation that ought not to occur -- a memory dependency which has been expressed only through value dependence. All such memory dependencies ought to be encoded by the chain, not by random other dag value edges.

That is trivially true for all loads and stores coming directly from IR (since they all start chained together, and only get relaxed by alias analysis), but other DAG-building code must handle the chain properly as well to preserve that property. Unfortunately, memcpy lowering did not, which is what prompted the hack to be added in the first place.

Unfortunately, the check was not just theoretically-redundant, but also forbade many situations which were not problematic. For example, it's common for incoming argument lowering to add a non-aliasing load hanging off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would find that other (unvisited) use of the EntryNode chain, and just give up entirely.

Furthermore, the check was incomplete: it would not actually detect all such potentially problematic DAG constructions, because GatherAllAliases did not guarantee to visit all chain nodes going up to the root EntryNode. This is in general fine -- giving up early will just miss a potential optimization, not generate incorrect results. But, for this non-chain dependency detection code, it's possible that you could have a load attached to a higher-up chain node than any which were visited. If that load aliases your store, but the only dependency is through the value operand of a non-aliasing store, it would've been missed by this code, and potentially reordered.

safely lift instructions in cases.

You mean "in some cases"?

niravd updated this revision to Diff 51038.Mar 18 2016, 10:26 AM
niravd edited edge metadata.
  • In visitSTORE, always use FindBetterChain, rather than only when UseAA
niravd updated this revision to Diff 51043.Mar 18 2016, 10:44 AM
niravd edited edge metadata.

Overwrite accidentally added diff

niravd updated this object.Mar 21 2016, 11:21 AM
niravd updated this revision to Diff 51196.Mar 21 2016, 11:25 AM
  • In visitSTORE, always use FindBetterChain, rather than only when UseAA
niravd added a subscriber: niravd.Mar 21 2016, 11:29 AM

Oops. I didn't send this to the mailing list so the summary is missing.
Adding below:

In Memcpy lowering we had missed a dependence from the load of the
operation to successor operations. This causes us to potentially construct
an in initial DAG with a memory dependence not fully represented in the
chain sub-DAG but rather require looking at the entire DAG breaking alias
analysis by allowing incorrect repositioning of memory operations.

To work around this, r200033 changed DAGCombiner::GatherAllAliases to be
conservative if any possible issues to happen. Unfortunately this check
forbade many non-problematic situations as well. For example, it's common
for incoming argument lowering to add a non-aliasing load hanging off of
EntryNode. Then, if GatherAllAliases visited EntryNode, it would find that
other (unvisited) use of the EntryNode chain, and just give up entirely.
Furthermore, the check was incomplete: it would not actually detect all
such potentially problematic DAG constructions, because GatherAllAliases
did not guarantee to visit all chain nodes going up to the root EntryNode.
This is in general fine -- giving up early will just miss a potential
optimization, not generate incorrect results. But, for this non-chain
dependency detection code, it's possible that you could have a load
attached to a higher-up chain node than any which were visited. If that
load aliases your store, but the only dependency is through the value
operand of a non-aliasing store, it would've been missed by this code, and
potentially reordered.

With the dependence added, this check can be removed and Alas Analysis can
be much more aggressive. This fixes code quality regression in the
Consecutive Store Merge cleanup (D14834).

Test Change:

ppc64-align-long-double.ll now may see multiple serializations
of its stores

Ping. Your thoughts on this Hal?

hfinkel edited edge metadata.Mar 23 2016, 12:02 PM

Ping. Your thoughts on this Hal?

This is quite interesting. The question is: Can we guarantee that all data dependencies are also encoded as chain dependencies? If so, then I'm happy with this -- although I'd also like some loud comments (maybe in include/llvm/CodeGen/ISDOpcodes.h), and ideally some verification code which can be enabled under expensive checks.

My worry here is not just about what we generate, but about what FindBetterChain might do. It would need to specifically not remove a chain dependency that was present to protect/represent a value dependency.

ppc64-align-long-double.ll now may see multiple serializations of its stores

What does this mean? I see you've put in CHECK-DAGs, but do you mean the order has changed or that there are now multiple stores?

Ping. Your thoughts on this Hal?

This is quite interesting. The question is: Can we guarantee that all data dependencies are also encoded as chain dependencies? If so, then I'm happy with this -- although I'd also like some loud comments (maybe in include/llvm/CodeGen/ISDOpcodes.h), and ideally some verification code which can be enabled under expensive checks.

*NO* we can't, as is exemplified by newly reported issue in D18336 (which is broken both before and after this change). However, what we *should* guarantee is that memory dependencies are always encoded by the chain, and not indirectly through value dependencies. That was what was happening before, in the memcpy expansion, and what ought not happen.

To fix that other issue, we will need to ensure none of the stores being merged are a predecessor of any others.

My worry here is not just about what we generate, but about what FindBetterChain might do. It would need to specifically not remove a chain dependency that was present to protect/represent a value dependency.

Well, this code being removed here did not actually try to prevent that. As it turns out, neither did anything else. (But that's another bug!)

I think the chain should not need to represent value dependencies (it doesn't now), but neither should value dependencies represent memory ordering dependencies (and this hack added because memcpy was using them that way).

ppc64-align-long-double.ll now may see multiple serializations of its stores

What does this mean? I see you've put in CHECK-DAGs, but do you mean the order has changed or that there are now multiple stores?

ppc64-align-long-double.ll now may see multiple serializations of its stores

What does this mean? I see you've put in CHECK-DAGs, but do you mean the order has changed or that there are now multiple stores?

Just that we no longer have an order dependencies so the store so they may appear in any order.

niravd added a comment.EditedMar 28 2016, 7:06 AM

Now that D18336 is committed, Hal's concerns should be addressed so I think we can move on this.

spatel added a subscriber: spatel.Mar 30 2016, 7:34 AM
jyknight accepted this revision.Apr 8 2016, 11:27 AM
jyknight edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Apr 8 2016, 11:27 AM
This revision was automatically updated to reflect the committed changes.