This is an archive of the discontinued LLVM Phabricator instance.

[RFC] [DAGCombine] Better store-to-load forwarding
Needs ReviewPublic

Authored by hfinkel on Sep 9 2016, 2:34 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This is a patch/RFC. DAGCombiner::visitLOAD currently has this code:

// If this load is directly stored, replace the load value with the stored
// value.
// TODO: Handle store large -> read small portion.
// TODO: Handle TRUNCSTORE/LOADEXT
if (ISD::isNormalLoad(N) && !LD->isVolatile()) {
  if (ISD::isNON_TRUNCStore(Chain.getNode())) {
    StoreSDNode *PrevST = cast<StoreSDNode>(Chain);
    if (PrevST->getBasePtr() == Ptr &&
        PrevST->getValue().getValueType() == N->getValueType(0))
    return CombineTo(N, Chain.getOperand(1), Chain);
  }
}

And so, I wondered, what might fixing these TODOs look like? Furthermore, I'm not even sure we want to do this. Improving this turns out not to be the right way to fix the problem at which I was looking. The code might nevertheless be useful. A couple observations:

  1. We can't really fix this without some kind of dead-store elimination later in the pipeline (which we don't currently have). Why? If we remove a load in favor of some other arithmetic (shifts, masks, sign extensions, etc.), but keep the (now unused) store, we often increase code size by performing the forwarding (perhaps with little resulting benefit, depending on the architecture).
  2. Our practice of canonicalizing addressing offsets using 'or' instead of 'add' when we know the addends have disjoint set bits is not uniformly handled well in DAGCombine. A lot of code does not even look for the 'or' case, such as in BaseIndexOffset::match and in FindBaseOffset. We also don't fold "(A|c1)+c2 => A+(c1+c2)" where A and c1 have no common bits. The patch includes fixes for these things. Our address selection code, in X86DAGToDAGISel::matchAddressRecursively for example, recurses through these to find the total accumulated offset, an so I suspect that we don't often notice.

This patch does not have any test-case updates, but around 50 regression tests would need to be updated.

Thoughts?

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 70901.Sep 9 2016, 2:34 PM
hfinkel retitled this revision from to [RFC] [DAGCombine] Better store-to-load forwarding.
hfinkel updated this object.
hfinkel added a subscriber: llvm-commits.