This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Eliminate dead stores to stack.
ClosedPublic

Authored by courbet on Jan 31 2019, 1:32 PM.

Details

Summary

A store to an object whose lifetime is about to end can be removed.

See PR40550 for motivation.

Diff Detail

Repository
rL LLVM

Event Timeline

courbet created this revision.Jan 31 2019, 1:32 PM

Sorry, something wrong happened with the diff

courbet marked an inline comment as done.
courbet added inline comments.
test/DebugInfo/COFF/inlining.ll
32 ↗(On Diff #184601)

Before:

addl    $6, x(%rip)
 addl    $4, x(%rip)
 movl    $1, -4(%rsp)
 leaq    -4(%rsp), %rax
 addl    %eax, x(%rip)
 addl    $2, x(%rip)
 addl    $3, x(%rip)
 addl    $5, x(%rip)
 addl    $7, x(%rip)
 retq

After:

addl    $6, x(%rip)
 addl    $4, x(%rip)
 leaq    -4(%rsp), %rax
 addl    %eax, x(%rip)
 addl    $2, x(%rip)
 addl    $3, x(%rip)
 addl    $5, x(%rip)
 addl    $7, x(%rip)

I just realized that there is nothing that guarantees that an alloca's lifetime is ended in one go, so I'll also have to check that the store is within the lifetime.end base and size.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 7:02 AM
courbet planned changes to this revision.Feb 1 2019, 7:02 AM
niravd added a comment.Feb 1 2019, 7:19 AM

I just realized that there is nothing that guarantees that an alloca's lifetime is ended in one go, so I'll also have to check that the store is within the lifetime.end base and size.

Yes. You should use the BaseIndexOffset alias framework here.

Also, you'll need to deal with the additional output from indexed stores.

We should probably also add this some LIFETIME nodes casing to FindBetterChains and improve LIFETIME_END's chain so we can deal with overlapping lifetimes.

courbet updated this revision to Diff 185280.Feb 5 2019, 4:15 AM

Only remove stores to dead stack space when they are fully contained inside the
object whose lifetime gets terminated.

Also, you'll need to deal with the additional output from indexed stores.

I think I'm fine because of the hasOneUse() check. I'm not sure how I should test this though, do you have a suggestion for some test IR for an indexed store ?

niravd added a comment.Feb 5 2019, 7:30 AM

Also, you'll need to deal with the additional output from indexed stores.

I think I'm fine because of the hasOneUse() check. I'm not sure how I should test this though, do you have a suggestion for some test IR for an indexed store ?

I believe the CombineTo will fail from too few values on indexed stores, even if the value isn't used. Better to check isIndexed() or do the unfolding back to an ADD/SUB.

I suspect the only way this would trigger currently is if an intrinsic lowers into an indexed store in ARM or AArch64.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15386 ↗(On Diff #185280)

I think we should merge this with FindBetterChains. At the least we should have load/store aliasing be able to bypass unrelated lifetime nodes and vice versa.

courbet updated this revision to Diff 185575.Feb 6 2019, 9:17 AM
  • Add an explicit isIndexed() check to ignore indexed stores.
  • Teach FindBetterChain to bypass unrelated lifetime nodes.
courbet marked an inline comment as done.Feb 6 2019, 9:18 AM
courbet added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15386 ↗(On Diff #185280)

I'm not exactly sure how to merge it, but teaching it to bypass unrelated lifetime nodes sounds like a good idea. Done.

courbet updated this revision to Diff 185577.Feb 6 2019, 9:20 AM

clang-format

niravd added inline comments.Feb 6 2019, 10:47 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15386 ↗(On Diff #185280)

I was thinking of this as two combines: One sharing logic with FindBetterChains to improve LIFETIME_END's chain to simplify it's chain to (presumably just the store chain) and the other to check the chain of a LIFETIME_END like we do with dead stores via overlapping stores. You already have the second.

15397 ↗(On Diff #185577)

LIFETIME_START as well?

15402 ↗(On Diff #185577)

You can remove "!ST->hasOneUse()" now that you have the indexed check.

15412 ↗(On Diff #185577)

Nit: return CombineTo(ST, ST->getChain);

lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
90 ↗(On Diff #185577)

This is "contains".

courbet updated this revision to Diff 185716.Feb 7 2019, 1:11 AM
courbet marked 4 inline comments as done.
  • search dead stores through non-aliasing LIFETIME_STARTs
  • Better comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15386 ↗(On Diff #185280)

I see. There is an issue with making GatherAllAliases work with LifetimeSDNode: LifetimeSDNode cannot be a MemSDNode (or LSBaseSDNode) because MemSDNodes really represent memory accesses, and lifetime nodes really are just references to memory, they never access it. Code manipulating the MemSDNode member MachineMemOperand assumes either load or store, and changing this would require significant changes.

On the other hand now that I've made FindBetterChain able to go past lifetime start/end nodes as you've suggested in previous comments, it will be able to move stores out of unrelated LIFETIME_{START,END}.

15397 ↗(On Diff #185577)

Indeed, but only for lifetime_starts that do not overlap. I'm not checking the overlap for lifetime_end because you cannot have:

lifetime_end(a);
lifetime_end(a);

but you might have:

lifetime_start(a);
lifetime_end(a);
lifetime_start(a); // 3
lifetime_end(a);   // 4

and you don't want 4 to go past 3.

15402 ↗(On Diff #185577)

I still need to check the !hasOneUse() check because the store might be depended on by several users.

15412 ↗(On Diff #185577)

Not sure I understand why ?

lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
90 ↗(On Diff #185577)

Not exactly. contains means "one is within the other" (and is not symmetrical), here it's "one might overlap the other" (and is symmetrical). I've added graphical comments + the symmetric formula for reference.

niravd accepted this revision.Feb 7 2019, 7:06 AM

LGTM, modulo potentially excising the hasOneUse as I've suggested provided you agree with my line of reasoning there.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15397 ↗(On Diff #185577)

Yes. Maybe add the check for LIFETIME_END as well, just as a sanity check. I worry that we may have a case where a and a' overlap.

lifetime_start(a);
lifetime_end(a);
lifetime_start(a'); 3
lifetime_end(a');
4

But it's fine as is.

15402 ↗(On Diff #185577)

Shoudn't the fact that that the store is an immediate (or at least only through TF) predecessor to the LIFETIME_END be sufficient proof that the store can be removed? In which case we should be able to remove it completely.

The only way it'll fail the hasOneUse case is if we have some redundancy somewhere (presumably because we gave up partway in our alias analysis).

I don't expect to see this very often, so either way is probably fine.

15412 ↗(On Diff #185577)

Oh right. You're returning LIFETIME_END. Nevermind.

lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
90 ↗(On Diff #185577)

Right!

This revision is now accepted and ready to land.Feb 7 2019, 7:06 AM
courbet marked 2 inline comments as done.Feb 8 2019, 4:46 AM
courbet added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15402 ↗(On Diff #185577)

Shoudn't the fact that that the store is an immediate (or at least only through TF) predecessor to the LIFETIME_END be sufficient proof that > the store can be removed? In which case we should be able to remove it completely.

Consider the example in test onealloc_readback_1. During the combine, just before we visit t23 in the dag looks like this:

SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
    t9: ch = lifetime.start<16 to 32> t0, TargetFrameIndex:i64<0>
  t10: ch = lifetime.start<0 to 16> t9, TargetFrameIndex:i64<0>
  t11: i64 = Constant<0>
  t13: v16i8,ch = load<(load 16 from %ir.a, align 1)> t10, t2, undef:i64
      t7: i64 = add nuw FrameIndex:i64<0>, Constant:i64<16>
    t14: ch = store<(store 16 into %ir.part1, align 1)> t10, t13, t7, undef:i64
  t15: ch = TokenFactor t13:1, t14
    t4: i64,ch = CopyFromReg t0, Register:i64 %1
  t16: v16i8,ch = load<(load 16 from %ir.b, align 1)> t15, t4, undef:i64
  t17: ch = store<(store 16 into %ir.part21)> t15, t16, FrameIndex:i64<0>, undef:i64
  t28: v16i8,ch = load<(load 16 from %ir.part21)> t17, FrameIndex:i64<0>, undef:i64
          t18: ch = TokenFactor t16:1, t17
        t19: ch = lifetime.end<16 to 32> t18, TargetFrameIndex:i64<0>
        t26: ch = store<(store 16 into %ir.a, align 1)> t0, t28, t2, undef:i64
      t30: ch = TokenFactor t19, t28:1, t26
    t23: ch = lifetime.end<0 to 16> t30, TargetFrameIndex:i64<0>
  t25: ch = X86ISD::RET_FLAG t23, TargetConstant:i32<0>

Note how the store t17 is in immediate predecessor (modulo TF & non-aliasing lifetime end) of lifetime end t23. However, load t28 also depends on t17. But the lifetime end t23 does not care about the load, so it makes sense for t23 not to have a chain on t28.

courbet updated this revision to Diff 185946.Feb 8 2019, 4:48 AM
  • Add the check for overlap when traversing LIFETIME_ENDs.
  • print the extents of the lifetime nodes when printing the DAG for debug.
This revision was automatically updated to reflect the committed changes.
niravd added inline comments.Feb 26 2019, 1:27 PM
llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15524

Ahha! We never add the lifetime to the Alias list like we should if we dont' skip past it. I suspect this is why we had to revert this patch.

courbet marked 2 inline comments as done.Mar 1 2019, 7:21 AM

I finally had time to track this down.

The issue was that we were missing a hasOneUse() check on the chain (we were only checking it on the store itself), so there could be other nodes dependind on the store (typically through tokenfactors), e.g.

store ----> TF ---> LIFETIME_END
             \-----> some_user

We were deleting the store...

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15524

I think you meant to put this comment on the lifetime skipping in GatherAllAliases. But you're right, thanks.