Page MenuHomePhabricator

[DAGCombiner] Eliminate dead stores to stack.
AcceptedPublic

Authored by courbet on Thu, Jan 31, 1:32 PM.

Details

Reviewers
niravd
Summary

A store to an object whose lifetime is about to end can be removed.

See PR40550 for motivation.

Diff Detail

Event Timeline

courbet created this revision.Thu, Jan 31, 1:32 PM

Sorry, something wrong happened with the diff

courbet marked an inline comment as done.
courbet added inline comments.
test/DebugInfo/COFF/inlining.ll
32

Before:

addl    $6, x(%rip)
 addl    $4, x(%rip)
 movl    $1, -4(%rsp)
 leaq    -4(%rsp), %rax
 addl    %eax, x(%rip)
 addl    $2, x(%rip)
 addl    $3, x(%rip)
 addl    $5, x(%rip)
 addl    $7, x(%rip)
 retq

After:

addl    $6, x(%rip)
 addl    $4, x(%rip)
 leaq    -4(%rsp), %rax
 addl    %eax, x(%rip)
 addl    $2, x(%rip)
 addl    $3, x(%rip)
 addl    $5, x(%rip)
 addl    $7, x(%rip)

I just realized that there is nothing that guarantees that an alloca's lifetime is ended in one go, so I'll also have to check that the store is within the lifetime.end base and size.

Herald added a project: Restricted Project. · View Herald TranscriptFri, Feb 1, 7:02 AM
courbet planned changes to this revision.Fri, Feb 1, 7:02 AM
niravd added a comment.Fri, Feb 1, 7:19 AM

I just realized that there is nothing that guarantees that an alloca's lifetime is ended in one go, so I'll also have to check that the store is within the lifetime.end base and size.

Yes. You should use the BaseIndexOffset alias framework here.

Also, you'll need to deal with the additional output from indexed stores.

We should probably also add this some LIFETIME nodes casing to FindBetterChains and improve LIFETIME_END's chain so we can deal with overlapping lifetimes.

courbet updated this revision to Diff 185280.Tue, Feb 5, 4:15 AM

Only remove stores to dead stack space when they are fully contained inside the
object whose lifetime gets terminated.

Also, you'll need to deal with the additional output from indexed stores.

I think I'm fine because of the hasOneUse() check. I'm not sure how I should test this though, do you have a suggestion for some test IR for an indexed store ?

niravd added a comment.Tue, Feb 5, 7:30 AM

Also, you'll need to deal with the additional output from indexed stores.

I think I'm fine because of the hasOneUse() check. I'm not sure how I should test this though, do you have a suggestion for some test IR for an indexed store ?

I believe the CombineTo will fail from too few values on indexed stores, even if the value isn't used. Better to check isIndexed() or do the unfolding back to an ADD/SUB.

I suspect the only way this would trigger currently is if an intrinsic lowers into an indexed store in ARM or AArch64.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15386

I think we should merge this with FindBetterChains. At the least we should have load/store aliasing be able to bypass unrelated lifetime nodes and vice versa.

courbet updated this revision to Diff 185575.Wed, Feb 6, 9:17 AM
  • Add an explicit isIndexed() check to ignore indexed stores.
  • Teach FindBetterChain to bypass unrelated lifetime nodes.
courbet marked an inline comment as done.Wed, Feb 6, 9:18 AM
courbet added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15386

I'm not exactly sure how to merge it, but teaching it to bypass unrelated lifetime nodes sounds like a good idea. Done.

courbet updated this revision to Diff 185577.Wed, Feb 6, 9:20 AM

clang-format

niravd added inline comments.Wed, Feb 6, 10:47 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15386

I was thinking of this as two combines: One sharing logic with FindBetterChains to improve LIFETIME_END's chain to simplify it's chain to (presumably just the store chain) and the other to check the chain of a LIFETIME_END like we do with dead stores via overlapping stores. You already have the second.

15396

LIFETIME_START as well?

15401

You can remove "!ST->hasOneUse()" now that you have the indexed check.

15411

Nit: return CombineTo(ST, ST->getChain);

lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
90

This is "contains".

courbet updated this revision to Diff 185716.Thu, Feb 7, 1:11 AM
courbet marked 4 inline comments as done.
  • search dead stores through non-aliasing LIFETIME_STARTs
  • Better comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15386

I see. There is an issue with making GatherAllAliases work with LifetimeSDNode: LifetimeSDNode cannot be a MemSDNode (or LSBaseSDNode) because MemSDNodes really represent memory accesses, and lifetime nodes really are just references to memory, they never access it. Code manipulating the MemSDNode member MachineMemOperand assumes either load or store, and changing this would require significant changes.

On the other hand now that I've made FindBetterChain able to go past lifetime start/end nodes as you've suggested in previous comments, it will be able to move stores out of unrelated LIFETIME_{START,END}.

15396

Indeed, but only for lifetime_starts that do not overlap. I'm not checking the overlap for lifetime_end because you cannot have:

lifetime_end(a);
lifetime_end(a);

but you might have:

lifetime_start(a);
lifetime_end(a);
lifetime_start(a); // 3
lifetime_end(a);   // 4

and you don't want 4 to go past 3.

15401

I still need to check the !hasOneUse() check because the store might be depended on by several users.

15411

Not sure I understand why ?

lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
90

Not exactly. contains means "one is within the other" (and is not symmetrical), here it's "one might overlap the other" (and is symmetrical). I've added graphical comments + the symmetric formula for reference.

niravd accepted this revision.Thu, Feb 7, 7:06 AM

LGTM, modulo potentially excising the hasOneUse as I've suggested provided you agree with my line of reasoning there.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15396

Yes. Maybe add the check for LIFETIME_END as well, just as a sanity check. I worry that we may have a case where a and a' overlap.

lifetime_start(a);
lifetime_end(a);
lifetime_start(a'); 3
lifetime_end(a');
4

But it's fine as is.

15401

Shoudn't the fact that that the store is an immediate (or at least only through TF) predecessor to the LIFETIME_END be sufficient proof that the store can be removed? In which case we should be able to remove it completely.

The only way it'll fail the hasOneUse case is if we have some redundancy somewhere (presumably because we gave up partway in our alias analysis).

I don't expect to see this very often, so either way is probably fine.

15411

Oh right. You're returning LIFETIME_END. Nevermind.

lib/CodeGen/SelectionDAG/SelectionDAGAddressAnalysis.cpp
90

Right!

This revision is now accepted and ready to land.Thu, Feb 7, 7:06 AM
courbet marked 2 inline comments as done.Fri, Feb 8, 4:46 AM
courbet added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15401

Shoudn't the fact that that the store is an immediate (or at least only through TF) predecessor to the LIFETIME_END be sufficient proof that > the store can be removed? In which case we should be able to remove it completely.

Consider the example in test onealloc_readback_1. During the combine, just before we visit t23 in the dag looks like this:

SelectionDAG has 26 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
    t9: ch = lifetime.start<16 to 32> t0, TargetFrameIndex:i64<0>
  t10: ch = lifetime.start<0 to 16> t9, TargetFrameIndex:i64<0>
  t11: i64 = Constant<0>
  t13: v16i8,ch = load<(load 16 from %ir.a, align 1)> t10, t2, undef:i64
      t7: i64 = add nuw FrameIndex:i64<0>, Constant:i64<16>
    t14: ch = store<(store 16 into %ir.part1, align 1)> t10, t13, t7, undef:i64
  t15: ch = TokenFactor t13:1, t14
    t4: i64,ch = CopyFromReg t0, Register:i64 %1
  t16: v16i8,ch = load<(load 16 from %ir.b, align 1)> t15, t4, undef:i64
  t17: ch = store<(store 16 into %ir.part21)> t15, t16, FrameIndex:i64<0>, undef:i64
  t28: v16i8,ch = load<(load 16 from %ir.part21)> t17, FrameIndex:i64<0>, undef:i64
          t18: ch = TokenFactor t16:1, t17
        t19: ch = lifetime.end<16 to 32> t18, TargetFrameIndex:i64<0>
        t26: ch = store<(store 16 into %ir.a, align 1)> t0, t28, t2, undef:i64
      t30: ch = TokenFactor t19, t28:1, t26
    t23: ch = lifetime.end<0 to 16> t30, TargetFrameIndex:i64<0>
  t25: ch = X86ISD::RET_FLAG t23, TargetConstant:i32<0>

Note how the store t17 is in immediate predecessor (modulo TF & non-aliasing lifetime end) of lifetime end t23. However, load t28 also depends on t17. But the lifetime end t23 does not care about the load, so it makes sense for t23 not to have a chain on t28.

courbet updated this revision to Diff 185946.Fri, Feb 8, 4:48 AM
  • Add the check for overlap when traversing LIFETIME_ENDs.
  • print the extents of the lifetime nodes when printing the DAG for debug.