Page MenuHomePhabricator

[MachineCopyPropagation] Eliminate spillage copies that might be caused by eviction chain
Needs ReviewPublic

Authored by lkail on Mar 21 2022, 2:21 AM.

Details

Reviewers
aditya_nandakumar
qcolombet
Group Reviewers
Restricted Project
Summary

Remove spill-reload like copy chains. For example

r0 = COPY r1
r1 = COPY r2
r2 = COPY r3
r3 = COPY r4
<def-use r4>
r4 = COPY r3
r3 = COPY r2
r2 = COPY r1
r1 = COPY r0

will be folded into

r0 = COPY r1
r1 = COPY r4
<def-use r4>
r4 = COPY r1
r1 = COPY r0

Diff Detail

Unit TestsFailed

TimeTest
1,940 msx64 debian > MemProfiler-x86_64-linux-dynamic.TestCases::test_merge_mib.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fmemory-profile -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -shared-libsan -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/memprof/TestCases/test_merge_mib.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/memprof/X86_64LinuxDynamicConfig/TestCases/Output/test_merge_mib.cpp.tmp

Event Timeline

lkail created this revision.Mar 21 2022, 2:21 AM
lkail requested review of this revision.Mar 21 2022, 2:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 2:21 AM
lkail retitled this revision from [MachineCopyPropagaion][WIP] Eliminate spillage copies that might caused by eviction chain to [MachineCopyPropagation][WIP] Eliminate spillage copies that might caused by eviction chain.Mar 21 2022, 2:21 AM
lkail updated this revision to Diff 416891.Mar 21 2022, 4:23 AM
lkail updated this revision to Diff 416893.Mar 21 2022, 4:32 AM
lkail updated this revision to Diff 417201.Mar 22 2022, 12:35 AM
qcolombet added a subscriber: aditya_nandakumar.

At first glance it looks similar to what @aditya_nandakumar implemented internally to get rid of copies produced by eviction chains.

@aditya_nandakumar could you take a look?

Talked to Aditya offline (a while back actually) and he told me he doesn't expect to have time to look at this any time soon.
Adding myself as a reviewers.

Hi @lkail ,

Thanks for your patience.
This goes in the right direction.

I think we miss a few comments and some cleanups (clang-format, remove stall comments, use proper LLVM_DEBUG macros, etc.) and we're good to go!

Cheers,
-Quentin

llvm/lib/CodeGen/MachineCopyPropagation.cpp
1052

Add comments.
What this does: explain the algorithm at a high level.

1060

Not for this patch, but you may want to use isCopyInstr instead of isCopy to catch more cases.
That said, it probably won't make much of a difference, since in particular most copies you're trying to remove here comes from splitting in regalloc (i.e., we'll have plain COPY).

1060

To be on the safe side, you may want to check that the operation has no implicit operand.

1067

Instead of copying the chain, can we hold on to the reference until we're done with the processing?

1074

Maybe worth splitting the reload from the spill chain as this assert is strange at first glance.
Perhaps it wouldn't be as problematic when the method is properly documented.

Put differently, let's leave it like this for now, add a bunch of comments and we'll see if it still feels weird after that.

1077

We'll need to expand on that commet because naively, a 2 pairs chain would be beneficial to remove, but this is not something this code can do since we don't recolor outside of the spill chain.

I'd suggest putting something like: We need at least 3 pairs of copies for the transformation to apply, because the first outermost pair cannot be removed since we don't recolor outside of the chain and that we need at least one temporary spill slot to shorten the chain.
If we only have a chain of two pairs, we already have the shortest sequence this code can handle: the outermost pair for the temporary spill slot, and the pair that use that temporary spill slot for the other end of the chain.

1089

Here and other places where you have "debug" statements:
Put this in LLVM_DEBUG macros.
Or remove completely.

1094

Move that into its own helper function and call it from an assert.

static bool LLVM_ATTRIBUTE_UNUSED isValidChain(const SmallVectorImpl<MachineInstr *> &Chain) {
  // your checks here.
}
...
assert(isValidChain(Chain) && "Invalid chain to process");
1099

I feel that this is a bit late to check that.
We should not put a copy in the "candidate" chains if the copy is not foldable.

I would suggest to handle that in the main loop.

1106

By construction Chain[Len - 4]->getOperand(0) == Chain[Len - 3]->getOperand(1), so I would instead put that in a variable and use it in both places.

E.g., something like:

// Pull the last spill slot used only within the chain as the final spill slot.
MCRegister LastReusableRegSpillSlot = Chain[Len - 4]->getOperand(0).getReg()
// Update the chain to skip all the intermediate register spill slots:
// Spilling:
Chain[0]->getOperand(0).setReg(LastReusableRegSpillSlot);
// Reload:
Chain[1]->getOperand(1).setReg(LastReusableRegSpillSlot);
1108

Maybe be worth adding a comment here that although the variable is called MaybeDeadCopies, we really are going to remove the related instructions.

The fact that we use MaybeDeadCopies to do our code cleanup is slightly confusing because if we don't actually delete the intermediate copies (a.k.a. what remains of the chain at this point) the resulting code would be incorrect.

1173

At first it is strange to see that we look for a copy when Reg is a def, but I guess it makes sense because:

  1. We are not going to recolor Reg
  2. We need to consider this chain before it gets clobbered later in that same loop

Assuming I understood that correctly, it deserves its comment here.

1181

Use LeadRegs.find and avoid the double lookups (one in count and one in operator[]).

1191

I think this statement deserves its own comment.
IIUC here we unconditionally clobber all the registers (as opposed to only clobbering the definitions) because we only rewrite the chain itself (i.e., we don't attempt to rewrite uses after the chain).

BTW, you need to take into account regmasks too.

1191

Shouldn't we clear the SpillChains here for defs and not-preversed-by-regmasks regs at this point?

llvm/test/CodeGen/PowerPC/mcp-elim-eviction-chain.mir
135

Add a test with regmasks.

lkail updated this revision to Diff 471178.Oct 27 2022, 9:02 AM
lkail retitled this revision from [MachineCopyPropagation][WIP] Eliminate spillage copies that might caused by eviction chain to [MachineCopyPropagation] Eliminate spillage copies that might caused by eviction chain.
lkail edited the summary of this revision. (Show Details)
lkail added a reviewer: Restricted Project.

Hi @qcolombet , thanks for your detail comments. I have uploaded another patch which is very different from previous one. Not sure I have addressed all your comments.

lkail marked 7 inline comments as done.Oct 27 2022, 9:04 AM
lkail retitled this revision from [MachineCopyPropagation] Eliminate spillage copies that might caused by eviction chain to [MachineCopyPropagation] Eliminate spillage copies that might be caused by eviction chain.Oct 27 2022, 9:13 AM
lkail added a comment.Tue, Nov 15, 9:44 PM

Gentle ping.

Hi @lkail,

I am halfway through.

I'm sharing my comments so far if you want to get started with some of the nitpicks.

Cheers,
-Quentin

llvm/lib/CodeGen/MachineCopyPropagation.cpp
106

Maybe rename in LastSeenUseInCopy.

Essentially, I would avoid LastUse alone as it carries a lot of expected semantic that I don't think apply here.

196

Use MI directly here instead of adding it line 200.

290

Could Current be const here?

309

If Def is clobbered between DefCopy and Current I would have expected that DefCopy would have been removed from Copies.

Put differently, I feel that if we have to check that here, the Copies map is holding hard-to-reason-about information.

Are we missing some call to clobberRegister?

1059

I think you can simplify this with a range based loop with rbegin/rend.

1060

That should be a simple range based loop:

for (const MachineInstr *MI: RC)
  MI->dump();
1079

Technically we could collapse this sequence to:

// r0 = COPY r4
// <def-use r4>
// r4 = COPY r0

I.e., I think it is worth explaining that the propagation doesn't check whether or not r0 can be altered outside of the chain and that's why we conservatively keep its value as it was before the rewrite. (Would be a nice follow-up fix BTW :)).

1082

Typo: until

1082

typo: chain uses

1095

typo: encountered

1100

That sounds weird.
Would you mind sharing the assertion?

1102

Could you add a comment on what the mappings hold (all three of them)?

I haven't read the code past this point yet, but for instance I would have expected that the key in these maps would be a Register not MachineInstr.

1103

typo: until

1105

Instead of tracking that, should we just invalidate the chain / stop it before that point?

1118

Maybe add a todo that if the outermost pair of copies modifies a register that is dead outside of that pair, we could eliminate one more pair.

lkail added inline comments.Fri, Nov 18, 12:07 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
196

Correct me if I'm woring, Copies.insert insert successfully only when the key doesn't exist before. Suppose we have

L0: R0 = COPY R1
L1: R2 = COPY R1

LastUseSeenInCopy should track MI in L1 rather than L0.

309

I can imagine there might be some RegMasks doesn't implicit def any registers, just clobber them. When we are traversing the MBB and are encountered with a RegMask without any other implicit-def, we don't know which register to clobber directly. I'm not sure we have a way to enumerate registers a RegMask clobbers. If there is such a way, I think we should clobber registers when we are traversing the MBB, not checking RegMask clobbers here.

1100

It hits DenseMapIterator's

  pointer operator->() const {
    assert(isHandleInSync() && "invalid iterator access!");
...
  }

I dived into it a bit, looks it's checking the validity of the iterator, i.e., if the container is updated, the iterator constructed before the update is invalid.
Code like

        auto Leader = ChainLeader.find(MaybePrevReload);
...
        ChainLeader.insert({MaybeSpill, Leader->second});
        ChainLeader.insert({&MI, Leader->second});

Should be avoided.

1102

I haven't read the code past this point yet, but for instance I would have expected that the key in these maps would be a Register not MachineInstr.

I separate the algorithm implementation in to two stages. stage1: Collect spill-reload chains. stage2: Fold the chains. If using Register, we are unable to track different spill-reload chains that share same registers.

1105

The implementation doesn't invalidate any chain in stage1. Compared to previous implementation, I think current one is easier to reason and easier to maintain. When we are iterating MI inside the MBB, we don't know which COPY might be one of the innermost spill-reload pair and we don't want to lose track of the innermost spill-reload pair. The Source of the innermost spill is allowed to be re-use and re-def between the innermost spill-reload pair.

lkail updated this revision to Diff 476364.Fri, Nov 18, 12:08 AM

Address comments.

lkail marked 10 inline comments as done.Fri, Nov 18, 12:09 AM
lkail updated this revision to Diff 476369.Fri, Nov 18, 12:17 AM
lkail marked an inline comment as done.
lkail updated this revision to Diff 476371.Fri, Nov 18, 12:30 AM
lkail added inline comments.Fri, Nov 18, 12:57 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
1079

Maybe we can check if r0 is killed to remove one more COPY.

qcolombet added inline comments.Tue, Nov 22, 7:20 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
196

Ah good point!

309

I think I see our misunderstanding.
Given the name of the function I would expect that this function only does some queries on the tracker, but you're actually using this function to do some bookkeeping as well.

So the conclusion is either rename this function to more accurately represents what it does (I don't have a good name for now) or move the bookkeeping in the main loop (i.e., I thought we were calling clobberRegister from the main loop already.)

Regarding your comment on RegMasks, I am not sure I follow:

  • RegMasks always list all the registers they preserve/clobber
  • Liveness sets at basic block boundaries are not represented with RegMasks, but anyway we don't care because the tracking is always purely local to a basic block in that pass. (Unless you've changed that and I missed it :)).
1059

You should be able to use an even more compact form:

for (auto I : make_range(SC.rbegin(), SC.rend())
1079

Yep, but for that to be accurate we would need to flip the direction of the analysis (from top-down, to bottom-up) to get proper liveness construction.
(Or use the kill flag, but generally speaking we try to avoid relying on this.)

1100

Yep, every time you insert something in the dense map, you may invalidate the iterators.

1102

I see, make sense.

lkail added inline comments.Wed, Nov 23, 1:36 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
309

RegMasks always list all the registers they preserve/clobber

Ah, I see. Is it a good idea to enumerate them via TRI->getNumRegs() and check RegMask to see if they are preserve/clobber?

For the origin question

If Def is clobbered between DefCopy and Current I would have expected that DefCopy would have been removed from Copies.

Does it imply we should not check RegMask and don't update bookkeeping by calling Tracker::clobberRegister here, checking if RegMask clobbers registers should already have been conducted in the main loop?

Correct me if I still fail to get your point.

qcolombet added inline comments.Thu, Nov 24, 12:15 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
309

Ah, I see. Is it a good idea to enumerate them via TRI->getNumRegs() and check RegMask to see if they are preserve/clobber?

That's the idea. Though we wouldn't need to enumerate all the registers, only the ones that you care about.
What you're doing here is already fine.

The thing that bothers me in the current code is the call to clobberRegister. This modifies the state of the tracker. Usually the find methods only query the RegMasks directly (with MachineOperand::clobbersPhysReg).

Correct me if I still fail to get your point.

You got the point.
Let's keep the code as is for now and let me do a full pass on the code so that I have a better model of how the whole things works.

I'll probably won't have time to do it before next week though.