This is an archive of the discontinued LLVM Phabricator instance.

[MachineCopyPropagation] Do more backward copy propagations
Needs ReviewPublic

Authored by yshui on Mar 15 2021, 1:25 PM.

Details

Summary

Previously we only do backward copy propagations if there is no read or write to both the Def and Src of the copy between the definition site and the copy site.

However, this restriction can be relaxed. Reads of the Src can be allowed as long as those Reads are rewritten to use Def instead during the propagation.

This commit implements that, potentially improve the generated code. In particular, this resolve an inconsistency between optimized build and optimized build with debuginfo, where DBG_VALUE instructions could prevent backward copy propagation from happening (See: https://bugs.llvm.org/show_bug.cgi?id=49446)

Diff Detail

Event Timeline

yshui created this revision.Mar 15 2021, 1:25 PM
yshui requested review of this revision.Mar 15 2021, 1:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2021, 1:25 PM
yshui added a comment.EditedMar 15 2021, 1:27 PM

Things still need to be done:

  1. Review the failed tests and see which ones are actual bad CodeGen, and which ones need to be updated.
  2. Add a test case for this propagation case.

I need some help with (2) as I can't come up with a simple test case that generates the kind of machine IR pattern I need.

yshui updated this revision to Diff 330794.Mar 15 2021, 1:31 PM

Fix typos in comments.

foad added a subscriber: foad.Mar 15 2021, 1:48 PM
yshui updated this revision to Diff 330797.Mar 15 2021, 1:53 PM

Adjust how DBG_VALUE is rewritten.

yshui updated this revision to Diff 330803.Mar 15 2021, 2:07 PM

Remove a TODO which is no longer relevant

yshui updated this revision to Diff 330805.Mar 15 2021, 2:14 PM

Adjust a debug message.

Harbormaster completed remote builds in B93909: Diff 330797.

Please do move https://reviews.llvm.org/D98401 regression test to this patch set

foad added inline comments.Mar 16 2021, 4:40 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
331

Hi, I tried testing this on AMDGPU but quite a lot of tests crash here because Copy is nullptr.

yshui added inline comments.Mar 16 2021, 4:56 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
331

Thanks, I will have a look.

yshui updated this revision to Diff 330966.EditedMar 16 2021, 6:34 AM

Strengthen the invariant maintained by the CopyTracker.

trackUse assumed that when a copy instruction exists in Copies, it can be found via both its Def and its Src, and vice-versa. This wasn't true previously, causing segfaults because findCopyDefViaUnit returned nullptr.

yshui marked an inline comment as done.Mar 16 2021, 6:35 AM
yshui added inline comments.
llvm/lib/CodeGen/MachineCopyPropagation.cpp
331

Looks like I made an assumption that was stronger than what CopyTracker actually guaranteed.

This should be fixed now.

foad added inline comments.Mar 16 2021, 7:08 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
331

Yes that fixes the crashes thanks. There are no affected tests for AMDGPU -- of course I was hoping for some massive codegen improvents!

yshui updated this revision to Diff 331113.Mar 16 2021, 3:16 PM

I reviewed the changes in CodeGen tests to the best of my ability, and they look fine to me. If someone else could review and confirm, that would be appreciated.

Also fix some clang-tidy complaints.

yshui marked an inline comment as done.Mar 16 2021, 3:20 PM

Please upload all patches with full context (-U9999)

yshui updated this revision to Diff 331208.Mar 17 2021, 3:46 AM

Include full context in the patch

yshui updated this revision to Diff 331238.Mar 17 2021, 5:20 AM
yshui added a comment.Mar 17 2021, 5:20 AM

Update Mips tests.

My knowledge of Mips is much more limited. I am especially not sure about the $gp motion. Someone more knowledgeable should definitely double check.

I don't know why it only affects microMIPS (presumably it does its lowering subtly differently from normal MIPS) but those changes look sensible to me

I don't know why it only affects microMIPS (presumably it does its lowering subtly differently from normal MIPS) but those changes look sensible to me

MIIPS changes look good to me. Using $gp register in instructions like lw $25, %call16(__divti3)($gp) is a common way. I do not know why initially another register were involved.

yshui added a comment.Mar 17 2021, 1:50 PM

Since this patch already changed output of many test cases, do we still need a test case specifically for this patch?

lkail added a comment.Mar 17 2021, 9:06 PM

I haven't looked into implementation details. There are some coding style issues need to be fixed.

llvm/lib/CodeGen/MachineCopyPropagation.cpp
100

Any reason of moving these two functions?

378

This comment should be ended with a period.

382

Comment should be a complete sentence.

922
940

Missing period.

lkail added a comment.Mar 17 2021, 9:18 PM

Since this patch already changed output of many test cases, do we still need a test case specifically for this patch?

You'd better add some mir tests to demonstrate the idea, w/o -g.

condy added a comment.Mar 18 2021, 1:05 AM

Since this patch already changed output of many test cases, do we still need a test case specifically for this patch?

You'd better add some mir tests to demonstrate the idea, w/o -g.

A piece of tests can be copied from https://reviews.llvm.org/D98401 @yshui

yshui added a comment.Mar 18 2021, 3:39 AM

Since this patch already changed output of many test cases, do we still need a test case specifically for this patch?

You'd better add some mir tests to demonstrate the idea, w/o -g.

Thanks for your review. There is a test case in https://reviews.llvm.org/D98401, which is quite big. I wasn't able to reduce it further, is it OK to just include that as test case?

yshui added inline comments.Mar 18 2021, 4:07 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
100

I realized these two functions don't have to be part of CopyTracker. If this is not fine I can put them back.

lkail added a comment.Mar 18 2021, 4:53 AM

Thanks for your review. There is a test case in https://reviews.llvm.org/D98401, which is quite big. I wasn't able to reduce it further, is it OK to just include that as test case?

Have you tried using bugpoint to reduce it further, or writing MIR test manually?

llvm/lib/CodeGen/MachineCopyPropagation.cpp
100

I would prefer leave them as they are to make code diff less noisy. If you have strong desire to move, please post another NFC patch.

yshui added a comment.Mar 18 2021, 5:01 AM

Thanks for your review. There is a test case in https://reviews.llvm.org/D98401, which is quite big. I wasn't able to reduce it further, is it OK to just include that as test case?

Have you tried using bugpoint to reduce it further, or writing MIR test manually?

I am not sure how to use bugpoint for this, as this is not a miscompilation, bad codegen, or crash. But I did try using llvm-reduce, which did help, but the resulting IR is still pretty big.

I find it difficult to manually create the kind of MIR pattern I want.

yshui added inline comments.Mar 18 2021, 5:08 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
100

Ah, sorry, my mistake. These functions were originally in class MachineCopyPropagation, but I needed them in class CopyTracker, so I moved them out.

yshui updated this revision to Diff 331521.Mar 18 2021, 5:10 AM

Style fixes.

yshui marked 4 inline comments as done.Mar 18 2021, 5:11 AM

There isn't a lot of difference to how debug-info is updated, as far as I can see, but the summary says a fault is resolved. Is this patch just masking the debug-info issue by doing more propagation? (I might have missed something in the implementation).

For the codegen change -- I would suggest this additional feature deserves a dedicated MIR test. All the tests modified are IR-to-assembly, where unrelated changes to other parts of the compiler might obscure this feature being accidentally disabled or otherwise modified. A MIR test for this specific behaviour in this specific pass will guarantee it isn't broken in the future. Plus, it'll make it easier for reviewers and post-review-readers to understand the patch.

llvm/lib/CodeGen/MachineCopyPropagation.cpp
295–298

Performance nit -- is it safe to return a SmallVector * in the pair? That avoids needless copying and potential allocations.

yshui added a comment.Mar 24 2021, 6:36 PM

There isn't a lot of difference to how debug-info is updated, as far as I can see, but the summary says a fault is resolved. Is this patch just masking the debug-info issue by doing more propagation? (I might have missed something in the implementation).

I am not exactly sure what you mean here. The fault is caused by debug-info preventing propagation from happening, this patch fixes that problem.

For the codegen change -- I would suggest this additional feature deserves a dedicated MIR test. All the tests modified are IR-to-assembly, where unrelated changes to other parts of the compiler might obscure this feature being accidentally disabled or otherwise modified. A MIR test for this specific behaviour in this specific pass will guarantee it isn't broken in the future. Plus, it'll make it easier for reviewers and post-review-readers to understand the patch.

I agree, but I mentioned above I have problem creating an small IR input that would generate the kind of MIR pattern I need. There is a test case attached to the issue but it's quite big.

yshui added inline comments.Mar 24 2021, 6:39 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
295–298

I don't see a problem with this. Will update.

yshui updated this revision to Diff 333196.Mar 24 2021, 8:14 PM

Thanks @jmorse for the review.

Update findAvailBackwardCopy to return a pointer to the SmallVector instead.

yshui marked an inline comment as done.Mar 24 2021, 8:15 PM
yshui updated this revision to Diff 333684.Mar 27 2021, 12:47 PM

Rebasing

yshui updated this revision to Diff 333685.EditedMar 27 2021, 2:03 PM

Uh, I just figured out how mir tests work. Guess nobody wanted to point out how dumb I am. 😅

Adapted the MIR test from D98401, thanks @condy

Uh, I just figured out how mir tests work.

No worries, there's a first time for everything :),

I am not exactly sure what you mean here. The fault is caused by debug-info preventing propagation from happening, this patch fixes that problem.

This is true, but I think the patch fixes it by masking the underlying debug-info problem. I've taken a closer look at MCP, I reckon there's a fault in MachineCopyPropagation::BackwardCopyPropagateBlock where it treats all register operands the same, i.e. a "use" by a DBG_VALUE is treated the same as a "use" by any ordinary instruction, causing the debug instruction to prevent an optimisation. The simplest and easiest fix for that, is to have the function ignore debug-info instructions by adding a "if (MI.isDebugInstr()) continue;" statement in the loop before interpreting its register-reads.

That fits with why the extra optimisation gets rid of the different-code-with-debug problem, optimising through register uses would also optimise through debug uses.

If I'm right about that -- could you add that guard, with a debug specific test (ideally in the llvm/tests/DebugInfo/X86 directory), in a separate patch so that we can consider the debug-info matter separate from the optimisation.

That being said: the extra optimisation in this patch is good, as shown by the changes to ll-to-assembly tests, some completely un-necessary copies get eliminated. It's well worth pursuing -- I think more MIR test coverage is needed though (as suggested in D98659#2633657), for example:

  • A full example without debug-info being involved,
  • Testing that no optimisation happens if the "renamable" flag is absent,
  • Checking that propagation doesn't occur (or occurs correctly modified) when sub-registers are read

Plus the scenarios you list in the comment in the trackUse function. The best way to guarantee that those situations are not incorrectly optimised by mistake in the future is by having a test that detects such a scenario. (It's often the case that the tests are a lot more important than the code in the patch itself).

yshui added a comment.EditedApr 2 2021, 2:07 PM

The simplest and easiest fix for that, is to have the function ignore debug-info instructions by adding a "if (MI.isDebugInstr()) continue;" statement in the loop before interpreting its register-reads.

This is what @condy did in D98401, but this is incorrect. A debug-info use is still a use, you still need to update them when doing propagation, otherwise the debug-info would be incorrect. And that is what this patch is trying to do.

That being said: the extra optimisation in this patch is good, as shown by the changes to ll-to-assembly tests, some completely un-necessary copies get eliminated. It's well worth pursuing -- I think more MIR test coverage is needed though (as suggested in D98659#2633657), for example:

  • A full example without debug-info being involved,
  • Testing that no optimisation happens if the "renamable" flag is absent,
  • Checking that propagation doesn't occur (or occurs correctly modified) when sub-registers are read

Plus the scenarios you list in the comment in the trackUse function. The best way to guarantee that those situations are not incorrectly optimised by mistake in the future is by having a test that detects such a scenario. (It's often the case that the tests are a lot more important than the code in the patch itself).

Makes sense, I'll get on with those.

Thanks!

yshui updated this revision to Diff 335397.Apr 5 2021, 9:57 PM

Add tests suggested by @jmorse

jmorse added inline comments.Apr 14 2021, 8:56 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
172–174

The key is only MCRegisters, but invalidateRegister and similar functions only work with register units. How are Uses found so that they can be invalidated too?

210–211

Could you explain how this is different from the original code -- after all, the original implementation invalidated copes for the reguints for source and destinations of copies found. I don't quite understand "entire register" -- is it the set of regunits, the aliases, or something else?

318–320

This would be a lot easier to understand if it had a motivating example of what it's doing -- see the top of MachineCopyPropagation.cpp for a few samples of what I mean.

327

clang-format? (And other single-statement-continues)

400–404

Shouldn't this be left to the rest of the machine-copy-propagate implementation -- after all, this method is called "trackUse", and it seems to be doing more than that.

948

I ran a stage2 RelWithDebInfo build of clang to test this (I was curious), and got an assertion failure in this function:

llvm::MachineRegisterInfo::updateDbgUsersToReg(llvm::Register, ArrayRef<llvm::MachineInstr *>) const: Assertion `MI->getOperand(0).isReg()' failed.
llvm/test/CodeGen/X86/machine-copy-prop.mir
139–149

The rbp-to-rbx copy is deleted without this patch too, the only difference is that some of the registers change. I think this is fragile, and doesn't demonstrate the intention of the optimisation. Could you use a test body that

a) deletes a COPY that wasn't already dead, and
b) rewrites a non-COPY operand, another NOOP for example.
yshui added inline comments.Apr 15 2021, 2:58 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
172–174

Uses is always accessed with registers as key (instead of register units), whether it's to find entries or to invalidate entries.

Unless I misunderstood your question?

210–211

So entries of Copies form pairs. Say we have a Def = COPY Src, then Copies will contain {RegUnits of Def} and {RegUnits of Src} as a pair. trackUse relies on the invariant that when Def = COPY Src is no longer a candidate, all related entries in Copies are removed.

The original code doesn't guarantee this. When a RegUnit of a Src register is passed to the original invalidateRegister, only that RegUnit, as well as the related Def RegUnits will be removed, leaving all other RegUnits of that Src register around.

This change makes sure all the related RegUnits are removed.

400–404

trackUse tracks the uses that are rewrite candidates, so if a use is no longer a candidate for rewrite, trackUse removes it.

I think this makes sense.

948

Interesting. I assume this is on X86? Do you happen to have the offending MIR around?

yshui added inline comments.Apr 18 2021, 6:07 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
948

I can successfully build a RelWithDebInfo stage 2 clang with this patch applied to ab158d35b5a09b2541071ec8351a6ad57dfd7b6e.

Maybe a recent change broke this patch. I will test again after updating.

yshui added inline comments.Apr 18 2021, 3:50 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
948

Hmm, still cannot reproduce with this patch applied to d4528cbb0e703ee17ee6fb4abe72b7246ccb38f1

yshui updated this revision to Diff 348642.EditedMay 29 2021, 9:56 AM

Update:

  • Rebase
  • Add an example of the new copy propagation in the top comments
  • One more updated test cases: CodeGen/X86/vector-interleaved-load-i16-stride-4.ll
  • Style fixes
yshui marked 2 inline comments as done.May 29 2021, 10:08 AM
yshui updated this revision to Diff 349256.Jun 2 2021, 6:49 AM

Rebasing

yshui updated this revision to Diff 350004.EditedJun 4 2021, 6:36 PM

Update:

  • Rebasing
  • More affected tests:
    • CodeGen/RISCV/rvv/vselect-fp-rv64.ll
    • CodeGen/RISCV/rvv/vselect-fp-rv32.ll
yshui updated this revision to Diff 350840.EditedJun 9 2021, 3:23 AM

Rebasing, no new affected tests.

yshui updated this revision to Diff 351684.Jun 12 2021, 2:10 PM

One more affected test: CodeGen/ARM/umulo-128-legalisation-lowering.ll

lkail added inline comments.Jun 17 2021, 4:36 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
54

Is it possible forward copy propagation can also eliminate this COPY if $R1 defined by this COPY is killed at the BB later? If so, I think we should extend forward copy propagation and should be less complex than what current patch does.

283

Missing perid.

yshui added inline comments.Jun 20 2021, 7:41 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
54

One case I can think of where this can't done is if $R0 got clobbered between $R1 = $R0 and the use of $R1 later.

yshui updated this revision to Diff 353263.Jun 20 2021, 8:05 PM

Fix a missing period

yshui marked an inline comment as done.Jun 20 2021, 8:05 PM

Looks like the verifier error I experienced must have been transient / my fault, I can't reproduce it now,

I think I understand what's going on here now, however it involves re-purposing the "Avail" flag attached to copies in an unexpected way. At the very least, it needs re-naming. As far as I understand it before this patch, avail means:

  • False: candidate for def being rewritten in backward copy propagation
  • False: is a clobbered register, in forward copy propagation
  • True: candidate for forward copy propagation

With this patch the "false" value takes on even more meaning: "uses can be rewritten along the way". Realistically, I think if this is to all be understood, the "Avail" flag needs to become an enum or something to communicate more information to the reader about what it means.

How does this new form of backwards copy propagation interact with the existing backwards copy propagation? The reason being, modifying one your tests a little:


name: copyprop3
body: |
  bb.0:
    renamable $rbp = LEA64r $rax, 1, $noreg, 8, $noreg
    NOOP implicit renamable $rbp
    renamable $rax = COPY renamable $rbp
    renamable $rbx = COPY killed renamable $rbp
    NOOP implicit $rax
...

I was expecting $rbp to be rewritten to $rax and the copies dropped -- instead the dead COPY is dropped, and the uses left un-rewritten:

bb.0:
  renamable $rbp = LEA64r $rax, 1, $noreg, 8, $noreg
  NOOP implicit renamable $rbp
  renamable $rax = COPY renamable $rbp
  NOOP implicit $rax

I suspect (90% confidence) that this is because CopyTracker::invalidateRegister is deleting the copy because it's not eligible for the "plain" backwards copy propagation. The design of these two things needs to be unified.

yshui added a comment.EditedJun 21 2021, 5:07 AM

Looks like the verifier error I experienced must have been transient / my fault, I can't reproduce it now,

I think I understand what's going on here now, however it involves re-purposing the "Avail" flag attached to copies in an unexpected way. At the very least, it needs re-naming. As far as I understand it before this patch, avail means:

  • False: candidate for def being rewritten in backward copy propagation
  • False: is a clobbered register, in forward copy propagation
  • True: candidate for forward copy propagation

With this patch the "false" value takes on even more meaning: "uses can be rewritten along the way". Realistically, I think if this is to all be understood, the "Avail" flag needs to become an enum or something to communicate more information to the reader about what it means.

I think it's just "Avail" has different meanings in forward/backward propagation. In backward propagation, it means "def is a candidate for rewriting", this patch doesn't add a new meaning.

How does this new form of backwards copy propagation interact with the existing backwards copy propagation? The reason being, modifying one your tests a little:


name: copyprop3
body: |
  bb.0:
    renamable $rbp = LEA64r $rax, 1, $noreg, 8, $noreg
    NOOP implicit renamable $rbp
    renamable $rax = COPY renamable $rbp
    renamable $rbx = COPY killed renamable $rbp
    NOOP implicit $rax
...

I was expecting $rbp to be rewritten to $rax and the copies dropped -- instead the dead COPY is dropped, and the uses left un-rewritten:

bb.0:
  renamable $rbp = LEA64r $rax, 1, $noreg, 8, $noreg
  NOOP implicit renamable $rbp
  renamable $rax = COPY renamable $rbp
  NOOP implicit $rax

I suspect (90% confidence) that this is because CopyTracker::invalidateRegister is deleting the copy because it's not eligible for the "plain" backwards copy propagation. The design of these two things needs to be unified.

renamable $rax = COPY renamable $rbp isn't a candidate for propagation because $rbp isn't killed in that instruction. The later copy is probably eliminated because it's dead code.

yshui updated this revision to Diff 358223.Jul 13 2021, 3:34 AM

Rebasing, new affected test:

  • CodeGen/Thumb2/mve-vqdmulh.ll