This is an archive of the discontinued LLVM Phabricator instance.

[MachineCopyPropagation] Remove redundant copies after TailDup via machine-cp
AbandonedPublic

Authored by lkail on Jul 25 2019, 2:12 AM.

Details

Summary

After tailduplication, we have redundant copies. For example

bb.6.land.lhs.true.1:                                                                                                                                                                          
; predecessors: %bb.4                                                                                                                                                                          
  successors: %bb.2(0x80000000); %bb.2(100.00%)                                                                                                                                                
                                                                                                                                                                                               
  renamable $x5 = LI8 0                                                                                                                                                                        
  B %bb.2


bb.2.cleanup16:                                                                                                                                                                                
; predecessors: %bb.3, %bb.1, %bb.6                                                                                                                                                            
  liveins: $x5                                                                                                                                                                                 
  $x3 = COPY killed renamable $x5                                                                                                                                                              
  BLR8 implicit $lr8, implicit $rm, implicit $x3

TailDuplication will duplicate the copy of $x5 and blr in bb.2 to the end of bb.6.

We can remove these copies in machine-cp if it's safe to, i.e.

$reg0 = OP ...
... <<< No uses of $reg0 and $reg1
$reg1 = COPY $reg0 <<< $reg0 is killed
...
<RET>

will be transformed to

$reg1 = OP ...
...
<RET>

Diff Detail

Event Timeline

lkail created this revision.Jul 25 2019, 2:12 AM
lkail edited the summary of this revision. (Show Details)Jul 31 2019, 6:35 PM

Is this something that could be fixed in TailDuplication?

lkail added a comment.Aug 1 2019, 7:59 PM

Is this something that could be fixed in TailDuplication?

TailDuplication doesn't take specific instructions, such as generating a immediate number, into account. We are able to add code to do such folding in tailduplication, however I think it's beyond taildup's duty.

For the test case, we have such MIR before taildup

bb.6.land.lhs.true.1:                                                                                                                                                                          
; predecessors: %bb.4                                                                                                                                                                          
  successors: %bb.2(0x80000000); %bb.2(100.00%)                                                                                                                                                
                                                                                                                                                                                               
  renamable $x5 = LI8 0                                                                                                                                                                        
  B %bb.2


bb.2.cleanup16:                                                                                                                                                                                
; predecessors: %bb.3, %bb.1, %bb.6                                                                                                                                                            
  liveins: $x5                                                                                                                                                                                 
  $x3 = COPY killed renamable $x5                                                                                                                                                              
  BLR8 implicit $lr8, implicit $rm, implicit $x3

So taildup copys bb.2s code at the end of bb.6.

jsji added a comment.Aug 12 2019, 2:44 PM

Have you considered extend MachineCopyPropagation to cover this? Looks like to me that this is a 'backward' COPY propagation.

lkail added a comment.Aug 12 2019, 6:45 PM
This comment was removed by lkail.
This comment was removed by lkail.

Have you considered extend MachineCopyPropagation to cover this? Looks like to me that this is a 'backward' COPY propagation.

Yes, it's indeed a backward COPY problem. But I suggest not to implement it in machine-cp. I've inspected MIR of arm64 and x86-64, no such pattern is generated after taildup. If I implement it in machine-cp, I might write some like isMoveImmediate and invoke target hook like TII->FoldImmediate. However currently only PowerPC implements specific TII->FoldImmediate. Taking above into account, this issues seems not general enough to implement in machine-cp.

jsji added a comment.Aug 13 2019, 7:35 AM

If I implement it in machine-cp, I might write some like isMoveImmediate and invoke target hook like TII->FoldImmediate. However currently only PowerPC implements specific TII->FoldImmediate. Taking above into account, this issues seems not general enough to implement in machine-cp.

I don't see this is specific to 'isMoveImmediate`.

//   $reg0 = OP ....
// ...
//   $reg1 = COPY $reg0

If we can be sure that $reg0 is renamable, and no use of $reg0 or $reg1 in between (and of course there might be other restrictions, like ReservedReg etc),
we should be able to remove the COPY here.

// `$reg1` = OP ..
// ...

LI $r,0 in your testcase is just a typical example here.

lkail added a comment.EditedAug 13 2019, 7:38 PM
//   $reg0 = OP ....
// ...
//   $reg1 = COPY $reg0

If we can be sure that $reg0 is renamable, and no use of $reg0 or $reg1 in between (and of course there might be other restrictions, like ReservedReg etc),
we should be able to remove the COPY here.

// `$reg1` = OP ..
// ...

One concern here, there might be uses of $reg0 in succ BBs. To overcome it, we might constraint this opt in terminal BBs considering current machine-cp working on single BB. To summarize, without complex uses and defs(thus we don't need to do complex replace-uses-with after RA), in a terminal BB we should have

$reg0 = OP ...
... <<< No uses of $reg0 and $reg1
$reg1 = COPY $reg0 <<< $reg0 is killed
...
<RET>

For complex scenes,

$reg0 = OP ...
... <<< No uses of $reg1, might have uses of $reg0
$reg1 = COPY $reg0
...
$reg1 = OP ...
... <<< If there are uses of $reg0, seems not profitable to conduct this opt.
$reg0 = OP ...
...
<RET>
jsji added a comment.EditedAug 13 2019, 8:26 PM

To summarize, without complex uses and defs(thus we don't need to do complex replace-uses-with after RA). in a terminal BB

Yes, only do simple and valid backward propagate.

lkail updated this revision to Diff 215033.Aug 14 2019, 12:17 AM
lkail retitled this revision from [PowerPC][PreEmitPeephole] Remove redundant copies of immediate numbers after TailDup to [MachineCopyPropagation] Remove redundant copies after TailDup via machine-cp.
lkail edited the summary of this revision. (Show Details)
lkail added reviewers: craig.topper, bogner, efriedma.

Updated patch to remove redundant copies by implementing backward copy propagation in machine-cp.

lkail updated this revision to Diff 215046.Aug 14 2019, 1:34 AM
jsji added inline comments.Aug 14 2019, 7:06 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
315

Can we check !Copy.getOperand(1).isKill() earlier? before the backward SrcMI search? Maybe together with isRenamable()?

jsji added a comment.Aug 14 2019, 7:09 AM

@lkail Can you please update the summary to describe more details about why we might have such redundant copies after TailDup.
I know we have those in comments, it would be great if we can summarize those into summary, so that reviewers don't have to look into every single comments. Thanks.

lkail updated this revision to Diff 215808.Aug 18 2019, 6:36 PM
lkail edited the summary of this revision. (Show Details)

Updated the patch and summary.

lkail edited the summary of this revision. (Show Details)Aug 18 2019, 7:38 PM
lkail updated this revision to Diff 215814.Aug 18 2019, 9:44 PM
lkail edited the summary of this revision. (Show Details)Aug 18 2019, 9:49 PM
lkail edited the summary of this revision. (Show Details)Aug 22 2019, 6:37 PM

ping.

lkail added a comment.Aug 29 2019, 3:14 AM

Gentle ping.

lkail added a reviewer: gberry.Sep 2 2019, 8:30 PM
lkail edited the summary of this revision. (Show Details)Sep 3 2019, 12:49 AM
lkail edited the summary of this revision. (Show Details)
jsji added a comment.Sep 3 2019, 11:54 AM

LGTM. Please hold for a few days, in case someone has more comments. Thanks.

The testcase needs rebasing, umulo-128-legalisation-lowering.ll needs update too.

lkail updated this revision to Diff 218584.Sep 3 2019, 7:40 PM

Rebased and updated tests.

jsji accepted this revision.Sep 3 2019, 7:47 PM

LGTM.

This revision is now accepted and ready to land.Sep 3 2019, 7:47 PM
lkail updated this revision to Diff 218585.Sep 3 2019, 8:11 PM

Added comment for bool MachineCopyPropagation::eraseIfRedundant(MachineInstr &Copy).

lkail added a comment.Sep 6 2019, 12:24 AM

Hi @craig.topper , would you mind having a look at changes of X86 test cases?

lkail updated this revision to Diff 219053.Sep 6 2019, 2:59 AM

Added reserved register check on SrcOp.getReg().

X86 changes look good.

lkail updated this revision to Diff 219284.Sep 8 2019, 7:14 PM

Check if SrcOp is tied or an implicit-def.

This revision was automatically updated to reflect the committed changes.

Hi,

Unfortunately, with this patch the following program (that I believe to be valid C++) fails under MSan:

#include <set>
#include <string>
#include <vector>

struct Inner {
  __attribute__((noinline)) int Opaque1(int, char) const { return -1; }
  __attribute__((noinline)) void Opaque2(int, char, int) {}
  std::vector<std::set<const std::string*> > output;
  int next = 1;
};

struct Outer {
  __attribute__((noinline)) void Enter(const std::string&);
  Inner* inner;
};

void Outer::Enter(const std::string& str) {
  int state = 0;
  int j = 0;
  for (; j < str.size(); j++) {
    int next_state = inner->Opaque1(state, str[j]);
    if (next_state == -1)
      break;
    else
      state = next_state;
  }
  for (; j < str.size(); j++) {
    inner->Opaque2(state, str[j], inner->next);
    state = inner->next;
    inner->next++;
  }
  inner->output[state].insert(&str);
}

__attribute__((noinline))
void fault(const std::set<const std::string*>& x) {
  for (const std::string *xx : x) {
    std::string copy = *xx;
  }
}

int main() {
  std::string TheString = "a";
  Outer outer;
  outer.inner = new Inner;
  outer.inner->output.resize(2);
  outer.Enter(TheString);
  const std::set<const std::string*>& x = outer.inner->output[1];
  fault(x);
}
==8166==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x7f20d47060e0 in std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char> >::__is_long() const include/string:1420:39
    #1 0x7f20d47060e0 in std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char> >::basic_string(std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char> > const&) include/string:1835:16
    #2 0x7f20d58209ad in fault(std::__msan::set<std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char> > const*, std::__msan::less<std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char> > const*>, std::__msan::allocator<std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char> > const*> > const&) test.cc:38:24
    #3 0x7f20d5820b21 in main test.cc:49:3
    #4 0x7f20d3e6dbbc in __libc_start_main
    #5 0x563387ed2d08 in _start

I'm suspecting a miscompile in Outer::Enter. I have verified that the program works without this patch, and fails with this patch. Please let me know if I could help more you to reproduce the problem.

I'm reverting this patch for now, sorry (r371421).

lkail added a comment.Sep 9 2019, 5:07 PM

Hi @gribozavr , thanks for the affirmative action. Could you provide me your command to generate the executable and the target platform you compile to?

lkail updated this revision to Diff 219466.EditedSep 9 2019, 7:13 PM

When replacing copy's src with copy's def, I missed a point that copy's def is clobbered by selected SrcMI and Tracker should have been informed.

lkail reopened this revision.Sep 9 2019, 7:13 PM
This revision is now accepted and ready to land.Sep 9 2019, 7:13 PM
lkail requested review of this revision.Sep 9 2019, 7:14 PM
lkail added a comment.Sep 9 2019, 8:19 PM

@gribozavr Please let me know if the latest patch resolves the issue.

lkail added a comment.Sep 10 2019, 3:03 AM

The failing case is rather large even reduced by bugpoint.

Thank you for the fix! I can confirm that the new patch does not break this test in my environment.

I'm not an expert in this part of the backend, however, I wanted to ask to add more tests to the patch, both positive and negative. The logic in the code is quite complex, it has lots of branches (so there are many possibilities for things to go wrong), but testing is extremely light. The patch is only updating existing tests in three places -- which to me looks more like a happy coincidence (that LLVM already includes tests that happen to trigger the new behavior) than intentional testing. Thanks!

jsji requested changes to this revision.Sep 11 2019, 8:39 AM

Please add more tests as required by @gribozavr . Thanks.

This revision now requires changes to proceed.Sep 11 2019, 8:39 AM
jsji added a comment.Sep 11 2019, 8:43 AM

The patch is only updating existing tests in three places -- which to me looks more like a happy coincidence (that LLVM already includes tests that happen to trigger the new behavior) than intentional testing. Thanks!

redundant-copy-after-tail-dup.ll was newly added test (pre-committed) dedicated for this patch.
But yes, it was mostly focus on positive testing, we should try to add more tests ,
especially due to the nature of MachineCopyPropagation.

lkail updated this revision to Diff 219863.Sep 12 2019, 2:26 AM

Pre-committed machine-backward-cp.mir, including edge cases as far as I can imagine. Some test cases remain unchanged, reviewers might have to read whole diff of machine-backward-cp.mir to see unchanged cases.

lkail marked an inline comment as done.Sep 12 2019, 2:34 AM
lkail added inline comments.
llvm/test/CodeGen/MIR/PowerPC/machine-backward-cp.mir
151 ↗(On Diff #219863)

This case is reduced from @gribozavr 's case.

lkail marked an inline comment as done.Sep 12 2019, 2:54 AM
lkail added inline comments.
llvm/test/CodeGen/MIR/PowerPC/machine-backward-cp.mir
165 ↗(On Diff #219863)

Prior to the fix, $x7 will be considered not clobbered and use of $x5 in the next instruction will be replaced by $x7 which MCP still considers renamable $x5 = COPY killed renamable $x7 a valid copy.

jsji added inline comments.Sep 13 2019, 7:23 AM
llvm/test/CodeGen/MIR/PowerPC/machine-backward-cp.mir
1 ↗(On Diff #219863)

This file shouldn't be in MIR/PowerPC, I have moved it to PowerPC in https://reviews.llvm.org/rL371857.
Please rebase.

lkail updated this revision to Diff 220243.Sep 15 2019, 5:53 AM

Rebased.

jsji accepted this revision.Sep 16 2019, 6:49 AM

LGTM.

This revision is now accepted and ready to land.Sep 16 2019, 6:49 AM
lkail updated this revision to Diff 220427.EditedSep 16 2019, 8:01 PM

Updated the patch. Currently we don't handle non-trivial copy which has more than 2 operands(Cuz I notice current copy forward propagation algorithm considers copy's implicit operands https://reviews.llvm.org/D29522).

lkail requested review of this revision.Sep 16 2019, 8:02 PM
lkail marked an inline comment as done.
lkail added inline comments.
llvm/lib/CodeGen/MachineCopyPropagation.cpp
300

Added check of non-trivial copy.

lkail updated this revision to Diff 220429.Sep 16 2019, 8:30 PM

Added test case that copy has side effect.

jsji resigned from this revision.Sep 17 2019, 6:36 AM

Hmm... OK, too many details that I did not even think of. I would leave this to be reviewed by other experts.

lkail abandoned this revision.Sep 18 2019, 2:56 AM

Seems we don't need the terminal BB constraint. I'll prepare another patch to address this issue.