This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Eliminate redundant register copys after register allocation
Needs ReviewPublic

Authored by inouehrs on Nov 2 2017, 1:48 AM.

Details

Summary

This patch add a PPC-specific path after register allocation to eliminate redundancy related register copies among physical registers.

So far, we eliminate the following two types of redundant register copys.

  1. intra-BB redundant register copy
li Y, 0          li X, 0  (any instruction, not limited to li)
mr X, Y    =>    (erase mr)
..               ..
  1. inter-BB partially redundant register copy
          BB1--------                             BB1--------
          | ..      |                             | ..      |
          | mr Y, X |                             | (erase) |
          | ..      |                             | ..      |
  with    -----------                    with     -----------
  1 pred /     |                         1 pred  /     |
BB--------     |      BB--------      BB---------      |      BB---------
| ..     |     |      | ..     |  =>  | mr Y, X |      |      | ..      |
| ..     |     |      | ..     |      | ..      |      |      | mr X, Y |
----------     |      ----------      -----------      |      -----------
               |     /  with                           |     /  with
          BB2--------   1 succ                    BB2--------   1 succ
          | ..      |                             | ..      |
          | mr X, Y |                             | (erase) |
          | ..      |                             | ..      |
          -----------                             -----------

Diff Detail

Event Timeline

inouehrs created this revision.Nov 2 2017, 1:48 AM
nemanjai edited edge metadata.Nov 6 2017, 12:37 AM

I think that the main loop definitely needs to be re-written and made more readable and easier to follow. To me, it seems like it does too much. It spans nearly 250 lines and does a whole bunch of analyses and transformations.
Could you extract functionality into easy-to-read functions that perform individual things so that the flow is easier to follow?

Also, would it be appropriate to make this target-independent and do it perhaps in MachineCopyPropagation? Perhaps Geoff can comment on that since he's done something similar that he is still working on improvements for in order to finally commit it.

lib/Target/PowerPC/PPCPostRAPeephole.cpp
61 ↗(On Diff #121249)

If we're adding a new peephole, it likely won't only be used for this one purpose. A more general name might be in order.

100 ↗(On Diff #121249)

This seems a bit fragile. I'm not sure there's any hard guarantee that the flag will be on the last source operand.

142 ↗(On Diff #121249)

These loops as well as a check for whether the instruction is a call seem a little odd. Why is it inadequate to use MachineInstr::readsRegister() and MachineInstr::modifiesRegister() for these purposes. Given the TRI, they should return the correct value for any instruction (including calls).

inouehrs updated this revision to Diff 121864.Nov 7 2017, 2:43 AM
  • changed a new pass name
  • code simplified

If we're adding a new peephole, it likely won't only be used for this one purpose. A more general name might be in order.

I changed the pass name. Do you have any suggestion on an alternative name?

This seems a bit fragile. I'm not sure there's any hard guarantee that the flag will be on the last source operand.

I modified to check the flag for both input operands.

These loops as well as a check for whether the instruction is a call seem a little odd. Why is it inadequate to use MachineInstr::readsRegister() and MachineInstr::modifiesRegister() for these purposes. Given the TRI, they should return the correct value for any instruction (including calls).

readsRegister checks whether the instruction reads the specified register or its super register, but it does not check about sub register access. I need to check sub register access for safety.

If we're adding a new peephole, it likely won't only be used for this one purpose. A more general name might be in order.

I changed the pass name. Do you have any suggestion on an alternative name?

If you agree that this would likely be used for other purposes, why not just name it PPCPostRAPeephole? Otherwise if you think this will likely only be used for this purpose, I suppose the name is fine.

This seems a bit fragile. I'm not sure there's any hard guarantee that the flag will be on the last source operand.

I modified to check the flag for both input operands.

Thanks.

These loops as well as a check for whether the instruction is a call seem a little odd. Why is it inadequate to use MachineInstr::readsRegister() and MachineInstr::modifiesRegister() for these purposes. Given the TRI, they should return the correct value for any instruction (including calls).

readsRegister checks whether the instruction reads the specified register or its super register, but it does not check about sub register access. I need to check sub register access for safety.

Hmm.. that certainly doesn't seem to be the case by just looking at the implementation. There's a check for Found = TRI->regsOverlap(MOReg, Reg); and I would assume this would return true for any overlap. But of course, this is based on just a cursory look at the code and I may be missing something.

gberry edited edge metadata.

I'm not sure adding a new pass that relies on kill flags is a good idea, since I believe they are slowly being phased out.
Also, if you stop using kill flags I believe you will need something like https://reviews.llvm.org/D39400 in order to avoid removing copies whose defs are needed for ABI/ISA reasons that aren't captured in the MI representation.

inouehrs updated this revision to Diff 122646.Nov 13 2017, 7:22 AM
  • rewrote without using kill flags as @gberry suggested.
  • slightly increased coverage to support opportunity addressed by D39785.
  • refactored to reduce duplicated code.

@gberry Thank you for the suggestions. I updated the implementation not to use kill flags.
Also, I updated not to optimize reg copies around ABI-dependent special registers such as stack pointer.
Additionally, the current version avoids optimization for specific ABI-dependent opcode (GETtlsldADDR), but I will use isRenamable flag after D39400 is merged.

MatzeB edited edge metadata.Nov 15 2017, 10:32 AM

Is this like the existing MachineCopyPropagation pass but working accross multiple blocks? Would D30751 solve your problem as well?

Is this like the existing MachineCopyPropagation pass but working accross multiple blocks? Would D30751 solve your problem as well?

The two optimization are mostly disjoint.
Copy source forwarding optimizes

mr 3, 4
addi 5, 3, 1

to

mr 3, 4
addi 5, 4, 1

but this optimization intends to rewrite

addi 4, 3, 1 
mr 3, 4

to

addi 3, 3, 1

or eliminate

mr 4, 3
mr 3, 4

Is this like the existing MachineCopyPropagation pass but working accross multiple blocks? Would D30751 solve your problem as well?

The two optimization are mostly disjoint.
Copy source forwarding optimizes

mr 3, 4
addi 5, 3, 1

to

mr 3, 4
addi 5, 4, 1

but this optimization intends to rewrite

addi 4, 3, 1 
mr 3, 4

to

addi 3, 3, 1

or eliminate

mr 4, 3
mr 3, 4

FWIW, the MachineCopyPropagation already handles eliminating the second COPY in the second case above, and could fairly easily be extended to handle the other cases once the change to mark MachineOperands as renamable lands. It is currently limited to single block though. It also only currently deals with instructions whose opcode is COPY (i.e. it doesn't handle other target specific copy operations). I don't see any reason why all of these things couldn't be added to MachineCopyPropagation eventually, but I don't want to hold up your change because it might be made redundant later, especially since I can;t commit to making these extensions to MachineCopyProp.

@gberry Thank you for the comment!
Many redundant register copies (e.g. about 2/3 of the intra-BB redundancies) are generated by optimizers executed after COPYs are expanded to real instructions. To catch them, this optimization is run just before code emission.
So, MachineCopyPropagation, which is executed just after register allocation, may not be able to catch these opportunities.
(For example, when I run this pass just after pseudo instruction expansion, optimizations happen 7780 times during the bootstrap while they happen 10255 times if the pass is executed at the very last of the compilation.)

inouehrs updated this revision to Diff 123880.Nov 22 2017, 12:02 AM

minor touch up in comments and formatting

jtony added inline comments.Nov 27 2017, 1:01 PM
lib/Target/PowerPC/PPCRegCopyElim.cpp
102

There are some other instructions which are equivalent to MR, do we want to handle them too in this patch? Some examples: ADDI Rx,Ry, 0 and ORI Rx,Ry,0

test/CodeGen/PowerPC/remove-cyclic-mr.ll
47 ↗(On Diff #123880)

Minor nit: I know this is auto generated test cases, but this line can be simplified away. Similarly line 1 and line 54.

inouehrs added inline comments.Nov 28 2017, 12:21 AM
lib/Target/PowerPC/PPCRegCopyElim.cpp
102

I picked these instructions since they are emitted for PPC::COPY. Do you see ADDI Rx,Ry, 0 or ORI Rx,Ry,0 frequently? If so, I will add that opcode here.

Is this something that RDF copy propagation would do for us as well? @kparzysz might know. FWIW, if this is something that RDF would help with, I'd rather see our efforts put there rather than recreating a less-powerful version of that infrastructure just for this specific case.

lib/Target/PowerPC/PPCRegCopyElim.cpp
268

X2 is not always special (in leaf functions, it might be allocatable). You can check using:

MF->getRegInfo().isAllocatable(PPC::X2) or !getReservedRegs(*MF).test(PPC::X2) or similar.

Now that https://reviews.llvm.org/rL323991 has actually landed, can you re-evaluate whether this patch is still applicable and if so, whether it needs any changes. Also, have you looked into Hal's suggestion?

inouehrs updated this revision to Diff 133166.Feb 7 2018, 12:53 AM
  • check the reserved register using getReservedRegs instead of own check.
  • rebased to the latest code

Now that https://reviews.llvm.org/rL323991 has actually landed, can you re-evaluate whether this patch is still applicable and if so, whether it needs any changes. Also, have you looked into Hal's suggestion?

As long as I tested, COPY source forwarding (rL323991) does not affect this optimization. The number of optimizations happen in this pass are mostly unchanged by enabling/disabling the COPY source forwarding (11730 with COPY source forwarding and 11242 without COPY source forwarding during the bootstrap test).

Also, I updated the patch to address Hal's suggestion.

inouehrs marked an inline comment as done.Feb 7 2018, 12:59 AM
inouehrs added inline comments.
lib/Target/PowerPC/PPCRegCopyElim.cpp
268

Thank you so much for the suggestion. I totally replaced my own check with getReservedRegs (not only for X2). It makes the code cleaner.

syzaara added inline comments.Apr 23 2018, 2:40 PM
lib/Target/PowerPC/PPCRegCopyElim.cpp
13

s/copys/copies

88

Is this an old comment you forgot to remove? I don't see anything related to IsKill in this function.

122

I think this flow would be more clear if written like:

if (MO.isReg() && TRI->isSuperOrSubRegisterEq(MO.getReg(), Reg)) {
    if (MO.isUse())
        return false;
    else if(MO.isDef() ||  (MO.isRegMask() && MO.clobbersPhysReg(Reg)))
        SrcKilled = true; // we may still have use of Reg in this MI
}
135

s/seccessor/successor

174

Since findRegisterDefOperandIdx already checks for SrcReg, do we need to check DefMO.getReg() == SrcReg here?

197

MO.isUse() would be more clear

211

It seems confusing that we check for registers being equal with isSuperOrSubRegisterEq and then again with if (MO.getReg() == SrcReg).
Can we rewrite to something like:

if (MO.getReg() == SrcReg)
   TmpOperands.push_back(&MO);
else if (TRI->isSuperOrSubRegisterEq(MO.getReg(), SrcReg)) {
   IsClobbered = true;
   return nullptr;
}
315

This is repeated below for the second case on line 443
However, the second case also sets Simplified = CurrentInstrErased which doesn't seem to be set here.
Can this be rewritten in a way that we only have to do this in one place for both cases?

340

Can we put these checks of pred_size/DstLive/control flow for which we just continue into a separate function and do something like:

if (!controlFlowSupported())
    continue;
syzaara added inline comments.Apr 23 2018, 3:16 PM
lib/Target/PowerPC/PPCRegCopyElim.cpp
286

I'm finding this difficult to follow since there is a lot happening in the loop. Can we extract out more functionality into functions? From what I understand, maybe we can do something like:

if (DefMI)
   Simplified |= eliminateIntraBBCopy
else if (controlFlowSupported())
   Simplified |= elimanteInterBBCopy
inouehrs updated this revision to Diff 144073.Apr 26 2018, 2:08 AM
inouehrs marked an inline comment as done.
  • addressed comments from @syzaara
  • rebased to the lates tcode and fix a failing test case
inouehrs marked 9 inline comments as done.Apr 26 2018, 3:48 AM
inouehrs added inline comments.
lib/Target/PowerPC/PPCRegCopyElim.cpp
122

They are calling different methods isSuperOrSubRegisterEq and isSubRegisterEq, but I gave a try to make the control flow clearer.

174

Yes, it intends to confirm that it is not sub/super register.

211

Fixed.
Originally, I intended to make the if-statement parallel as

  • isSuperOrSubRegisterEq for DstReg
  • isSuperOrSubRegisterEq for SrcReg
286

I hope it becomes easier to follow now.

syzaara added inline comments.Apr 26 2018, 12:55 PM
lib/Target/PowerPC/PPCRegCopyElim.cpp
81

I don't see any uses of TII, but there are uses of TRI. Maybe replace this with TRI?

431

Can we change this to

if (!isRegCopy(CopyMI, SrcMO, DstMO))
   continue;

so that we don't have to nest the rest of the code.

455

It seems a little messy to me for this function to have a parameter for OperandsToRewrite1. I wonder if it would be better to have another function like getOperandsToRewrite which we can call on line 482 after we know that we have erased an instruction.

inouehrs updated this revision to Diff 146550.May 14 2018, 1:16 AM
inouehrs marked 4 inline comments as done.
inouehrs marked 2 inline comments as done.May 14 2018, 1:20 AM
inouehrs added inline comments.
lib/Target/PowerPC/PPCRegCopyElim.cpp
81

Removed TII. I do not add TRI as a member since TRI must be passed among static functions anyway.

455

To implement getOperandsToRewrite helper, we need to do almost same (potentially costly) iteration again in the helper. So I have not do this yet.