This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Extend folding RLWINM + RLWINM to post-RA.
AbandonedPublic

Authored by Esme on Oct 21 2020, 12:51 AM.

Details

Summary

This patch depends on D89846.
We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.

Diff Detail

Event Timeline

Esme created this revision.Oct 21 2020, 12:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2020, 12:51 AM
Esme requested review of this revision.Oct 21 2020, 12:51 AM

It would be great if we could split this patch into two parts:
1: extend rlwinm + rlwinm folding to post-ra and add some post-ra testcases, MIR cases are better.
2: add new simplication for rlwinm + andi_rec and add some post-ra and pre-ra testcases.

Esme updated this revision to Diff 299857.Oct 21 2020, 9:42 PM
Esme retitled this revision from [PowerPC] Combine RLWINM and RLWINM/ANDI_rec before or after RA. to [PowerPC] Extend folding RLWINM + RLWINM to post-RA..
Esme edited the summary of this revision. (Show Details)
Esme updated this revision to Diff 299864.Oct 21 2020, 10:27 PM

Some nits.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3893–3918

Comments need to be updated.

3897

How about use the MachineInstr *&ToErase to avoid the null check and default arguments ?

3921

It is preferred to initialize the stack variable.

3922

Remove such kind of comments as the code is self explained.

3927

Common line 3209 and 3217 then, add a check in 3220

3939

A better code style to avoid the duplicate check of the SrcOpCode.

switch (SrcOpCode) {
case RLWINM8
case RLWINM8_rec:
    Is64Bit = true;
    break;
case ...
}
4080

null check?

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
41

Change the name as it is extended to all rotate combining.

519

No {} if there is only one statement.

Esme updated this revision to Diff 299884.Oct 22 2020, 12:15 AM
Esme added a comment.Oct 22 2020, 12:15 AM

Updated as Steven's comments.

Esme marked 8 inline comments as done.Oct 22 2020, 12:25 AM
Esme updated this revision to Diff 300181.Oct 23 2020, 12:30 AM
shchenz added inline comments.Oct 23 2020, 7:49 AM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3893–3918

The strange character before "We"

3898

Can we initial Is64Bit to true or false and then we only need to handle 32-bit or 64-bit opcodes in following switch?

3931

Any special reason that we don't do the folding after RA when there are multiple uses for SrcMI? Before RA, we do such kind of transformation as it is still benefit. We eliminate the dependancy between SrcMI and MI.

3938

What case will have rlwinm input is rlwinm8 or rlwinm8 input is rlwinm? Just curious.

4079

Same as above, after RA, even SrcMI has multiple uses, we still should do the transform?

Esme updated this revision to Diff 300566.Oct 25 2020, 6:48 PM
Esme added inline comments.Oct 25 2020, 7:04 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3938

There is a case in llvm/test/CodeGen/PowerPC/popcnt-zext.ll

declare i16 @llvm.ctpop.i16(i16) nounwind readnone
define i64 @popa_i16_i64(i16 %x) {
  %pop = call i16 @llvm.ctpop.i16(i16 %x)
  %z = zext i16 %pop to i64 ; SimplifyDemandedBits may turn zext (or sext) into aext
  %a = and i64 %z, 16
  ret i64 %a
}

llc -verify-machineinstrs -mtriple=powerpc64-- -mattr=+slow-popcntd < cnt16.ll -debug
We will see such transformation:
From

368B	  %22:gprc = RLWINM %21:gprc, 8, 24, 31
384B	  undef %23.sub_32:g8rc = COPY %22:gprc
400B	  %25:g8rc = RLWINM8 %23:g8rc, 0, 27, 27
416B	  $x3 = COPY %25:g8rc

To

undef %23.sub_32:g8rc = RLWINM %21:gprc, 8, 24, 31
%25:g8rc = RLWINM8 %23:g8rc, 0, 27, 27
$x3 = COPY %25:g8rc

After RA

renamable $r3 = RLWINM killed renamable $r3, 8, 24, 31, implicit-def $x3
renamable $x3 = RLWINM8 killed renamable $x3, 0, 27, 27

We will treat them as DefMI and MI after RA, but they shouldn't be folded. Please correct me if I'm wrong.

shchenz added inline comments.Oct 25 2020, 8:13 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3928

This is redundant or can be changed to an assert?

3938

Thanks, I see, this is just a post-ra issue. Before RA, there are always register copy instructions between RLWINM and RLWINM8.

After Post-RA,
1: for RLWINM8(RLWINM), it should be ok to fold it to single RLWINM8,
2: for RLWINM(RLWINM8), it should be ok to fold it to a single RLWINM.

Esme added inline comments.Oct 25 2020, 8:45 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3938

Thanks! I will mark this as a follow-up work after this patch submitted.

Esme updated this revision to Diff 300573.Oct 25 2020, 9:37 PM
steven.zhang accepted this revision.Oct 29 2020, 5:51 PM

LGTM overall with one minor comment.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
4078

Can we set the NoUse for pre-ra also,so that just check the NoUse here?

This revision is now accepted and ready to land.Oct 29 2020, 5:51 PM
shchenz accepted this revision.Nov 1 2020, 2:48 AM

LGTM too. Please remember to rebase this patch after you commit NFC patch D89846

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3938

sounds good to me.

Esme added a comment.Nov 2 2020, 11:25 PM

Thanks for your reviews, and I apologize for not responding in time. I will commit this patch after all tests done.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
4078

For the scenario of pre-ra, we have to check MRI->use_nodbg_empty(FoldingReg) here instead of setting NoUse before folding as post-ra did, since MRI->use_nodbg_empty(FoldingReg) is false before folding MI and SrcMI.

This revision was landed with ongoing or failed builds.Nov 2 2020, 11:44 PM
This revision was automatically updated to reflect the committed changes.
Esme updated this revision to Diff 304410.Nov 10 2020, 11:39 PM

In post-RA, if SrcMI also defines the register to be forwarded, we can only do the folding if SrcMI is going to be erased.
For example, SrcMI and MI can't not be folded in the scenario:

$r3 = RLWINM_rec $r3, 27, 5, 10, implicit-def $cr0
dead renamable $r3 = RLWINM_rec killed renamable $r3, 8, 5, 10, implicit-def $cr0
BLR8 implicit $lr8, implicit $rm, implicit killed $cr0

I've run the bootstrap-test as well as SEPC.

Esme reopened this revision.Nov 10 2020, 11:40 PM
This revision is now accepted and ready to land.Nov 10 2020, 11:40 PM
Esme requested review of this revision.Nov 12 2020, 9:41 PM

I saw some testcases are removed compared with your committed version. Is there any reason?

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3983

Can these lines be moved after line 3221? So we will bail out early.

4079

Can we move CanErase assignment for SSA to the place where we assign it for post SSA?

Esme marked an inline comment as not done.Nov 17 2020, 1:44 AM

I saw some testcases are removed compared with your committed version. Is there any reason?

Every specific folding patterns with different masks have been tested in llvm/test/CodeGen/PowerPC/fold-rlwinm.mir
This patch didn't change the patterns, so I supposed it would be clearer to remove some redundant test cases.
Do you think they should be added back?

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3983

It's safe to call SrcMI->getOperand(1).getReg() after filtering the opcode of SrcMI.

4079

The condition MRI->use_nodbg_empty(FoldingReg) has to be checked after the folding of MI and SrcMI.

I saw some testcases are removed compared with your committed version. Is there any reason?

Every specific folding patterns with different masks have been tested in llvm/test/CodeGen/PowerPC/fold-rlwinm.mir
This patch didn't change the patterns, so I supposed it would be clearer to remove some redundant test cases.
Do you think they should be added back?

Understanding your concerns here. I prefer to keep that testcases as well. If new added post ra peephole in pre-emit-peephole pass changes the case, the pre-ra cases will not be aware of? This is up to you.^^

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3983

Is it safe if we add SrcMI->getOperand(1).isReg()?. We can bail out early if the operand 1 of SrcMI is not a register.

4079

Is it ok if we change use_nodbg_empty to hasOneNonDBGUse()?

Esme updated this revision to Diff 306584.Nov 19 2020, 7:31 PM

Updated as Zheng's suggestions.

shchenz accepted this revision.Nov 19 2020, 11:12 PM

LGTM. Thanks for the new fix.

This revision is now accepted and ready to land.Nov 19 2020, 11:12 PM
This revision was landed with ongoing or failed builds.Nov 21 2020, 11:38 PM
This revision was automatically updated to reflect the committed changes.
lkail added a subscriber: lkail.Dec 16 2020, 6:44 PM
lkail added inline comments.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3928

I suggest checking Register::isPhysicalRegister explicitly, since Reg might also be a stackslot.

3951

Suggest using modifiesRegister or it'll miss case like $x3 = rlwinm $r3, ...

shchenz added inline comments.Dec 16 2020, 7:08 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3928

Could you please explain more here? this is after RA, if FoldingReg is a stackslot, we should get r1/x1? I don't understand how Register::isPhysicalRegister would change the semantic here? Thanks.

3951

That would not happen, for RLWINM, we get input and output are both r3, for RLWINM8 we get input and output are both x3.

lkail added inline comments.Dec 16 2020, 7:32 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3928

If it's the case, an assertion here would be better.

Esme updated this revision to Diff 313010.Dec 20 2020, 6:29 PM

Check if the register used can be forward.

Esme reopened this revision.Dec 20 2020, 7:27 PM
This revision is now accepted and ready to land.Dec 20 2020, 7:27 PM
Esme requested review of this revision.Dec 20 2020, 7:28 PM

I have added a number of comments to code that isn't part of this review which may seem odd. However, you are adding a fair amount of code to a function that is already very long/complex and rather than doing so, I'd prefer that you refactor the original function so the new code integrates more seamlessly.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3938

This is a bit of a bizarre set up that makes it hard to read. How about skipping these switch statements and just doing what you're after:

  1. Find the source MI
  2. Check if they're both 32-bit or both 64-bit instructions

For 2. you can do something like:

static bool sameWidthRLWINM(MachineInstr *SrcMI, MachineInstr *UseMI,
                            bool &Is64Bit) {
  unsigned SrcOpc = SrcMI->getOpcode();
  unsigned UseOpc = UseMI->getOpcode();
  if ((SrcOpc == PPC::RLWINM8 || SrcOpc == PPC::RLWINM8_rec) &&
      (UseOpc == PPC::RLWINM8 || UseOpc == PPC::RLWINM8_rec)) {
    Is64Bit = true;
    return true;
  }
  if ((SrcOpc == PPC::RLWINM || SrcOpc == PPC::RLWINM_rec) &&
      (UseOpc == PPC::RLWINM || UseOpc == PPC::RLWINM_rec))
    return true;
  Is64Bit = false;
  return false;
}
3999

Please describe this "special case" since it appears to be two special cases:
The first part means all bits in a 64-bit register and the second part means the low 32 bits in a 64-bit register.

4007–4008

I am not sure what the intended meaning of lowerest is here, but please use something like least significant bit.

4018–4019

Why can't this whole block be replaced by a call to replaceInstrWithLI()?

4043

I find it unlikely that clang-format produced this formatting.

4053–4054

No need, but is it incorrect to update them? Namely, won't the NewMB/NewME be the same as MBMI/MEMI respectively? If so, I think it is preferable to avoid the condition and leave it to the reader to figure out.

4059–4060

In general, I find that manually messing with kill flags when changing the instructions to be an error-prone thing that should be avoided if possible? Is it possible to re-run the pass that computes kill flags after the peephole?

For example I think that here, if MI.getOperand(1).isKill() before the change, we will not update the kill flag on the previous use of that register so it will be considered live after MI whereas before it wasn't. This is of course not semantically incorrect, but may hinder some optimizations.

llvm/test/CodeGen/PowerPC/vsx_builtins.ll
167

It is not really related to this patch, but what is this test case supposed to do? Why would we produce a 4-bit field in a CR, move it to a GPR and then right shift by 4 (clearly shifting out all the bits)?

Esme updated this revision to Diff 313494.Dec 23 2020, 12:06 AM

Addressed nemanjai's comments.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
4053–4054

It's incorrect to update MBMI/MEMI with NewMB/NewME if the SrcMI mask is full, since isRunOfOnes(FinalMask) may not hold true in such scenario. Well might it be more clear to replace no need with don't?

4059–4060

Good catch! However I have no idea which pass can solve the problem. It seems that LiveInternals Pass will computes kill flags, but it can't handle the case you mentioned. Do you have any idea about this?

llvm/test/CodeGen/PowerPC/vsx_builtins.ll
167

Yes. you're right. lol... This test case was designed to show the folding patter, rlwinm + rlwinm -> li 0 in post-ra.

Esme updated this revision to Diff 496818.Feb 12 2023, 6:18 PM

Rebase.
LIT, LNT and Bootstrap are clean.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2023, 6:18 PM
Esme added a comment.Aug 20 2023, 10:18 PM

gentle ping