This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Extend folding RLWINM + RLWINM to post-RA.
AbandonedPublic

Authored by Esme on Oct 21 2020, 12:51 AM.

Details

Summary

This patch depends on D89846.
We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.

Diff Detail

Event Timeline

Esme created this revision.Oct 21 2020, 12:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2020, 12:51 AM
Esme requested review of this revision.Oct 21 2020, 12:51 AM

It would be great if we could split this patch into two parts:
1: extend rlwinm + rlwinm folding to post-ra and add some post-ra testcases, MIR cases are better.
2: add new simplication for rlwinm + andi_rec and add some post-ra and pre-ra testcases.

Esme updated this revision to Diff 299857.Oct 21 2020, 9:42 PM
Esme retitled this revision from [PowerPC] Combine RLWINM and RLWINM/ANDI_rec before or after RA. to [PowerPC] Extend folding RLWINM + RLWINM to post-RA..
Esme edited the summary of this revision. (Show Details)
Esme updated this revision to Diff 299864.Oct 21 2020, 10:27 PM

Some nits.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3247–3263

Comments need to be updated.

3251

How about use the MachineInstr *&ToErase to avoid the null check and default arguments ?

3266

It is preferred to initialize the stack variable.

3267

Remove such kind of comments as the code is self explained.

3272

Common line 3209 and 3217 then, add a check in 3220

3284

A better code style to avoid the duplicate check of the SrcOpCode.

switch (SrcOpCode) {
case RLWINM8
case RLWINM8_rec:
    Is64Bit = true;
    break;
case ...
}
3434

null check?

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
40

Change the name as it is extended to all rotate combining.

481

No {} if there is only one statement.

Esme updated this revision to Diff 299884.Oct 22 2020, 12:15 AM
Esme added a comment.Oct 22 2020, 12:15 AM

Updated as Steven's comments.

Esme marked 8 inline comments as done.Oct 22 2020, 12:25 AM
Esme updated this revision to Diff 300181.Oct 23 2020, 12:30 AM
shchenz added inline comments.Oct 23 2020, 7:49 AM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3247–3263

The strange character before "We"

3252

Can we initial Is64Bit to true or false and then we only need to handle 32-bit or 64-bit opcodes in following switch?

3276

Any special reason that we don't do the folding after RA when there are multiple uses for SrcMI? Before RA, we do such kind of transformation as it is still benefit. We eliminate the dependancy between SrcMI and MI.

3283

What case will have rlwinm input is rlwinm8 or rlwinm8 input is rlwinm? Just curious.

3433

Same as above, after RA, even SrcMI has multiple uses, we still should do the transform?

Esme updated this revision to Diff 300566.Oct 25 2020, 6:48 PM
Esme added inline comments.Oct 25 2020, 7:04 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3283

There is a case in llvm/test/CodeGen/PowerPC/popcnt-zext.ll

declare i16 @llvm.ctpop.i16(i16) nounwind readnone
define i64 @popa_i16_i64(i16 %x) {
  %pop = call i16 @llvm.ctpop.i16(i16 %x)
  %z = zext i16 %pop to i64 ; SimplifyDemandedBits may turn zext (or sext) into aext
  %a = and i64 %z, 16
  ret i64 %a
}

llc -verify-machineinstrs -mtriple=powerpc64-- -mattr=+slow-popcntd < cnt16.ll -debug
We will see such transformation:
From

368B	  %22:gprc = RLWINM %21:gprc, 8, 24, 31
384B	  undef %23.sub_32:g8rc = COPY %22:gprc
400B	  %25:g8rc = RLWINM8 %23:g8rc, 0, 27, 27
416B	  $x3 = COPY %25:g8rc

To

undef %23.sub_32:g8rc = RLWINM %21:gprc, 8, 24, 31
%25:g8rc = RLWINM8 %23:g8rc, 0, 27, 27
$x3 = COPY %25:g8rc

After RA

renamable $r3 = RLWINM killed renamable $r3, 8, 24, 31, implicit-def $x3
renamable $x3 = RLWINM8 killed renamable $x3, 0, 27, 27

We will treat them as DefMI and MI after RA, but they shouldn't be folded. Please correct me if I'm wrong.

shchenz added inline comments.Oct 25 2020, 8:13 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3273

This is redundant or can be changed to an assert?

3283

Thanks, I see, this is just a post-ra issue. Before RA, there are always register copy instructions between RLWINM and RLWINM8.

After Post-RA,
1: for RLWINM8(RLWINM), it should be ok to fold it to single RLWINM8,
2: for RLWINM(RLWINM8), it should be ok to fold it to a single RLWINM.

Esme added inline comments.Oct 25 2020, 8:45 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3283

Thanks! I will mark this as a follow-up work after this patch submitted.

Esme updated this revision to Diff 300573.Oct 25 2020, 9:37 PM
steven.zhang accepted this revision.Oct 29 2020, 5:51 PM

LGTM overall with one minor comment.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3432

Can we set the NoUse for pre-ra also,so that just check the NoUse here?

This revision is now accepted and ready to land.Oct 29 2020, 5:51 PM
shchenz accepted this revision.Nov 1 2020, 2:48 AM

LGTM too. Please remember to rebase this patch after you commit NFC patch D89846

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3283

sounds good to me.

Esme added a comment.Nov 2 2020, 11:25 PM

Thanks for your reviews, and I apologize for not responding in time. I will commit this patch after all tests done.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3432

For the scenario of pre-ra, we have to check MRI->use_nodbg_empty(FoldingReg) here instead of setting NoUse before folding as post-ra did, since MRI->use_nodbg_empty(FoldingReg) is false before folding MI and SrcMI.

This revision was landed with ongoing or failed builds.Nov 2 2020, 11:44 PM
This revision was automatically updated to reflect the committed changes.
Esme updated this revision to Diff 304410.Nov 10 2020, 11:39 PM

In post-RA, if SrcMI also defines the register to be forwarded, we can only do the folding if SrcMI is going to be erased.
For example, SrcMI and MI can't not be folded in the scenario:

$r3 = RLWINM_rec $r3, 27, 5, 10, implicit-def $cr0
dead renamable $r3 = RLWINM_rec killed renamable $r3, 8, 5, 10, implicit-def $cr0
BLR8 implicit $lr8, implicit $rm, implicit killed $cr0

I've run the bootstrap-test as well as SEPC.

Esme reopened this revision.Nov 10 2020, 11:40 PM
This revision is now accepted and ready to land.Nov 10 2020, 11:40 PM
Esme requested review of this revision.Nov 12 2020, 9:41 PM

I saw some testcases are removed compared with your committed version. Is there any reason?

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3332

Can these lines be moved after line 3221? So we will bail out early.

3433

Can we move CanErase assignment for SSA to the place where we assign it for post SSA?

Esme marked an inline comment as not done.Nov 17 2020, 1:44 AM

I saw some testcases are removed compared with your committed version. Is there any reason?

Every specific folding patterns with different masks have been tested in llvm/test/CodeGen/PowerPC/fold-rlwinm.mir
This patch didn't change the patterns, so I supposed it would be clearer to remove some redundant test cases.
Do you think they should be added back?

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3332

It's safe to call SrcMI->getOperand(1).getReg() after filtering the opcode of SrcMI.

3433

The condition MRI->use_nodbg_empty(FoldingReg) has to be checked after the folding of MI and SrcMI.

I saw some testcases are removed compared with your committed version. Is there any reason?

Every specific folding patterns with different masks have been tested in llvm/test/CodeGen/PowerPC/fold-rlwinm.mir
This patch didn't change the patterns, so I supposed it would be clearer to remove some redundant test cases.
Do you think they should be added back?

Understanding your concerns here. I prefer to keep that testcases as well. If new added post ra peephole in pre-emit-peephole pass changes the case, the pre-ra cases will not be aware of? This is up to you.^^

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3332

Is it safe if we add SrcMI->getOperand(1).isReg()?. We can bail out early if the operand 1 of SrcMI is not a register.

3433

Is it ok if we change use_nodbg_empty to hasOneNonDBGUse()?

Esme updated this revision to Diff 306584.Nov 19 2020, 7:31 PM

Updated as Zheng's suggestions.

shchenz accepted this revision.Nov 19 2020, 11:12 PM

LGTM. Thanks for the new fix.

This revision is now accepted and ready to land.Nov 19 2020, 11:12 PM
This revision was landed with ongoing or failed builds.Nov 21 2020, 11:38 PM
This revision was automatically updated to reflect the committed changes.
lkail added a subscriber: lkail.Dec 16 2020, 6:44 PM
lkail added inline comments.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3273

I suggest checking Register::isPhysicalRegister explicitly, since Reg might also be a stackslot.

3304

Suggest using modifiesRegister or it'll miss case like $x3 = rlwinm $r3, ...

shchenz added inline comments.Dec 16 2020, 7:08 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3273

Could you please explain more here? this is after RA, if FoldingReg is a stackslot, we should get r1/x1? I don't understand how Register::isPhysicalRegister would change the semantic here? Thanks.

3304

That would not happen, for RLWINM, we get input and output are both r3, for RLWINM8 we get input and output are both x3.

lkail added inline comments.Dec 16 2020, 7:32 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3273

If it's the case, an assertion here would be better.

Esme updated this revision to Diff 313010.Dec 20 2020, 6:29 PM

Check if the register used can be forward.

Esme reopened this revision.Dec 20 2020, 7:27 PM
This revision is now accepted and ready to land.Dec 20 2020, 7:27 PM
Esme requested review of this revision.Dec 20 2020, 7:28 PM

I have added a number of comments to code that isn't part of this review which may seem odd. However, you are adding a fair amount of code to a function that is already very long/complex and rather than doing so, I'd prefer that you refactor the original function so the new code integrates more seamlessly.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3283

This is a bit of a bizarre set up that makes it hard to read. How about skipping these switch statements and just doing what you're after:

  1. Find the source MI
  2. Check if they're both 32-bit or both 64-bit instructions

For 2. you can do something like:

static bool sameWidthRLWINM(MachineInstr *SrcMI, MachineInstr *UseMI,
                            bool &Is64Bit) {
  unsigned SrcOpc = SrcMI->getOpcode();
  unsigned UseOpc = UseMI->getOpcode();
  if ((SrcOpc == PPC::RLWINM8 || SrcOpc == PPC::RLWINM8_rec) &&
      (UseOpc == PPC::RLWINM8 || UseOpc == PPC::RLWINM8_rec)) {
    Is64Bit = true;
    return true;
  }
  if ((SrcOpc == PPC::RLWINM || SrcOpc == PPC::RLWINM_rec) &&
      (UseOpc == PPC::RLWINM || UseOpc == PPC::RLWINM_rec))
    return true;
  Is64Bit = false;
  return false;
}
3347

Please describe this "special case" since it appears to be two special cases:
The first part means all bits in a 64-bit register and the second part means the low 32 bits in a 64-bit register.

3356

I am not sure what the intended meaning of lowerest is here, but please use something like least significant bit.

3366

Why can't this whole block be replaced by a call to replaceInstrWithLI()?

3396

I find it unlikely that clang-format produced this formatting.

3408

No need, but is it incorrect to update them? Namely, won't the NewMB/NewME be the same as MBMI/MEMI respectively? If so, I think it is preferable to avoid the condition and leave it to the reader to figure out.

3414

In general, I find that manually messing with kill flags when changing the instructions to be an error-prone thing that should be avoided if possible? Is it possible to re-run the pass that computes kill flags after the peephole?

For example I think that here, if MI.getOperand(1).isKill() before the change, we will not update the kill flag on the previous use of that register so it will be considered live after MI whereas before it wasn't. This is of course not semantically incorrect, but may hinder some optimizations.

llvm/test/CodeGen/PowerPC/vsx_builtins.ll
138

It is not really related to this patch, but what is this test case supposed to do? Why would we produce a 4-bit field in a CR, move it to a GPR and then right shift by 4 (clearly shifting out all the bits)?

Esme updated this revision to Diff 313494.Dec 23 2020, 12:06 AM

Addressed nemanjai's comments.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3408

It's incorrect to update MBMI/MEMI with NewMB/NewME if the SrcMI mask is full, since isRunOfOnes(FinalMask) may not hold true in such scenario. Well might it be more clear to replace no need with don't?

3414

Good catch! However I have no idea which pass can solve the problem. It seems that LiveInternals Pass will computes kill flags, but it can't handle the case you mentioned. Do you have any idea about this?

llvm/test/CodeGen/PowerPC/vsx_builtins.ll
138

Yes. you're right. lol... This test case was designed to show the folding patter, rlwinm + rlwinm -> li 0 in post-ra.

Esme updated this revision to Diff 496818.Feb 12 2023, 6:18 PM

Rebase.
LIT, LNT and Bootstrap are clean.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2023, 6:18 PM
Esme added a comment.Aug 20 2023, 10:18 PM

gentle ping