This patch depends on D89846.
We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
Details
- Reviewers
shchenz steven.zhang nemanjai qiucf - Group Reviewers
Restricted Project - Commits
- rG1c0941e1524f: [PowerPC] Extend folding RLWINM + RLWINM to post-RA.
rG119ab2181e6e: [PowerPC] Extend folding RLWINM + RLWINM to post-RA.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
It would be great if we could split this patch into two parts:
1: extend rlwinm + rlwinm folding to post-ra and add some post-ra testcases, MIR cases are better.
2: add new simplication for rlwinm + andi_rec and add some post-ra and pre-ra testcases.
Some nits.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3893–3918 | Comments need to be updated. | |
3897 | How about use the MachineInstr *&ToErase to avoid the null check and default arguments ? | |
3921 | It is preferred to initialize the stack variable. | |
3922 | Remove such kind of comments as the code is self explained. | |
3927 | Common line 3209 and 3217 then, add a check in 3220 | |
3939 | A better code style to avoid the duplicate check of the SrcOpCode. switch (SrcOpCode) { case RLWINM8 case RLWINM8_rec: Is64Bit = true; break; case ... } | |
4080 | null check? | |
llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp | ||
41 | Change the name as it is extended to all rotate combining. | |
519 | No {} if there is only one statement. |
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3893–3918 | The strange character before "We" | |
3898 | Can we initial Is64Bit to true or false and then we only need to handle 32-bit or 64-bit opcodes in following switch? | |
3931 | Any special reason that we don't do the folding after RA when there are multiple uses for SrcMI? Before RA, we do such kind of transformation as it is still benefit. We eliminate the dependancy between SrcMI and MI. | |
3938 | What case will have rlwinm input is rlwinm8 or rlwinm8 input is rlwinm? Just curious. | |
4079 | Same as above, after RA, even SrcMI has multiple uses, we still should do the transform? |
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3938 | There is a case in llvm/test/CodeGen/PowerPC/popcnt-zext.ll declare i16 @llvm.ctpop.i16(i16) nounwind readnone define i64 @popa_i16_i64(i16 %x) { %pop = call i16 @llvm.ctpop.i16(i16 %x) %z = zext i16 %pop to i64 ; SimplifyDemandedBits may turn zext (or sext) into aext %a = and i64 %z, 16 ret i64 %a } llc -verify-machineinstrs -mtriple=powerpc64-- -mattr=+slow-popcntd < cnt16.ll -debug 368B %22:gprc = RLWINM %21:gprc, 8, 24, 31 384B undef %23.sub_32:g8rc = COPY %22:gprc 400B %25:g8rc = RLWINM8 %23:g8rc, 0, 27, 27 416B $x3 = COPY %25:g8rc To undef %23.sub_32:g8rc = RLWINM %21:gprc, 8, 24, 31 %25:g8rc = RLWINM8 %23:g8rc, 0, 27, 27 $x3 = COPY %25:g8rc After RA renamable $r3 = RLWINM killed renamable $r3, 8, 24, 31, implicit-def $x3 renamable $x3 = RLWINM8 killed renamable $x3, 0, 27, 27 We will treat them as DefMI and MI after RA, but they shouldn't be folded. Please correct me if I'm wrong. |
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3928 | This is redundant or can be changed to an assert? | |
3938 | Thanks, I see, this is just a post-ra issue. Before RA, there are always register copy instructions between RLWINM and RLWINM8. After Post-RA, |
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3938 | Thanks! I will mark this as a follow-up work after this patch submitted. |
LGTM overall with one minor comment.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
4078 | Can we set the NoUse for pre-ra also,so that just check the NoUse here? |
Thanks for your reviews, and I apologize for not responding in time. I will commit this patch after all tests done.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
4078 | For the scenario of pre-ra, we have to check MRI->use_nodbg_empty(FoldingReg) here instead of setting NoUse before folding as post-ra did, since MRI->use_nodbg_empty(FoldingReg) is false before folding MI and SrcMI. |
In post-RA, if SrcMI also defines the register to be forwarded, we can only do the folding if SrcMI is going to be erased.
For example, SrcMI and MI can't not be folded in the scenario:
$r3 = RLWINM_rec $r3, 27, 5, 10, implicit-def $cr0 dead renamable $r3 = RLWINM_rec killed renamable $r3, 8, 5, 10, implicit-def $cr0 BLR8 implicit $lr8, implicit $rm, implicit killed $cr0
I've run the bootstrap-test as well as SEPC.
Every specific folding patterns with different masks have been tested in llvm/test/CodeGen/PowerPC/fold-rlwinm.mir
This patch didn't change the patterns, so I supposed it would be clearer to remove some redundant test cases.
Do you think they should be added back?
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3983 | It's safe to call SrcMI->getOperand(1).getReg() after filtering the opcode of SrcMI. | |
4079 | The condition MRI->use_nodbg_empty(FoldingReg) has to be checked after the folding of MI and SrcMI. |
Understanding your concerns here. I prefer to keep that testcases as well. If new added post ra peephole in pre-emit-peephole pass changes the case, the pre-ra cases will not be aware of? This is up to you.^^
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3983 | Is it safe if we add SrcMI->getOperand(1).isReg()?. We can bail out early if the operand 1 of SrcMI is not a register. | |
4079 | Is it ok if we change use_nodbg_empty to hasOneNonDBGUse()? |
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3928 | Could you please explain more here? this is after RA, if FoldingReg is a stackslot, we should get r1/x1? I don't understand how Register::isPhysicalRegister would change the semantic here? Thanks. | |
3951 | That would not happen, for RLWINM, we get input and output are both r3, for RLWINM8 we get input and output are both x3. |
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3928 | If it's the case, an assertion here would be better. |
I have added a number of comments to code that isn't part of this review which may seem odd. However, you are adding a fair amount of code to a function that is already very long/complex and rather than doing so, I'd prefer that you refactor the original function so the new code integrates more seamlessly.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
3938 | This is a bit of a bizarre set up that makes it hard to read. How about skipping these switch statements and just doing what you're after:
For 2. you can do something like: static bool sameWidthRLWINM(MachineInstr *SrcMI, MachineInstr *UseMI, bool &Is64Bit) { unsigned SrcOpc = SrcMI->getOpcode(); unsigned UseOpc = UseMI->getOpcode(); if ((SrcOpc == PPC::RLWINM8 || SrcOpc == PPC::RLWINM8_rec) && (UseOpc == PPC::RLWINM8 || UseOpc == PPC::RLWINM8_rec)) { Is64Bit = true; return true; } if ((SrcOpc == PPC::RLWINM || SrcOpc == PPC::RLWINM_rec) && (UseOpc == PPC::RLWINM || UseOpc == PPC::RLWINM_rec)) return true; Is64Bit = false; return false; } | |
3999 | Please describe this "special case" since it appears to be two special cases: | |
4007–4008 | I am not sure what the intended meaning of lowerest is here, but please use something like least significant bit. | |
4018–4019 | Why can't this whole block be replaced by a call to replaceInstrWithLI()? | |
4043 | I find it unlikely that clang-format produced this formatting. | |
4053–4054 | No need, but is it incorrect to update them? Namely, won't the NewMB/NewME be the same as MBMI/MEMI respectively? If so, I think it is preferable to avoid the condition and leave it to the reader to figure out. | |
4059–4060 | In general, I find that manually messing with kill flags when changing the instructions to be an error-prone thing that should be avoided if possible? Is it possible to re-run the pass that computes kill flags after the peephole? For example I think that here, if MI.getOperand(1).isKill() before the change, we will not update the kill flag on the previous use of that register so it will be considered live after MI whereas before it wasn't. This is of course not semantically incorrect, but may hinder some optimizations. | |
llvm/test/CodeGen/PowerPC/vsx_builtins.ll | ||
167 | It is not really related to this patch, but what is this test case supposed to do? Why would we produce a 4-bit field in a CR, move it to a GPR and then right shift by 4 (clearly shifting out all the bits)? |
Addressed nemanjai's comments.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
4053–4054 | It's incorrect to update MBMI/MEMI with NewMB/NewME if the SrcMI mask is full, since isRunOfOnes(FinalMask) may not hold true in such scenario. Well might it be more clear to replace no need with don't? | |
4059–4060 | Good catch! However I have no idea which pass can solve the problem. It seems that LiveInternals Pass will computes kill flags, but it can't handle the case you mentioned. Do you have any idea about this? | |
llvm/test/CodeGen/PowerPC/vsx_builtins.ll | ||
167 | Yes. you're right. lol... This test case was designed to show the folding patter, rlwinm + rlwinm -> li 0 in post-ra. |
How about use the MachineInstr *&ToErase to avoid the null check and default arguments ?