This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix bugs in sign-/zero-extension elimination
ClosedPublic

Authored by stefanp on Nov 28 2017, 6:20 AM.

Details

Summary

This patch fixes the following two bugs in PPCInstrInfo::isSignOrZeroExtended helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass.

  • Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer).
  • Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension).

To handle the first case, the parameter for the helpers is changed from MachineInstr to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case.

Diff Detail

Event Timeline

inouehrs created this revision.Nov 28 2017, 6:20 AM
nemanjai requested changes to this revision.Nov 29 2017, 3:58 AM

To be honest, this code seems even more counter-intuitive to me now. I initially thought that this should essentially be a known-bits calculation on virtual registers and suggested that. With these changes, the code even more closely resembles that, but in a very roundabout fashion. Seems like it would be much simpler to implement and follow this if we just built and maintained a map from virtual registers to known zero/sign bits. Then all we need to check in order to decide if we want to get rid of an EXTS[BHW] is whether the correct number of leading bits are known to be sign bits. Similarly, we'd know if it is safe to remove a CLRLDI/RLWINM/... if the correct number of leading bits are known to be zero. Furthermore, with this approach we wouldn't need to consider the depth that we look through.

Of course, we can leave that as a future simplification.

The reason I've requested changes is to get rid of the debugging code and change the names of the functions.

lib/Target/PowerPC/PPCInstrInfo.cpp
2164 ↗(On Diff #124562)

Similarly to isZeroExtendingOp(), this name no longer makes sense.

2215 ↗(On Diff #124562)

This name no longer makes sense. This is really a query for whether or not the virtual register contains a value that is zero-extended.

lib/Target/PowerPC/PPCMIPeephole.cpp
196 ↗(On Diff #124562)

This whole section needs to be cleaned up. Lines too long. Code commented-out. Pre-processor conditions. Presumably leftovers from debugging during development.

Actually, on second look, this injects trap instructions for debugging and should probably just be removed.

This revision now requires changes to proceed.Nov 29 2017, 3:58 AM
inouehrs updated this revision to Diff 124721.Nov 29 2017, 4:36 AM
inouehrs edited edge metadata.
  • addressed comments from @nemanjai
  • rebased to the latest code base and fixed merge conflict
inouehrs marked 3 inline comments as done.Nov 29 2017, 4:41 AM
inouehrs added inline comments.
lib/Target/PowerPC/PPCInstrInfo.cpp
2164 ↗(On Diff #124562)

I changed the method name. Do you have idea on an alternative name?

lib/Target/PowerPC/PPCMIPeephole.cpp
196 ↗(On Diff #124562)

Thank you so much for pointing this out. I removed debug code section.

inouehrs updated this revision to Diff 125162.Dec 1 2017, 9:23 AM
inouehrs marked 2 inline comments as done.
  • fixed a bug
hfinkel added inline comments.Dec 14 2017, 3:50 PM
lib/Target/PowerPC/PPCInstrInfo.h
306 ↗(On Diff #125162)

Comment out of date?

313 ↗(On Diff #125162)

Comment out of date.

jtony added a comment.Dec 20 2017, 7:29 AM

As you mentioned some registers defined by ORIS/ANDIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension). Can you provide some details (maybe in a comment) about how they depend on the immediate? This would be helpful for future maintainers.

lib/Target/PowerPC/PPCInstrInfo.cpp
2164 ↗(On Diff #125162)

We need to update this function name.

2201 ↗(On Diff #125162)

Can you put a comment here to explain why only when the the immediate is great or equal than 33 the value is sign-extended?

inouehrs updated this revision to Diff 130561.Jan 18 2018, 11:19 PM
  • rebased on the latest code
inouehrs updated this revision to Diff 132808.Feb 5 2018, 4:21 AM

update MIR test case to comply with the recent change in the sigil for MIR physical register

nemanjai added inline comments.Feb 5 2018, 9:57 AM
lib/Target/PowerPC/PPCInstrInfo.cpp
3048 ↗(On Diff #132808)

I think the number of opcodes here is large enough to warrant storing them in a data structure with efficient searching such as a dense set. In addition to readability at the point where you're querying this, it can be very clear from the name what the data structure contains.

For example, if you have a container called OpcodesZExt32To64, it is very clear what it is for and any future code that needs such functionality can easily access that container rather than re-writing this whole list.

Of course, the same applies to the similar list for zero extending ops.

3249 ↗(On Diff #132808)

Seems a bit misleading to have a depth parameter that is not actually respected if we're looking through a chain of copies. Although of course, deep chains of copies shouldn't happen in practice.

3254 ↗(On Diff #132808)

Why does the register input matter for and-immediate?

3261 ↗(On Diff #132808)

This recursive call as well as the next also don't check the depth, right? I realize that these immediate-form binary logical ops won't be chained, but it's still good to add a comment explaining we don't need to limit the depth on such ops.

3317 ↗(On Diff #132808)

Comment with the assert.

lib/Target/PowerPC/PPCInstrInfo.h
343 ↗(On Diff #132808)

I still think this setup with separate queries for sign/zero/any extend seems unnecessarily complex. Seems like it would be clearer if we had an enum for the extension type (sign, zero, none) and we just have a single query that will return this. Therefore no need for Boolean parameters, no need for multiple functions, etc.

inouehrs updated this revision to Diff 132996.Feb 6 2018, 7:24 AM

addressed comments

inouehrs updated this revision to Diff 132997.Feb 6 2018, 7:27 AM
inouehrs edited the summary of this revision. (Show Details)
inouehrs marked 8 inline comments as done.Feb 6 2018, 7:34 AM
inouehrs added inline comments.
lib/Target/PowerPC/PPCInstrInfo.cpp
3249 ↗(On Diff #132808)

I changed the names of variable and constant to indicate they shows the depth of binary operators.

3254 ↗(On Diff #132808)

Good catch. I fixed.

3317 ↗(On Diff #132808)

I added comment, but I feel this assert is not necessary.

lib/Target/PowerPC/PPCInstrInfo.h
343 ↗(On Diff #132808)

I agree that the single query approach is cleaner. But I am afraid that it will increase the cost of compilation slightly since it always checks for both sign- and zero-extensions. Do you think the potential increase in compilation cost does not matter too much here?

nemanjai added inline comments.Feb 7 2018, 12:39 PM
lib/Target/PowerPC/PPCInstrInfo.h
343 ↗(On Diff #132808)

It seems to me that the checks are not costly except in chained operations cases (which we traverse the chain as it is). Or am I missing something. What is the worst case complexity of the computations for each?

inouehrs updated this revision to Diff 133414.Feb 8 2018, 6:46 AM
inouehrs marked 3 inline comments as done.
  • changed the return value of isSignOrZeroExtended to std::pair<bool, bool>
inouehrs marked an inline comment as done.Feb 8 2018, 6:48 AM
inouehrs added inline comments.
lib/Target/PowerPC/PPCInstrInfo.h
343 ↗(On Diff #132808)

Thanks. I modified the helper to return std::pair<bool, bool>.
Do you prefer an enum?

inouehrs updated this revision to Diff 142429.Apr 13 2018, 9:56 AM
inouehrs marked an inline comment as done.
  • rebased to the latest tree and ran tests again
  • fix formatting
inouehrs updated this revision to Diff 164397.Sep 7 2018, 6:09 AM
inouehrs removed a reviewer: jtony.
  • rebased to the latest source (fix conflicts and update unit tests)
amyk commandeered this revision.May 13 2019, 9:41 AM
amyk added a reviewer: inouehrs.
amyk added a subscriber: amyk.

Hi @inouehrs, I will be commandeering this revision since we are revisiting this patch.

@amyk I see. Please let me know you need some help.

amyk updated this revision to Diff 201651.May 28 2019, 6:33 AM
amyk added a subscriber: nemanjai.
  • Rebased to the latest source (fix conflicts, ran functional tests)
  • Fixed llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll to remove redundant sign extend instruction
  • Added the following check in PPCMIPeephole.cpp to only perform the LWZ->LWA transformation when the displacement is a multiple of 4
// The transformation from a zero-extending load to a sign-extending
// load is only legal when the displacement is a multiple of 4.
// If the displacement is not at least 4 byte aligned, don't perform
// the transformation.
if (SrcMI->getOperand(1).isGlobal() &&
    SrcMI->getOperand(1).getGlobal()->getAlignment() < 4)
  break;
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2019, 6:33 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
nemanjai accepted this revision.Jun 3 2019, 4:10 AM

LGTM.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
5316

The bitwise or seems like an odd choice here since IsSExt/IsZExt shouldn't possibly be true here and also I think logical or seems more natural for bool variables.
The change to logical or doesn't require another revision.

5328

These can probably be logical or's as well. Same with the below ones.

5369

Please remove this todo.

This revision is now accepted and ready to land.Jun 3 2019, 4:10 AM
stefanp commandeered this revision.Dec 2 2021, 6:56 AM
stefanp edited reviewers, added: amyk; removed: stefanp.

Taking over this patch to rebase it on top of current code and to see if it can be eventually checked in.

stefanp updated this revision to Diff 391319.Dec 2 2021, 7:02 AM

Rebased the patch to top of trunk.
A number of test cases had to be re-generated. This rebase shouldn't really
change what the patch does.

stefanp updated this revision to Diff 391321.Dec 2 2021, 7:07 AM

Fixed some formatting issues.

stefanp updated this revision to Diff 393264.Dec 9 2021, 1:06 PM

Found that POPCNTW instruction does not ensure that the upper 32 bits
are cleared. As a result I have removed it from the list.

stefanp updated this revision to Diff 394359.Dec 14 2021, 12:49 PM

Fixed a typo in the handling of the PPC::AND.

stefanp updated this revision to Diff 397308.Jan 4 2022, 8:14 AM

Added more early breaks to make sure that we do not convert from a zero extending
load to a sign extending load with an invalid displacement.

amyk added inline comments.Jan 7 2022, 7:11 AM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
5091–5092

Question: should we update this to say if the register and not machine instruction since it is changed to Reg for the parameter?
Or should it stay as machine instruction since you're checking for the different MachineInstr opcodes in this function?
The same question I have applies to definedByZeroExtendingOp().

5130
5316

If the change to logical or still applies, we should probably update these accordingly.

stefanp updated this revision to Diff 400676.Jan 17 2022, 4:58 PM

Updated a couple of comments and changed the OR.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
5091–5092

I think it should probably say "machine instruction that defines this register" because that is what we are really interested in. We want to make sure that this register is already sign extended by the instruction that defines it. I'm going to re-write the comment to make it clear.

5316

Sure. I can change it to logical.

stefanp updated this revision to Diff 409971.Feb 18 2022, 10:20 AM

Rebased this patch to the top of main and retested.

stefanp planned changes to this revision.Apr 11 2022, 1:59 PM

Need to rebase this patch again. Will request another review once I have done that.

Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2022, 1:59 PM
stefanp updated this revision to Diff 422500.Apr 13 2022, 6:30 AM

Rebased patch to top of trunk.

This revision is now accepted and ready to land.Apr 13 2022, 6:30 AM
stefanp requested review of this revision.Apr 13 2022, 6:34 AM

Fairly significant changes have been made since I took over this patch and I would prefer to have another set of eyes take a look.
I will request a new review.

Sorry, I had some comments that I haven't submitted because I had not gotten through the entire patch. I am not sure if they all still apply but thought I'd submit them before restarting the review.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
5092–5121

I think all explanations of what sign-extending means should be removed and you should just note that this is a test explicitly for sign extension from 32 to 64 bits.

5106

Rather than having tables of these opcodes in the C++ code, we should probably just add a couple of flags in our .td files for the two types of extension. Perhaps isSExt32To64, isZExt32To64. Then we can simply query that on the MI.

Of course, the ones below this set probably have to stay because they have further constraints.

5120

Is there no DenseSet<>::count()?

5289

This block ends up getting very deeply nested and hard to follow. Can we flip the conditions and do early exits? Something like:

if (SrcReg != PPC::X3)
  return ...
...
if previous instr is not the stack adjust or the one before is not a call...
  return ...

and so on.

5314–5315

This code might need to move up if we are flipping the previous condition.

llvm/lib/Target/PowerPC/PPCInstrInfo.h
600–602
// Return true if the register is sign-extended from 32 to 64 bits.
607–609
// Return true if the register is zero-extended from 32 to 64 bits.
stefanp updated this revision to Diff 422958.Apr 14 2022, 1:57 PM

Added two new flags ZExt32To64 and SExt32To64.
Marked instructions that zero extend or sign extend in the TD files with those
flags.
Fixed a few comments as well.
Tried to extract an if statement to reduce the indentation a little.

stefanp updated this revision to Diff 423390.Apr 18 2022, 7:22 AM

Rebased the patch to top of trunk again.

stefanp updated this revision to Diff 430795.May 19 2022, 1:41 PM

Rebased to top of main branch.

amyk added inline comments.May 30 2022, 8:48 PM
llvm/lib/Target/PowerPC/PPCInstrFormats.td
48 ↗(On Diff #423390)
53 ↗(On Diff #423390)
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
5155–5167
5155–5167

A minor nit on this comment.

5196–5219

Question: This is something super minor, but is it better to put the check for isZExt32To64() earlier in the function?

I know it may not matter, but I am just wondering in case it may be better to do this prior to all of the different opcode checks, and since the definedBySignExtendingOp also put it's isSExt32To64() pretty early in the function.

In any case, this is just a question so if it's not applicable, you can feel free to disregard this.

stefanp updated this revision to Diff 433142.May 31 2022, 11:10 AM

Rebased the patch to top of main branch.
Updated a couple of typos.
Moved a check earlier in the function.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
5196–5219

Question: This is something super minor, but is it better to put the check for isZExt32To64() earlier in the function?

I know it may not matter, but I am just wondering in case it may be better to do this prior to all of the different opcode checks, and since the definedBySignExtendingOp also put it's isSExt32To64() pretty early in the function.

In any case, this is just a question so if it's not applicable, you can feel free to disregard this.

I don't think it matters in this case. For this check we need to get TII so maybe it's better to check other things first that don't require the instruction info? That may save us from having to get the additional information in a limited number of cases. On the other hand we are just checking a single flag for a large number of instructions.
Anyway, I really don't feel strongly about it one way or another. I'm going to move it up just because it is nicer style to have the two functions a little more similar.

amyk added a comment.Jun 14 2022, 7:59 PM

Thanks for addressing my previous comments, Stefan.
I don't believe I have any other further comments. Unless if Nemanja has any additional comments, this personally LGTM.

As tends to be the case with huge patches such as this, it is difficult to get this patch to converge and it still needs a bit of work. But I think it is pretty close overall.

llvm/lib/Target/PowerPC/PPCInstr64Bit.td
897 ↗(On Diff #423390)

I don't follow why this is only marked as zero-extended (whereas for example, andi. is marked as both). The 32-bit sign bit can only be a zero, so a sign-extend from 32-bits (extsw) is guaranteed to essentially be a nop.

962 ↗(On Diff #423390)

Similar comment as for the word versions regarding sign extension..

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
5135

s/masked/masked out

5196–5220

Just for consistency, this check is first in the sign extending check... You should probably move it up in this function so it is consistent with that.

5263

I still don't like the deep nesting in this block. It seems like we can get rid of all that deep nesting by rewriting this block as:

case PPC::COPY: {
  Register SrcReg = MI->getOperand(1).getReg();

  // In both ELFv1 and v2 ABI, method parameters and the return value
  // are sign- or zero-extended.
  const MachineFunction *MF = MI->getMF();

  if (!MF->getSubtarget<PPCSubtarget>().isSVR4ABI()) {
    // If this is a copy from another register, we recursively check source.
    auto SrcExt = isSignOrZeroExtended(SrcReg, BinOpDepth, MRI);
    return std::pair<bool, bool>(SrcExt.first || IsSExt,
                                 SrcExt.second || IsZExt);
  }

  const PPCFunctionInfo *FuncInfo = MF->getInfo<PPCFunctionInfo>();
  // We check the ZExt/SExt flags for a method parameter.
  if (MI->getParent()->getBasicBlock() ==
      &MF->getFunction().getEntryBlock()) {
    Register VReg = MI->getOperand(0).getReg();
    if (MF->getRegInfo().isLiveIn(VReg)) {
      IsSExt |= FuncInfo->isLiveInSExt(VReg);
      IsZExt |= FuncInfo->isLiveInZExt(VReg);
      return std::pair<bool, bool>(IsSExt, IsZExt);
    }
  }

  // For a method return value, we check the ZExt/SExt flags in attribute.
  // We assume the following code sequence for method call.
  //   ADJCALLSTACKDOWN 32, implicit dead %r1, implicit %r1
  //   BL8_NOP @func,...
  //   ADJCALLSTACKUP 32, 0, implicit dead %r1, implicit %r1
  //   %5 = COPY %x3; G8RC:%5
  if (SrcReg != PPC::X3) {
    // If this is a copy from another register, we recursively check source.
    auto SrcExt = isSignOrZeroExtended(SrcReg, BinOpDepth, MRI);
    return std::pair<bool, bool>(SrcExt.first || IsSExt,
                                 SrcExt.second || IsZExt);
  }

  const MachineBasicBlock *MBB = MI->getParent();
  MachineBasicBlock::const_instr_iterator II =
      MachineBasicBlock::const_instr_iterator(MI);
  if (II == MBB->instr_begin() || (--II)->getOpcode() != PPC::ADJCALLSTACKUP)
    return std::pair<bool, bool>(IsSExt, IsZExt);

  const MachineInstr &CallMI = *(--II);
  if (!CallMI.isCall() || !CallMI.getOperand(0).isGlobal())
    return std::pair<bool, bool>(IsSExt, IsZExt);

  const Function *CalleeFn =
      dyn_cast<Function>(CallMI.getOperand(0).getGlobal());
  if (!CalleeFn)
    return std::pair<bool, bool>(IsSExt, IsZExt);

  const IntegerType *IntTy = dyn_cast<IntegerType>(CalleeFn->getReturnType());
  if (IntTy && IntTy->getBitWidth() <= 32) {
    const AttributeSet &Attrs = CalleeFn->getAttributes().getRetAttrs();
    IsSExt |= Attrs.hasAttribute(Attribute::SExt);
    IsZExt |= Attrs.hasAttribute(Attribute::ZExt);
  }
  return std::pair<bool, bool>(IsSExt, IsZExt);
}

I have neither compiled nor tested this, so please scrutinize it carefully rather than assuming that my suggestion is verified.

5315–5318

Can any of this ever produce anything useful? If I am reading this correctly, this is basically:

  • The copy is from X3
  • We check if the definition of X3 is sign/zero extended

But this seems pointless since we won't track anything about what defined X3 here.

Edit: actually, the above is so deeply nested that I missed the fact that this code is for non-SVR4 ABI's.

5328

Is there a reason this comment was not addressed and these are still bitwise OR's?

5360–5361

I understand that these names predate this patch, but since we are modifying it, can we give them more descriptive names? Perhaps OperandEnd, OperandStride?

llvm/lib/Target/PowerPC/PPCInstrP10.td
1322 ↗(On Diff #433142)

Did you mean to mark all of these as ZExt32To64? The possible results are 0/1 for these. So I suppose they are technically both sign and zero extending.
Furthermore, the negative forms (setnbc/setnbcr) are not marked as sign extending even though the values are possible values are -1/0.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
1303 ↗(On Diff #433142)

It would seem that the vextu[bhw][lr]x family of instructions have been missed and they're all zero extending.

llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
786–799

This code is so unnecessarily lambda-ey. Can you please commit an NFC patch (I don't care if it is before or after this patch) to flatten this out?

803–806

Why? LHA is D-Form (not DS-Form) and LHAX of course doesn't care since it is X-Form.

861–864

This is true if we are transforming it to the DS-Form lwa. It shouldn't matter for the X-Form lwax. Perhaps this code should just set a bool variable IsWordAligned or something similar and then below, we break out:

if (IsWordAligned && (Opc == PPC::LWA || Opc == PPC::LWA_32))
stefanp updated this revision to Diff 442887.Jul 7 2022, 6:32 AM

Fixed the zero extend / sign extend flags on a number of instructions.
Tried to flatten out PPCInstrInfo::isSignOrZeroExtended a little.
Removed the restriction on LHA, LHAX, and LWAX.

I will address the issues with too many lambdas in the code in a separate patch.

stefanp updated this revision to Diff 443028.Jul 7 2022, 12:19 PM

Removed the zero extend and sign extend from popcntw and popcntb and I added
them incorrectly.

nemanjai accepted this revision.Aug 4 2022, 6:35 AM

The remaining comments are minor nits that can be addressed on the commit. Please don't forget the follow-up I requested at https://reviews.llvm.org/D40554#inline-1239933.
Other than those, LGTM.

Thanks for your patience with this patch.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
5174–5178

Since we repeat this code (the subword part), maybe it would be cleaner to extract it into a static local function - something like:

static bool isOpZeroOfSubwordPreincLoad(const MachineInstr &MI)

Of course, since you still need the LWZU variants, it may not be worth it.
This is a minor stylistic comment that shouldn't delay approval.

5312

I am a little unclear on the use of BinOpDepth in this function. It seems very inconsistent. Wouldn't it make sense to always add one for the recursive call as well as to check whether it has been exceeded prior to the call? We seem to do that when checking for reg+reg instructions but not for reg+imm instructions. Is that because a single recursive call avoids exponential recursion whereas the two calls for reg+reg instructions don't? If so, can you please add a comment to that end at the top of the function?

5318

Super minor nit: since we construct the same pair in 3 different early exits as well as at the end, maybe just construct it once and return it in each location (lines 5320, 5324, 5329, 5340).
There is no functional or performance difference, but it is very clear that what we are returning is the exact same thing and couldn't have changed.

5320–5325

Since the comment above this block describes the code below, maybe just move this block above the comment.

5327

I think it may be possible for CallMI.getOperand(0).getGlobal() to return null under some odd circumstances. It would probably be safer to use dyn_cast_if_present here rather than dyn_cast.

5339

This comment is out of date now.

This revision is now accepted and ready to land.Aug 4 2022, 6:35 AM
stefanp updated this revision to Diff 453676.Aug 18 2022, 8:43 AM

Rebased the patch to top of trunk.
Addressed final comments from reviewer.
This is mostly adding or fixing comments and code cleanup.

stefanp updated this revision to Diff 453755.Aug 18 2022, 12:34 PM

Did a final format fix.

This revision was automatically updated to reflect the committed changes.