Page MenuHomePhabricator

[PowerPC] Add linker opt for PC Relative GOT indirect accesses
ClosedPublic

Authored by stefanp on May 13 2020, 7:52 AM.

Details

Summary

A linker optimization is available on PowerPC for GOT indirect PCRelative loads.

The idea is that we can mark a usual GOT indirect load:

pld 3, vec@got@pcrel(0), 1
lwa 3, 4(3)

With a relocation to say that if we don't need to go through the GOT we can let the linker further optimize this and replace a load with a nop.

  pld 3, vec@got@pcrel(0), 1
.Lpcrel1:
.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
  lwa 3, 4(3)

Which will eventually turn it into this by the linker (if we don't actually need to go through the GOT):

plwa 3, vec@got@pcrel+4(0), 1
nop

To create those relocations on the compiler side this patch adds an MCSymbol operand on the pld and the subsequent load or store that depends on it.
This symbol is then read in the MC layer and is used to match the two instructions together.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
NeHuang added inline comments.
llvm/lib/Target/PowerPC/PPCMCInstLower.cpp
108

We can merge it with the check above (97-103) since it also checks for isUsingPCRelativeCalls()

llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
415 ↗(On Diff #263713)

nit: static uint64_t UseCounter = 0;

llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll
7

Do we need to add a test description here?

stefanp updated this revision to Diff 265998.May 25 2020, 3:55 AM

Address review comments.

Which will eventually turn it into this by the linker (if we don't actually need to go through the GOT):

plwa 3, vec@got@pcrel+4(0), 1
nop

Which instruction gets replaced by a nop, the pld or the lwa?

llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
401 ↗(On Diff #265998)

I think the hasPCRelativeForm check subsumes a bunch of the other checks: for example, if an instruction has one of the listed opcodes, it can't be a call.

amyk added a subscriber: amyk.May 27 2020, 2:55 PM
amyk added inline comments.
llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
95

Maybe add a short comment here about the relocation printing, since other cases are also documented.

llvm/lib/Target/PowerPC/PPCInstrPrefix.td
504 ↗(On Diff #265998)

nit: space after the :.

llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
405 ↗(On Diff #265998)

Would it be good to check if it is indeed a store instruction? Something like Use.mayStore(), though I realize it might be a bit redundant if we reached this part of the code.

Which will eventually turn it into this by the linker (if we don't actually need to go through the GOT):

plwa 3, vec@got@pcrel+4(0), 1
nop

Which instruction gets replaced by a nop, the pld or the lwa?

The pld is replaced by the plwa and the lwa is replaced by the nop.

Then your code is missing some checks: you're assuming the operand and the result of the load will be allocated in the same register, without actually verifying that anywhere.

Just looked at what I wrote, and the issue I'm looking for isn't clear.

The code doesn't require that requiring that the pld and the lwa

And now I somehow submitted before I was done writing my comment. Trying again.


Just looked at what I wrote, and the issue I'm seeing isn't clear.

The code doesn't require that the pld and the lwa are adjacent. Therefore, it's possible that some instruction between the pld and the lwa writes to the destination register of the lwa. If there is such an instruction, the linker optimization would cause a miscompile: the loaded value would be clobbered.

stefanp marked 2 inline comments as done.May 28 2020, 6:49 PM

Just looked at what I wrote, and the issue I'm seeing isn't clear.

The code doesn't require that the pld and the lwa are adjacent. Therefore, it's possible that some instruction between the pld and the lwa writes to the destination register of the lwa. If there is such an instruction, the linker optimization would cause a miscompile: the loaded value would be clobbered.

You are correct I have not checked for that. I will have to think about how to resolve the issue because I add the linker opt flag to the instruction before register allocation so I can't just check that the registers are the same at that time.
In the meantime, I will fixup the feedback from the other comments.

I could re-verify the instruction after register allocation in another peephole perhaps...

llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
401 ↗(On Diff #265998)

You are correct, the previous checks are redundant. I'll remove all four of the previous checks.

405 ↗(On Diff #265998)

That's a good question. I don't think we need to check if this is a store because:

  1. The check hasPCRelativeForm limits the valid instructions for this.

AND

  1. If this is a load from the valid instruction pool then operand zero is not a use (And we know that Use is a use.)
stefanp updated this revision to Diff 267109.May 28 2020, 6:52 PM

Addressed some of the review comments.
Added code that I had meant to have in this patch but forgot.

Still need to address the concern where the output for the second load is clobbered between the two loads.

stefanp updated this revision to Diff 267685.Jun 1 2020, 11:56 AM

Moved the linker opt detection into the pre-emit peephole.
This should solve the problem where instructions between the PLD and the second load/store make the optimization unsafe.

Can the subject line of the revision characterize the addition as something more specific than a "new linker optimization"?

anil9 added a subscriber: anil9.Jun 8 2020, 11:39 PM
anil9 added inline comments.
llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp
148

nit : Obtian -> Obtaini

166

Can this -8 be handled in some other way, like defined above as something. Looks a bit like a magic number.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
49

Indent ?

222

nit:

We can get one operand, check condition and exit, instead of getting both of them and then checkiing one by one.

237

This did not fit in one line ?

256

" if this is valid to mark it as a "

llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll
7

enabled, it is

stefanp retitled this revision from [PowerPC] Add new linker optimization for PowerPC to [PowerPC] Add linker opt for PC Relative GOT indirect accesses.Jun 9 2020, 9:02 AM
amyk added inline comments.Jun 10 2020, 2:04 PM
llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp
137

Could you rephrase this sentence? Not sure if it's just me but it seems a little confusing to read.

149

s/instrctions/instructions

156

s/instrctions/instructions

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
251

Could we do something like

if (!UseFound.isValid())
  continue;

maybe?

stefanp set the repository for this revision to rG LLVM Github Monorepo.Jun 11 2020, 2:32 PM
stefanp marked 3 inline comments as done.Jun 12 2020, 4:53 AM
stefanp added inline comments.
llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp
148

I'm not sure what you want here... Did I have typo it before and have Obaini there?

166

The 8 is the size of a prefixed instruction in bytes. I don't want to define it somewhere because it is used only in a couple of places but I will add a comment to explain it. This whole section deserves a better comment.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
237

no... off by one character. :(

stefanp updated this revision to Diff 270361.Jun 12 2020, 4:54 AM

Address Review Comments.

stefanp marked an inline comment as done.Jun 12 2020, 4:57 AM
stefanp added inline comments.
llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp
148

Of course I typoed the comment. :(
Anyway, I think Obtain is fine.

@efriedma
Would you mind looking at the new implementation of this patch to see if you can spot any issues.
Any feed back is very much appreciated!

efriedma added inline comments.Jun 17 2020, 1:37 PM
llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp
129

The way this is structured is a little confusing. If the instruction is a pld, it returns the label; if it's some other instruction, it emits a relocation itself. It would probably be more clear to split the "emit a relocation" part into a separate function.

154

Isn't getContext().getOrCreateSymbol(SymExpr->getSymbol().getName()); equivalent to SymExpr->getSymbol()?

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
264

To be on the safe side, I think I'd add an implicit def of the register to the pld, and an implicit use to the load, so it's clear to any later passes that the register is actually live between the pld and the load

271

Storing a counter in a static variable isn't threadsafe or deterministic. Please store the counter in PPCFunctionInfo or something like that.

Or actually, in this context, you could probably use createTempSymbol, which manages the counter for you.

stefanp added inline comments.Jun 19 2020, 10:05 AM
llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp
154

Unfortunately it is not the same thing.
In the case of SymExpr->getSymbol() we get a const MCSymbol& while in the case of getContext().getOrCreateSymbol(SymExpr->getSymbol().getName()); we get an MCSymbol*. The problem is that I call emitLabel and that requires a MCSymbol*. The const doesn't let me call that function and I don't want to const cast or something like that.

There may, however, still be a better way to do this. I just don't know what that is.

stefanp updated this revision to Diff 272119.Jun 19 2020, 10:07 AM

Addressed review comments.

stefanp marked 7 inline comments as done.Jun 19 2020, 10:08 AM
nemanjai added inline comments.Jun 23 2020, 10:51 AM
llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp
94

I think it would be clear if at the top level here we have something like this:

// If the instruction is a part of the GOT to PC-Rel link time optimization
// instruction pair, return a value, otherwise return None. A true returned
// value means the instruction is the PLDpc and a false value means it is
// the user instruction.
Optional<bool> IsPartOfGOTToPCRelPair = isPartOfGOTToPCRelPair(Inst, STI);

// User of the GOT-indirect address.
if (IsPartOfGOTToPCRelPair.hasValue() && !IsPartOfGOTToPCRelPair.getValue())
  emitGOTToPCRelReloc(Inst);

// existing code to emit the instruction

// Producer of the GOT-indirect address.
if (IsPartOfGOTToPCRelPair.hasValue() && IsPartOfGOTToPCRelPair.getValue())
  emitGOTToPCRelLabel(Inst);
128

This needs to have both a more descriptive name and a comment describing what it does. A few requests here:

  1. Functions whose names indicate they are queries (i.e. check... etc.) should probably not modify code. Either check something or modify something.
  2. Avoid the very generic phrase "linker opt{imization}" throughout this code. You should pick a name for this optimization and use it throughout. Perhaps a name such as "GOT to PC-Rel relaxation" or "Link time GOT to PC-Rel optimization".
  3. One or two examples of the kind of MCInst for which this will return a non-null symbol.
168
// The reason we place the label after the PLDpc instruction is that there
// may be an alignment nop before it since prefixed instructions must not
// cross a 64-byte boundary (please see
// PPCELFStreamer::emitPrefixedInstruction()). When referring to the
// label, we subtract the width of a prefixed instruction (8 bytes) to ensure
// we refer to the PLDpc.
181

TmpLabel is just the current location, right? Please name it as such if so.

llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
99

Could we maybe pull out isPartOfGOTToPCRelPair() I mentioned above and use it here as well as in the streamer code?

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
215

The algorithm here is a bit hard to follow by just reading the code. We search forward looking for a PLDpc. When we find one, we search forward for a user. Once we find the user, if the user defines any registers, we search back to make sure there are no intervening uses or defs of that register.

It might be easier to follow if we do a single pass to collect all address defs and uses and then check code in between for any candidate pairs.
Something along these lines:

struct GOTDefUsePair {
  MachineBasicBlock::iterator DefInst;
  MachineBasicBlock::iterator UseInst;
  Register DefReg;
  Register UseInstDef;
};

SmallVector<GOTDefUsePair> Pairs;
loop over the block
  if (instruction is a candidate PLDpc)
    create a pair object and push it to Pairs
    continue
  if (instruction is a use of any defs in Pairs)
    if (not a kill)
      remove pair object from Pairs (this can also be done by keeping a Valid flag for each pair)
      continue
    add the use iterator and register (if any) to the pair in question

Then after collecting instruction pairs, loop over the vector to eliminate invalid pairs (due to register uses/defs, etc.) and finally add the symbols.

stefanp updated this revision to Diff 273647.Jun 26 2020, 3:43 AM

Address review comments.
Mainly reworked two functions: isPartOfGOTToPCRelPair and addLinkerOpt.
Unfortunately I was not able to move isPartOfGOTToPCRelPair out of PPCELFStreamer because I was not able to find a good place for it.

Gentle ping

nemanjai added inline comments.Jul 10 2020, 12:18 PM
llvm/lib/Target/PowerPC/MCTargetDesc/PPCELFStreamer.cpp
99
// For example, the load that will get the relocation as follows:
// .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
//  lwa 3, 4(3)
113
// For example, the prefixed load from the got that will get the label as follows:
//  pld 3, vec@got@pcrel(0), 1
// .Lpcrel1:
llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
110

Is there not something like printLabel() and printRelocDirective() for this? It seems odd to be manually printing it like this.
If there isn't that's fine, just seems odd.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
246

I know this is based on my suggestion, but I think the name doesn't make sense if this is a store (since a store does not define anything). Perhaps just UseReg is fine.

255

I think a simpler implementation for lines 255-311 would be something like this:

for (auto BBI = MBB.instr_begin(); BBI != MBB.instr_end(); ++BBI) {
  // Look for the initial GOT indirect load.
  if (isGOTPLDpc(*BBI)) {
    GOTDefUsePair CurrentPair{BBI, MachineBasicBlock::iterator(),
                              BBI->getOperand(0).getReg(), PPC::NoRegister,
                              true};
    CandPairs.push_back(CurrentPair);
    continue;
  }

  // We haven't encountered any new PLD instructions, nothing to check.
  if (CandPairs.empty())
    continue;

  // Run through the candidate pairs and see if any of the registers
  // defined in the PLD instructions are used by this instruction.
  // Note: the size of CandPairs can change in the loop.
  for (unsigned Idx = 0; Idx < CandPairs.size(); Idx++) {
    GOTDefUsePair &Pair = CandPairs[Idx];
    // The instruction does not use or modify this PLD's def reg, ignore it.
    if (!BBI->readsRegister(Pair.DefReg, TRI) &&
        !BBI->modifiesRegister(Pair.DefReg, TRI))
      continue;

    // The use needs to be used in the address compuation and not
    // as the register being stored for a store.
    const MachineOperand *UseOp =
        hasPCRelativeForm(*BBI) ? &BBI->getOperand(2) : nullptr;

    // Check for a valid use.
    if (UseOp && UseOp->isReg() && UseOp->getReg() == Pair.DefReg &&
        UseOp->isUse() && UseOp->isKill()) {
      Pair.UseInst = BBI;
      Pair.UseInstDef = BBI->getOperand(0).getReg();
      ValidPairs.push_back(Pair);
    }
    CandPairs.erase(CandPairs.begin() + Idx);
  }
}

// Go through all of the pairs and check for any more valid uses.
for (auto Pair = ValidPairs.begin(); Pair != ValidPairs.end(); Pair++) {
  // We shouldn't be here if we don't have a valid pair.
  assert(Pair->UseInst.isValid() && Pair->StillValid &&
         "Kept an invalid def/use pair for GOT PCRel opt");

The idea is that after the loop over the basic block, we just have a data structure containing pairs of (PLD, MemAccess) where the register defined by PLD is guaranteed to not have uses in between them. Then for each such pair, we check for defs/uses of the register defined by the MemAccess and we're done.

268

s/futher/further

llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll
160
; FIXME: we should always convert X-Form instructions that use
; PPC::ZERO[8] to the corresponding D-Form so we can perform this opt.
391

Please add attributes #0 = { nounwind } and decorate the functions with it so we don't get the .cfi directives since they just make the test case more busy.

stefanp updated this revision to Diff 278461.Jul 16 2020, 7:00 AM
stefanp marked 2 inline comments as done.

Changed the way that the peephole finds candidates according to the suggestion by Nemanja.
Added/fixed some comments.
Cleaned up the testcase.

llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
110

Yes it is a bit funny.
There is a standard way to print the label:

SymExpr->print(O, &MAI);

The problem is that if I do it that I way I get this:

.Lpcrel3@<<invalid>>

I have the MCSymbolRefExpr::VK_PPC_PCREL_OPT there because I need it internally but I don't want it printed.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
215

This makes sense to me. I'll use this code.

Current approach seems fine.

llvm/lib/Target/PowerPC/MCTargetDesc/PPCInstPrinter.cpp
110

There's a method MCSymbol::print

stefanp updated this revision to Diff 278792.Jul 17 2020, 9:25 AM

Modified static_cast to cast.
Made use of MCSymbol::print.

stefanp updated this revision to Diff 278828.Jul 17 2020, 10:00 AM

I should not have changed the cast.
Changed it back now.

nemanjai accepted this revision.Jul 20 2020, 11:28 AM

The remaining comments are minor nits that can be addressed on commit. LGTM.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
315

I think you can just flip this to

if (!Pair->StillValid)
  continue;

and reduce the nesting in the following code.

317

Elaborate slightly. Implicit def of what and implicit use of what and why.

329

I think the name LabelNum is a bit awkward. Maybe it should be something like PCRelLabel or PCRelNumberedLabel?

llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll
46

Why is there no relocation that refers to the store? Similarly below.

106

Please add a FIXME here for us to follow up on this. We should favour the D-Forms in cases such as this so that they can be optimized.

This revision is now accepted and ready to land.Jul 20 2020, 11:28 AM
stefanp updated this revision to Diff 279359.Jul 20 2020, 3:40 PM

Flipped the if statement.
Fixed a comment in the source and added a couple more comments in the test case.

This revision was automatically updated to reflect the committed changes.