This is an archive of the discontinued LLVM Phabricator instance.

[LLD][PowerPC] Implement GOT to PC-Rel relaxation
ClosedPublic

Authored by nemanjai on Jul 22 2020, 1:46 PM.

Details

Summary

This patch implements the handling for the R_PPC64_PCREL_OPT relocation as well as the GOT relocation for the associated R_PPC64_GOT_PCREL34 relocation.

On Power10 targets with PC-Relative addressing, the linker can relax GOT-relative accesses to PC-Relative under some conditions. Since the sequence consists of a prefixed load, followed by a non-prefixed access (load or store), the linker needs to replace the first instruction (as the replacement instruction will be prefixed). The compiler communicates to the linker that this optimization is safe by placing the two aforementioned relocations on the GOT load (of the address).
The linker then does two things:

  1. Convert the load from the got into a PC-Relative add to compute the address relative to the PC
  2. Find the instruction referred to by the second relocation (R_PPC64_PCREL_OPT) and replace the first with the PC-Relative version of it

It is important to synchronize the mapping from legacy memory instructions to their PC-Relative form. Hence, this patch adds a file to be included by both the compiler and the linker so they're always in agreement.

Diff Detail

Event Timeline

nemanjai created this revision.Jul 22 2020, 1:46 PM
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
nemanjai marked an inline comment as done.Jul 27 2020, 9:24 AM
nemanjai added inline comments.
lld/ELF/Arch/PPC64.cpp
42

Perhaps this entire enum should just be absorbed into the new one to eliminate confusion.

nemanjai updated this revision to Diff 280944.Jul 27 2020, 9:39 AM
nemanjai retitled this revision from [WIP][LLD][PowerPC] Implement GOT to PC-Rel relaxation to [LLD][PowerPC] Implement GOT to PC-Rel relaxation.
nemanjai edited the summary of this revision. (Show Details)
nemanjai added reviewers: Restricted Project, hfinkel.
nemanjai added a subscriber: NeHuang.

Add test cases for all the different instructions - thanks for your help producing these @NeHuang.
Run clang-format on the patch since the pre-commit build flagged a bunch of issues.

Overall I think this looks good. I had a handful of comments but nothing major.
I wish we had a better way of doing the conversion between the old forms and the PCRel forms but I can't think of a better way.

stefanp added inline comments.Jul 28 2020, 12:10 PM
lld/ELF/Arch/PPC64.cpp
42

My two cents:
I would agree except that the file PPCLegacyToPCRelMap.def is meant specifically for PCRel (as is in the title of the file) and the update forms have nothing to do with that really...

We should clean these up but not with the PC Rel instructions.

llvm/include/llvm/Target/PPCLegacyToPCRelMap.def
72 ↗(On Diff #280944)

nit:
You should probably add

DFLOADf32 = 48,  //  lfs
DFLOADf64 = 50,  //  lfd

The names DFLOADf32 and DFLOADf64 are internal names we have that won't mean anything to anyone that is not familiar with the compiler. The lfs and lfd are in the ISA.

196 ↗(On Diff #280944)

I would almost prefer to have this function in the compiler in PPCPreEmitPeephole.cpp.
The function was in the compiler in the first place and it is only used in that one file.

If we do it that way we can get rid of the macro PPC_LEGACY_TO_PREFIXED_COMPILER.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
50 ↗(On Diff #280944)

nit:
I get the feeling that clang-format went a little crazy on this file and removed the indentation. :)
Should probably be a separate NFC patch.

nemanjai added inline comments.Jul 28 2020, 12:40 PM
llvm/include/llvm/Target/PPCLegacyToPCRelMap.def
72 ↗(On Diff #280944)

For sure. Will do.

196 ↗(On Diff #280944)

I don't think it would be a good thing to get rid of that macro. I have added the #error in this file to require that one of the two be defined so that this file cannot be included accidentally somewhere else and cause namespace pollution.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
50 ↗(On Diff #280944)

Yeah, I have no idea what clang-format's problem is - why is it modifying lines that I never touched in this patch? I don't mind pre-committing an NFC patch from running clang-format on this file.

NeHuang added inline comments.Jul 28 2020, 1:20 PM
llvm/include/llvm/Target/PPCLegacyToPCRelMap.def
36 ↗(On Diff #280944)

nit: // Loads.

74 ↗(On Diff #280944)

nit: // Stores.

123 ↗(On Diff #280944)

Do we also need to add a FIXME here for PPC::LXVP and PPC::STXVP since they are not supported yet?

NeHuang added inline comments.Jul 28 2020, 1:27 PM
llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
50 ↗(On Diff #280944)

+1, It makes sense to have a separate NFC patch for clang-format changes. Phabricator complains those clang-format issues in https://reviews.llvm.org/D84360?id=279937

I haven't looked into the functional part of the patch. Just made some style issues and spot a few bugs due to incorrect -1lu uses.

lld/ELF/Arch/PPC64.cpp
66

return encoding >> 26;

658

Nit: uint64_t insn = readPrefixedInstruction(loc) & ~0xff000000fc000000; and move the comment below here

662

This file is not consistent on upper/lower case hexadecimals but for newer code we use lower case to be consistent with other files. The suffix lu is redundant (in any case not proper on 32-bit systems where unsigned long is 32-bit).

681

-1lu is wrong on a 32-bit platform: 4294967295.

pcRelInsn == UINT64_C(-1)

682

Consider using more errorOrWarn which can downgrade to warnings if --noinhibit-exec is implemented. error is used by code which can't really proceed. fatal is fatal error which should call exit immediately.

691

The two variables are only used once. Just write (insn & 0x0003ffff0000ffff) | pcRelInsn

lld/ELF/InputSection.cpp
1003

This is wrong on 32-bit platforms.

lld/test/ELF/Inputs/ppc64-got-to-pcrel-relaxation-def.s
5

.globl a, b, c, d can declare multiple variables

32

Are these .size and .type directives used? Dropping them can make the file much smaller

lld/test/ELF/ppc64-got-to-pcrel-relaxation.s
20

Some newer lld tests make sure that instructions are indented more than their containing labels

# CHECK-S-LABEL: <check_LBZ_STB>:
# CHECK-S-NEXT:    plbz 10
llvm/include/llvm/Target/PPCLegacyToPCRelMap.def
21 ↗(On Diff #280944)

The parentheses are redundant

MaskRay added inline comments.Jul 29 2020, 10:19 PM
lld/test/ELF/ppc64-got-to-pcrel-relaxation.s
5

If a linked shared object is used to link another file, the shared object needs to specify --soname=t2 or another fixed string. Otherwise the DT_SONAME field of the other file is dependent on the full path of %t2.so, which can vary on different test machines.

MaskRay added inline comments.Jul 29 2020, 10:37 PM
lld/ELF/InputSection.cpp
1029

This assumes that relocation types on other architectures (which are represented by LLD's R_RELAX_GOT_PC or R_RELAX_GOT_PC_NOPIC) do not use the value R_PPC64_GOT_PCREL34. This is a subtle assumption we want to avoid. I think we need a new RelExpr: R_PPC64_RELAX_GOT_PCREL34.

lld/test/ELF/ppc64-got-to-pcrel-relaxation.s
169

Can these unrelated instructions be removed? We just need prefixed instructions and for the .reloc case, one instruction before .reloc

nemanjai added inline comments.Jul 30 2020, 4:03 PM
lld/ELF/Arch/PPC64.cpp
662

I can correct all the inconsistencies in an NFC patch after this lands. I will make sure I don't add any upper case ones.

lld/test/ELF/ppc64-got-to-pcrel-relaxation.s
169

I could remove them. I am reluctant to do that because the asm doesn't really make sense if we remove them. Although I realize this doesn't make a difference when linking, I just find it less visually jarring to have asm that is clearly incorrect (i.e. storing an undefined register in this case). And removing them doesn't save that many lines.

Of course, if you insist, I will remove them.

nemanjai updated this revision to Diff 282079.Jul 30 2020, 4:26 PM

Fix up usage of unsigned long literal suffixes.
Add new rel-expr.
Fix up minor nits.
Shorten the test cases.

Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2020, 4:26 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
nemanjai updated this revision to Diff 282080.Jul 30 2020, 4:27 PM

Remove erroneous unrelated update.

Harbormaster completed remote builds in B66478: Diff 282079.
sfertile added inline comments.
llvm/include/llvm/Target/PPCLegacyToPCRelMap.def
1 ↗(On Diff #282080)

This file does way more then I expect from a .def file. Are there other examples of a .def file defining so much data and code? If there isn't this needs to be moved to a header and potentially a source file since we don't want to change what a .def file represents in LLVM. If there is I still think we are better moving this to a header + source if it stay close to its current implementation using maps.

Alternatively if we split this up back to LLVM and LLD implementations it will make a clean .def fiie in lld.

My suggestion is to keep the switch in PPCPreEmitPeephole.cpp the way it is. I understand wanting to not have more then 1 place to modify when adding a new relaxable instruction is really desirable, but you have an error in LLD if we find the relocation on an unrecognized instruction to catch the mismatches. Combining both llc and lld implementations makes this much more complicated then it needs to be (and like I mentioned previously, inappropriate foe a .def file).

The LLD implementation I think we could turn into a proper .def file by working with the full encodings instead of primary opcodes and using a macro along the lines of:

MACRO(NONPCREL_ENCODING, PCREL_EQUIVALENT, OPCODE_MASK)

Then we define the enums in Arch/PPC64.cpp, and have 2 functions with switches over the NONPCREL_ENCODINGS: 1 that maps an encoding to its pc-relative equivalent encoding and one to map it to the operand mask. Instead of mixing in the bits for the non-primary opcode we just zero out the non-opcode bits, then use the 2 helper functions to implement getPCRelativeForm almost exactly the same as it is now.

As a bonus we can re-implement isInstructionUpdateForm as a macro in the .def file as well for consistency.

nemanjai planned changes to this revision.Aug 4 2020, 9:41 AM

Need to account for a displacement on the access instruction and add some tests along the lines of

extern int Ext[40];
int test(int a) {
  return a + Ext[10];
}
lld/ELF/Arch/PPC64.cpp
687

As it turns out, the access instruction may actually have a displacement which needs to be added to the computed offset.
I will update this to account for that.

llvm/include/llvm/Target/PPCLegacyToPCRelMap.def
1 ↗(On Diff #282080)

I agree with this and will rewrite this portion. This tight coupling between the compiler and the linker is quite unnecessary since there are at least two compilers on the platform and at least 3 linkers.

nemanjai updated this revision to Diff 283464.Aug 5 2020, 6:18 PM
  • Move implementation completely into LLD
  • Add option to disable the optimization
  • Handle computation of the total offset when the access instruction has a non-zero displacement
  • Add tests with the new option as well as a test with non-zero displacement on the access instructions
MaskRay added inline comments.Aug 5 2020, 9:59 PM
lld/ELF/Options.td
407

Is this an option binutils will support as well?

Note that on x86, --no-relax can control whether R_X86_64_GOTPCRELX is optimized.
I hope PowerPC can adopt the same option.

There is a related thread https://sourceware.org/pipermail/binutils/2020-June/111571.html before I knew the PCREL GOT optimization activities on PowerPC.

nemanjai added inline comments.Aug 6 2020, 7:12 AM
lld/ELF/Options.td
407

I have asked Alan about support for this option in the GNU linkers. I'll get back to you when he responds.

I personally am not in favour of reducing the granularity by having all of this controlled by --no-relax because if we encounter bugs (or mismatches between compiler version and linker version) it is good to be able to turn off only the minimum set of optimizations.
A real example:
In Power9, we got new D-Form instructions which then ended up with relocations on them. The version of the linker in binutils on the various distros did not have support for those. Upgrading binutils was not an option for many customers, so they had to add --no-toc-optimize to their builds. If there was only --no-relax and that implied a bunch of other optimizations are turned off, such customers would be unnecessarily penalized in terms of performance.

Of course, the integration between LLD and Clang/LLVM is much more close so this particular issue is less likely to occur, but I think the argument for having finer granularity is still valid.

sfertile added inline comments.Aug 6 2020, 7:08 PM
lld/ELF/Arch/PPC64.cpp
405

Instead of shifting the primary opcode down to the least significant bits, and then shifting the secondary opcodes into the more significant bits, why don't we just use the significant opcode bits as the enum value.

For example: LDU 0xE8000001, LD = E8000000, LDU = 7C000000. It won't simplify this function since you still need the primary opcode to know how to mask out the encoding bits, but we could follow this patch with an NFC patch where we convert all the other uses of
DFormOpcd and XFormOpcode enums to use PPCInsn and get rid of the other 2 enums. If we do take this root I would suggest having separate enums for the 32-bit instructions and the prefixed instructions.

lld/ELF/Relocations.h
99

Should we drop the 'REL34' in the Expr? Its not really important how many bits it represents, and I think the REL is implied by 'PC' in R_RELAX_GOT_PC.

Also, should it be added to isRelExpr in Relocations.cpp?

sfertile added inline comments.Aug 6 2020, 7:31 PM
lld/ELF/Arch/PPC64.cpp
535

minor nit: The prefixed instruction in the sequence we are going to optimize should always be a paddi, since we relaxed the instruction referenced by the R_PPC64_GOT_PCREL34 from pld to paddi. My understanding could be wrong though, I expect we can only see the optimization relocation on a got-addressing sequence or a masked load/store sequence (which we don't support yet, but would also start with paddi).

nemanjai added inline comments.Aug 7 2020, 4:13 AM
lld/ELF/Arch/PPC64.cpp
405

While I agree that it is kind of odd to essentially reverse the positions of the primary and extended opcodes, I don't really follow what the details of your proposed update are.

How in your proposal does LDU go from 0xe8000001 to 0x7c000000?
I think that any proposal that puts the primary and secondary opcode in the same field is problematic.

How about keeping the bits where they are for DS and DQ form instructions that have extended opcodes. So lets take primary opcode 58 (LD, LDU, LWA):

LD  = 0xe8000000
LDU = 0xe8000001
LWA = 0xe8000002

Then in this function, all DS-Form (PO in { 57, 58, 61, 62 }) would be converted to a PPCInsn simply as encoding & 0xfc000003.

For the two versions of DQ-Form, we would have:
PO == 56 (only LQ) compute as encoding & 0xfc000000
PO == 61 compute as encoding & 0xfc000007
PO == 6 compute as encoding & 0xfc00000f

How does that sound?

535

Sorry, I should probably not use an extended mnemonic here.
The pla 3, 1000 is just paddi 3, 0, 1000, 1

lld/ELF/Relocations.h
99

Should we drop the 'REL34' in the Expr? Its not really important how many bits it represents, and I think the REL is implied by 'PC' in R_RELAX_GOT_PC.

I am definitely OK with this.

Also, should it be added to isRelExpr in Relocations.cpp?

This is just an omission on my part as I didn't know that existed. I'll add it.

nemanjai added inline comments.Aug 7 2020, 5:05 AM
lld/ELF/Relocations.h
99

Just to be clear, the name you are suggesting is R_PPC64_RELAX_GOT_PC, correct? I don't imagine you would like me to drop PPC64 since this is specific to this target.

sfertile added inline comments.Aug 7 2020, 6:55 AM
lld/ELF/Arch/PPC64.cpp
405

Sorry, I should have proofread my comment. The third one was suppsoed to be LDUX instead of `LDU again.

Your suggestion sounds perfect.

535

Thanks, I'm not that familiar with the new prefixed instructions yet so I didn't realize that was a paddi.

lld/ELF/Relocations.h
99

Yep, thats perfect.

nemanjai updated this revision to Diff 284730.Aug 11 2020, 7:58 AM

Cleaned up the definitions of instructions and addressed other minor comments.

nemanjai updated this revision to Diff 284974.Aug 11 2020, 11:15 PM

It is incorrect to relax this relocation if it is on a paddi. Bail out from the optimization if the relocation is on anything but a pld.

sfertile added inline comments.Aug 12 2020, 8:19 AM
lld/ELF/Arch/PPC64.cpp
72

This helper should return true for any 4-byte load/store instruction that is optimizable as part of a pc-rel access sequence? If so a better name would be isPCRelOptimizable or something along those lines.

lld/ELF/Arch/PPCInsns.def
11 ↗(On Diff #284974)

This still isn't really the form of a .def file.

  1. The enum values are supposed to be defined elsewhere, and the def file is included places where those definitions are visible.
  2. I though we agreed to use the encodings directly as the enum values in our previous discussion. So LBZ would be 0x88000000 not 34.
  3. We should have 2 distinct enum types, 1 for the non-prefixed instructions (which use an underlying 4 byte integral type) and the 1 for the prefixed instructions which uses an 8-byte type.

Then we should include the 4-byte instruction, the prefixed instruction it maps to, and the operand mask in the macro:
PCREL_OPT(Instruction, Prefixed_Equivalent, Operand_Mask)

Ex:

PCREL_OPT(LBZ, PLBZ, OPC_AND_RST)

And the helpers are implemented by defining the macro. For example to implement getPCRelativeForm:

switch(arg)
{
default:
  return NOINSN;
#define PCREL_OPT(Encoding, Pcrel_Encoding, Op_mask) case Encoding: return Pcrel_Encoding;
#include PPCInsns.def
#undef PCREL_OPT
}

Similarly:
For checkPPCInsn

switch(Encoding) 
{
default return false;
 #define PCREL_OPT(Encoding, Pcrel_Encoding, Op_mask) case Encoding: return true;
#include PPCInsns.def
#undef PCREL_OPT
}

ditto for getInsnMask

nemanjai added inline comments.Aug 12 2020, 9:49 AM
lld/ELF/Arch/PPC64.cpp
72

Not really. This is meant to check whether the underlying value is one of the enumerators or some value outside of the enumeration.
For example:
checkPPCinsn(static_cast<PPCInsn>(0x6100000A4000000) == true since it is PLWA
while
checkPPCinsn(static_cast<PPCInsn>(0x6100000A400ABCD) == false because it is not the value of any enumerator.

lld/ELF/Arch/PPCInsns.def
11 ↗(On Diff #284974)
  1. I don't think this is true. There are plenty of examples where the enumerators are specified in a .def file. For example include/llvm/IR/Instruction.def. I prefer to leave the defs in the .def file.
  2. This is just a bit of miscommunication. I thought we were only talking about instructions for which we need the extended opcode to disambiguate.
  3. I am not sure how this improves anything, but sure - since I am indifferent and you have a preference, I'll go with your preference.
nemanjai updated this revision to Diff 285171.Aug 12 2020, 1:18 PM

Split the instruction enums into "Legacy" and "Prefixed" and use the .def file only for functions that access the enums as it simplifies the preprocessor logic.

stefanp accepted this revision as: stefanp.Aug 12 2020, 2:35 PM

This patch makes sense to me.
I have a question about the access instruction but I don't think we should worry about it now.
LGTM!

lld/ELF/Arch/PPC64.cpp
536

Is it possible to have:

paddi 3, 0, 1000, 1
plwz 3, 20(2)

Where both instructions are prefixed.
I assume probably not (clang won't generate it?) but I figured it is worth asking.

685

I guess if the access instruction is already a prefixed instruction there won't be an equivalent PC Relative form of it and we will produce an error here.

This revision is now accepted and ready to land.Aug 12 2020, 2:35 PM

Overall the patch looks good to me. I only have two questions and a clang-format nit which can be resolved before committing the code.

lld/ELF/Arch/PPC64.cpp
662

nit: clang-format, this can be resolved before committing the changes.

1311

I have two questions regarding void PPC64::relaxGot(uint8_t *loc, const Relocation &rel, uint64_t val) in lld/ELF/Arch/PCC64cpp line 619, where had similar check added as below:

case R_PPC64_GOT_PCREL34: {
    // Clear the first 8 bits of the prefix and the first 6 bits of the
...
    if ((insn & 0xfc000000) != 0xe4000000)
      error ("expected a 'pld' for got-indirect to pc-relative relaxing");
...
  1. Did it catch the case that paddi instruction with R_PPC64_GOT_PCREL34 emitted from compiler side when it tried to optimize it?
  2. After adding the new check here, does it make sense to remove the same check in PPC64::relaxGot?
nemanjai added inline comments.Aug 12 2020, 4:00 PM
lld/ELF/Arch/PPC64.cpp
536

At this point, it should not be possible (plus it would be a bad thing if it happened since we'd only look at a part of the instruction and part of the displacement).
We bail out before the call to this if we don't have a prefixed version of the instruction (which you already observed below).

662

Not sure what you mean, here. I ran clang-format on the patch and the pre-commit check didn't complain about the formatting.

685

Right, if the access instruction is already prefixed, we'd be looking at the prefix and not the instruction.

1311
  1. This check was initially missing so we didn't know that we were mis-optimizing this.
  2. I suppose it could be removed, but it is there defensively because if we ever get something other than a pld there, it is guaranteed that the resulting executable will be broken. I chose to go with an error there rather than an assert because it is crucial to fail in such a situation even if we build without asserts.
NeHuang added inline comments.Aug 12 2020, 4:56 PM
lld/ELF/Arch/PPC64.cpp
662

Yeah, the clang-format pre-check warning is gone after I refresh the page. It could be an outdated message.

I believe we can have the optimization relocation on a code sequence like:

paddi ra, sym@pcrel
load/store rt, offset(ra)

If so we should have a test showing we relocate it properly even if we don't optimize it yet.

I believe we can have the optimization relocation on a code sequence like:

paddi ra, sym@pcrel
load/store rt, offset(ra)

If so we should have a test showing we relocate it properly even if we don't optimize it yet.

But that's what check_LD_STD_W_PADDI does. Do you mean that I should add checks for the actual values?

But that's what check_LD_STD_W_PADDI does. Do you mean that I should add checks for the actual values?

check_LD_STD_W_PADDI uses paddi referencing the got-entry of the symbol (sym@got@pcrel), My suggestion is for referencing a symbol pcrel (sym@pcrel) combined with a load/store with offset.

This revision was automatically updated to reflect the committed changes.