This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
ppc32-static.ll

Differential D38554

Fixed ppc32 function relocations in non-pic mode
Needs ReviewPublic

Authored by vit9696 on Oct 4 2017, 11:17 AM.

Download Raw Diff

Details

Reviewers

hfinkel
joerg
• espindola

Summary

Appears to be a regression introduced in r273499 and r273595.
Further discussion at http://lists.llvm.org/pipermail/llvm-dev/2017-October/117982.html

Diff Detail

Event Timeline

vit9696 created this revision.Oct 4 2017, 11:17 AM

Herald added subscribers: kbarton, nemanjai. · View Herald TranscriptOct 4 2017, 11:17 AM

This doesn't look right to me at all. The normal SYSV ABI on PowerPC is effectively PIC, with a few edge cases. This really looks like a step backwards.

In D38554#888745, @joerg wrote:

This doesn't look right to me at all. The normal SYSV ABI on PowerPC is effectively PIC, with a few edge cases. This really looks like a step backwards.

Please clarify what exactly you do not like. I am afraid I did not understand you. Currently you have PLT on regardless of whether you are compiling with PIC and whether you are not. This change fixes it allowing one not to use PLT in non-PIC mode, and it does not break PIC mode anyhow.

Let's start with the obvious: I have no idea why you need this code at all. NetBSD builds a mix of static, PIC and PIE code using GNU ld without hitting any unsupported relocations. As I said before, the normal ABI on PowerPC is mostly PIC by default. I wonder if the root of your problem is that you want EABI and not the SYSV ABI.

In D38554#888793, @joerg wrote:

Let's start with the obvious: I have no idea why you need this code at all. NetBSD builds a mix of static, PIC and PIE code using GNU ld without hitting any unsupported relocations. As I said before, the normal ABI on PowerPC is mostly PIC by default. I wonder if the root of your problem is that you want EABI and not the SYSV ABI.

Just as mentioned on the lists this is needed for a custom embedded platform that indeed uses eabi (PIC executables are currently not compatible with the loader).
Let me make things clearer, especially since somebody mentioned PIE here, though I do not see this comment at the moment.

There is PIC and there is PIE. PIE just makes it possible to offset the whole image, and it is unrelated to this issue like at all. Both PIC and PIE are controlled by compiler arguments (-fpic, -fpie, -fno-pic, -fno-pie). Each target has its defaults as well, making LLVM assume which one should be used if no argument is provided. Indeed in System-V ABI PIC is normally the default one, thus effectively getRelocationModel() == Reloc::PIC_ will evaluate to true.

Now to what PLT has to do to it:

PIC implies that all the function calls go through a PLT table, and thus the linker should implement PLT-relative relocations (e.g. R_PPC_PLTREL24).
non-PIC implies that the functions may directly call each other (e.g. via absolute or relative relocations like R_PPC_REL24).

Nobody forbids the use of PLT in PIC mode, however that's not how things work. Neither gcc, nor llvm (until the mentioned regression) use a PLT table in non-PIC mode. There is simply no reason to add an extra address indirection layer. Yet starting with r273499 LLVM erroneously enforced PLT table addition in non-PIC mode. That means that now even when one specifies -fno-pic explicitly PLT table will be used.

This is wrong and undesired, and this bug is what the presented patch fixes. It does not change the defaults for your ABI: if your defaults are PIC, then you get your PIC/PLT. It only makes -fno-pic be actually non-PIC but not using a PLT table.

Hope that makes it clear and allows the reviewers to accept the patch in the nearest future :)

In that case: the short version is that there is no EABI support at the moment. That patch changes the behavior for SYSV ABI, it is not acceptable as such. The test case doesn't reflect the ABI variance either.

In D38554#888873, @joerg wrote:

In that case: the short version is that there is no EABI support at the moment. That patch changes the behavior for SYSV ABI, it is not acceptable as such. The test case doesn't reflect the ABI variance either.

May I please ask you a second time, where did the behaviour change, so that I could address the issue. At the moment I do not see it. If SYSV ABI uses PIC, then DAG.getTarget().getRelocationModel() == Reloc::PIC_ will not affect it anyhow.
If you imply that it changes the behaviour for static relocation model (i.e. the one that has PIC disabled), then it is intended, since it is a fix to a regression introduced a year ago.

In D38554#888873, @joerg wrote:

In that case: the short version is that there is no EABI support at the moment. That patch changes the behavior for SYSV ABI, it is not acceptable as such. The test case doesn't reflect the ABI variance either.

The only thing I could imagine to be relevant is https://reviews.llvm.org/diffusion/L/browse/cfe/trunk/lib/Driver/ToolChains/Gnu.cpp;314943$2187, that does not handle llvm::Triple::ppc case. As a result fpic has always been off for PPC32 by default, however, starting with LLVM 3.9.x PLT was "accidentally" always on, making the result be "partially?" PIC.

I could add the case there, if it is what you mean and that is the desired behaviour. However, I strongly ask you to be as constructive as possible.

In D38554#888793, @joerg wrote:

Let's start with the obvious: I have no idea why you need this code at all. NetBSD builds a mix of static, PIC and PIE code using GNU ld without hitting any unsupported relocations. As I said before, the normal ABI on PowerPC is mostly PIC by default. I wonder if the root of your problem is that you want EABI and not the SYSV ABI.

Joerg, do you believe that use of the PLT should be considered a feature independent of PIC? And, in this case, a feature that we should always use? If so, why?

I'll point out that the PowerPC Processor ABI Supplement begins the Procedure Linkage Table section by stating, "Much as the global offset table redirects position-independent address calculations to absolute locations, the procedure linkage table redirects position-independent function calls to absolute locations. " This, and further text in that section, do imply to me that the PLT is a feature associated with PIC. Aside from everything else, there is overhead in use of the PLT, and use of non-PIC/static linking is generally assumed to elide those overheads.

It has been over two weeks since the last comment, and no explanations why this could cause issues. Given no response from @joerg I consider his claims invalid, and ask this to be approved and merged into trunk.

The existing code works fine. What you have demonstrated so far is more an issue with lld and not a problem in the code. GNU ld is perfectly fine with translating the PLT references to direct calls. As such, I see no need for changes here.

This revision now requires changes to proceed.Oct 22 2017, 9:05 AM

@joerg, while it is a valid claim that LLD could be implementing PLT reference support, it is not sane to claim that they should be used with STATIC relocation model. Please refer to what gcc does here and what @hfinkel said.

On the contrary, I find little reason for adding the complexity here. That's even ignoring the question of whether the suggested check even covers the relocation models correctly.

@joerg, please read carefully the comment @hfinkel posted (https://reviews.llvm.org/D38554#889007) in case we cannot understand each other.

Firstly, I did test what I submitted, and I can tell that it works expectedly.
Secondly, just as stated above — PLT is an extra indirection layer which affects the performance.
Thirdly, this is not what gcc does, and this is what llvm did previously until @rafael broke it.

If you think that adding a single (and sane) if condition to fix a regression, conform to what another compiler does, and improve the performance is adding more complexity, I have to additionally remind you about the complexity of using the current executables on embedded hardware. While I definitely heard your opinion, I have to state that the decisions made should benefit the project in total and there is no place for personal preconceived actions.

So far I was unable to hear clear reasons that are based on something why this should cause issues (please point it out, if I did not notice something). Furthermore, we started to move to the area where the proposed change is simply "disliked", and it gets stuck because I have to convince you to read the arguments already the two people provided.

There is no layer of indirection here. The call gets resolved to the local symbol by a working linker.

@joerg, this is nonsense in general, and possibly implementation dependent. A simple example as follows:

int f1();
int _start() { return f1(); }

Produces the following disassembly with the current llvm:

Disassembly of section .text:
_start:
       0:	7c 08 02 a6 	mflr 0
       4:	90 01 00 04 	stw 0, 4(1)
       8:	94 21 ff f0 	stwu 1, -16(1)
       c:	93 e1 00 0c 	stw 31, 12(1)
      10:	7c 3f 0b 78 	mr 31, 1
      14:	48 00 00 01 	bl .+0
			00000014:  R_PPC_PLTREL24	Unknown
      18:	80 01 00 14 	lwz 0, 20(1)
      1c:	83 e1 00 0c 	lwz 31, 12(1)
      20:	38 21 00 10 	addi 1, 1, 16
      24:	7c 08 03 a6 	mtlr 0
      28:	4e 80 00 20 	blr

Even if you could find a linker that optimises R_PPC_PLTREL24 into R_PPC_REL24 for local symbols, this will not change the fact that this is an optimisation, and a linker is expected to produce PLT calls when given such code.

A working linker is certainly expected to use the most efficient binding. The assembler (and by extension the compiler) can't tell. This applies in both directions -- in your example, the linker might need to translate the direct call into a PLT call, depending on the target. It all goes back to "use a working linker". So yes, at this point in time the existance of two different relocations for call instructions can be clearly seen as a historic artifact. This doesn't change anything of what I have been saying since the beginning. The PowerPC ABIs are heavily biases towards position independent code and this is just adding complications for no good reason.

Right, I could finally understand your opinion and perhaps even the reasons behind it. However, I cannot agree with them due to the following:
— a working linker you describe is something subjective, in my opinion it is a linker capable of translating relocations defined by a standard or a specification, not necessarily all, and efficiency is not a necessity.
— the assembler indeed does not know what binding should be used, however, the extension to the compiler is incorrect. The compiler, which is responsible for emitting the relocations, is aware of the relocation model, and thus is expected to emit relocations suitable for this model. Since PLT is a PIC-specific feature, which is known to the compiler, the compiler should decide whether to generate PLT relocations or not.
— while you state that two relocation models is a historic artifact, LLVM currently implements a check whether to use PLT or not, so if this code is here, then it should work properly, otherwise it is a bug.
— your statement regarding more common GNU Power-PC ABIs being towards generating position independent may be correct, however, it has nothing to do with embedded ABI we are using here. Furthermore, if the "working" linker generates PLT relocations for a static model, it will simply crash a loader that has no support for PLT.

And to recite the former reasons already mentioned above:
— gcc does not generate PLT relocations in such case
— llvm until the aforementioned two commits did not either
— this change fixes lld compatibility
— the fix is one line long (!)

Taking all this into account the "complexity" as the only last reason you are talking about is not a valid cause to reject the proposed change, which brings actual benefit and no issues. If you think that it is unclear why a PIC check should exist in the PLT logic (although I expect one knowing what PLT is to also be aware of how PIC works), I am fine to edit the diff and include a small comment like "PLT relocations are only necessary for PIC model".

If not, then I ask this diff to be approved and committed upstream.

A PLT is used not only by PIC code. It is required for all dynamic entry points and that's not limited to PIC. It's not even limited to dynamically linked binaries. There is no support for the embedded ABIs as I said before. I'm going to stop responding since it is rather pointless now. My objection stands.

Alright, now I understand that the condition I added in the beginning did not conform to all the other relocation models available in LLVM:

-relocation-model                                 - Choose relocation model
  =static                                         -   Non-relocatable code
  =pic                                            -   Fully relocatable, position independent code
  =dynamic-no-pic                                 -   Relocatable external references, non-relocatable code
  =ropi                                           -   Code and read-only data relocatable, accessed PC-relative
  =rwpi                                           -   Read-write data relocatable, accessed relative to static base
  =ropi-rwpi                                      -   Combination of ropi and rwpi

Initially I added PLT to be enabled only for PIC model, and you are right, in case of dynamic-no-pic model (which allows dynamic relocation) PLT must be used. I updated the diff to enable PLT unless static relocation model is used. No issues from then on?

Even a full static binary will have a PLT when IFUNC is used. As such, a linker has to deal with conversion between direct and PLT branches anyway.

So, am I right that the correct approach is to add an additional check whether eabi target is used?
Another way is to ensure that lld handles PLT relocations and solves them as relative relocations if possible.
I think that both changes should be made regardless of the reasons behind, but I would like to hear the opinion of @ruiu and @davide from lld project first.

TL;DR latest llvm completely breaks compatibility with lld for ppc32 target, and even a simple program won't be linked due to unsupported PLT relocations llvm emits. I could submit a patch solving PLTREL relocations for local symbols, however, adding full PLT support to lld will require more effort than I could spend on this issue.

Looks like I do not have enough knowledge about the topic discussed in this review thread, so I don't think I can say anything about that. Orthogonal to that, if lld cannot handle inputs that GNU linkers can handle, we in general need to fix/implement it to lld.

In D38554#903690, @vit9696 wrote:

So, am I right that the correct approach is to add an additional check whether eabi target is used?
Another way is to ensure that lld handles PLT relocations and solves them as relative relocations if possible.
I think that both changes should be made regardless of the reasons behind, but I would like to hear the opinion of @ruiu and @davide from lld project first.

TL;DR latest llvm completely breaks compatibility with lld for ppc32 target, and even a simple program won't be linked due to unsupported PLT relocations llvm emits. I could submit a patch solving PLTREL relocations for local symbols, however, adding full PLT support to lld will require more effort than I could spend on this issue.

As we've stated, lld support on PPC is not mature enough to motivate features in this sense. Regardless, let me propose the following:

D39079 is going to add support for GCC's -fno-plt option (the nolazybind attribute will be added to functions compiled when -fno-plt is specified). We can add PPC support for this attribute. In this way, by specifying -fno-plt, we'll have a way to directly request the desired code-generation strategy. You can look at D39065 for an example of how to use the attribute (you'll essentially add a check for F->hasFnAttribute(Attribute::NonLazyBind).

@hfinkel, that sounded like a good idea in the beginning, but after I implemented it by changing the condition to the following:

bool Local = TM.shouldAssumeDSOLocal(*Mod, GV);
// Do not use the PLT when explicitly told to do so
if (!Local && GV && GV->isDeclarationForLinker()) {
  const Function *F = dyn_cast_or_null<Function>(GV);
  Local = F && F->hasFnAttribute(Attribute::NonLazyBind);
}
bool UsePlt = !Local && Subtarget.isTargetELF() && !isPPC64;

It did not quite work out. Consider the following example:

unsigned long long fn1();
unsigned fn2();

int __start() {
  unsigned long long v1 = fn1();
  unsigned v2 = fn2();
  return v1 % v2;
}

With -fno-plt and this patch added fn1 and fn2 would work just fine, however, v1 % v2 resolving to __umoddi3 would not, since GV would be null in the first place.

I am not sure if this can be handled without breaking shared builtins linkage, to be honest. So adding basic PLTREL support for lld might be easier after all since IFUNCs are already supported in LLVM and we cannot special case just the relocation model. I submitted a patch here: https://reviews.llvm.org/D39226. I can also submit the -fno-plt patch as well (for completeness) but I am afraid it will be of no real use due to builtins.

hfinkel mentioned this in D39079: New clang option -fno-plt to avoid PLT for external calls.Oct 24 2017, 10:00 AM

In D38554#904915, @vit9696 wrote:
@hfinkel, that sounded like a good idea in the beginning, but after I implemented it by changing the condition to the following:
bool Local = TM.shouldAssumeDSOLocal(*Mod, GV);
// Do not use the PLT when explicitly told to do so
if (!Local && GV && GV->isDeclarationForLinker()) {
  const Function *F = dyn_cast_or_null<Function>(GV);
  Local = F && F->hasFnAttribute(Attribute::NonLazyBind);
}
bool UsePlt = !Local && Subtarget.isTargetELF() && !isPPC64;
It did not quite work out. Consider the following example:
unsigned long long fn1();
unsigned fn2();

int __start() {
  unsigned long long v1 = fn1();
  unsigned v2 = fn2();
  return v1 % v2;
}
With -fno-plt and this patch added fn1 and fn2 would work just fine, however, v1 % v2 resolving to __umoddi3 would not, since GV would be null in the first place.

Interesting. Thanks for trying this.

I am not sure if this can be handled without breaking shared builtins linkage, to be honest. So adding basic PLTREL support for lld might be easier after all since IFUNCs are already supported in LLVM and we cannot special case just the relocation model. I submitted a patch here: https://reviews.llvm.org/D39226. I can also submit the -fno-plt patch as well (for completeness) but I am afraid it will be of no real use due to builtins.

Improving lld is clearly a good thing. If there's interest, we can also look at actually adding support for the embedded ABI (and this can be done incrementally). If improving lld meets our needs and we don't need to add specific EABI handling for now, that's fine too.

Well, the more I think of it, more strange it feels. I indeed was not aware of GNU_IFUNC and kind of expected there to be no difference between EABI and GNU ABI in this sense. While it does not make any sense to emit PLTREL relocations for EABI, it is a must for GNU_IFUNC support and thus there are very few ways (aside -fno-plt) not to emit PLT calls. Hacking in an abi check in the place relocation model check will not be nice. For this reason, once -fno-plt lands and works with compiler-rt calls I will update this with an additional check that enables -fno-plt in EABI by default, which should be sane and more reliable.

In D38554#906397, @vit9696 wrote:

Well, the more I think of it, more strange it feels. I indeed was not aware of GNU_IFUNC and kind of expected there to be no difference between EABI and GNU ABI in this sense. While it does not make any sense to emit PLTREL relocations for EABI, it is a must for GNU_IFUNC support and thus there are very few ways (aside -fno-plt) not to emit PLT calls. Hacking in an abi check in the place relocation model check will not be nice. For this reason, once -fno-plt lands and works with compiler-rt calls I will update this with an additional check that enables -fno-plt in EABI by default, which should be sane and more reliable.

That makes sense to me. You'll need, in general, to add some handling for -meabi in Clang (and parsing of powerpc*-*-eabi* triples). EABI also implies some other things (e.g., the stack alignment is only 8 bytes by default, a function named __eabi is called from main, and so on).

• espindola edited reviewers, added: • espindola; removed: • rafael.Mar 15 2018, 9:09 AM

Without this or D70570, secure-PLT does not work. the @plt relocation annotations cause GNU ld to force BSS-PLT instead.

Herald added subscribers: • wuzish, jsji. · View Herald TranscriptNov 28 2019, 7:22 AM

vit9696 mentioned this in D70570: [PowerPC] Only use PLT annotations if using PIC relocation model.Nov 29 2019, 5:54 AM

Matching up with jhibbits comment: when I (cross) buildworld buildkernel for FreeBSD head to build to -r355547 from scratch, targeting 32-bit powerpc, using the gnu linker with the system-clang, the log shows 2016 notices of the structure:

/usr/local/powerpc64-unknown-freebsd13.0/bin/ld: bss-plt forced due to FILENAME

This is from:

/usr/local/powerpc64-unknown-freebsd13.0/bin/ld -v
GNU ld (GNU Binutils) 2.33.1

The build environment has secure-plt enabled by default for targeting 32-bit powerpc, attempting to leave behind bss-plt. But gnu ld is overriding the request for secure-plt.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.cpp

3 lines

test/

CodeGen/

PowerPC/

ppc32-static.ll

13 lines

Diff 119839

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,713 Lines • ▼ Show 20 Lines	PrepareCall(SelectionDAG &DAG, SDValue &Callee, SDValue &InFlag, SDValue &Chain,
// we're building with the leopard linker or later, which automatically		// we're building with the leopard linker or later, which automatically
// synthesizes these stubs.		// synthesizes these stubs.
const TargetMachine &TM = DAG.getTarget();		const TargetMachine &TM = DAG.getTarget();
const Module *Mod = DAG.getMachineFunction().getFunction()->getParent();		const Module *Mod = DAG.getMachineFunction().getFunction()->getParent();
const GlobalValue *GV = nullptr;		const GlobalValue *GV = nullptr;
if (auto *G = dyn_cast<GlobalAddressSDNode>(Callee))		if (auto *G = dyn_cast<GlobalAddressSDNode>(Callee))
GV = G->getGlobal();		GV = G->getGlobal();
bool Local = TM.shouldAssumeDSOLocal(*Mod, GV);		bool Local = TM.shouldAssumeDSOLocal(*Mod, GV);
bool UsePlt = !Local && Subtarget.isTargetELF() && !isPPC64;		bool UsePlt = !Local && Subtarget.isTargetELF() && !isPPC64 &&
		DAG.getTarget().getRelocationModel() != Reloc::Static;

if (isFunctionGlobalAddress(Callee)) {		if (isFunctionGlobalAddress(Callee)) {
GlobalAddressSDNode *G = cast<GlobalAddressSDNode>(Callee);		GlobalAddressSDNode *G = cast<GlobalAddressSDNode>(Callee);
// A call to a TLS address is actually an indirect call to a		// A call to a TLS address is actually an indirect call to a
// thread-specific pointer.		// thread-specific pointer.
unsigned OpFlags = 0;		unsigned OpFlags = 0;
if (UsePlt)		if (UsePlt)
OpFlags = PPCII::MO_PLT;		OpFlags = PPCII::MO_PLT;
▲ Show 20 Lines • Show All 8,796 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/ppc32-static.ll

				; RUN: llc < %s -mtriple=powerpc-unknown-linux-gnu -relocation-model=static \| FileCheck --match-full-lines %s

				declare i32 @call_foo(i32, ...)

				define i32 @foo() {
				entry:
				%call = call i32 (i32, ...) @call_foo(i32 0, i32 1)
				ret i32 0
				}

				; LABEL:foo:
				; CHECK-NOT: bl call_foo@PLT
				; CHECK: bl call_foo