This is an archive of the discontinued LLVM Phabricator instance.

AsmPrinter: Don't treat symbols with prefix data as code
Changes PlannedPublic

Authored by angerman on Mar 10 2017, 1:00 AM.

Details

Reviewers
pcc
javed.absar
Summary

Doing so may result in these symbols being relocated via means of trampoline,
which breaks references to prefix data.

This can be demonstrated with the following setup:

- libtest.ll -------------------------------------------------------------------
define i32 @hello() prefix i32 42 {
  ret i32 42
}
--------------------------------------------------------------------------------
- main.c -----------------------------------------------------------------------
#include <stdio.h>

int hello(void);

int main() {
  int *prefix_data = (int*) &hello;
  printf("hi: %d\n", prefix_data[-1]);
  return 0;
}
--------------------------------------------------------------------------------
- Makefile --------------------------------------------------------------------
all : libtest.s main

%.s : %.ll
	llc -o $@ $+

libtest.so : libtest.o
	gcc -shared -o $@ $+

main : main.o libtest.so
	gcc -fPIC -L. -ltest -o $@ $+

run : main
	LD_LIBRARY_PATH=. ./main

clean :
	git clean -f
--------------------------------------------------------------------------------
$ make run # should result in “hi: 42”

will result in hi: <random> on ARMv7 and Aarch64.

Event Timeline

angerman created this revision.Mar 10 2017, 1:00 AM
angerman edited the summary of this revision. (Show Details)Mar 10 2017, 1:01 AM

The sample code demonstrating this defect can be found in https://github.com/bgamari/symbol-type-repro
The corresponding bugreport is https://bugs.llvm.org/show_bug.cgi?id=31484. Ben Gamari is to be credited
for the original patch.

pcc edited edge metadata.Mar 10 2017, 6:39 PM

Does this really solve your problem? The linker should only create the trampoline if an object that takes a reference to the function with prefix data was built without -fPIC. If you change the symbol type to object I would expect the linker to create a copy relocation instead. That would cause the dynamic loader to create a copy of the function body in the main executable's address space, and that copy would not be in executable memory so you would not be able to call it.

In fact, if I patch in your change and compile your test case, that is exactly what happens:

$ readelf -r main

Relocation section '.rela.dyn' at offset 0x508 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600ff8  000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000601040  000c00000005 R_X86_64_COPY     0000000000601040 hello + 0

Relocation section '.rela.plt' at offset 0x538 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000601018  000200000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0
000000601020  000300000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
000000601028  000400000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0

and if I change your test case like this:

diff --git a/main.c b/main.c
index d54d1c5..accf187 100644
--- a/main.c
+++ b/main.c
@@ -5,5 +5,6 @@ int hello(void);
 int main() {
         int *prefix_data = (int*) &hello;
         printf("hi: %d\n", prefix_data[-1]);
+        hello();
         return 0;
 }

I get:

$ LD_LIBRARY_PATH=. ./main
hi: 0
Segmentation fault (core dumped)

I think the correct fix is for you to arrange to build any objects that may access prefix data with -fPIC (or with the -relocation-model=pic flag to llc). If I change your test case to do that:

diff --git a/Makefile b/Makefile
index 84ccf76..9d22147 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ all : libtest.s main
 libtest.so : libtest.o
        gcc -shared -o $@ $+
 
-main : main.o libtest.so
+main : main.c libtest.so
        gcc -fPIC -L. -ltest -o $@ $+
 
 run : main

the main binary works as intended.

angerman planned changes to this revision.Mar 10 2017, 9:14 PM
In D30812#698479, @pcc wrote:

Does this really solve your problem? The linker should only create the trampoline if an object that takes a reference to the function with prefix data was built without -fPIC. If you change the symbol type to object I would expect the linker to create a copy relocation instead. That would cause the dynamic loader to create a copy of the function body in the main executable's address space, and that copy would not be in executable memory so you would not be able to call it.

In fact, if I patch in your change and compile your test case, that is exactly what happens:

$ readelf -r main

Relocation section '.rela.dyn' at offset 0x508 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600ff8  000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000601040  000c00000005 R_X86_64_COPY     0000000000601040 hello + 0

Relocation section '.rela.plt' at offset 0x538 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000601018  000200000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0
000000601020  000300000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
000000601028  000400000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0

and if I change your test case like this:

diff --git a/main.c b/main.c
index d54d1c5..accf187 100644
--- a/main.c
+++ b/main.c
@@ -5,5 +5,6 @@ int hello(void);
 int main() {
         int *prefix_data = (int*) &hello;
         printf("hi: %d\n", prefix_data[-1]);
+        hello();
         return 0;
 }

I get:

$ LD_LIBRARY_PATH=. ./main
hi: 0
Segmentation fault (core dumped)

I think the correct fix is for you to arrange to build any objects that may access prefix data with -fPIC (or with the -relocation-model=pic flag to llc). If I change your test case to do that:

diff --git a/Makefile b/Makefile
index 84ccf76..9d22147 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ all : libtest.s main
 libtest.so : libtest.o
        gcc -shared -o $@ $+
 
-main : main.o libtest.so
+main : main.c libtest.so
        gcc -fPIC -L. -ltest -o $@ $+
 
 run : main

the main binary works as intended.

Thank you @pcc for the thorough review and the lengthy test and explanation. This
now ends up being a bit more puzzling that with what we started. Especially as to
why the function to object rewrite we did, seemingly used to work. This should
not have worked, as you have demonstrated.

Our test case clearly fell prey to the implicit %.o : %c rule that make provides, and
which does not come with -fPIC.

I'll try to locate the issue we saw again and see where it deviates from the fixed test
case based on your suggestions to see if we still have a genuine issue or if there is/was
a bug in our toolchain use.

Thank you.

So I've dug into this a bit more by implementing ghc internal linker for arm and arm64.

The issue seems to be the following:

Say we have a function symbol F with P of size S. prefix data, Then P would be at F - S. As long as we are referring to P only from within F, all is good,
however if we refer to P from outside of F, by computing F - S to be the start of the prefix data, this succeeds only if F is not relocated via a jump.

If it is however relocated via a jump (and arm, arm64 permit "veneers" for static relocations), F - S now doesn't point to P, but to something unpredictable.

Now rewriting functions with prefix data to objects, forces them to be GOT relocated, which in turn means that the symbols address needs to be loaded, instead of
relying on the target of the address in place to be able to forward jump to the final address. And therefore F - S will still point to P.

pcc added a comment.Apr 6 2017, 12:09 AM

Is there any chance that you can create a standalone reproducer that demonstrates the problem? I don't think I understand why simply referencing a function via a jump would require the linker to create a veneer and use it as the function's address where its address is taken. Although a linker may create a veneer if a direct call is out of range or requires ARM/Thumb interworking, that doesn't mean that the veneer's address needs to be used when the function's address is taken, and indeed the veneer's address shouldn't be observable if all calls to the veneer are direct.

I will try to see if I run into this again. The case I had yesterday ended up being a botched relocation that did an invalid sign extend.

However the description is, what I believe to be the underlying reason for the function/object mangling in ghc. I am uncertain if it's
still valid though. Your description certainly makes sense, and I'm not certain that the described scenario could happen only if the
PLT is used via linking. What you said makes me believe though that if the symbol's address is used for a relative offset that this
would have to go through the GOT in any case.

I'll revise this issue once I'll have a proper test case at hand.