This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Don't apply --localize flags to common symbols
ClosedPublic

Authored by rupprecht on Oct 26 2018, 3:33 PM.

Details

Summary

--localize-symbol and --localize-hidden will currently localize common symbols. GNU objcopy will not localize these symbols even when explicitly requested, which seems reasonable; common symbols should always be global so they can be merged during linking.

See PR39461

Diff Detail

Event Timeline

rupprecht created this revision.Oct 26 2018, 3:33 PM
MaskRay added inline comments.
tools/llvm-objcopy/llvm-objcopy.cpp
259 ↗(On Diff #171365)

GNU objcopy -L does not apply on SHN_UNDEF symbols.

MaskRay added inline comments.Oct 26 2018, 4:33 PM
tools/llvm-objcopy/llvm-objcopy.cpp
259 ↗(On Diff #171365)
if (Sym.getShndx() != SHN_UNDEF && Sym.getShndx() != SHN_COMMON &&
    (Sym.Binding == STB_GLOBAL || Sym.Binding == STB_WEAK) &&
    (is_contained(Config.SymbolsToLocalize, Sym.Name) ||
     (Config.LocalizeHidden &&
      (Sym.Visibility == STV_HIDDEN || Sym.Visibility == STV_INTERNAL))))

May be a better emulation of GNU objcopy's behavior. But I don't know how much compatibility we want to achieve...

This revision is now accepted and ready to land.Oct 26 2018, 6:10 PM

I can certainly understand your reasoning, but this could also adversely impact a desired use-case of hiding a symbol in an object or library from another object, which is presumably the purpose of --localize-hidden, and is what is discussed in https://chromium.googlesource.com/external/dynamorio/+/master/core/CMakeLists.txt#541:

We need to do extra work to hide symbols in the static library build.
First we do a partial link with ld -r, which makes a single libdynamorio.o
object file. Then we use objcopy --localize-hidden to hide all
non-exported symbols.

Let's say that there's a common symbol foo which is defined in test.o, and is added to a static library to be shipped by a vendor. They might want this symbol to not appear in their interface. Using --localize-hidden would be one way of doing this, but only if the existing llvm-objcopy behaviour is maintained.

Basically, I'm on the fence on this, but I wanted to clarify the argument against it from my point of view.

tools/llvm-objcopy/llvm-objcopy.cpp
259 ↗(On Diff #171365)

Reasonably observations these, but I think that they're a separate issue, and should not be done as part of this patch.

I can certainly understand your reasoning, but this could also adversely impact a desired use-case of hiding a symbol in an object or library from another object, which is presumably the purpose of --localize-hidden, and is what is discussed in https://chromium.googlesource.com/external/dynamorio/+/master/core/CMakeLists.txt#541:

We need to do extra work to hide symbols in the static library build.
First we do a partial link with ld -r, which makes a single libdynamorio.o
object file. Then we use objcopy --localize-hidden to hide all
non-exported symbols.

Let's say that there's a common symbol foo which is defined in test.o, and is added to a static library to be shipped by a vendor. They might want this symbol to not appear in their interface. Using --localize-hidden would be one way of doing this, but only if the existing llvm-objcopy behaviour is maintained.

Basically, I'm on the fence on this, but I wanted to clarify the argument against it from my point of view.

I forgot to mention in the patch description (but it is in https://bugs.llvm.org/show_bug.cgi?id=39461): dynamorio is the thing that breaks when switching to llvm-objcopy, so getting it working has been my test case :)

Here's verification w/ a fresh pull of upstream dynamorio:

$ cmake -DBUILD_TESTS=ON -DCMAKE_OBJCOPY=/path/to/llvm-objcopy ../dynamorio
$ make -j && ctest -R 'code_api\|api.static_' -E FLAKY

GNU objcopy (just omit -DCMAKE_OBJCOPY) & llvm-objcopy w/ this patch passes. llvm-objcopy from trunk crashes:

(gdb) r
Starting program: $HOME/src/dynamorio-build/suite/tests/bin/api.static_noinit
pre-DR init

Program received signal SIGSEGV, Segmentation fault.
0x00005555555e2b92 in modules_init () at $HOME/src/dynamorio/core/module_list.c:145
145         VMVECTOR_ALLOC_VECTOR(loaded_module_areas, GLOBAL_DCONTEXT,
(gdb) bt
#0  0x00005555555e2b92 in modules_init () at $HOME/src/dynamorio/core/module_list.c:145
#1  0x000055555558b5b2 in dynamorio_app_init () at $HOME/src/dynamorio/core/dynamo.c:523
#2  0x000055555558c0ad in dr_app_setup () at $HOME/src/dynamorio/core/dynamo.c:2693
#3  0x0000555555584a5f in main (argc=1, argv=0x7fffffffda08) at $HOME/src/dynamorio/suite/tests/api/static_noinit.c:73

I'm not exactly sure where the SIGSEGV is coming from, but loaded_module_areas is a common symbol, and diffing the objects w/ llvm-readobj -symbols shows that common symbols are where this differs compared to when building w/ objcopy.

Huh. I'd be interested to know WHY making a common symbol local is causing that effect, but I'm happy with this going in to fix it. You might want to prod @jakehehrlich though to be sure.

MaskRay accepted this revision.Oct 29 2018, 10:28 AM

I currently view this change as padding the walls for a somewhat improper use of llvm-objcopy. Sort of like how javascript, PHP, or other such languages see a funny operation at runtime and try and pick the most sensible thing to keep your program running when there should probably be an error. In this case I actually view it as worse because this is an operation you might actually want to perform unlike adding 5 and "10 bananas" to get 15 or whatever crazy thing PHP does.

In general I'm fine with sweeping options like --localize-hidden padding the walls like this if it makes sense. I'm unsure one way or the other if it makes sense here however. I'm less ok with --localize-symbol padding the walls like this since the user should be allowed to do what they want when the specifier is so specific. Any use of --localize-symbol that causes an error like this is an error on the part of the user IMO. I'm sympathetic to needing to add this padding anyway because sometimes forking is not an option but the code you're pulling from still has the bug. So you can probably still convince me we should add this, especially to --localize-hidden. Also there isn't really a way to say "localize hidden symbols but ignore common symbols" right now so this fix might actually be the correct thing to do always. I just want to understand the failure mode better before adding what I currently feel is a blind workaround/foam padding for the user.

I'd also feel better if we added a --localize-hidden-common because then we cover the full use cases. Alternatively we can branch from GNU objcopy again and add --localize-hidden-gnu and *also* add a -gnu option that would make all --*-gnu options the semantics when using the non-gnu counter part name. I think we need such an option for --strip-all-gnu to be useful anyway because otherwise you can't really do a proper drop in replacement.

test/tools/llvm-objcopy/localize.test
50

Slight oddity that doesn't really matter. If Index is SHN_COMMON then the Type should always by STT_COMMON. In relocatable files if the type STT_COMMON then the index must be SHN_COMMON. In fully linked executables it needs to be something specific but the type should still be STT_COMMON. You should never see a symbol with index SHN_COMMON without STT_COMMON.

tools/llvm-objcopy/llvm-objcopy.cpp
259 ↗(On Diff #171365)

This isn't the right way to check for common symbols. You should check the type not shndx.

Another thought I had was that we could add an escape hatch for specific symbols the way "--keep" adds an escape hatch for specific sections. It's as if we have a few operations now

  1. strip section
  2. strip symbol
  3. localize symbol
  4. globalize symbol

And we need an escape hatch for all of these when sweeping changes are made. Again this only solves the problem if chromium maintains patches on top of dynamorio or has forked it.

MaskRay added inline comments.Oct 29 2018, 3:01 PM
test/tools/llvm-objcopy/localize.test
50

It does not need to be STT_COMMON. Index: SHN_COMMON suffices.

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b8871f357fdfa9c0c06d2d3e5600391d8c994f37 added some STT_COMMON support in binutils but the intention is not clear to me.

By default .common sym, size, align emits STT_BOJECT, --elf-stt-common= can toggle the behavior:

% as a.s --elf-stt-common=no && readelf -s a.out | grep COM

4: 0000000000000004     4 OBJECT  GLOBAL DEFAULT  COM sym

% as a.s --elf-stt-common=yes && readelf -s a.out | grep COMMON

4: 0000000000000004     4 COMMON  GLOBAL DEFAULT  COM sym
MaskRay added a comment.EditedOct 29 2018, 3:13 PM

I'd also feel better if we added a --localize-hidden-common because then we cover the full use cases. Alternatively we can branch from GNU objcopy again and add --localize-hidden-gnu and *also* add a -gnu option that would make all --*-gnu options the semantics when using the non-gnu counter part name. I think we need such an option for --strip-all-gnu to be useful anyway because otherwise you can't really do a proper drop in replacement.

Totally agree.

For my comment, if (Sym.getShndx() != SHN_UNDEF && Sym.getShndx() != SHN_COMMON && ... I was just digging into GNU objcopy internals and did not intend to suggest any one direction or the other. It served as a reference what we can further do if we ever want to keep more compatibility. I feel the ELF editing interfaces of objcopy is rather bad. I cannot what an option does unless I read the source code. (bfd/* binutils/objcopy.c). I favor your preference that these options should be simple and for blanket operations the user intends to do (e.g. --localize-hidden), it is reasonable to not specialize case COMMON as this is user fault.

I am still unclear what the good command line ELF editing should do but providing a separate --*-gnu interface seems reasonable.

rupprecht updated this revision to Diff 171590.Oct 29 2018, 3:20 PM
  • Also check STT_COMMON
rupprecht added inline comments.Oct 29 2018, 3:21 PM
test/tools/llvm-objcopy/localize.test
50

It doesn't seem to be required that STT_COMMON is used for common symbols:

$ echo 'int foo;' | bin/clang -xc - -c -o /tmp/common.o; bin/llvm-readobj -symbols /tmp/common.o | grep -B1 -A7 foo
  Symbol {
    Name: foo (16)
    Value: 0x4
    Size: 4
    Binding: Global (0x1)
    Type: Object (0x1)
    Other: 0
    Section: Common (0xFFF2)
  }

However, there seems to be a --elf-stt-common=yes flag that some GNU tools (including objcopy) support:

$ echo 'int foo;' | bin/clang -xc - -c -o /tmp/common.o; objcopy --elf-stt-common=yes /tmp/common.o; bin/llvm-readobj -symbols /tmp/common.o | grep -B1 -A7 foo
  Symbol {
    Name: foo (3)
    Value: 0x4
    Size: 4
    Binding: Global (0x1)
    Type: Common (0x5)
    Other: 0
    Section: Common (0xFFF2)
  }

It looks like STT_COMMON was introduced later, and so it might be the direction that we want to eventually be in, but at least for now tools should be able to handle common objects that aren't explicitly STT_COMMON. Checking both type and section index seems to be standard practice, so I'll do that. (e.g. https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Object/ELFTypes.h#L219)

Ah fair. That looks like many cases we've unfortunately seen before (see
SHF_LINK and SHF_COMPRESSED) that are not always respected (and thus we
must support the old case as well). So disregard my comment about using the
type. That's what I get for blindly reading the spec.

l'm still waiting to hear back from Roland about the sensibilities of this
but what's sensible probably doesn't matter the more I think about it. I'll
give a +2 later tomorrow if I can't think of a better way to resolve this
in a practical manner. As of yet this seems like the best option to me.

On STT_COMMON, it looks like it only matters in fully linked files, since STT_COMMON symbols must always be in SHN_COMMON in relocatable objects. However, STT_COMMON can affect dynamic link-time semantics too, so I guess we may have to check for both. You should check GNU objcopy's behaviour in this case too (i.e. does an STT_COMMON symbol in a fully-linked ELF get affected or not by this option - I assume it is affected), and test accordingly. At the moment, the tests only test for the relocatable output case (i.e. where STT_COMMON is in SHN_COMMON), and we need cases for STT_COMMON not in SHN_COMMON.

jhenderson requested changes to this revision.Oct 30 2018, 3:38 AM

Just doing this to ask for the test update I mentioned.

This revision now requires changes to proceed.Oct 30 2018, 3:38 AM
rupprecht updated this revision to Diff 171820.Oct 30 2018, 3:13 PM
  • Update test cases with common symbols not in SHN_COMMON
  • Rebase after refactoring
jakehehrlich accepted this revision.Oct 30 2018, 3:18 PM

I spoke with Roland. Roland basically agrees that there's no good reason for this to function this way (also pointed out that SHN_COMMON is a legacy thing)...but he also agrees that none of that matters because we need to support this case. So LGTM pending anyone else's code/test specific issues.

On STT_COMMON, it looks like it only matters in fully linked files, since STT_COMMON symbols must always be in SHN_COMMON in relocatable objects. However, STT_COMMON can affect dynamic link-time semantics too, so I guess we may have to check for both. You should check GNU objcopy's behaviour in this case too (i.e. does an STT_COMMON symbol in a fully-linked ELF get affected or not by this option - I assume it is affected), and test accordingly.

GNU objcopy seems to only ignore --localize options for symbols where the index is SHN_COMMON, but not when it is STT_COMMON in some other section.
Based on an earlier comment, I changed this to respect index == SHN_COMMON and type == STT_COMMON equally, i.e. --localize-* will not be applied in either of those cases. So this would be breaking compatibility. I don't have a preference one way or the other; I'm happy to change it back if someone feels strongly.

At the moment, the tests only test for the relocatable output case (i.e. where STT_COMMON is in SHN_COMMON), and we need cases for STT_COMMON not in SHN_COMMON.

I wasn't able to come up with a real-life repro for this, but I changed the test case to EM_EXEC and changed the section.
readobj still prints "Common" for the section name, but you can see from the flag that it is not in the special SHN_COMMON section -- i.e. SHN_COMMON is 0xFFF2, but the STT_COMMON symbol here is in 0x2. But I'm not sure if that specific section number is reliable (based on buildbot configuration etc.), so I just checked that the output section is not 0xF* for that.

If you're able to come up with a real-life way to get STT_COMMON in a fully linked executable, I can adapt this test case to look like it. I tried variants of things like: echo 'int foo; int main() { return 0; }' | gcc -xc - -o /tmp/common-linked -Wa,--elf-stt-common=yes -Wl,-z -Wl,common && bin/llvm-readobj -symbols /tmp/common-linked | grep -B1 -A7 foo, which still converts it to STT_OBJECT when putting it in .bss.

jhenderson accepted this revision.Oct 31 2018, 9:22 AM

LGTM.

I haven't even been able to get the assembler to emit a STT_COMMON symbol. What does the assembly and command-line look like?

This revision is now accepted and ready to land.Oct 31 2018, 9:22 AM

Let's not break compaitableity then and just check the index. The fewer
cases this happens in the better IMO. I guess basically what you had at the
start was actually the best.

  • Localize common symbols only if they are actually in SHN_COMMON

LGTM.

I haven't even been able to get the assembler to emit a STT_COMMON symbol. What does the assembly and command-line look like?

You need to pass in --elf-stt-common=yes, and AFAICT also need to be using GNU as (not sure if clang can produce STT_COMMON like this).

LGTM.

I haven't even been able to get the assembler to emit a STT_COMMON symbol. What does the assembly and command-line look like?

You need to pass in --elf-stt-common=yes, and AFAICT also need to be using GNU as (not sure if clang can produce STT_COMMON like this).

I tried, and I still ended up with an STT_OBJECT...! But it doesn't matter at this point.

test/tools/llvm-objcopy/localize-hidden.test
9

You should change this back if we are reverting to the previous version.

126

I'm not sure that STT_COMMON symbols should be tested if we're putting them in SHN_COMMON (they're no different from STT_OBJECT etc), as there shouldn't be a SHN_COMMON in a linked executable as far as I understand it.

rupprecht updated this revision to Diff 172129.Nov 1 2018, 8:31 AM
  • Revert ET_EXEC/STT_COMMON from tests as we aren't doing that anymore
This revision was automatically updated to reflect the committed changes.