This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Implement -G/--keep-global-symbol(s).
ClosedPublic

Authored by rupprecht on Aug 10 2018, 4:47 PM.

Details

Summary

Port GNU Objcopy -G/--keep-global-symbol(s).

This is slightly different than the already-implemented --globalize-symbol, which marks a symbol as global when copying. When --keep-global-symbol (alias -G) is used, *only* those symbols marked will stay global, and all other globals are demoted to local. (Also note that it doesn't *promote* a symbol to global). Additionally, there is a pluralized version of the flag --keep-global-symbols, which effectively applies --keep-global-symbol for every non-comment in a file.

Diff Detail

Repository
rL LLVM

Event Timeline

rupprecht created this revision.Aug 10 2018, 4:47 PM

Sorry for the slew of reviews. I have another one for implementing "llvm-strip -p" almost ready to go too (about the same size as this), hopefully I'll be able to send it next week.

Let me know I can help out to make up for it, e.g. if you want me to handle doing some reviews here as a first pass before you approve, or do some cleanups you've been meaning to get around to but haven't had time because of all the patches I'm sending :)

What happens if you specify a mixture of --globalize-symbol and --keep-global-symbol?

test/tools/llvm-objcopy/keep-global-symbols.test
3 ↗(On Diff #160208)

Unfinished sentence.

5 ↗(On Diff #160208)

Nit: -G -> -G/--keep-global-symbol

10 ↗(On Diff #160208)

It may be a bit clearer to do:

# RUN: echo " Global3 " > %t-globals2.txt
# RUN: echo "Global4 # Global5" >> %t-globals2.txt
# RUN: echo "Unknown" >> %t-globals2.txt
36 ↗(On Diff #160208)

What happens if you specify -G Weak? Does it become Global? Ditto -G Local?

tools/llvm-objcopy/ObjcopyOpts.td
91 ↗(On Diff #160208)

What happens in GNU objcopy if you have a file containing a line like "Global1 Global2"? Does it a) keep both, b) keep neither, c) keep Global1, or d) keep a symbol called "Global1 Global2"?

tools/llvm-objcopy/llvm-objcopy.cpp
700 ↗(On Diff #160208)

Use lower case for first letter of function name.

708 ↗(On Diff #160208)

I'd prefer no auto here, because it confused me on first read. Also, the const I think is unnecessary, since this is a StringRef.

710 ↗(On Diff #160208)

Nit: missing trailing full stop.

rupprecht marked 5 inline comments as done.Aug 13 2018, 2:18 PM

What happens if you specify a mixture of --globalize-symbol and --keep-global-symbol?

The symbol will be a global in the output file.
That's how it accidentally was in the way I wrote it, but I added a test case to make sure it stays that way and left some more comments.

test/tools/llvm-objcopy/keep-global-symbols.test
36 ↗(On Diff #160208)

Added both as test cases. -G Local doesn't do anything, -G Weak prevents it from being demoted to local, and it remains a weak symbol.

The --keep-global-symbol seems subtly different from --globalize-symbol, i.e. it seems to work in the other direction. --globalize-symbol will promote a local symbol to global, whereas specifying a single --keep-global-symbol implies you want to demote all symbols to local, except for any symbols added with --keep-global-symbol.

tools/llvm-objcopy/ObjcopyOpts.td
91 ↗(On Diff #160208)

It prints this amusing message and then keeps Global1:
objcopy: <path>/keep-global-symbols.test.tmp-globals2.txt:3: Ignoring rubbish found on this line.

See this part of the gnu file parser: https://github.com/redox-os/binutils-gdb/blob/a9d73bb6dacfbbdc826fea178e3a47bf7a4154b1/binutils/objcopy.c#L1076

The way I have it written, this will break compatibility -- it will trim off comments and whitespace before/after the symbol name, but not between, so it will keep a symbol called "Global1 Global2". But, I think this is a case that will never come up; anyone relying on the parsing to work like this is probably doing it wrong anyway. So, I'm inclined to leave it as written just because it's simpler. WDYT?

tools/llvm-objcopy/llvm-objcopy.cpp
700 ↗(On Diff #160208)

Done -- although this is just copying the pattern from all the other static methods here which are written with an upper case first letter.
I'll go fire off an NFC to fix the names in this file.

rupprecht updated this revision to Diff 160447.Aug 13 2018, 2:18 PM
  • Misc updates to tests/comments

What happens if you specify a mixture of --globalize-symbol and --keep-global-symbol?

The symbol will be a global in the output file.
That's how it accidentally was in the way I wrote it, but I added a test case to make sure it stays that way and left some more comments.

I think I wasn't explicit enough with my wording to explain what I was thinking: what happens if you specify --globalize-symbol a --keep-global-symbol b? It looks like the code will turn a into STB_LOCAL, and b will remain as it currently is (similar points about --localize|weaken-symbol also apply). Does this match GNU's behaviour? I'd expect a to be STB_GLOBAL and b to remain unchanged.

test/tools/llvm-objcopy/keep-global-symbols.test
8 ↗(On Diff #160447)

You should probably expand this comment for the new local and weak symbols, as well as Global6.

tools/llvm-objcopy/ObjcopyOpts.td
82 ↗(On Diff #160447)

Since this doesn't make things STB_GLOBAL, and makes things STB_WEAK remain as weak, I'd actually rephrase the help text to be something along the lines of "Convert all symbols except <symbol> to STB_LOCAL".

91 ↗(On Diff #160208)

I think it's reasonable for two reasons:

  1. It is possible using asm at least to create symbol names with whitespace e.g. the following will create a global symbol called "a b":
.global "a b"
"a b":
	nop

In fact, you can even get " a b " as a valid symbol name like that. Maybe we should just not trim whitespace, although I'm not confident about that.

We should probably test that this is kept in fact by our new behaviour, since by my understanding of what you are saying, it is impossible to specify such a name in the keep-global-symbols file.

  1. Users who have been seeing and ignoring the warning deserve what's coming to them...
tools/llvm-objcopy/llvm-objcopy.cpp
384 ↗(On Diff #160447)

Like the help text, the second bit here should probably be updated to explain what happens to symbols that aren't kept.

700 ↗(On Diff #160208)

Yeah, this was a mistake early on in development, and we're trying to fix it. Thanks for the NFC commit in advance.

rupprecht updated this revision to Diff 160699.Aug 14 2018, 3:03 PM
rupprecht marked 8 inline comments as done.
  • Handle --globalize --keep-global correctly, and clean up comments

What happens if you specify a mixture of --globalize-symbol and --keep-global-symbol?

The symbol will be a global in the output file.
That's how it accidentally was in the way I wrote it, but I added a test case to make sure it stays that way and left some more comments.

I think I wasn't explicit enough with my wording to explain what I was thinking: what happens if you specify --globalize-symbol a --keep-global-symbol b? It looks like the code will turn a into STB_LOCAL, and b will remain as it currently is (similar points about --localize|weaken-symbol also apply). Does this match GNU's behaviour? I'd expect a to be STB_GLOBAL and b to remain unchanged.

Ah, I see. I took a closer look at GNU and was very surprised at the results.
I basically ran --globalize-symbol a --keep-global-symbol b for a/b being local, weak, and global. (I added this as a separate test case).
For GNU, b ended up global in the output file. But a was kept local unless it was originally local; when a was local in the input, it ended up being global in the output.

For reference, the GNU business logic is here: https://github.com/redox-os/binutils-gdb/blob/a9d73bb6dacfbbdc826fea178e3a47bf7a4154b1/binutils/objcopy.c#L1552
It seems that:

  • --globalize intentionally only applies to local symbols, so we're already broken w.r.t gnu compatability. (I don't know what the history of that is in llvm-objcopy -- I can fix that if you want). So running GNU objcopy --globalize Weak1 --keep-global-symbol Weak2 will result in Weak1 being local.
  • --globalize omits already global symbols (i.e. (flags & BSF_LOCAL)), but it only checks on the *original* binding type (i.e. it does not look at sym->flags). So --globalize G1 --keep-global-symbol G2 means that G1 will go from global->local (because it isn't included in --keep-global-symbol), but won't go back from local->global because the original binding was already global.

I'm not sure if the weak symbol binding is an issue; I can fix that in its own commit if you want. For the global symbol binding issue, IMHO, that's a bug, and we should do the functionality you suggested: --globalize foo should keep foo global, whether or not --keep-global-symbol covers it. OTOH, it's also an edge case that probably never comes up.

tools/llvm-objcopy/ObjcopyOpts.td
91 ↗(On Diff #160208)

I think trimming whitespace from the end makes sense, e.g. to allow things like:

SomeSymbol # We want this global because X

Anyway, I added a test case that asserts " a b " in the file will refer to symbol "a b" in the object.

jhenderson accepted this revision.Aug 15 2018, 2:48 AM

I basically ran --globalize-symbol a --keep-global-symbol b for a/b being local, weak, and global. (I added this as a separate test case).
For GNU, b ended up global in the output file. But a was kept local unless it was originally local; when a was local in the input, it ended up being global in the output.

For reference, the GNU business logic is here: https://github.com/redox-os/binutils-gdb/blob/a9d73bb6dacfbbdc826fea178e3a47bf7a4154b1/binutils/objcopy.c#L1552
It seems that:

  • --globalize intentionally only applies to local symbols, so we're already broken w.r.t gnu compatability. (I don't know what the history of that is in llvm-objcopy -- I can fix that if you want). So running GNU objcopy --globalize Weak1 --keep-global-symbol Weak2 will result in Weak1 being local.
  • --globalize omits already global symbols (i.e. (flags & BSF_LOCAL)), but it only checks on the *original* binding type (i.e. it does not look at sym->flags). So --globalize G1 --keep-global-symbol G2 means that G1 will go from global->local (because it isn't included in --keep-global-symbol), but won't go back from local->global because the original binding was already global.

I'm not sure if the weak symbol binding is an issue; I can fix that in its own commit if you want. For the global symbol binding issue, IMHO, that's a bug, and we should do the functionality you suggested: --globalize foo should keep foo global, whether or not --keep-global-symbol covers it. OTOH, it's also an edge case that probably never comes up.

I can't remember whether we deliberately decided to deviate from GNU on the --globalize behaviour. To me, our behaviour makes sense: if a user deliberately requests to globalize a weak symbol, why shouldn't we? I could even see a use-case for it: a member, in an archive, with a weak symbol definition, won't normally be used, unless other symbols from the archive are needed, so --globalize could force its inclusion. I'd go back and dig out the change that introduced the behaviour and check if it was discussed previously, and if not, file a bug or send an email on the mailing list to discuss it further.

Overall, our goal has been to match GNU's behaviour except for where it doesn't make much sense. I'd say the second point is definitely dubious behaviour in GNU objcopy, but I'd like @jakehehrlich to confirm that he's happy with the deviation here. Otherwise, LGTM. Feel free to decide what you want to do about the CHECK patterns I mentioned inline, without needing further input from me.

test/tools/llvm-objcopy/keep-global-symbols-mix-globalize.test
43 ↗(On Diff #160699)

I see what you're doing with this CHECK pattern, but I think it will false positive if we for some reason create additional symbols, although I guess I could be persuaded that this is not an issue to worry about.

I think you should probably also check the total symbol count. With that, I think this pattern becomes valid.

test/tools/llvm-objcopy/keep-global-symbols.test
80 ↗(On Diff #160699)

See my earlier comment about checking the number of symbols.

This revision is now accepted and ready to land.Aug 15 2018, 2:48 AM
rupprecht marked 6 inline comments as done.Aug 15 2018, 10:53 AM

I basically ran --globalize-symbol a --keep-global-symbol b for a/b being local, weak, and global. (I added this as a separate test case).
For GNU, b ended up global in the output file. But a was kept local unless it was originally local; when a was local in the input, it ended up being global in the output.

For reference, the GNU business logic is here: https://github.com/redox-os/binutils-gdb/blob/a9d73bb6dacfbbdc826fea178e3a47bf7a4154b1/binutils/objcopy.c#L1552
It seems that:

  • --globalize intentionally only applies to local symbols, so we're already broken w.r.t gnu compatability. (I don't know what the history of that is in llvm-objcopy -- I can fix that if you want). So running GNU objcopy --globalize Weak1 --keep-global-symbol Weak2 will result in Weak1 being local.
  • --globalize omits already global symbols (i.e. (flags & BSF_LOCAL)), but it only checks on the *original* binding type (i.e. it does not look at sym->flags). So --globalize G1 --keep-global-symbol G2 means that G1 will go from global->local (because it isn't included in --keep-global-symbol), but won't go back from local->global because the original binding was already global.

I'm not sure if the weak symbol binding is an issue; I can fix that in its own commit if you want. For the global symbol binding issue, IMHO, that's a bug, and we should do the functionality you suggested: --globalize foo should keep foo global, whether or not --keep-global-symbol covers it. OTOH, it's also an edge case that probably never comes up.

I can't remember whether we deliberately decided to deviate from GNU on the --globalize behaviour. To me, our behaviour makes sense: if a user deliberately requests to globalize a weak symbol, why shouldn't we? I could even see a use-case for it: a member, in an archive, with a weak symbol definition, won't normally be used, unless other symbols from the archive are needed, so --globalize could force its inclusion. I'd go back and dig out the change that introduced the behaviour and check if it was discussed previously, and if not, file a bug or send an email on the mailing list to discuss it further.

Looks like it was added in D46177 without any discussion on GNU compatibility, so it was probably just overlooked (although there is a weak symbol in a test case). I'll start a thread.

Overall, our goal has been to match GNU's behaviour except for where it doesn't make much sense. I'd say the second point is definitely dubious behaviour in GNU objcopy, but I'd like @jakehehrlich to confirm that he's happy with the deviation here. Otherwise, LGTM. Feel free to decide what you want to do about the CHECK patterns I mentioned inline, without needing further input from me.

test/tools/llvm-objcopy/keep-global-symbols-mix-globalize.test
43 ↗(On Diff #160699)

I was actually able to get something much more terse/easier to check via FileCheck (and also has the symbol count) with -elf-output-style=GNU

rupprecht marked an inline comment as done.
  • Make test case simpler by using -elf-output-style=GNU

This LGTM. But maybe worth giving it a day or two to see if anybody responds to the email thread. Or you could commit it as is, and then fix things if people complain in the thread, I don't mind.

test/tools/llvm-objcopy/keep-global-symbols-mix-globalize.test
43 ↗(On Diff #160699)

Nice, I like it!

jhenderson accepted this revision.Aug 16 2018, 3:18 AM
This revision was automatically updated to reflect the committed changes.