This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Allow the visibility of symbols created by --binary and --add-symbol to be specified with --new-symbol-visibility
ClosedPublic

Authored by chrisjackson on Aug 7 2019, 10:25 AM.

Details

Summary

llvm-objcopy creates three symbols whenever it is used for binary input generating ELF output, representing the start address, end address and size of the data in the new .data section. These symbols all have STV_DEFAULT visibility. This diff allows the visibility to be specified with --new-symbol-visibility.

Diff Detail

Event Timeline

chrisjackson created this revision.Aug 7 2019, 10:25 AM

@jakehehrlich/@rupprecht, I looked at this offline with @chrisjackson, and it's something we would find useful. In particular, we're interested in being able to make the new symbols STV_HIDDEN, so that they don't get involved in symbol pre-emption (although I could imagine some users wanting STV_PROTECTED semantics too). Without this patch, there's no way of avoiding this as far as I know.

Quote gABI:

Second, if any reference to or definition of a name is a symbol with a non-default visibility attribute, the visibility attribute must be propagated to the resolving symbol in the linked object. If different visibility attributes are specified for distinct references to or definitions of a symbol, the most constraining visibility attribute must be propagated to the resolving symbol in the linked object. The attributes, ordered from least to most constraining, are: STV_PROTECTED, STV_HIDDEN and STV_INTERNAL.

You can achieve the same with: __attribute__((visibility("hidden"))) extern char _binary_a_start[]; So I'm not sure why another option is needed (I think my another concern is that the option name --binary-symbol-visibility can make people puzzled if they add/delete/change symbols). See D55682.

Quote gABI:

Second, if any reference to or definition of a name is a symbol with a non-default visibility attribute, the visibility attribute must be propagated to the resolving symbol in the linked object. If different visibility attributes are specified for distinct references to or definitions of a symbol, the most constraining visibility attribute must be propagated to the resolving symbol in the linked object. The attributes, ordered from least to most constraining, are: STV_PROTECTED, STV_HIDDEN and STV_INTERNAL.

You can achieve the same with: __attribute__((visibility("hidden"))) extern char _binary_a_start[]; So I'm not sure why another option is needed (I think my another concern is that the option name --binary-symbol-visibility can make people puzzled if they add/delete/change symbols). See D55682.

Three symbols per embedding in an object results in bloated output. Preventing this with many attribute statements seems like a clumsy approach. With regards to the confusing option, we can just decide on a better name.

Quote gABI:

Second, if any reference to or definition of a name is a symbol with a non-default visibility attribute, the visibility attribute must be propagated to the resolving symbol in the linked object. If different visibility attributes are specified for distinct references to or definitions of a symbol, the most constraining visibility attribute must be propagated to the resolving symbol in the linked object. The attributes, ordered from least to most constraining, are: STV_PROTECTED, STV_HIDDEN and STV_INTERNAL.

You can achieve the same with: __attribute__((visibility("hidden"))) extern char _binary_a_start[]; So I'm not sure why another option is needed (I think my another concern is that the option name --binary-symbol-visibility can make people puzzled if they add/delete/change symbols). See D55682.

Three symbols per embedding in an object results in bloated output.

I don't understand this point. How can the visibility change de-bloat the output?

Preventing this with many attribute statements seems like a clumsy approach.

I don't understand this, either. Your are going to reference _binary_a_start, (_binary_a_end or _binary_a_size) - two undefined symbols. Restrict the undefined symbol with a visibility attribute, then the linked exe/dso will change the symbol bindings accordingly. What's the problem?

With regards to the confusing option, we can just decide on a better name.

We can do that once we decide this option is useful.

Can you explain more about the use case? Not sure I understand the symbol preemption scenario.

Also, if we do want this option, it may be useful as a more generic option, e.g. --redefine-visibility=<symbol1>=<vis1>,<symbol2>=<vis2>,...

Quote gABI:

Second, if any reference to or definition of a name is a symbol with a non-default visibility attribute, the visibility attribute must be propagated to the resolving symbol in the linked object. If different visibility attributes are specified for distinct references to or definitions of a symbol, the most constraining visibility attribute must be propagated to the resolving symbol in the linked object. The attributes, ordered from least to most constraining, are: STV_PROTECTED, STV_HIDDEN and STV_INTERNAL.

You can achieve the same with: __attribute__((visibility("hidden"))) extern char _binary_a_start[]; So I'm not sure why another option is needed (I think my another concern is that the option name --binary-symbol-visibility can make people puzzled if they add/delete/change symbols). See D55682.

Three symbols per embedding in an object results in bloated output.

I don't understand this point. How can the visibility change de-bloat the output?

Preventing this with many attribute statements seems like a clumsy approach.

I don't understand this, either. Your are going to reference _binary_a_start, (_binary_a_end or _binary_a_size) - two undefined symbols. Restrict the undefined symbol with a visibility attribute, then the linked exe/dso will change the symbol bindings accordingly. What's the problem?

With regards to the confusing option, we can just decide on a better name.

We can do that once we decide this option is useful.

Apologies, I misunderstood and phrased that poorly. Trying again, the desire for the option is because:

  • For our use cases it is more convenient to change the symbol's visibility with an option than to have users alter their source code.
  • As --add-symbol allows visibility to be specified, it seems reasonable to also allow the visibility of the generated symbols to be controllable too.
  • For our use cases it is more convenient to change the symbol's visibility with an option than to have users alter their source code.

I accepted D65891 because:

  • It deals with an existing option: --add-symbol. protected is a natural extension to the existing default/hidden visibility. Nobody will get puzzled by this addition.
  • According to Changes to symbol visibility between RVCT v3.1 for µVision and RVCT v4.0 for µVision you linked, there are some use cases. I guess it is used for preventing inadvertent preemption in static linking.

The first bullet is sufficient for me to accept it, but I hope you can elaborate more on the second bullet. How do you use STV_PROTECTED in object files and shared objects? What semantics do you expected when linking STV_DEFAULT/STV_PROTECTED undefined/defined together?

For our use cases it is more convenient to change the symbol's visibility with an option than to have users alter their source code.

--binary-symbol-visibility can cause confusion. As suggested by rupprecht, it should be generalized to --redefine-visibility= if we do want the option.

  • For our use cases it is more convenient to change the symbol's visibility with an option than to have users alter their source code.

I accepted D65891 because:

  • It deals with an existing option: --add-symbol. protected is a natural extension to the existing default/hidden visibility. Nobody will get puzzled by this addition.
  • According to Changes to symbol visibility between RVCT v3.1 for µVision and RVCT v4.0 for µVision you linked, there are some use cases. I guess it is used for preventing inadvertent preemption in static linking.

The first bullet is sufficient for me to accept it, but I hope you can elaborate more on the second bullet. How do you use STV_PROTECTED in object files and shared objects? What semantics do you expected when linking STV_DEFAULT/STV_PROTECTED undefined/defined together?

For our use cases it is more convenient to change the symbol's visibility with an option than to have users alter their source code.

--binary-symbol-visibility can cause confusion. As suggested by rupprecht, it should be generalized to --redefine-visibility= if we do want the option.

I agree that the generalized version is useful and I have been looking at a patch for this. However, in this specific case the symbol names are automatically generated based on the path of the binary, so it makes it difficult to specify these specific symbol names on the command line.

With regards to the expected semantics, we anticipate standard ELF semantics when combining different visibilities. Our toolchain operates using a default visibility of STV_HIDDEN for definitions unless explicitly exported, in which case they are STV_PROTECTED. This means that declarations need not be explicitly marked with a visibility in order to be exported or not. The new switch allows the replication of the compiler's -fvisibility=hidden switch in llvm-objcopy. As an alternative to the name --binary-symbol-visibility, how about --symbol-visibility? This would affect the visibility of all the new symbols generated by llvm-objcopy, not just those symbols added for binary input.

I agree that the generalized version is useful and I have been looking at a patch for this. However, in this specific case the symbol names are automatically generated based on the path of the binary, so it makes it difficult to specify these specific symbol names on the command line.

The symbol names are generated but that is unavoidable. You also have to specify the name in the source: extern const char _binary_a_start[];

With regards to the expected semantics, we anticipate standard ELF semantics when combining different visibilities. Our toolchain operates using a default visibility of STV_HIDDEN for definitions unless explicitly exported, in which case they are STV_PROTECTED. This means that declarations need not be explicitly marked with a visibility in order to be exported or not. The new switch allows the replication of the compiler's -fvisibility=hidden switch in llvm-objcopy. As an alternative to the name --binary-symbol-visibility, how about --symbol-visibility? This would affect the visibility of all the new symbols generated by llvm-objcopy, not just those symbols added for binary input.

There are no other options that add new symbols. --add-symbol has its own visibility flags, --symbol-visibility will not help that feature. --new-symbol-visibility might be a better name but people need to be taught it will be overridden by --add-symbol visibility flags. I am still not sure the little amount of saved labor (__attribute__((visibility("protected"))) marks) justifies a new option, given that Keil embedded development tools already require other markers like __declspec(dllexport) and __declspec(dllimport). And another question: how did you survive with GNU objcopy? It doesn't allow you to change the visibility of _binary_xxx_start...

In D65893#1620711, @MaskRay wrote:
Quote gABI:

Second, if any reference to or definition of a name is a symbol with a non-default visibility attribute, the visibility attribute must be propagated to the resolving symbol in the linked object. If different visibility attributes are specified for distinct references to or definitions of a symbol, the most constraining visibility attribute must be propagated to the resolving symbol in the linked object. The attributes, ordered from least to most constraining, are: STV_PROTECTED, STV_HIDDEN and STV_INTERNAL.

You can achieve the same with: attribute((visibility("hidden"))) extern char _binary_a_start[]; So I'm not sure why another option is needed (I think my another concern is that the option name --binary-symbol-visibility can make people puzzled if they add/delete/change symbols). See D55682.

I agree that the generalized version is useful and I have been looking at a patch for this. However, in this specific case the symbol names are automatically generated based on the path of the binary, so it makes it difficult to specify these specific symbol names on the command line.

The symbol names are generated but that is unavoidable. You also have to specify the name in the source: extern const char _binary_a_start[];

With regards to the expected semantics, we anticipate standard ELF semantics when combining different visibilities. Our toolchain operates using a default visibility of STV_HIDDEN for definitions unless explicitly exported, in which case they are STV_PROTECTED. This means that declarations need not be explicitly marked with a visibility in order to be exported or not. The new switch allows the replication of the compiler's -fvisibility=hidden switch in llvm-objcopy. As an alternative to the name --binary-symbol-visibility, how about --symbol-visibility? This would affect the visibility of all the new symbols generated by llvm-objcopy, not just those symbols added for binary input.

There are no other options that add new symbols. --add-symbol has its own visibility flags, --symbol-visibility will not help that feature. --new-symbol-visibility might be a better name but people need to be taught it will be overridden by --add-symbol visibility flags. I am still not sure the little amount of saved labor (__attribute__((visibility("protected"))) marks) justifies a new option, given that Keil embedded development tools already require other markers like __declspec(dllexport) and __declspec(dllimport). And another question: how did you survive with GNU objcopy? It doesn't allow you to change the visibility of _binary_xxx_start...

In D65893#1620711, @MaskRay wrote:
Quote gABI:

Second, if any reference to or definition of a name is a symbol with a non-default visibility attribute, the visibility attribute must be propagated to the resolving symbol in the linked object. If different visibility attributes are specified for distinct references to or definitions of a symbol, the most constraining visibility attribute must be propagated to the resolving symbol in the linked object. The attributes, ordered from least to most constraining, are: STV_PROTECTED, STV_HIDDEN and STV_INTERNAL.

You can achieve the same with: attribute((visibility("hidden"))) extern char _binary_a_start[]; So I'm not sure why another option is needed (I think my another concern is that the option name --binary-symbol-visibility can make people puzzled if they add/delete/change symbols). See D55682.

--add-symbol has its own visibility flags, true, but it is not a requirement to specify those flags. The normal output is STV_DEFAULT, whether or not the "default" flag is specified. Thus, by default newly added symbols are exported when using ELF visibility rules. We are in the process of transitioning to the ELF visibility scheme away from a different private implementation. The old implementation required the use of __declspec(dllexport) to export. This meant that exporting was an opt-in process on the part of the exporter, whereas the normal ELF process is an opt-out process on the part of the exporter. Since exporting was always opt-in, GNU objcopy's behaviour (and up to this point of transition llvm-objcopy's too) resulted in the new symbols not being exported.

Note that the producers of an object may not always be the same as the consumers of that object, so the behaviour really needs to be on the producer's end. Otherwise an updated build of the object will require a change in the consumer's source code - or their linked output will contain exported symbols without them expecting it.

I've taken a brief look at the Kiel/μVision docs and I see that the behaviour they have is somewhat similar to what we require, namely that symbols are STV_HIDDEN by default, so it looks like we aren't the only people for whom this behaviour would be useful.

I agree that the generalized version is useful and I have been looking at a patch for this. However, in this specific case the symbol names are automatically generated based on the path of the binary, so it makes it difficult to specify these specific symbol names on the command line.

The symbol names are generated but that is unavoidable. You also have to specify the name in the source: extern const char _binary_a_start[];

With regards to the expected semantics, we anticipate standard ELF semantics when combining different visibilities. Our toolchain operates using a default visibility of STV_HIDDEN for definitions unless explicitly exported, in which case they are STV_PROTECTED. This means that declarations need not be explicitly marked with a visibility in order to be exported or not. The new switch allows the replication of the compiler's -fvisibility=hidden switch in llvm-objcopy. As an alternative to the name --binary-symbol-visibility, how about --symbol-visibility? This would affect the visibility of all the new symbols generated by llvm-objcopy, not just those symbols added for binary input.

There are no other options that add new symbols. --add-symbol has its own visibility flags, --symbol-visibility will not help that feature. --new-symbol-visibility might be a better name but people need to be taught it will be overridden by --add-symbol visibility flags. I am still not sure the little amount of saved labor (__attribute__((visibility("protected"))) marks) justifies a new option, given that Keil embedded development tools already require other markers like __declspec(dllexport) and __declspec(dllimport). And another question: how did you survive with GNU objcopy? It doesn't allow you to change the visibility of _binary_xxx_start...

In D65893#1620711, @MaskRay wrote:
Quote gABI:

Second, if any reference to or definition of a name is a symbol with a non-default visibility attribute, the visibility attribute must be propagated to the resolving symbol in the linked object. If different visibility attributes are specified for distinct references to or definitions of a symbol, the most constraining visibility attribute must be propagated to the resolving symbol in the linked object. The attributes, ordered from least to most constraining, are: STV_PROTECTED, STV_HIDDEN and STV_INTERNAL.

You can achieve the same with: attribute((visibility("hidden"))) extern char _binary_a_start[]; So I'm not sure why another option is needed (I think my another concern is that the option name --binary-symbol-visibility can make people puzzled if they add/delete/change symbols). See D55682.

--add-symbol has its own visibility flags, true, but it is not a requirement to specify those flags. The normal output is STV_DEFAULT, whether or not the "default" flag is specified. Thus, by default newly added symbols are exported when using ELF visibility rules. We are in the process of transitioning to the ELF visibility scheme away from a different private implementation. The old implementation required the use of __declspec(dllexport) to export. This meant that exporting was an opt-in process on the part of the exporter, whereas the normal ELF process is an opt-out process on the part of the exporter. Since exporting was always opt-in, GNU objcopy's behaviour (and up to this point of transition llvm-objcopy's too) resulted in the new symbols not being exported.

Note that the producers of an object may not always be the same as the consumers of that object, so the behaviour really needs to be on the producer's end. Otherwise an updated build of the object will require a change in the consumer's source code - or their linked output will contain exported symbols without them expecting it.

I've taken a brief look at the Kiel/μVision docs and I see that the behaviour they have is somewhat similar to what we require, namely that symbols are STV_HIDDEN by default, so it looks like we aren't the only people for whom this behaviour would be useful.

Since exporting was always opt-in, GNU objcopy's behaviour (and up to this point of transition llvm-objcopy's too) resulted in the new symbols not being exported.

Is your GNU objcopy patched to use STV_HIDDEN or STV_PROTECTED for _binary_a_{start,end,size}?

objcopy -I binary -O elf64-x86-64 a b => Symbols in b are STV_HIDDEN/STV_PROTECTED?

Is your GNU objcopy patched to use STV_HIDDEN or STV_PROTECTED for _binary_a_{start,end,size}?

objcopy -I binary -O elf64-x86-64 a b => Symbols in b are STV_HIDDEN/STV_PROTECTED?

No our GNU objcopy is not patched to change the visibility. This was not necessary, because, the GNU objcopy tool is only used with our old export scheme as described above.

We are in the process of transitioning to the ELF visibility scheme away from a different private implementation. The old implementation required the use of __declspec(dllexport) to export. This meant that exporting was an opt-in process on the part of the exporter, whereas the normal ELF process is an opt-out process on the part of the exporter.

Since exporting was always opt-in, GNU objcopy's behaviour (and up to this point of transition llvm-objcopy's too) resulted in the new symbols not being exported.

I'm rather confused. If your GNU objcopy isn't pached, how can it result in the new symbols not being exported? It doesn't have a flag to control the visibility of _binary_a_start.

Note that the producers of an object may not always be the same as the consumers of that object, so the behaviour really needs to be on the producer's end.

I don't follow the causality here. The visibility is the most restricted one among the producer (definition) and consumers (references).

Actually I think the undefined reference (extern const char _binary_a_start[];) should really have a visibility attribute to annotate that the linked component should have the reference resolved within the same component (executable or DSO). It cannot be left undefined but interposed at runtime by another DSO.

Otherwise an updated build of the object will require a change in the consumer's source code - or their linked output will contain exported symbols without them expecting it.

If you migrate to something like -fvisibility=hidden, you'll need visibility attributes to mark exported functions/objects.

I've taken a brief look at the Kiel/μVision docs and I see that the behaviour they have is somewhat similar to what we require, namely that symbols are STV_HIDDEN by default, so it looks like we aren't the only people for whom this behaviour would be useful.

The link you gave to me is a Kiel/μVision link, so I thought you were a Kiel/μVision developer/user. Then you said you are not those people... Sorry if I misunderstood your affiliation but I don't find why this option can be justified by their non-standard behaviors...

@chrisjackson and I work in the same team for Sony on the PlayStation toolchain, but our required semantics looks to be similar to the published ones in the Kiel documentation he mentioned.

Regardless, I realised something on Friday that means that a switch is essentially a requirement. You cannot be guaranteed to have references to all three symbols in your source code. A typical usage might look something like this:

extern uint8_t *_binary_start; // actually the symbol name from the binary input object
extern uint8_t *_binary_end;

for (uint8_t C = _binary_start; C != _binary_end; ++C) {
  // do something with C
}

You can certainly decorate those two symbols with a visibility attribute, as you suggested before. However, what you can't do is specify a visibility for them if they aren't referenced:

__attribute__((visibility("hidden")))
extern uint8_t *_binary_start; // referenced
__attribute__((visibility("hidden")))
extern uint8_t *_binary_end; // referenced
__attribute__((visibility("hidden")))
extern uint64_t _binary_size; // unreferenced

The above will result in the first two symbols being marked STV_HIDDEN, but the third symbol will not appear in the object file, as it is unreferenced (and introducing an artificial reference may or may not work depending on compiler optimizations etc). This means that when the binary input object gets linked together with this object, there will be nothing overriding the STV_DEFAULT visibility of the third symbol, meaning it gets exported. There is nothing apart from changing the visibility of the symbols in that object that can prevent this. As a result, I think a --symbol-visibility switch as described above in llvm-objcopy would be very useful. A --redefine-visibility switch could also be useful potentially, but is a little trickier to use, since it would require potentially being specified two or three times, whereas a --symbol-visibility switch would only need specifying once (and could also be used to change the default visibility of --add-symbol specified symbols, simplifying users' command-lines).

@chrisjackson and I work in the same team for Sony on the PlayStation toolchain, but our required semantics looks to be similar to the published ones in the Kiel documentation he mentioned.

Regardless, I realised something on Friday that means that a switch is essentially a requirement. You cannot be guaranteed to have references to all three symbols in your source code. A typical usage might look something like this:

Agreed. There is no way to make the unrefenced symbol hidden in an object file.

Thanks for providing more background.

Have done some --export-dynamic and isPreemptible lld work recently, I think I favor the idea to use STV_PROTECTED to mark symbols as non-interposable in a shared object. In ELF, it is unfortunate that a STV_DEFAULT symbol in a shared object is by default interposable. Another unfortunate thing is that toolchains don't annotate STV_DEFAULT symbols properly when some symbols are actually non-interposable:

  • -Bsymbolic -Bsymbolic-functions: all relevant symbols are non-interposable.
  • --dynamic-list family (other --dynamic-list-* options are mostly unused nowadays): they do a fine control what symbols are interposable.

For those exported non-interposable STV_DEFAULT symbols, annotating them as STV_PROTECTED is a great use of symbol visibility.

... As a result, I think a --symbol-visibility switch as described above in llvm-objcopy would be very useful. A --redefine-visibility switch could also be useful potentially, but is a little trickier to use, since it would require potentially being specified two or three times, whereas a --symbol-visibility switch would only need specifying once (and could also be used to change the default visibility of --add-symbol specified symbols, simplifying users' command-lines).

I think --symbol-visibility is better than --binary-symbol-visibility, but it may still be a bit confusing. How about --new-symbol-visibility? It currently should affect --add-symbol and -I binary.

I also think --redefine-visibility may be niche. We can add it when it is needed.

edd added a subscriber: edd.Aug 20 2019, 2:15 AM
chrisjackson retitled this revision from [llvm-objcopy] Allow the visibility of the start, end and size symbols created by --binary to be specified with --binary-symbol-visibility to [llvm-objcopy] Allow the visibility of the start, end and size symbols created by --binary to be specified with --news-symbol-visibility.Aug 28 2019, 9:24 AM
chrisjackson edited the summary of this revision. (Show Details)
chrisjackson retitled this revision from [llvm-objcopy] Allow the visibility of the start, end and size symbols created by --binary to be specified with --news-symbol-visibility to [llvm-objcopy] Allow the visibility of symbols created by --binary and --add-symbol to be specified with --new-symbol-visibility.
chrisjackson edited the summary of this revision. (Show Details)

The diff has been updated to apply the visibility to symbols created with --add-symbol where a visibility flag is not specified. The flag name has also been changed from --binary-symbol-visibility to --new-symbol-visibility

Thanks for the update. It looks better now. Will accept after a few comments are addressed.

llvm/tools/llvm-objcopy/CopyConfig.cpp
509
if (opt::Arg *A = InputArgs.getLastArg(OBJCOPY_new_symbol_visibility)) {
  ... StringSwitch<uint8_t>(A->getValue())
717

getValueOr(ELF::STV_DEFAULT)

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
760

getValueOr(ELF::STV_DEFAULT)

llvm/tools/llvm-objcopy/ELF/Object.h
897

Move NewSymbolVisibility after MemBuf [-Wreorder]

951

Move NewSymbolVisibility after MemBuf

MaskRay added inline comments.Aug 28 2019, 10:26 PM
llvm/tools/llvm-objcopy/CopyConfig.cpp
524

binary symbol visibility -> symbol visibility

Update to apply the corrections pointed out by @MaskRay .

MaskRay accepted this revision.Aug 29 2019, 6:59 PM

LGTM

llvm/tools/llvm-objcopy/ELF/Object.h
950

const uint8_t -> uint8_t

The parameter doesn't have to use const.

This revision is now accepted and ready to land.Aug 29 2019, 6:59 PM
chrisjackson closed this revision.EditedAug 30 2019, 3:27 AM

This was closed with revision r370458, commit fa1fe93789.