This is an archive of the discontinued LLVM Phabricator instance.

Perform symbol binding for .symver versioned symbols
ClosedPublic

Authored by tejohnson on Feb 28 2017, 7:14 PM.

Details

Summary

In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".

While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.

E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.

The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.Feb 28 2017, 7:14 PM
pcc added inline comments.Feb 28 2017, 7:40 PM
lib/MC/MCParser/ELFAsmParser.cpp
748 ↗(On Diff #90115)

I don't think this is right. See ELFObjectWriter::executePostLayoutBinding: .symver symbols inherit their binding from the symbol they alias.

I think we may need something equivalent to the executePostLayoutBinding logic in ModuleSymbolTable::CollectAsmSymbols.

tejohnson added inline comments.Feb 28 2017, 9:43 PM
lib/MC/MCParser/ELFAsmParser.cpp
748 ↗(On Diff #90115)

Thanks for pointing me to that, I was having trouble finding good documentation on the .symver directive, but I see that if I make the aliased symbol local that gas makes the versioned symbol also local.

However, I'm not sure I agree that the handling in executePostLayoutBinding, which deals with .symver specifically, should be added to CollectAsmSymbols, which is doing more general handling of the symbols returned from the streamer. It seems like we should do this in the .symver handling here within the ELFAsmParser, and instead set the appropriate symbol attribute based on the symbol binding of the aliased Sym (instead of unconditionally setting to global as I have done here).

pcc added inline comments.Feb 28 2017, 10:16 PM
lib/MC/MCParser/ELFAsmParser.cpp
748 ↗(On Diff #90115)

The problem is that we cannot resolve the binding until we have seen the entire input because a later directive may change the binding. For example:

foo:
.symver foo, bar@@baz

.weak foo

In this case bar@@baz has weak binding but that is not known until the end of the input.

It may be that we can handle this within the parser but I don't think it would be right to do it entirely at the point where we see the directive.

tejohnson added inline comments.Mar 1 2017, 7:09 AM
lib/MC/MCParser/ELFAsmParser.cpp
748 ↗(On Diff #90115)

Good point.

Sigh, my test case expanded to include the local and weak cases hit https://bugs.llvm.org//show_bug.cgi?id=25679 for the local case. I think as Rafael said there we should relax this assertion, although I am not sure what the right conditions for relaxing this are - we no longer have the local aliased symbol defined anywhere so can't check that its binding was local.

For the original bug I am fixing here, getting this right unfortunately seems tricky - the binding of the aliased symbol could be set in the module asm via a directive, or as is probably more likely, in the IR, e.g.:

module asm ".symver io_cancel_local_0_4,io_cancel_local@@LIBAIO_0.4"

define internal i32 @io_cancel_local_0_4() {

ret i32 0

}

In that case, we don't have the information in the asm parser about the aliased symbol's binding. We don't even currently have this in CollectAsmSymbols, although I suppose we could pass in the Module. But it seems complex to get this right - we need to identify these .symver aliases, and get the binding of the aliased symbol, either from the asm or from the IR. Not sure of a cleaner way to do this right...

Ultimately the emitted object has the right bindings, because the binding is set on the alias in ELFObjectWriter::executePostLayoutBinding. But the issue is ensuring the symbol iterator in the InputFile skips the right symbols.

pcc added inline comments.Mar 1 2017, 10:44 AM
lib/MC/MCParser/ELFAsmParser.cpp
748 ↗(On Diff #90115)

Yes, this is tricky. I think we will have to change ModuleSymbolTable to keep track of the set of defined symbol names (and as part of that use the aliased symbol to compute the binding of the alias). I think that is something we'll need to do anyway to deduplicate symbols when we create the bitcode symbol table.

If we also keep track of which symbols are aliased with .symver we can inhibit internalization for those symbols, which would seem to fix pr25679.

tejohnson added inline comments.Mar 1 2017, 11:41 AM
lib/MC/MCParser/ELFAsmParser.cpp
748 ↗(On Diff #90115)

It's also tricky because CollectAsmSymbols is target agnostic, and the symver directive is specific to ELF. Even the symbol binding is only accessible via the ELF symbol type.

After looking at this some more, I think the most straightforward way is to use a callback from the parser that is defined in CollectAsmSymbols, that would a) add the aliased symbol to the llvm.compiler.used set to avoid internalization; and b) return the binding info based on the linkage type (assuming it is defined in the IR).

Then have a virtual method probably on the MCAsmParserExtension that can be invoked at the end of parsing and would be overridden for the ELFAsmParser to do the right thing for symver aliases (which would need to be tracked), and which uses the callback.

tejohnson added inline comments.Mar 2 2017, 12:48 PM
lib/MC/MCParser/ELFAsmParser.cpp
748 ↗(On Diff #90115)

I have fixes for these issues but they just need to be cleaned up a bit. I also discovered that the issue with the incorrect internalization was even easier to fix. We just needed to invoke the mechanism that updates the llvm.compiler.used with undefined asm symbols earlier for regular LTO - both in the new API and in the old legacy LTO. I'll clean that up and send it to fix that original bug first. Tomorrow though as I need to finish up something unrelated today...

tejohnson updated this revision to Diff 90707.Mar 6 2017, 8:06 AM

Rework to get symbol binding from aliased symbol.

tejohnson retitled this revision from Mark .symver versioned symbol as global to Perform symbol binding for .symver versioned symbols.Mar 6 2017, 8:08 AM
tejohnson edited the summary of this revision. (Show Details)
pcc edited edge metadata.Mar 6 2017, 2:18 PM

The parser doesn't seem like the correct place to handle this. If you introduce an EmitELFSymverDirective method on MCStreamer, and extend the AsmSymbol callback to CollectAsmSymbols with something that lets the caller know that this is a symver symbol and the name of the aliasee, I think you can move all handling of this directive into RecordStreamer and the AsmSymbol callback provided by ModuleSymbolTable.

include/llvm/Object/ModuleSymbolTable.h
55 ↗(On Diff #90707)

The signature change seems like a separately useful cleanup.

lib/Object/ModuleSymbolTable.cpp
112 ↗(On Diff #90707)

I don't think this is right, as Name may be mangled.

In D30485#693665, @pcc wrote:

The parser doesn't seem like the correct place to handle this.

Why? The ELFAsmParser already has .symver handling and is specific to ELF. The benefit of handling there is that all clients get the correct binding. My concern with doing this in the MCStreamer & RecordStreamer is that the RecordStreamer is currently format-agnostic. The MCStreamer has some format specific Emit* methods on it, but they are empty and overridden by the various format-specific object file writers derived via MCObjectStreamer.

If you introduce an EmitELFSymverDirective method on MCStreamer, and extend the AsmSymbol callback to CollectAsmSymbols with something that lets the caller know that this is a symver symbol and the name of the aliasee,

I think what would need to be done is to add this virtual method on the MCStreamer, which by default doesn't do anything, and only implement it for the derived RecordStreamer class. For RecordStreamer, it would need to save the mapping between the alias and aliasee. Then in CollectAsmSymbols, we'd have to walk this saved mapping, and get and apply the same symbol attribute to the alias as applied to its aliasee (either by invoking RecordStreamer::EmitSymbolAttribute, or by applying the same RecordStreamer::State to the aliasee when iterating through the RecordStreamer's Syymbols string map). We also need to get the symbol binding from the IR in the case where the aliasee is defined in IR.

I think you can move all handling of this directive into RecordStreamer and the AsmSymbol callback provided by ModuleSymbolTable.

Do you mean move it into the invocation of CollectAsmSymbols from ModuleSymbolTable::addModule (i.e. into it's definition of AsmSymbol)? This will only benefit that one client of CollectAsmSymbols (although that is the one that matters for this bug).

lib/Object/ModuleSymbolTable.cpp
112 ↗(On Diff #90707)

Won't the name in IR also be mangled?

pcc added a comment.Mar 7 2017, 2:18 PM
In D30485#693665, @pcc wrote:

The parser doesn't seem like the correct place to handle this.

Why? The ELFAsmParser already has .symver handling and is specific to ELF. The benefit of handling there is that all clients get the correct binding. My concern with doing this in the MCStreamer & RecordStreamer is that the RecordStreamer is currently format-agnostic. The MCStreamer has some format specific Emit* methods on it, but they are empty and overridden by the various format-specific object file writers derived via MCObjectStreamer.

The parser is supposed to be just a parser. The streamer is meant to control what to do with each directive. It is fine if it contains special handling for ELF because the RecordStreamer is supposed to be handling all directives that can introduce a symbol reference or definition in the object file, even those which happen to be specific to one object format (in fact, I suspect that we may need to override methods like BeginCOFFSymbolDef in the future).

If you introduce an EmitELFSymverDirective method on MCStreamer, and extend the AsmSymbol callback to CollectAsmSymbols with something that lets the caller know that this is a symver symbol and the name of the aliasee,

I think what would need to be done is to add this virtual method on the MCStreamer, which by default doesn't do anything, and only implement it for the derived RecordStreamer class. For RecordStreamer, it would need to save the mapping between the alias and aliasee. Then in CollectAsmSymbols, we'd have to walk this saved mapping, and get and apply the same symbol attribute to the alias as applied to its aliasee (either by invoking RecordStreamer::EmitSymbolAttribute, or by applying the same RecordStreamer::State to the aliasee when iterating through the RecordStreamer's Syymbols string map). We also need to get the symbol binding from the IR in the case where the aliasee is defined in IR.

Yes, you could even handle all non-symver symbols first and then handle the symver symbols by walking the partially computed ModuleSymbolTable. (I think it is fine if it uses a linear scan because module asm symbols are uncommon.)

I think you can move all handling of this directive into RecordStreamer and the AsmSymbol callback provided by ModuleSymbolTable.

Do you mean move it into the invocation of CollectAsmSymbols from ModuleSymbolTable::addModule (i.e. into it's definition of AsmSymbol)? This will only benefit that one client of CollectAsmSymbols (although that is the one that matters for this bug).

If a caller of CollectAsmSymbols needs 100% accurate binding information, it will effectively need to create a ModuleSymbolTable. I haven't looked very closely at what the other callers are doing though. It may be that they can get by with inaccurate bindings for symvers.

lib/Object/ModuleSymbolTable.cpp
112 ↗(On Diff #90707)

In ELF it could be something like @"\01foo". In that case the mangled name is foo.

In D30485#694931, @pcc wrote:
In D30485#693665, @pcc wrote:

The parser doesn't seem like the correct place to handle this.

Why? The ELFAsmParser already has .symver handling and is specific to ELF. The benefit of handling there is that all clients get the correct binding. My concern with doing this in the MCStreamer & RecordStreamer is that the RecordStreamer is currently format-agnostic. The MCStreamer has some format specific Emit* methods on it, but they are empty and overridden by the various format-specific object file writers derived via MCObjectStreamer.

The parser is supposed to be just a parser. The streamer is meant to control what to do with each directive. It is fine if it contains special handling for ELF because the RecordStreamer is supposed to be handling all directives that can introduce a symbol reference or definition in the object file, even those which happen to be specific to one object format (in fact, I suspect that we may need to override methods like BeginCOFFSymbolDef in the future).

Fair enough.

If you introduce an EmitELFSymverDirective method on MCStreamer, and extend the AsmSymbol callback to CollectAsmSymbols with something that lets the caller know that this is a symver symbol and the name of the aliasee,

I think what would need to be done is to add this virtual method on the MCStreamer, which by default doesn't do anything, and only implement it for the derived RecordStreamer class. For RecordStreamer, it would need to save the mapping between the alias and aliasee. Then in CollectAsmSymbols, we'd have to walk this saved mapping, and get and apply the same symbol attribute to the alias as applied to its aliasee (either by invoking RecordStreamer::EmitSymbolAttribute, or by applying the same RecordStreamer::State to the aliasee when iterating through the RecordStreamer's Syymbols string map). We also need to get the symbol binding from the IR in the case where the aliasee is defined in IR.

Yes, you could even handle all non-symver symbols first and then handle the symver symbols by walking the partially computed ModuleSymbolTable. (I think it is fine if it uses a linear scan because module asm symbols are uncommon.)

I changed to handling this in the RecordStreamer and CollectAsmSymbols.

I think you can move all handling of this directive into RecordStreamer and the AsmSymbol callback provided by ModuleSymbolTable.

Do you mean move it into the invocation of CollectAsmSymbols from ModuleSymbolTable::addModule (i.e. into it's definition of AsmSymbol)? This will only benefit that one client of CollectAsmSymbols (although that is the one that matters for this bug).

If a caller of CollectAsmSymbols needs 100% accurate binding information, it will effectively need to create a ModuleSymbolTable. I haven't looked very closely at what the other callers are doing though. It may be that they can get by with inaccurate bindings for symvers.

All the handling is in CollectAsmSymbols, none needed in the AsmSymbol callbacks.

lib/Object/ModuleSymbolTable.cpp
112 ↗(On Diff #90707)

Ah, wasn't familiar with cases like that. Since I need to go from the mangled name (in asm) to a potentially unmangled name (in IR), I ended up needing to precompute a map. Added a case like this to the test.

tejohnson updated this revision to Diff 91040.Mar 8 2017, 9:44 AM

Address review comments

pcc accepted this revision.Mar 8 2017, 1:30 PM

LGTM with a few nits.

lib/Object/ModuleSymbolTable.cpp
61 ↗(On Diff #91040)

if (Streamer.symverAliases().empty()) return;

74 ↗(On Diff #91040)

You can make the code a little simpler by putting all globals in the map.

lib/Object/RecordStreamer.h
26 ↗(On Diff #91040)

Can just be a vector of (alias, aliasee) pairs.

test/LTO/X86/symver-asm.ll
19 ↗(On Diff #91040)

You mean so they aren't DCE'd?

This revision is now accepted and ready to land.Mar 8 2017, 1:30 PM

Incidentally, this is no longer dependent on D30657, since the handling moved out of the ELFAsmParser. Will update this patch to remove the dependence, but that is still a good standalone fix.

lib/Object/ModuleSymbolTable.cpp
61 ↗(On Diff #91040)

Good idea, will add

74 ↗(On Diff #91040)

I was trying to reduce the size of the map we create, since in most cases we won't need an entry. I'll go ahead and remove the guard, the map creation shouldn't be common in the first place.

lib/Object/RecordStreamer.h
26 ↗(On Diff #91040)

I prefer to keep it this way so the processing in handleSymverAliases can stay the same (i.e. only look for the Aliasee in the IR once for all aliases).

test/LTO/X86/symver-asm.ll
19 ↗(On Diff #91040)

Right, or more specifically, not linked into the combined module. Will change to "DCE'd during module linking".

tejohnson updated this revision to Diff 91094.Mar 8 2017, 3:49 PM

Address review comments

This revision was automatically updated to reflect the committed changes.