This is an archive of the discontinued LLVM Phabricator instance.

[MC] .addrsig_sym: ignore unregistered symbols
ClosedPublic

Authored by MaskRay on Oct 10 2022, 8:47 PM.

Details

Summary

.addrsig_sym forces registering the symbol regardless whether it is otherwise
registered. This creates an undefined symbol which is inconvenient/undesired:

  • extern int x; void f() { (void)x; } has inconsistent behavior whether x is emitted as an undefined symbol. -O0 -faddrsig makes x undefined while other -O levels and -fno-addrsig eliminate the symbol.
  • In ThinLTO, after a non-prevailing linkonce_odr definition is converted to available_externally, and then a declaration, the addrsig code emits a symbol while the symbol is otherwise unseen.

D135427 fixed a bug that a non-prevailing __cxx_global_var_init was
incorrectly retained. However, the IR declaration causes an undesired
.addrsig_sym __cxx_global_var_init. This can be addressed in a way similar
to D101512 (isTransitiveUsedByMetadataOnly) but the increased
OutStreamer->emitAddrsigSym(getSymbol(&GV)); complexity makes me nervous.
Just ignoring unregistered symbols circumvents the problem.

Diff Detail

Event Timeline

MaskRay created this revision.Oct 10 2022, 8:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2022, 8:47 PM
MaskRay requested review of this revision.Oct 10 2022, 8:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2022, 8:47 PM
MaskRay edited the summary of this revision. (Show Details)Oct 10 2022, 8:48 PM
MaskRay updated this revision to Diff 466694.Oct 10 2022, 9:14 PM

rebase after a Mach-O MC test

chapuni added inline comments.
llvm/test/MC/MachO/addrsig.s
2

It requires aarch64 target?

MaskRay updated this revision to Diff 466699.Oct 10 2022, 10:28 PM

rebase on # REQUIRES: aarch64-registered-target

MaskRay marked an inline comment as done.Oct 10 2022, 10:28 PM
MaskRay added inline comments.
llvm/test/MC/MachO/addrsig.s
2

Thanks. Fixed in a separate commit.

@pcc should also take a look. I'm not very familiar with addrsig but it looks like he added support for it in D47744, including the source line and test that are being deleted here.

  • extern int x; void f() { (void)x; } produces undefined x only in -O0 -faddrsig, other -O levels and -fno-addrsig don't produce the symbol.
  • In ThinLTO, after a non-prevailing linkonce_odr definition is converted to available_externally, and then a declaration, the addrsig code emits a symbol while the symbol is otherwise unseen.

To make sure I understand the above properly, is it the case that both at "-O0 -faddrsig" and "-flto=thin -faddrsig" we get a reference to the symbol, but with "-O"x no ThinLTO we don't? I'm confused as to what was intended.

rnk added a comment.Oct 11 2022, 9:14 AM

If we have this, should we revert D101512 as a code simplification?

Please update the .addrsig docs available at https://llvm.org/docs/Extensions.html#machine-specific-assembly-syntax to document that symbols that are neither defined nor referenced are not recorded as address-significant.

Also, I'm starting to reconsider D135427, which made non-prevailing local comdat members available_externally. Why did you choose available_externally? That is an external linkage. If there are any references to the local linkage symbol, they will now become references to an external symbol with the same name, rather than becoming errors, or relocations against discarded sections. The native linker *discards* non-prevailing comdat members, so we should do that too in complex cases. Using available_externally is something we do to support optimization (inlining).

MaskRay updated this revision to Diff 466895.Oct 11 2022, 12:53 PM
MaskRay marked an inline comment as done.

Update Extensions.rst

MaskRay edited the summary of this revision. (Show Details)Oct 11 2022, 12:55 PM

@pcc should also take a look. I'm not very familiar with addrsig but it looks like he added support for it in D47744, including the source line and test that are being deleted here.

  • extern int x; void f() { (void)x; } produces undefined x only in -O0 -faddrsig, other -O levels and -fno-addrsig don't produce the symbol.
  • In ThinLTO, after a non-prevailing linkonce_odr definition is converted to available_externally, and then a declaration, the addrsig code emits a symbol while the symbol is otherwise unseen.

To make sure I understand the above properly, is it the case that both at "-O0 -faddrsig" and "-flto=thin -faddrsig" we get a reference to the symbol, but with "-O"x no ThinLTO we don't? I'm confused as to what was intended.

Clarified the description with this bullet point now:

* `extern int x; void f() { (void)x; }` has inconsistent behavior whether `x` is emitted as an undefined symbol.
  `-O0 -faddrsig` makes `x` undefined while other -O levels and -fno-addrsig eliminate the symbol.
MaskRay added a comment.EditedOct 11 2022, 1:04 PM

If we have this, should we revert D101512 as a code simplification?

It can be considered a follow-up if this new behavior appears stable in the main branch.

Please update the .addrsig docs available at https://llvm.org/docs/Extensions.html#machine-specific-assembly-syntax to document that symbols that are neither defined nor referenced are not recorded as address-significant.

Done.

Also, I'm starting to reconsider D135427, which made non-prevailing local comdat members available_externally. Why did you choose available_externally? That is an external linkage. If there are any references to the local linkage symbol, they will now become references to an external symbol with the same name, rather than becoming errors, or relocations against discarded sections. The native linker *discards* non-prevailing comdat members, so we should do that too in complex cases. Using available_externally is something we do to support optimization (inlining).

Because marking such a symbol in a non-prevailing COMDAT available_externally is the only available and pratical mechanism, considering that it may be referenced by symbols in the same COMDAT. The symbol cannot just be eliminated (without introducing a GC pass).

External references to local linkage symbols in a non-prevailing COMDAT group is disallowed. The generic ABI says "... References to this symbol table entry from outside the group are not allowed."
(ld.lld error: relocation refers to a symbol in a discarded section: )
This property means that local linkage symbols in COMDAT groups are essentially roots in a reference graph. I think D135427 treatment has sane semantics.

rnk accepted this revision.Oct 11 2022, 1:25 PM

I think this looks good, but make sure @tejohnson's concerns are addressed.

If we have this, should we revert D101512 as a code simplification?

It can be considered a follow-up if this new behavior appears stable in the main branch.

Sounds good

Please update the .addrsig docs available at https://llvm.org/docs/Extensions.html#machine-specific-assembly-syntax to document that symbols that are neither defined nor referenced are not recorded as address-significant.

Done.

Thanks!

Also, I'm starting to reconsider D135427, which made non-prevailing local comdat members available_externally. Why did you choose available_externally? That is an external linkage. If there are any references to the local linkage symbol, they will now become references to an external symbol with the same name, rather than becoming errors, or relocations against discarded sections. The native linker *discards* non-prevailing comdat members, so we should do that too in complex cases. Using available_externally is something we do to support optimization (inlining).

Because marking such a symbol in a non-prevailing COMDAT available_externally is the only available and pratical mechanism, considering that it may be referenced by symbols in the same COMDAT. The symbol cannot just be eliminated (without introducing a GC pass).

External references to local linkage symbols in a non-prevailing COMDAT group is disallowed. The generic ABI says "... References to this symbol table entry from outside the group are not allowed."
(ld.lld error: relocation refers to a symbol in a discarded section: )
This property means that local linkage symbols in COMDAT groups are essentially leaves in a reference graph. I think D135427 treatment has sane semantics.

I think what I'm saying is that yes, such references to local linkage symbols from outside of a comdat are not allowed. In a native link, they result in exactly that error message. How can we ensure that we issue an equivalent linker error in that case during ThinLTO? Anyway, we can discuss further on that review thread.

llvm/docs/Extensions.rst
380

I would elaborate: If sym is not otherwise referenced or defined anywhere else in the file, this directive is a no-op.

This revision is now accepted and ready to land.Oct 11 2022, 1:25 PM
MaskRay updated this revision to Diff 466933.Oct 11 2022, 2:54 PM

update doc

This revision was landed with ongoing or failed builds.Oct 11 2022, 3:07 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/MC/COFF/addrsig.s