Page MenuHomePhabricator

[MC][ELF] Make linked-to symbol name part of ELFSectionKey
ClosedPublic

Authored by MaskRay on Tue, Feb 4, 4:01 PM.

Details

Summary

https://bugs.llvm.org/show_bug.cgi?id=44775

This rule has been implemented by GNU as https://sourceware.org/ml/binutils/2020-02/msg00028.html (binutils >= 2.35)

It allows us to simplify

.section .foo,"o",foo,unique,0
.section .foo,"o",bar,unique,1  # different section

as:

.section .foo,"o",foo
.section .foo,"o",bar  # different section

We consider the two .foo different even if the linked-to symbols foo and bar
are defined in the same section. This is a deliberate choice so that we don't
need to know the section where foo and bar are defined beforehand.

Diff Detail

Event Timeline

MaskRay created this revision.Tue, Feb 4, 4:01 PM
Herald added a project: Restricted Project. · View Herald Transcript

Unit tests: unknown.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

MaskRay marked an inline comment as done.Tue, Feb 4, 4:12 PM
MaskRay added inline comments.
llvm/lib/MC/MCContext.cpp
322–323

As a note: this is only used by renaming .debug_* to .zdebug_*, which don't do SHF_LINK_ORDER magic. So using an empty linked-to symbol name will be fine.

As I understand it, .zdebug* are deprecated. ELF toolchain moves to SHF_COMPRESSED, so we don't need to add more support to .zdebug*.

It looks reasonable to me. Two minor nits are below, the rest looks probably fine.
Lets see what other people think.

llvm/include/llvm/MC/MCContext.h
217

Perhaps just:

if (UniqueID  != Other.UniqueID)
  return UniqueID < Other.UniqueID;
return LinkedToName < Other.LinkedToName;
llvm/lib/MC/MCContext.cpp
322–323

It deserves a comment in the code I think.

MaskRay updated this revision to Diff 242522.Wed, Feb 5, 12:42 AM
MaskRay marked 2 inline comments as done.

Add a comment

llvm/include/llvm/MC/MCContext.h
217

The order is so that it matches the field order. Name fields placing together looks better, so I want to stick with the current order.

I can do a follow-up change to use StringRef::compare for SectionName and GroupName.

grimar added inline comments.Wed, Feb 5, 12:52 AM
llvm/include/llvm/MC/MCContext.h
217

Up to you. Not a big problem anyways.

Unit tests: unknown.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

I think that this patch removes the need for uniquing sections for symbols with associated symbols which are explicitly assigned to a section name e.g. via attribute((section name)).

I have updated the comment in https://reviews.llvm.org/D72194 to say that. It would be a nice follow up change to remove the UniqueID added to sections for symbols with COMDATs/associated symbols that are explicitly assigned a section name.

llvm/include/llvm/MC/MCContext.h
199

Here start using "linked_to" terminology instead of "associated". Should we stick to just one?

This is a deliberate choice so that we don't need to know the section where foo and bar are defined beforehand.

Just want to make sure this is well defined with respect to the description in ELF.

SHF_LINK_ORDER
     This flag adds special ordering requirements for link editors. The requirements apply if the sh_link field of this section's header references another section (the linked-to section). If this section is combined with other sections in the output file, it must appear in the same relative order with respect to those sections, as the linked-to section appears with respect to sections the linked-to section is combined with.

A typical use of this flag is to build a table that references text or data sections in address order.

If we have two sections S1, S2, with a SHF_LINK_ORDER on the same section D, what is their ordering relationship between S1 and S2? If foo and bar are in the same section and we don't create two sections then we don't get this problem as we'd only have one S with a sh_link to D. I think that if the author is trying to use SHF_LINK_ORDER to build a table that references text or data sections in address order then they may get surprised if S1 and S2 are not in the order of foo and bar. True that if one section where generated they'd have to ensure that the contents of S where in the same relative order to foo and bar in D but at least they would know the linker couldn't alter it.

MaskRay marked an inline comment as done.Wed, Feb 5, 9:50 AM
MaskRay added inline comments.
llvm/include/llvm/MC/MCContext.h
199

"linked-to section" is used by the ELF spec. By analogy, I use "linked-to symbol" here (omitted "Sym" in "LinkedToName"). "linked-to" implies a directed edge and makes it clear its relation with "sh_link", while one can argue that "associated" means an undirected edge. I think we should rename MCSymbol related "Associated" to "LinkedToSym". The IR level "!associated" can be left unchanged. I can do that in a separate change.

This is a deliberate choice so that we don't need to know the section where foo and bar are defined beforehand.

Just want to make sure this is well defined with respect to the description in ELF.

SHF_LINK_ORDER
     This flag adds special ordering requirements for link editors. The requirements apply if the sh_link field of this section's header references another section (the linked-to section). If this section is combined with other sections in the output file, it must appear in the same relative order with respect to those sections, as the linked-to section appears with respect to sections the linked-to section is combined with.

A typical use of this flag is to build a table that references text or data sections in address order.

If we have two sections S1, S2, with a SHF_LINK_ORDER on the same section D, what is their ordering relationship between S1 and S2? If foo and bar are in the same section and we don't create two sections then we don't get this problem as we'd only have one S with a sh_link to D. I think that if the author is trying to use SHF_LINK_ORDER to build a table that references text or data sections in address order then they may get surprised if S1 and S2 are not in the order of foo and bar. True that if one section where generated they'd have to ensure that the contents of S where in the same relative order to foo and bar in D but at least they would know the linker couldn't alter it.

"Two sections with the same sh_link" this scenario is probably not clearly specified. I will expect a stable sort by sh_link. For the following part of the added test, after linking,

.section .foo,"o",@progbits,foo  # first in the output section .foo due to stable sort
.byte 0

.section .foo,"o",@progbits,bar  # second
.byte 1

This is what lld currently does. It will be nice to get a sign-up from GNU ld. An alternative choice is that users should not rely on the order. I'm inclined to define an order here.

MaskRay added a comment.EditedWed, Feb 5, 10:07 AM

Another thing is whether we should use sh_flags/sh_type to differentiate sections. In addition, whether we should use sh_entsize.
If yes, the implementation can be a bit clumsy. We may have to add many Elf*_Shdr fields to ELFSectionKey.
If no (GNU as behavior), we should add warnings. D73999 will do that.

The section group name is part of the key in both MC and GNU as. By analogy, SHF_LINK_ORDER+sh_link should probably be part of the key.

Field not specified by the .section directive (sh_addrline,sh_addr,sh_offset,sh_info) should definitely not be part of the key.

I have a slight preference for "no", though I don't currently have a particular good reason why "sh_entsize" should not be part of a key. Maybe we are fine with current .rodata.cst16 naming.

.section .foo,"o",@progbits,foo  # first in the output section .foo due to stable sort
.byte 0

.section .foo,"o",@progbits,bar  # second
.byte 1

This is what lld currently does. It will be nice to get a sign-up from GNU ld. An alternative choice is that users should not rely on the order. I'm inclined to define an order here.

I don't believe that assembler nor linker should guarantee any orders here.

Another thing is whether we should use sh_flags/sh_type to differentiate sections. In addition, whether we should use sh_entsize.
If yes, the implementation can be a bit clumsy. We may have to add many Elf*_Shdr fields to ELFSectionKey.
If no (GNU as behavior), we should add warnings. D73999 will do that.

The section group name is part of the key in both MC and GNU as. By analogy, SHF_LINK_ORDER+sh_link should probably be part of the key.

Field not specified by the .section directive (sh_addrline,sh_addr,sh_offset,sh_info) should definitely not be part of the key.

I have a slight preference for "no", though I don't currently have a particular good reason why "sh_entsize" should not be part of a key. Maybe we are fine with current .rodata.cst16 naming.

An alternative design to this patch is for the assembler to *never* create more than one section with a given name (apart from for COMDATs because that is established behaviour we can't change) unless the ,unique, N, extension is used. However, the assembler can give warnings as in D73999. This is really simple for users to understand - the rule is you have to choose a unique name for your sections.

llvm/include/llvm/MC/MCContext.h
199

SGTM.

MaskRay marked 2 inline comments as done.Wed, Feb 5, 11:54 AM
MaskRay added inline comments.
llvm/include/llvm/MC/MCContext.h
199

Created D74082 to rename Associated to LinkedToSym

MaskRay updated this revision to Diff 243453.Sun, Feb 9, 10:08 AM
MaskRay marked an inline comment as done.

Rebase after D74082 was committed

My current thoughts regarding (.section directives with the same name but different fields)
are similar to binutils' (eventual) state:

  • sh_flags or sh_type: warn
  • sh_link/group name: no warning. produce separate sections
  • sh_entsize due to SHF_MERGE: still controversial. I am slightly inclined to require ,unique
MaskRay added a subscriber: psmith.Thu, Feb 13, 9:13 PM

Address review comments.

I tend to agree with Alan Modra's arguement in
https://sourceware.org/ml/binutils/2020-02/msg00093.html

I don't think so. User assembly often gets section attributes wrong
or leaves them off entirely for special sections known to the
assembler. ie. the first .section .foo above is a built-in rather
than user input.

Add more folks for feedback.

Please see my binutils post. For a .section directive with the same name but a different field. If the field is:

  • sh_flags or sh_type: warn
  • sh_link due to SHF_LINK_ORDER: no warning. produce separate sections
  • sh_entsize due to SHF_MERGE: still controversial

I agree with warning for sh_flags, I think there is too much legacy code out there that would behave differently.
I agree with sh_link producing separate sections. In an ideal world no-one writes this by hand in assembly.
For sh_entsize SHF_MERGE, this is informing the linker that it can produce an optimisation, which it is permitted to ignore. I tend towards an error message if it is implementable as it is a strong message to fix the assembler. The next best thing is a warning then clearing SHF_MERGE and setting sh_entsize to 0. I don't think a warning on its own is safe.

Does this patch look good? The consensus is that we should make sh_link producing separate sections.

MaskRay updated this revision to Diff 244569.Thu, Feb 13, 9:38 PM

Update 4 __patchable_function_entries tests.

The case no-function-sections is very similar to function-sections case now.
A future clean-up can delete ,unique from the tests but that will require some interface changes.

MaskRay edited the summary of this revision. (Show Details)Thu, Feb 13, 9:42 PM

I'm happy with this patch, I think it is a fair reflection of the consensus, I'll let others approve when they have had a chance to read through in the US timezone.

This revision was not accepted when it landed; it landed in state Needs Review.Fri, Feb 14, 8:07 PM
This revision was automatically updated to reflect the committed changes.