Page MenuHomePhabricator

[MC][ELF] Ensure that mergeable globals with an explicit section are assigned to SHF_MERGE sections with compatible entsizes
Needs ReviewPublic

Authored by bd1976llvm on Jan 3 2020, 6:48 PM.

Details

Summary

Ensure that symbols that are explicitly assigned a section name are placed into a section with compatible entry sizes. This affects mergeable symbols and non-mergeable symbols that are explicitly assigned to a generic mergeable section name.

Design:

The basic mechanism is to create multiple sections with the same name, using LLVM's ", unique," assembly extension (I refer to such sections as unique or uniqued sections) if incompatible symbols are explicitly given the same section name.

The simplest way to implement this would be to put each symbol into its own uniqued section. However, this would increase the size of the generated object files and it would mean that the unique section extension, which is not supported by other assemblers, (although support is being implemented, https://sourceware.org/bugzilla/show_bug.cgi?id=25380) would be used when it is not required.

Instead the implementation:

  1. Avoids using uniqued sections where possible.
  2. Tries to create as few SHF_MERGE sections as possible.

The generic section for a given section name is considered mergeable if the name is a mergeable default section name (such as .debug_str), a mergeable implicit section name (such as .rodata.cst4), or MC has already created a mergeable generic section for the given section name in response to a section directive in assembler. Otherwise, non-mergeable symbols are assigned to the generic section for a given section name and mergeable symbols are assigned to uniqued sections.

Symbols with a an explicit assignment to a section name may already be put into uniqued sections to handle COMDATs. The COMDAT uniquing takes precedence; which can make the UniqueID in the generated assembly redundant. This is an outstanding problem with the uniqing mechanism and I have just accepted the redundancy in this patch - it isn't harmful, just confusing. Update: There will be a similar issue for associated symbols once https://reviews.llvm.org/D74006 lands.

Terminology:

  • default sections are those always created by MC initially, e.g. .text or .debug_str
  • implicit section are those created normally by the MC in response to the symbols that it encounters, i.e. in the absence of an explicit section name assignment on the symbol.
  • generic sections are those sections that will be referred to when a uniqueID is not supplied. Typically these are the first section of a given name to be created. Default sections are always generic.

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=43457.

See https://reviews.llvm.org/D68101 for previous discussions leading to this patch.

Diff Detail

Event Timeline

bd1976llvm created this revision.Jan 3 2020, 6:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 3 2020, 6:48 PM
rjmccall added inline comments.
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
665

This looks like it should work. I don't know the MC code design well enough to have an opinion about whether it's better to handle this here or in getELFSection.

701

Is this assertion intentionally allowing Section to be non-mergeable? I feel like this probably should guarantee to return a mergeable section if that's what was requested.

From https://reviews.llvm.org/D68101#1806135, I think a better approach would be to delay the emission of the "M" merge-able section flag unless all entries had the same size.

llvm/test/CodeGen/X86/explict-section-mergeable.ll
11
- ;; the same name
+ ;; the same name, but different entity sizes.
13

I don't understand the ,unique,1 part. Can you clarify?
https://sourceware.org/binutils/docs/as/Section.html

llvm/test/CodeGen/X86/explict-section-mergeable2.ll
10 ↗(On Diff #236160)

ditto.

llvm/include/llvm/CodeGen/TargetLoweringObjectFileImpl.h
17 ↗(On Diff #236160)

probably should include llvm/ADT/StringRef.h, too. IWYU.

bd1976llvm retitled this revision from WIP alternative approach for D68101 to [MC][ELF] Ensure that mergeable globals with an explicit section are assigned to SHF_MERGE sections with compatible entsizes.Jan 13 2020, 5:13 PM
bd1976llvm edited the summary of this revision. (Show Details)
bd1976llvm marked 4 inline comments as done.Jan 13 2020, 5:28 PM
bd1976llvm added inline comments.
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
665

It is better to handle this here, associated symbols are handled in a similar manner.

llvm/test/CodeGen/X86/explict-section-mergeable.ll
13

Hi Nick,

The "unique 1" here is an LLVM extension to allow for multiple sections with the same name to be created and referenced in assembly.
The GNU assembler only allows the creation of a single section for a given name.. if you try to create a second section with the same name as a previous section but different properties then the assembler instead switches the current section back to the original section and ignores the properties you specified.

Updated the diff to address review comments and expand test coverage.

rjmccall resigned from this revision.Jan 13 2020, 7:43 PM

Okay. This seems conceptually reasonable to me, but I don't feel qualified to review the code, sorry.

Thanks for the extensive tests. I might triple check the entity sizes to make sure they're correct, but the tests look pretty exhaustive to me and make sense.

llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
701

Right, shouldn't this be:

if (Flags & ELF::SHF_MERGE) {
  assert((Section->getFlags() & ELF::SHF_MERGE) &&
        "Merge-able section requested but not produced");
  assert(Section->getEntrySize() == getEntrySizeForKind(Kind) &&
        "Mismatched merge-able section entity sizes");
}

?

Also, it might be nice to hoist the multiple calls to getEntrySizeForKind(Kind) into one variable rather than have 3 calls at different scopes.

llvm/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp
82

Are the ORC test changes related to this change?

This is a partial fix for

Is this still considered a partial fix? Seems pretty complete to me, otherwise what's still missing?

bd1976llvm marked an inline comment as done.Jan 14 2020, 12:07 PM
bd1976llvm added inline comments.
llvm/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp
82

Yes. In the original test the user has explicitly assigned a 32bit integer into the .debug_str section (GV->setSection(".debug_str")). The .debug_str section is a special section reserved for the system with fixed properties. MC creates a .debug_str section internally before looking at any of the globals as a mergeable section containing ASCII strings. The attempt to assign the integer foo to .debug_str ends up creating two .debug_str sections with the second one being a mergeable section with its entsize compatible with 32bit integers with the patch.

I traced the history of the test and I believe that using an integer is simply a mistake and that the original unit test author should have put strings into .debug_str. So, I updated the unit tests to do that.

This is a partial fix for

Is this still considered a partial fix? Seems pretty complete to me, otherwise what's still missing?

Having thought about it I am happy with this being the full fix. I was of the opinion that we should also add diagnostics for when a symbol has been explicitly assigned to an incompatible section. For example putting a function in the data section (.data). However, you can easily do this at link time and no diagnostic will be emitted. So I am now of the opinion that no diagnostics are actually required and I am happy to close the bug once this patch has landed.

Having thought about it I am happy with this being the full fix

Me too. Please amend the patch description above to say so, rather than *partial fix*. Also, please address the two comments I've "bumped."

llvm/include/llvm/CodeGen/TargetLoweringObjectFileImpl.h
17 ↗(On Diff #236160)

bump.

llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
701

bump

llvm/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp
82

Thanks for the context, please add it to the patch description above so that others have context when looking at git log. It may also be worthwhile to subscribe the tests' authors to this change, which I've done (@lhames ).

Thanks, I'll address the points that you have raised.
I tried updating the assert and rebasing the patch but the assert is now firing in the It tests. I will have to investigate this tomorrow and update the patch then.

lhames added inline comments.Jan 14 2020, 6:08 PM
llvm/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp
82

This test just needed to put something into a debug section so that the section would be forced though the JIT memory manager. No care was taken with types. ;)

This fix looks good to me.

Is this worth merging into 10.0 if it lands after the branch?

MaskRay added inline comments.
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
657

We should check Ctx.getAsmInfo()->useIntegratedAssembler() and not use the linkage ",unique" for -no-integrated-as.

I created a feature request a few days ago https://sourceware.org/bugzilla/show_bug.cgi?id=25380

Is this worth merging into 10.0 if it lands after the branch?

Yes please.

bd1976llvm marked an inline comment as done.Jan 16 2020, 3:03 AM
bd1976llvm added inline comments.
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
657

@MaskRay - Thanks for the comments. This was my original design - see comments in https://reviews.llvm.org/D68101.

However, I now think that we should land this patch in its current state and file a bug for supporting -no-integrated-as properly. The ",unique" LLVM extension is used in other places e.g. for comdats (with no-unique-names) and for MD_associated. What do you think?

Also, I think that we should probably consider if the design of the ",unique" extension is correct. It seems a bit of a clumsy extension. Some ideas:

  1. Maybe it it would be better to have an index scheme to select between sections with the same name rather than being able to apply a unique number to any section.
  2. We could have an alias scheme so that you can use a unique name for a section in assembly but specify a real name when the section is emitted into an object file.
  3. I think it might be nicer to have a new section directive e.g. .unique_section to support multiple sections with the same name rather than the current unique syntax?

I rebased the patch and addressed the review comments. I also noticed that there was a problem with the patch when explicitly placing non-mergeable sections into one of the "default" mergeable sections (e.g. .debug_str). As these "default" mergeable sections were not being added to the map it wasn't possible to explicitly place symbols into these even if the symbols entsize was compatible (as they would be put into a uniqued section instead). This is not acceptable for these "default" sections since for some of these like e.g. .debug_str, current binutils consumers make the assumption that there can only be one of these sections. The fix for this is WIP and I haven't added LIT tests for it yet; but, I have put up the patch in WIP state to get comments on the approach taken.

llvm/include/llvm/MC/MCContext.h
273

are there getters/setters for this member? Might be better than exposing the raw member.

llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
102

can a range based for loop be used here?

106

i shadows another i from above, and also isn't used.

645

LLVM coding style is just to use unsigned (drop the optional int) as per the Flags variable above.

MaskRay added inline comments.Jan 18 2020, 2:35 PM
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
657

,unique,2 does not look bad to me. It allows you to use a unified syntax for .section and .pushsection (see ELFAsmParser::ParseSectionArguments). The unique number can be potentially useful if we ever support a syntax to reference name+unique_id (this use case can usually be accomplished by a temporary local label).

However, I now think that we should land this patch in its current state and file a bug for supporting -no-integrated-as properly.

I think you mean .c ---clang--> .s ---GNU as---> .o

I am concerned that clang -no-integrated-as is not so uncommon. For example, ClangBuiltLinux uses it (assembly support is still a bit lacking). Hope you can add a check.

MD_associated is a bit different. It has a strong requirement: the global variable is in a @llvm.compiler.used
It can still work without garbage collection semantics of SHF_LINK_ORDER.

Please chime in the binutils feature requests https://sourceware.org/bugzilla/show_bug.cgi?id=25380 https://sourceware.org/bugzilla/show_bug.cgi?id=25381 https://sourceware.org/ml/binutils/2019-11/msg00266.html :)

bd1976llvm marked 5 inline comments as done.

I have updated the patch. It now handles both "default" sections and "implicit" sections ("implicit" sections are those that are created by MC normally as it encounters symbols). I have improved the test coverage, organised the tests and improved the comments for the test cases. Sorry it took a while to update the patch.. this was a more difficult change than I had anticipated. I have tried a few different approaches, I am not entirely happy with the current implementation but it is difficult to simply further.

bd1976llvm marked 5 inline comments as done.Thu, Jan 23, 6:48 PM
bd1976llvm added inline comments.
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
657

I think you mean .c ---clang--> .s ---GNU as---> .o

Yes I did. My understanding was that this is the usecase for "clang -no-integrated-as". Are there other usecases?

I am concerned that clang -no-integrated-as is not so uncommon. For example, ClangBuiltLinux uses it (assembly support is still a bit lacking).
Hope you can add a check.

I can ad a check however the only fix that I have for -no-integrated-as is to remove the SHF_MERGE flag from sections that symbols have been explicitly assigned to as in the https://reviews.llvm.org/D68101 patch. Is this acceptable?

bd1976llvm updated this revision to Diff 242492.Tue, Feb 4, 6:32 PM
bd1976llvm edited the summary of this revision. (Show Details)

updated the patch and description

bd1976llvm edited the summary of this revision. (Show Details)Wed, Feb 5, 5:00 AM
bd1976llvm added reviewers: MaskRay, maskray0.
MaskRay added a comment.EditedWed, Feb 5, 10:55 PM

I saw this a couple of weeks ago. Apologies that I did not pay enough attention. (I complained that GNU as lacks features. Things are changing (I'm glad that my __patchable_function_entries complaints worked:) )

https://bugs.llvm.org/show_bug.cgi?id=43457: If we did two passes in AsmPrinter, we probably could decide sh_entsize in the first pass.
In the second pass, we emit the two global variables to the same section. In reality we have just one pass, so we need to decide the section when we are about to emit a global variable, without knowing whether there will be another section with a different sh_entsize.
This is the conundrum faced by the -filetype=asm backend.

For the -filetype=obj backend, it is simple: when there are conflicting sh_entsize fields => change sh_entsize to 0 (disabling merging).
In GNU ld and gold, if input sections are of different sh_entsize, no merge is performed.
lld performs limited merging (https://github.com/llvm/llvm-project/blob/master/lld/test/ELF/merge-entsize2.s) but I doubt it has much value. (I noticed that explict-section-mergeable.ll output can crash lld due to a linked-to section being a SHF_MERGE section. I'll try fixing it tomorrow.)

If we can't afford doing two passes for -filetype=asm. Another choice is to update AsmParser. When two .section directives with different sh_entsize are parsed, consider them the same section and update sh_entsize to 0.
This solves a problem and is very simple. The assembler just does some tasks that the linker will eventually do, so I'll probably not call it a hack.
However, if we want to do that, we may need to ask GNU as maintainers.

I realized that there are several assembly syntax issues we need to discuss with GNU as maintainers. <del>We should start a thread.</del> Created https://sourceware.org/ml/binutils/2020-02/msg00091.html

llvm/lib/MC/MCContext.cpp
477

Instead of find(), you can use try_emplace. The return value will tell you whether a new element is inserted. If yes, you can increase the unique ID.

482
error: conditional expression is ambiguous; 'pair<[...], typename __decay_and_strip<unsigned int>::__type>' can be converted to 'pair<[...], typename __decay_and_strip<int>::__type>' and vice versa

0 -> 0u

llvm/test/CodeGen/X86/explict-section-mergeable.ll
6

There are a plethora of CHECK-NOT patterns which make the test difficult to read.

FileCheck %s --implicit-check-not=.section may be useful