This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Suppress GRP_COMDAT deduplication if the signature symbol is STB_LOCAL
AbandonedPublic

Authored by MaskRay on May 22 2021, 7:01 PM.

Details

Summary

The ELF spec says:

GRP_COMDAT: This is a COMDAT group. It may duplicate another COMDAT group in another object file, where duplication is defined as having the same group signature.

The wording "having the same group signature" is not clear. GNU ld/gold/LLD
simply use the symbol name as the deduplication key. Another interpretation is
that we should respect the regular symbol resolution rule and treat a STB_LOCAL
signature different from another object file with the same name.
I think the STB_LOCAL interpretation aligns with the ELF spirit.

Compilers don't produce local signature symbols. However, a local signature
symbol may be created by
objcopy --localize-hidden/--keep-global-symbol or IR internalization (if we
apply D53234). I feel that it is useful to suppress GRP_COMDAT deduplication
when the signature symbol is STB_LOCAL.

Close PR43094

Diff Detail

Event Timeline

MaskRay created this revision.May 22 2021, 7:01 PM
MaskRay requested review of this revision.May 22 2021, 7:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2021, 7:01 PM
MaskRay updated this revision to Diff 347227.May 22 2021, 7:03 PM

fix comment

MaskRay edited the summary of this revision. (Show Details)May 22 2021, 7:05 PM

Whilst I see where you're coming from, I think you've missed out an important part fo the spec's wording:

The name of a symbol from one of the containing object's symbol tables provides a signature for the section group.

The key thing here is that it's the name that forms the signature, not the symbol, which I think means that your comment "The wording "having the same group signature" is not clear" is not correct. It seems pretty clear to me that the name is the thing that's important and nothing else, read as written, so strict reading of the spec says two same-named STB_LOCAL symbols acting as group signatures should be considered the same for deduplication purposes.

We also need to be confident that something isn't going to break whilst making this change. I think it's possible to craft something where this would potentially be the case, although I acknowledge it's a little contrived, so the behaviour change may be acceptable:

# test.s
.section .a,"axG",@progbits,a,comdat
a:
.local a
.hidden a
    ret

# test2.s
.section .a,"ax",@progbits
a:
.weak a
.protected a
    nop
    ret

In this example, the current behaviour when linking as a shared object always produce a protected dynamic symbol for a, but it's marked as undefined if test.o is linked first, whereas it's defined if test2.o was first in link order. If you delete the comdat aspects of the example, a will always be defined in the dynamic symbol table. This change in behaviour could theoretically impact dynamic symbol resolution, if I'm not mistaken, which could be a problem.

That all being said, I do see your point, and there is a good chance STB_LOCAL symbols were not considered when this part of the spec was written. I'd therefore not oppose a discussion on the gABI list suggesting the change. I don't think acting without wider discussion is a good idea though (but could be persuaded otherwise).

lld/test/ELF/comdat-local.s
15

Aside: @thopre, it would be nice if FileCheck would allow this pattern without the need to name the variable something ("_" in this case). The ideal format would be [[#%x]] indicating match any hex format value, much like [[#]] matches any default-format (i.e. decimal) number.

thopre added inline comments.May 24 2021, 2:29 AM
lld/test/ELF/comdat-local.s
15

It's already possible:

% cat chk 
CHECK: FOO[[#%X,]]R
% echo "FOOBAR" | ./bin/FileCheck chk; echo $?
0

I agree with James that the way that the spec is written favour the name of the symbol as the key regardless of whether the symbol is local or not. In particular: "The referenced signature symbol is not restricted. Its containing symbol table section need not be a member of the group, for example."

I'm also sympathetic to the objcopy problem. In our own proprietary tools I can remember disabling a lot of symbol transformations on our equivalent tool to prevent COMDAT semantics from being broken. A particular favourite with STB_LOCAL signature symbols was the transformation to remove all local symbol names as they are not usually significant. Suddenly all the COMDAT groups became the same leading to very strange results. Personally I'd have preferred that ELF had said that COMDAT signature symbols were not allowed to be STB_LOCAL.

I raised a Windows PE-COFF example on https://groups.google.com/g/generic-abi/c/2X6mR-s2zoc/m/fc49QKuNBwAJ
In PE-COFF, a COMDAT with a non-external selection symbol does not deduplicate.

I.e. if we disable STB_LOCAL deduplication for COMDAT, compilers can treat ELF and PE-COFF in the same way.
Otherwise, compiler internalization needs to rename the signature specifically for ELF.

You may look at D103043. llvm/test/Transforms/Internalize/comdat.ll is an example where the ELF behavior is unfortunate.

PE-COFF COMDAT is unfortunate in many aspects (https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html) but is better than ELF in this aspect:)

Just one thing I notieced In D103043

Beside the comdat any deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:

Do you actually need a COMDAT group in all those circumstances, i.e. do you need one group to prevail if there are multiple groups present with the same signature? Is one way to think about this:

  • I can permit multiple instances of the same group in the object, the one-definition rule does not need to hold for this group. In this case don't set GRP_COMDAT for the group, just use a non COMDAT group.
  • I need one COMDAT group to prevail but only within a single object file (STB_LOCAL signature symbol). In this case the object producer (compiler etc.) uniques the group, emits just one non COMDAT group, we achieve the same effect.

In either case, is the ELF behaviour preventing something useful, broken, or just being inconvenient?

While I agree that ELF could have been written differently, we have what has been written and if we deviate from that we risk compatibility problems. Compatibility even to an imperfect spec does have value, I'd personally prefer to stick to what is written at least in the default case. Do we know what the binutils people think? I'd be a lot more comfortable if there is agreement between the tools. Failing that then there is the possibility of a command line option or even a GRP_MASKOS flag to signify that the group should be treated differently.

Just one thing I notieced In D103043

Beside the comdat any deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:

Do you actually need a COMDAT group in all those circumstances, i.e. do you need one group to prevail if there are multiple groups present with the same signature? Is one way to think about this:

  • I can permit multiple instances of the same group in the object, the one-definition rule does not need to hold for this group. In this case don't set GRP_COMDAT for the group, just use a non COMDAT group.
  • I need one COMDAT group to prevail but only within a single object file (STB_LOCAL signature symbol). In this case the object producer (compiler etc.) uniques the group, emits just one non COMDAT group, we achieve the same effect.

In either case, is the ELF behaviour preventing something useful, broken, or just being inconvenient?

While I agree that ELF could have been written differently, we have what has been written and if we deviate from that we risk compatibility problems. Compatibility even to an imperfect spec does have value, I'd personally prefer to stick to what is written at least in the default case. Do we know what the binutils people think? I'd be a lot more comfortable if there is agreement between the tools. Failing that then there is the possibility of a command line option or even a GRP_MASKOS flag to signify that the group should be treated differently.

The Internalize case can use comdat noduplicates. I somehow forgot that it is available in ELF now.
With that, I think making a special case when we already support zero flag is not useful.
All of GNU ld, gold, and ld.lld support zero flag section group for no-deduplication.

I'll file a ticket about objcopy --localize-hidden behavior.

thopre added inline comments.Jun 3 2021, 4:27 AM
lld/test/ELF/comdat-local.s
15–17
MaskRay abandoned this revision.Jun 7 2021, 1:25 PM

See https://groups.google.com/g/generic-abi/c/2X6mR-s2zoc
I'll abandon this patch but add test coverage. Here is the test I want to add:

comdat-local.s

# REQUIRES: x86
## COMDAT groups are deduplicated by the name of the signature symbol.
## The local/global status is not part of the equation.

# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t.o
# RUN: ld.lld %t.o %t.o -o %t
# RUN: llvm-readelf -s -x .zero -x .comdat %t | FileCheck %s

# CHECK:      NOTYPE LOCAL DEFAULT [[#A:]] zero
# CHECK-NEXT: NOTYPE LOCAL DEFAULT [[#]]   comdat
# CHECK-NEXT: NOTYPE LOCAL DEFAULT [[#A]]  zero
# CHECK-NOT:  {{.}}

## Non-GRP_COMDAT groups are never deduplicated.
# CHECK:      Hex dump of section '.zero':
# CHECK-NEXT: [[#%x,]] 0202

## GRP_COMDAT groups are deduplicated.
# CHECK:      Hex dump of section '.comdat':
# CHECK-NEXT: [[#%x,]] 01 .{{$}}

.section .zero,"aG",@progbits,zero
zero:
  .byte 2

.section .comdat,"aG",@progbits,comdat,comdat
comdat:
  .byte 1

Test suggestion looks reasonable to me, although I might suggest one slight tweak, namely to use --implicit-check-not rather than CEHCK-NOT after the symbol check. In the latter case, there's nothing stopping the symbol appearing before the set that you are checking.