This is an archive of the discontinued LLVM Phabricator instance.

Add GNU attribute 'retain'
AbandonedPublic

Authored by MaskRay on Feb 16 2021, 10:36 PM.

Details

Summary

For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a
__attribute__((retain)) function/variable, if the configure-time GNU as
supports SHF_GNU_RETAIN (binutils 2.36 or above on Linux or FreeBSD).

'used' and 'retain' are orthogonal on ELF targets: 'used' does not
prevent the section from being discarded by ld --gc-sections.
(
On Windows, 'used' is coarse-grained. An attribute on a variable(function)
causes all variables(functions) in the object file to be retained.
)

As an example, to notify the fuzzer engine "American Fuzzy Lop" about persistent mode,
the user may define an unreferenced variable:

__attribute__((used, retain, section(".data.PERSIST_SIG")))
static char AFL_PERSISTENT[] = "##SIG_AFL_PERSISTENT##";

Without 'used', the definition can be discarded by the compiler. Without
'retain', the definition can be discarded by ld --gc-sections.

(
Before 'retain', previous ELF solutions require inline asm or linker tricks, e.g.
asm volatile(".reloc 0, R_X86_64_NONE, .data.DEFER_SIG"); (architecture dependent)
or define a non-local symbol in the section and use ld -u.
There was no elegant source-level solution.
)

Because 'used' and 'retain' will usually be used together, we
deliberately allow 'retain' on non-ELF targets to avoid -Wunknown-attributes.
We don't want users to write code dispatching on binary formats.

This patch sets !retain metadata on RetainAttr definitions. The backend is
responsible for setting SHF_GNU_RETAIN for ELF (if integrated assembler or
-fbinutils-version=2.36 or newer) and ignores the metadata for other binary
formats.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 16 2021, 10:36 PM
MaskRay requested review of this revision.Feb 16 2021, 10:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2021, 10:36 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MaskRay edited the summary of this revision. (Show Details)Feb 16 2021, 10:49 PM
MaskRay edited the summary of this revision. (Show Details)Feb 16 2021, 10:50 PM
MaskRay added reviewers: rjmccall, rsmith, phosek, rnk.
phosek added inline comments.Feb 17 2021, 12:10 AM
clang/lib/CodeGen/CodeGenModule.cpp
2074

Would it be possible to also include Fuchsia?

MaskRay updated this revision to Diff 324504.Feb 17 2021, 7:45 PM
MaskRay retitled this revision from CodeGen: Set !retain metadata on UsedAttr definitions for Linux/FreeBSD to CodeGen: Set !retain metadata on UsedAttr definitions for FreeBSD/Fuchsia/Linux.

Add Fuchsia

arichardson added inline comments.Feb 18 2021, 1:10 AM
clang/lib/CodeGen/CodeGenModule.cpp
2074

Why does the IR metadata depend on the OS? Shouldn't this decision happen later? Alternatively maybe the check should be for ELF?

MaskRay updated this revision to Diff 325233.Feb 20 2021, 12:50 PM
MaskRay retitled this revision from CodeGen: Set !retain metadata on UsedAttr definitions for FreeBSD/Fuchsia/Linux to Add GNU attribute 'retain'.
MaskRay edited the summary of this revision. (Show Details)

Add 'retain' instead (latest GCC resolution)

MaskRay marked 2 inline comments as done.Feb 20 2021, 12:50 PM
phosek accepted this revision.Feb 20 2021, 5:06 PM

LGTM

clang/test/CodeGen/attr-retain.c
12

There are no FileCheck --check-prefixes=NORETAIN invocations so this is unused.

22

Would it be possible to also include negative check for g2?

This revision is now accepted and ready to land.Feb 20 2021, 5:06 PM
aaron.ballman added inline comments.Feb 21 2021, 8:20 AM
clang/include/clang/Basic/Attr.td
2655

Should this be a target-specific attribute as it only has effects on ELF targets?

clang/lib/Sema/SemaDecl.cpp
2834

I realize you're following the same pattern as the used attribute, but.. why is this code even necessary? The attributes are already marked as being inherited in Attr.td, so I'd not expect to need either of these. (If you don't know the answer off the top of your head, that's fine -- I can investigate on my own.)

clang/test/Sema/attr-retain.c
4

Oof, this is not a particularly friendly diagnostic (there's no mention of *why* the attribute is ignored, so it's hard to know how to respond to the diagnostic as a user).

9

Can you also add a test case showing the attribute does not accept any arguments?

MaskRay updated this revision to Diff 325336.Feb 21 2021, 1:00 PM
MaskRay marked 5 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

Comments

clang/include/clang/Basic/Attr.td
2655

As I understand it, GCC retain is not warned on unsupported targets.

Regardless of GCC's choice, I think not having a warning: unknown attribute 'retain' ignored [-Wunknown-attributes] diagnostic makes sense. retain will be usually used together with used. In Clang, used already has "retain" semantics on macOS and Windows (I don't know what they do on GCC; GCC folks want orthogonality for ELF and I agree). If retain is silently ignored on macOS and Windows, then users don't need to condition compile for different targets.

clang/lib/Sema/SemaDecl.cpp
2834

Without setInherited, attr-decl-after-definition.c will fail. (I don't know why..) So I will keep the code but add a test case using the variable tentative.

clang/test/CodeGen/attr-retain.c
22
// CHECK-NEXT: @g1 ={{.*}} global i32 {{.*}} !retain ![[#EMPTY]]
// CHECK-NEXT: @g3 = internal global i32 {{.*}} !retain ![[#EMPTY]]

excludes accidental g2.

clang/test/Sema/attr-retain.c
4

I created the test based on the existing attr-used.c. Let me figure out how to make the diagnostic friendlier...

MaskRay marked an inline comment as done.Feb 21 2021, 1:11 PM

GCC 11 hasn't been released yet, so can we still engage with GCC about the semantics of this attribute? This is a very low-level attribute, and I don't understand why it was made separate instead of just having used add the appropriate section flag on targets that support it.

MaskRay added a comment.EditedFeb 21 2021, 7:02 PM

GCC 11 hasn't been released yet, so can we still engage with GCC about the semantics of this attribute? This is a very low-level attribute, and I don't understand why it was made separate instead of just having used add the appropriate section flag on targets that support it.

GCC did overload used with the linker garbage collection semantics. Then after discussions (e.g. https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565478.html) they decided to pick the earliest suggestion: add retain.

This makes sense to me: used can emit a function which is only used by inline assembly, but the user intention can still be that the whole thing can be GCable.

IIUC macOS's always enabled .subsections_via_symbols means always -ffunction-sections/-fdata-sections, so there is no GC preciseness concern.
However, the current Windows behavior (.drectve include a linker option to force retaining the whole section) can unnecessarily retain the full section, in -fno-function-sections or -fno-data-sections mode.

GCC 11 hasn't been released yet, so can we still engage with GCC about the semantics of this attribute? This is a very low-level attribute, and I don't understand why it was made separate instead of just having used add the appropriate section flag on targets that support it.

GCC did overload used with the linker garbage collection semantics. Then after discussions (e.g. https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565478.html) they decided to pick the earliest suggestion: add retain.

This makes sense to me: used can emit a function which is only used by inline assembly, but the user intention can still be that the whole thing can be GCable.

There's definitely some merit to distinguishing the cases here — used doesn't just force the object to be emitted, it also forces the compiler to treat it as if there might be accesses to the variable / function that it can't see. Thus, for example, a non-const used variable has to be assumed to be potentially mutated in arbitrary ways. I guess my concern is that this new attribute still doesn't feel like it covers the use cases very well, especially because used does have long-established semantics of affecting the linker on non-ELF targets. Basically, to get "full strength" semantics that affect the compiler and linker on all targets, you have to add a new attribute; to get "partial strength" semantics that only affect the compiler, you can use the old attribute on ELF, but there's no way to spell it on non-ELF; and if you wanted to get "weak" semantics (e.g. to force the symbol to be preserved through linking without broadly disabling the optimizer), there's no way to spell that anywhere. Seems like if we're going to the trouble of adding an attribute, that's not the situation we want to end up in.

IIUC macOS's always enabled .subsections_via_symbols means always -ffunction-sections/-fdata-sections, so there is no GC preciseness concern.

Yes, the MachO linker will aggressively dead-strip, and used prevents that.

However, the current Windows behavior (.drectve include a linker option to force retaining the whole section) can unnecessarily retain the full section, in -fno-function-sections or -fno-data-sections mode.

But that Windows behavior is a limitation of the format rather than an intent to provide different semantics on different targets, I'd say.

MaskRay added a comment.EditedFeb 21 2021, 7:55 PM

GCC 11 hasn't been released yet, so can we still engage with GCC about the semantics of this attribute? This is a very low-level attribute, and I don't understand why it was made separate instead of just having used add the appropriate section flag on targets that support it.

GCC did overload used with the linker garbage collection semantics. Then after discussions (e.g. https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565478.html) they decided to pick the earliest suggestion: add retain.

This makes sense to me: used can emit a function which is only used by inline assembly, but the user intention can still be that the whole thing can be GCable.

There's definitely some merit to distinguishing the cases here — used doesn't just force the object to be emitted, it also forces the compiler to treat it as if there might be accesses to the variable / function that it can't see. Thus, for example, a non-const used variable has to be assumed to be potentially mutated in arbitrary ways. I guess my concern is that this new attribute still doesn't feel like it covers the use cases very well, especially because used does have long-established semantics of affecting the linker on non-ELF targets. Basically, to get "full strength" semantics that affect the compiler and linker on all targets, you have to add a new attribute;

Yes.

to get "partial strength" semantics that only affect the compiler, you can use the old attribute on ELF, but there's no way to spell it on non-ELF; and if you wanted to get "weak" semantics (e.g. to force the symbol to be preserved through linking without broadly disabling the optimizer), there's no way to spell that anywhere.

The semantics can be represented on non-ELF, as long as the binary format supports multiple sections of the same name.

Garbage collection is done at section level on both ELF and PE-COFF. On ELF, GCC developers decided that a single used making the monolithic .data/.text retained loses GC preciseness and is not a good tradeoff. (I agree)

So they let the used object be placed in a separate section. There may be multiple sections of the same name, even in -fno-function-sections -fno-data-sections mode.

.section .text,"ax",@progbits
...
.section .text,"axR",@progbits,unique,1
...

This uncovered a Linux kernel problem (I'd say it is that the Linux kernel's linker scripts relies on too much on a not-entirely-guaranteed behavior) http://gcc.gnu.org/PR99113
Their later discussion reconsidered the original retain idea as it made for better separation of concerns.

If PE-COFF wants to pick up retain, it can do that for external linkage symbols: let retain create a separate section and add /INCLUDE:foo to .drectve.
For an internal linkage symbol, I think that is likely non-representable in the binary format. (maybe rnk can correct me:) )
If PE-COFF decides to implement retain, perhaps it should revisit whether used should still set /INCLUDE:foo, to make things more orthogonal.

For macOS, it can make retain pick up the N_NO_DEAD_STRIP behavior if it wants to match ELF.
Though the current uses already retains the subsection.

Seems like if we're going to the trouble of adding an attribute, that's not the situation we want to end up in.

IIUC macOS's always enabled .subsections_via_symbols means always -ffunction-sections/-fdata-sections, so there is no GC preciseness concern.

Yes, the MachO linker will aggressively dead-strip, and used prevents that.

However, the current Windows behavior (.drectve include a linker option to force retaining the whole section) can unnecessarily retain the full section, in -fno-function-sections or -fno-data-sections mode.

But that Windows behavior is a limitation of the format rather than an intent to provide different semantics on different targets, I'd say.

It is a limitation for internal symbols. But for external symbols it can fully match ELF.

GCC 11 hasn't been released yet, so can we still engage with GCC about the semantics of this attribute? This is a very low-level attribute, and I don't understand why it was made separate instead of just having used add the appropriate section flag on targets that support it.

GCC did overload used with the linker garbage collection semantics. Then after discussions (e.g. https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565478.html) they decided to pick the earliest suggestion: add retain.

This makes sense to me: used can emit a function which is only used by inline assembly, but the user intention can still be that the whole thing can be GCable.

There's definitely some merit to distinguishing the cases here — used doesn't just force the object to be emitted, it also forces the compiler to treat it as if there might be accesses to the variable / function that it can't see. Thus, for example, a non-const used variable has to be assumed to be potentially mutated in arbitrary ways. I guess my concern is that this new attribute still doesn't feel like it covers the use cases very well, especially because used does have long-established semantics of affecting the linker on non-ELF targets. Basically, to get "full strength" semantics that affect the compiler and linker on all targets, you have to add a new attribute;

Yes.

to get "partial strength" semantics that only affect the compiler, you can use the old attribute on ELF, but there's no way to spell it on non-ELF; and if you wanted to get "weak" semantics (e.g. to force the symbol to be preserved through linking without broadly disabling the optimizer), there's no way to spell that anywhere.

The semantics can be represented on non-ELF, as long as the binary format supports multiple sections of the same name.

My concern is less whether the semantics are implementable and more whether there's a way of requesting them in the source language. attribute((used)) is expected to prevent link-time stripping on Mach-O and PE/COFF. We can't just change the long-accepted semantics of the existing attribute after something like 20 years; this is documented behavior.

That said, I can understand why GCC is reluctant to make attribute((used)) alone start preventing the linker from stripping symbols that it's been able to strip for years. So I guess really the attribute just has to be understood to have a different meaning on different targets because of this historical divergence. This seems like something we need to document in the manual.

If we wanted to implement the PE/COFF precision improvement, so that used only affected one specific declaration, that seems like something we could probably reasonably do, since that behavior is not documented (and may be inconsistent between linkages anyway, if I understand you right).

GCC 11 hasn't been released yet, so can we still engage with GCC about the semantics of this attribute? This is a very low-level attribute, and I don't understand why it was made separate instead of just having used add the appropriate section flag on targets that support it.

GCC did overload used with the linker garbage collection semantics. Then after discussions (e.g. https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565478.html) they decided to pick the earliest suggestion: add retain.

This makes sense to me: used can emit a function which is only used by inline assembly, but the user intention can still be that the whole thing can be GCable.

There's definitely some merit to distinguishing the cases here — used doesn't just force the object to be emitted, it also forces the compiler to treat it as if there might be accesses to the variable / function that it can't see. Thus, for example, a non-const used variable has to be assumed to be potentially mutated in arbitrary ways. I guess my concern is that this new attribute still doesn't feel like it covers the use cases very well, especially because used does have long-established semantics of affecting the linker on non-ELF targets. Basically, to get "full strength" semantics that affect the compiler and linker on all targets, you have to add a new attribute;

Yes.

to get "partial strength" semantics that only affect the compiler, you can use the old attribute on ELF, but there's no way to spell it on non-ELF; and if you wanted to get "weak" semantics (e.g. to force the symbol to be preserved through linking without broadly disabling the optimizer), there's no way to spell that anywhere.

The semantics can be represented on non-ELF, as long as the binary format supports multiple sections of the same name.

My concern is less whether the semantics are implementable and more whether there's a way of requesting them in the source language. attribute((used)) is expected to prevent link-time stripping on Mach-O and PE/COFF. We can't just change the long-accepted semantics of the existing attribute after something like 20 years; this is documented behavior.

That said, I can understand why GCC is reluctant to make attribute((used)) alone start preventing the linker from stripping symbols that it's been able to strip for years. So I guess really the attribute just has to be understood to have a different meaning on different targets because of this historical divergence. This seems like something we need to document in the manual.

I have documented the behaviors in clang/include/clang/Basic/AttrDocs.td. Do you have suggestions on the wording?

If we wanted to implement the PE/COFF precision improvement, so that used only affected one specific declaration, that seems like something we could probably reasonably do, since that behavior is not documented (and may be inconsistent between linkages anyway, if I understand you right).

I have documented the behaviors in clang/include/clang/Basic/AttrDocs.td. Do you have suggestions on the wording?

Included; please let me know what you think.

clang/include/clang/Basic/AttrDocs.td
63–76
83–91
aaron.ballman added inline comments.Feb 22 2021, 4:50 AM
clang/include/clang/Basic/Attr.td
2655

The other side of that is a user who writes only retain and expects it to do something when it's actually being silently ignored. While they may usually use it in conjunction with used, I'm more worried about the situation where the user is possibly confused.

clang/lib/Sema/SemaDecl.cpp
2834

Thanks, if I get the chance, I'll try to look into why this code is necessary (and make adjustments if needed).

MaskRay updated this revision to Diff 325486.Feb 22 2021, 10:29 AM
MaskRay marked 2 inline comments as done.

incorporate rjmccall's doc suggestion

clang/include/clang/Basic/Attr.td
2655

retain without used can be used on some external linkage definitions, as well as internal linkage definitions which are referenced by live sections. I agree there could be some confusion, but hope with mccall's suggestion (thanks) the documentation is clear.

aaron.ballman added inline comments.Feb 22 2021, 10:46 AM
clang/include/clang/Basic/Attr.td
2655

If retain without used is an expected usage pattern, then I think we need a diagnostic when we ignore the retain attribute. I don't think it is reasonable to expect users to read the documentation because they won't know that they've misused the attribute when we silently ignore it.

Alternatively, would it be possible to make retain useful on all targets? e.g., when retain is used by itself on a declaration compiled for macOS or Windows, the backend does whatever it would normally do for used?

MaskRay updated this revision to Diff 325499.Feb 22 2021, 11:08 AM

Add tests that Sema discarded unused functions are warned (nor different from the case without 'retain')

MaskRay added inline comments.Feb 22 2021, 11:13 AM
clang/include/clang/Basic/Attr.td
2655

There is the normal behavior: __attribute__((retain)) static void f1() {} // expected-warning {{unused function 'f1'}}.

Sema.cpp:1227 has the unused diagnostics. There are already many different versions of diagnostics. Do you suggest another diagnostic line for used is needed? If you think so, I'll need to figure out how to do that...

Added some tests leveraging the existing diagnostic code.

rjmccall added inline comments.Feb 22 2021, 11:17 AM
clang/include/clang/Basic/Attr.td
2655

We could definitely support retain-without-used with the ELF semantics on all targets if we think they're useful.

MaskRay added inline comments.
clang/include/clang/Basic/Attr.td
2655

This is the case where LangRef definition reflects the state on Mach-O but not on ELF (and not on PE-COFF before 2018-01).

https://reviews.llvm.org/rG99f479abcf2c9b36daad04eb91cd0aafa659bb1d (CC @compnerd) (2018-01) is the commit. We could let !retain (if we choose to use !retain as the representation) lower to similar /INCLUDE: directives in the .drectve section.

aaron.ballman added inline comments.Feb 22 2021, 11:50 AM
clang/include/clang/Basic/Attr.td
2655

I was thinking more along the lines of:

def Retain : InheritableAttr, TargetSpecificAttr<???> {
  ...
}

so that on targets where retain is ignored, the user gets an unknown attribute diagnostic as with other target-specific attributes. However, I see now that the attribute being passed along to the backend in all situations and so it isn't really a target-specific attribute in the same way as MSNoVTable (etc) is. It's not that retain isn't sensible on those targets, it's that different backends will want to handle the attribute differently and that might include it being a harmless noop. Is that a correct understanding? If so, then I don't think we need to make it target-specific or otherwise diagnose it.

MaskRay marked an inline comment as done.Feb 22 2021, 12:11 PM
MaskRay added inline comments.
clang/include/clang/Basic/Attr.td
2655

However, I see now that the attribute being passed along to the backend in all situations and so it isn't really a target-specific attribute in the same way as MSNoVTable (etc) is.

Yes. It depends on whether the backend can translate !retain into something affecting linker garbage collection behavior.
On ELF, it works via SHF_GNU_RETAIN (requires bleeding-edge toolchain).
On other binary formats, it is currently a no-op. Further notes:

On COFF, I can make !retain work by using /INCLUDE: for non-local-LLVM-linkage definitions.
On macOS, the maintainer can decide whether it is useful to set the S_ATTR_NO_DEAD_STRIP symbol attribute for !retain.

If so, then I don't think we need to make it target-specific or otherwise diagnose it.

Thanks! Marks as resolved.

aaron.ballman accepted this revision.Feb 22 2021, 12:21 PM

LGTM aside from some minor improvements to the documentation.

clang/include/clang/Basic/Attr.td
2655

Thank you for your patience while I figured this out. :-)

clang/include/clang/Basic/AttrDocs.td
63
85

Yeah, both COFF and Mach-O have longstanding ways to protect linker dead-stripping, and the compiler already has to manually trigger them on those targets for used, so it's certainly implementable to also trigger them for retain-without-used. I just don't think it's a very good feature. It seems to me that the use cases of retain-without-used all basically boil down to "Definition A relies on Definition B in some way that isn't just a symbol reference." retain-without-used is at best a very rough way of achieving that underlying goal of forcing Definition B to be emitted if something requires Definition A, and what we really want is a way to express that dependence so that the compiler/linker can still strip Definition B if Definition A is also stripped.

Also in this space, I think we still don't have a way to express llvm.compiler_used in the source language, i.e. "feel free to dead-strip this if it's never referenced, but if it is referenced, there will be funny uses of it that the compiler can't reason about".

rnk added a comment.Feb 23 2021, 10:06 AM

It should be possible to make retain work on COFF for internal linkage things by emitting a non-comdat section, even when /Gy / -ffunction/data-sections are enabled. That's complicated if the thing being retained is in a comdat group, but I think it's doable. If the group is being retained and the leader is internal, then all of the sections in the group should be retained by GC, and none of them need to be marked comdat, and they won't participate in GC.

probinson added inline comments.
clang/include/clang/Basic/AttrDocs.td
98

As a user, this seems like a complicated thing to figure out.
Also it seems to understate the case; if you have a linker that does understand SHF_GNU_RETAIN, then you *have* to use attribute retain to get the effect you want; otherwise your code breaks, probably silently.
And it looks like if you happen to combine an older compiler with a new-enough linker, you can never get it to do what you want.

I hope I'm wrong about these things, but that's what the docs suggest.

MaskRay marked an inline comment as done.Feb 23 2021, 11:14 AM
MaskRay added inline comments.
clang/include/clang/Basic/AttrDocs.td
98

Also it seems to understate the case; if you have a linker that does understand SHF_GNU_RETAIN, then you *have* to use attribute retain to get the effect you want; otherwise your code breaks, probably silently.

For ELF, used does not retain the entity, regardless of this patch.

This means on https://llvm.org/docs/LangRef.html#the-llvm-used-global-variable , the "linker" part has been false for ELF, and had been false for COFF until 2018-01 (https://reviews.llvm.org/rG99f479abcf2c9b36daad04eb91cd0aafa659bb1d).

If a symbol appears in the @llvm.used list, then the compiler, assembler, and linker are required to treat the symbol as if there is a reference to the symbol that it cannot see (which is why they have to be named).

llvm.compiler.used's description has

This is a rare construct that should only be used in rare circumstances, and should not be exposed to source languages.

For ELF, used does not retain the entity, regardless of this patch.

That statement does not match my experience.

Note that there are two separate variables documented in the LangRef: llvm.used (which corresponds to source attribute used and is documented as affecting linker behavior) and llvm.compiler.used (which is a rare construct with no corresponding source attribute, and does not affect linker behavior). So, according to LangRef, source attribute used maps to llvm.used which is documented as affecting linker behavior, and that's exactly the behavior I am seeing in my project.

But from a user perspective, the IR representation is irrelevant, I don't actually care what the description of llvm.used says. I put attribute used on my data, and it is still there in the final executable. That's how things have worked for a long time, and I am concerned that you are about to break it. Note that I am not using groups or comdats or anything else funky; I'm declaring static variables in a custom section, adding attribute used and the necessary __start_/__stop_ symbols, and everything works exactly as I want.

Maybe there is a different patch that more directly relates to my concern, and probably we should be having this conversation on the llvm-dev thread instead of in a patch that not many people are paying attention to. I'm very happy to move the discussion elsewhere if that would help.

MaskRay added a comment.EditedFeb 23 2021, 1:18 PM

For ELF, used does not retain the entity, regardless of this patch.

That statement does not match my experience.

You probably used C identifier name sections with __attribute__((used, section("foo"))).
used prevents clang/GCC from deleting unused definitions, but does not prevent GNU ld/gold/LLD from garbage collecting them.
The __start_foo reference from a live section retains all input sections foo: https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order#c-identifier-name-sections

Note that there are two separate variables documented in the LangRef: llvm.used (which corresponds to source attribute used and is documented as affecting linker behavior) and llvm.compiler.used (which is a rare construct with no corresponding source attribute, and does not affect linker behavior). So, according to LangRef, source attribute used maps to llvm.used which is documented as affecting linker behavior, and that's exactly the behavior I am seeing in my project.

As the description of this patch suggests, llvm.usedhas linker behavior effects on macOS with all symbols, and Windows with non-local-linkage symbols, but no effect on ELF.
The behavior you observed is due to __start_/__stop_ references from live input sections. (D96914 https://lists.llvm.org/pipermail/llvm-dev/2021-February/148682.html)

But from a user perspective, the IR representation is irrelevant, I don't actually care what the description of llvm.used says. I put attribute used on my data, and it is still there in the final executable. That's how things have worked for a long time, and I am concerned that you are about to break it.

You probably meant the LLD patch D96914. It does not change the default now.

Note that I am not using groups or comdats or anything else funky; I'm declaring static variables in a custom section, adding attribute used and the necessary __start_/__stop_ symbols, and everything works exactly as I want.

If you use a non-C-identifier-name section, then __start_ does not work. __attribute__((used)) does not retain it on ELF.

Maybe there is a different patch that more directly relates to my concern, and probably we should be having this conversation on the llvm-dev thread instead of in a patch that not many people are paying attention to. I'm very happy to move the discussion elsewhere if that would help.

So GCC folks' opinions is that used is orthogonal to retain. From this viewpoint, __attribute__((used)) will be better represented as llvm.compiler.used, but LangRef currently discourages this usage.
I am a bit busy currently, but if you move the discussion to llvm-dev, I'll be happy to discuss it there.

Aha; attribute used *by itself* is not sufficient to preserve sections in the output. But the __start_/__stop_ symbols implicitly create a reference to each of the named sections, and that implicit reference can preserve them in the output (assuming gc roots etc).
So, the idea is that attribute retain can be used *instead* of the __start_/__stop_ symbols, to preserve sections in the output (with the advantage that it will work even for sections that do not have a C-identifier name).

Thanks for helping me understand this from a user perspective. That will be important when you go to write the release note for this new attribute.

MaskRay added a comment.EditedFeb 24 2021, 9:28 AM

We have two implementation choices.

  1. Lift the source language restriction (it is currently discouraged) on llvm.compiler.used, let llvm.used use SHF_GNU_RETAIN on ELF, and change __attribute__((used)) to use llvm.compiler.used on ELF.
  2. Don't touch the backend semantics of llvm.used or llvm.compiler.used. Add a metadata !retain and let __attribute__((retain)) use that. This is what has been implemented in D96837 and this patch.

I posted https://lists.llvm.org/pipermail/llvm-dev/2021-February/148760.html (llvm-dev & cfe-dev) (my first message mentions that I lean to 1. I actually lean to 2.)

Seems that whatever we choose, llvm.used and llvm.compiler.used's semantics in LangRef may need further clarification.

MaskRay abandoned this revision.Feb 24 2021, 11:43 PM

Thanks folks for review. Folks are more happy with approach 1 on https://lists.llvm.org/pipermail/llvm-dev/2021-February/148760.html , so I am abandoning this.

I have copied the documentation and tests to D97447.