This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Add -z start-stop-gc to let __start_/__stop_ not retain C identifier name sections
ClosedPublic

Authored by MaskRay on Feb 17 2021, 3:59 PM.

Details

Summary

For one metadata section usage, each text section references a metadata section.
The metadata sections have a C identifier name to allow the runtime to collect them via __start_/__stop_ symbols.

Since __start_/__stop_ references are always present from live sections, the
C identifier name sections appear like GC roots, which means they cannot be
discarded by ld --gc-sections.

To make such sections GCable, either SHF_LINK_ORDER or a section group is needed.

SHF_LINK_ORDER is not suitable for the references can be inlined into other functions
(See D97430:
Function A (in the section .text.A) references its __sancov_guard section.
Function B inlines A (so now .text.B references __sancov_guard - this is invalid with the semantics of SHF_LINK_ORDER).

In the linking stage,
if .text.A gets discarded, and __sancov_guard is retained via the reference from .text.B,
the output will be invalid because __sancov_guard references the discarded .text.A.
LLD errors "sh_link points to discarded section".
)

A section group have size overhead, and is cumbersome when there is just one metadata section.

Add -z start-stop-gc to drop the "start_/stop_ references retain
non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections" rule.
We reserve the rights to switch the default in the future.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 17 2021, 3:59 PM
MaskRay requested review of this revision.Feb 17 2021, 3:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 3:59 PM
MaskRay updated this revision to Diff 324460.Feb 17 2021, 4:11 PM

add a test

Personally, I'm really not a fan of hardcoding/special casing names of internal symbols from libclang_rt.profile and I'd prefer a more generic solution like SHF_GNU_RETAIN.

I understand that there would be an issue when you try to link libclang_rt.profile that was compiled by an older GCC/Clang (which doesn't support SHF_GNU_RETAIN) using the latest lld, but I'm not sure if that's a use case we want to support. This was discussed several times in the past, and the conclusion has always been that you should either use the in-tree Clang to build runtimes, or that you should build runtimes separately from the compiler in which case you need to configure the build appropriately.

We could introduce an additional option in the compiler-rt profile build for that purpose which would enable/disable the use of the .data.PERSIST_SIG trick (used by AFL) as one potential solution for older compilers.

MaskRay updated this revision to Diff 324539.Feb 17 2021, 11:41 PM
MaskRay edited the summary of this revision. (Show Details)

Drop __llvm_prf_* workaround

MaskRay added inline comments.Feb 17 2021, 11:44 PM
lld/test/ELF/gc-sections-metadata-startstop.s
22

Will delete this stray comment.

phosek added inline comments.Feb 17 2021, 11:44 PM
lld/test/ELF/gc-sections-metadata-startstop.s
22–30

Leftover from previous version?

phosek accepted this revision.Feb 18 2021, 11:41 AM

LGTM (% stale comment), thanks for figuring out the solution for __llvm_prf_*.

This revision is now accepted and ready to land.Feb 18 2021, 11:41 AM
MaskRay edited the summary of this revision. (Show Details)Feb 19 2021, 3:32 PM
jrtc27 requested changes to this revision.Feb 19 2021, 5:30 PM
jrtc27 added a subscriber: jrtc27.

FreeBSD uses linker sets extensively. Do not remove this, you will break FreeBSD, both the kernel and userspace.

This revision now requires changes to proceed.Feb 19 2021, 5:30 PM

Please can you ensure that this is tested with some Objective-C code compiled with -fobjc-runtime=gnustep-2.0? If I am reading the intention correctly, it may result in all of the Objective-C code being dropped from the final link.

These types of reference can be difficult as it is difficult to know what the users intention with respect to gc_root is. I think that the majority of programs don't depend on all sections being marked live but there are definitely programs out there that do. It may not be easy to migrate all these programs to use alternative means such as SHF_RETAIN. Would it be useful to make this a selectable option for users with lots of existing code that depends on these sections being marked live?

As an aside, in Arm's proprietary linker we have had some experience with trying to come up with a more complex way of handling these types of reference as
we found that keeping everything was too restrictive for the use case we had for .ARM.exidx sections. We wanted to always generate unwind tables (.ARM.exidx), but to be able to remove all the .ARM.exidx sections and all the exception handling code from the library that process them if no call to throw was found in the program; in effect a semi-automatic -fno-exceptions. A key part of that was only referencing the .ARM.exidx sections via the equivalent of _start and _stop sections. Rather than special case this we wanted to come up with a general mechansim.

I wish I could say we came up with a good automatic solution but we couldn't find anything particularly justifiable. Our thought was to mark sections that were reachable only via a chain of dependencies (relocations or link-order) from _start and _stop symbols as weakly used. Sections reachable from a GC root were strongly used. A weakly used section could be turned into strongly used if one of the weakly used sections had a dependency on a strongly used section defined in the same object file as the weakly used section (the last bit was in effect a hack as we found the gc ineffective without it). This worked reasonably well for .ARM.exidx sections. However while it was intended to generalise to other use cases, such as throwing away unused C++ static constructors, we found this broke too many programs so the whole mechanism only got used for exceptions.

lld/test/ELF/startstop-gccollect.s
8 ↗(On Diff #324539)

Presumably you'd need to update the comment if the change went through.

MaskRay updated this revision to Diff 325701.Feb 23 2021, 12:46 AM
MaskRay marked an inline comment as done.
MaskRay retitled this revision from [ELF] Don't let __start_/__stop_ retain C identifier name sections to [ELF] Add -z start-stop-gc to let __start_/__stop_ not retain C identifier name sections.
MaskRay edited the summary of this revision. (Show Details)

Add -z start-stop-gc

It does not link to a usage requiring the rule __start_/__stop_ references retain non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections.

FreeBSD uses linker sets extensively. Do not remove this, you will break FreeBSD, both the kernel and userspace.

I think I'd like bjk's word: "'that breaks linker sets entirely' seems like something that would benefit from a paragraph or two of additional exposition". Perhaps @dim can help on discussing this on a FreeBSD mailing list (I don't subscribe them).

Please can you ensure that this is tested with some Objective-C code compiled with -fobjc-runtime=gnustep-2.0? If I am reading the intention correctly, it may result in all of the Objective-C code being dropped from the final link.

I know almost nothing about Objective-C. Can you name the section which could be problematic?

% clang -ffunction-sections -fobjc-runtime=gnustep-2.0 -isystem /usr/include/GNUstep a.m
In file included from a.m:1:
/usr/include/GNUstep/Foundation/Foundation.h:31:9: fatal error: 'objc/objc.h' file not found
#import <objc/objc.h>
        ^~~~~~~~~~~~~
1 error generated.

Thanks for the update. I'm happy for this to be an option that defaults to off. That won't break any existing code that is depending on the behavior.

jrtc27 accepted this revision.Feb 24 2021, 3:07 PM

Thanks, having it opt-in for the times it's needed is fine by me (although anyone opting in should of course be careful that it really is safe). Possibly worth a note in the manpage that it's not always safe?

This revision is now accepted and ready to land.Feb 24 2021, 3:07 PM
MaskRay edited the summary of this revision. (Show Details)Feb 25 2021, 3:44 PM
MaskRay edited the summary of this revision. (Show Details)Feb 25 2021, 3:47 PM

Please can you ensure that this is tested with some Objective-C code compiled with -fobjc-runtime=gnustep-2.0? If I am reading the intention correctly, it may result in all of the Objective-C code being dropped from the final link.

I know almost nothing about Objective-C. Can you name the section which could be problematic?

% clang -ffunction-sections -fobjc-runtime=gnustep-2.0 -isystem /usr/include/GNUstep a.m
In file included from a.m:1:
/usr/include/GNUstep/Foundation/Foundation.h:31:9: fatal error: 'objc/objc.h' file not found
#import <objc/objc.h>
        ^~~~~~~~~~~~~
1 error generated.

Answering this: there are some llvm.used usage for ObjC (D97446). D97448 will handle them.

Thanks, having it opt-in for the times it's needed is fine by me (although anyone opting in should of course be careful that it really is safe). Possibly worth a note in the manpage that it's not always safe?

My approval for this change was conditional on it being opt-in. In 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 you flipped the default, committing without review. As anticipated, this has broken things in FreeBSD. I don't know the extent of it, @dim can perhaps comment, but I am not impressed by that commit after I specifically told you this had implications for FreeBSD's use of linker sets.

MaskRay added a comment.EditedSep 7 2021, 12:58 PM

Thanks, having it opt-in for the times it's needed is fine by me (although anyone opting in should of course be careful that it really is safe). Possibly worth a note in the manpage that it's not always safe?

My approval for this change was conditional on it being opt-in. In 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 you flipped the default, committing without review. As anticipated, this has broken things in FreeBSD. I don't know the extent of it, @dim can perhaps comment, but I am not impressed by that commit after I specifically told you this had implications for FreeBSD's use of linker sets.

On March 1, 2021, I sent a message with ID "CAN30aBHe1YLGxMQv6s2x1BKp7Osg_kUHoVCKFAJST+=gMJY39g@mail.gmail.com" to inform emaste and dim.
@emaste replied: "However, if GNU ld / gold / lld default to GCing things we could set the option to reenable it in kernel/userland Makefiles."

If FreeBSD's use of linker sets has any problems, I think they'll happily adapt.

hvdijk added a subscriber: hvdijk.EditedOct 30 2021, 1:49 PM

That followup 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 gives off the impression that aside from systemd, nothing else would be broken by the change. That is incorrect, enough time has passed since the traditional GNU ld behaviour was ubiquitous that more software has started relying on the new GNU ld behaviour, and lld 13 breaks at least NetworkManager too. (I have reported this to NetworkManager.) What's the right thing to do here? At the very least, I think the release notes simply listing this as an improvement rather than as a breaking change is not enough info for users to beware that their code may need updating, but depending on whether the breakage affects enough other software, maybe it should be reverted on the 13.x branch as well?

It's worse. While trying to patch NetworkManager to add attribute((retain)), I found that has_attribute(retain__) is effectively broken in GCC, which you already knew about: you reported it to GCC (bug 99587). The switch really should have been left alone until that was fixed first: flipping the switch has left us in a state where code is broken and needs modification to work, but the way we should modify the code for clang, the way we generally recommend in documentation, breaks the code for GCC.

MaskRay added a comment.EditedOct 30 2021, 4:41 PM

That followup 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 gives off the impression that aside from systemd, nothing else would be broken by the change. That is incorrect, enough time has passed since the traditional GNU ld behaviour was ubiquitous that more software has started relying on the new GNU ld behaviour, and lld 13 breaks at least NetworkManager too. (I have reported this to NetworkManager.) What's the right thing to do here?

Thanks for reporting the bug to NetworkManager.

At the very least, I think the release notes simply listing this as an improvement rather than as a breaking change is not enough info for users to beware that their code may need updating, but depending on whether the breakage affects enough other software, maybe it should be reverted on the 13.x branch as well?

No, I don't think the additional report from NetworkManager is sufficient to justify restoring the GNU ld buggy behavior.
It's not that many projects depend on the unfortunate behavior and I don't think restoring the behavior now helps any project.
Postponing the transition would just cause pain to users of various metadata sections (LLVM PGO).

NetworkManager can either fix their code or not use -Wl,--gc-sections.

jrtc27 added a comment.EditedNov 2 2021, 1:43 PM

This also breaks LDC, which uses __start___minfo/__stop___minfo and enables --gc-sections.

jrtc27 added a comment.Nov 2 2021, 1:45 PM

That followup 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 gives off the impression that aside from systemd, nothing else would be broken by the change. That is incorrect, enough time has passed since the traditional GNU ld behaviour was ubiquitous that more software has started relying on the new GNU ld behaviour, and lld 13 breaks at least NetworkManager too. (I have reported this to NetworkManager.) What's the right thing to do here?

Thanks for reporting the bug to NetworkManager.

At the very least, I think the release notes simply listing this as an improvement rather than as a breaking change is not enough info for users to beware that their code may need updating, but depending on whether the breakage affects enough other software, maybe it should be reverted on the 13.x branch as well?

No, I don't think the additional report from NetworkManager is sufficient to justify restoring the GNU ld buggy behavior.

It wasn't a bug, it was a feature that you broke.

It's not that many projects depend on the unfortunate behavior and I don't think restoring the behavior now helps any project.

It helps all the projects that are broken by this feature they rely on being removed, with no easy way to work around it in some cases.

Postponing the transition would just cause pain to users of various metadata sections (LLVM PGO).

Or you could fix those to be GC-able via some other means.

NetworkManager can either fix their code or not use -Wl,--gc-sections.

emaste added a comment.Nov 2 2021, 1:55 PM

@emaste replied: "However, if GNU ld / gold / lld default to GCing things we could set the option to reenable it in kernel/userland Makefiles."

Right, we can accommodate a changed default relatively easily for the FreeBSD base system. It is more work for the ports tree and quite unfortunate if different linkers behave differently.

MaskRay added a comment.EditedNov 2 2021, 8:01 PM

That followup 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 gives off the impression that aside from systemd, nothing else would be broken by the change. That is incorrect, enough time has passed since the traditional GNU ld behaviour was ubiquitous that more software has started relying on the new GNU ld behaviour, and lld 13 breaks at least NetworkManager too. (I have reported this to NetworkManager.) What's the right thing to do here?

Thanks for reporting the bug to NetworkManager.

At the very least, I think the release notes simply listing this as an improvement rather than as a breaking change is not enough info for users to beware that their code may need updating, but depending on whether the breakage affects enough other software, maybe it should be reverted on the 13.x branch as well?

No, I don't think the additional report from NetworkManager is sufficient to justify restoring the GNU ld buggy behavior.

It wasn't a bug, it was a feature that you broke.

It's not that many projects depend on the unfortunate behavior and I don't think restoring the behavior now helps any project.

It helps all the projects that are broken by this feature they rely on being removed, with no easy way to work around it in some cases.

-Wl,-z,nostart-stop-gc is an easy workaround.
The option is even supported by newer GNU ld, though it hasn't switched the default yet.

Postponing the transition would just cause pain to users of various metadata sections (LLVM PGO).

Or you could fix those to be GC-able via some other means.

It cannot. The very problem of the 2015-10 GNU ld behavior is that it essentially makes all metadata sections using C identifier section names not GCable.
It is a built-in rule which cannot be overridden.

We could add ad-hoc rules like SHF_GROUP sections could still be GCed. That would help some metadata sections but not others which do not use section groups.
Section groups have high costs (sizeof(Elf64_Shdr) = 64) and may be too expensive to enable.

LDC is even less of a problem because it already needs many changes when importing LLVM changes.
Well, I have fixed some LLVM upgrade issues for it when I was still using D.

These projects relying on the post-2015-10 GNU ld behavior. While I feel sympathy for them, and I think something which could be done better (but I heard that for Chrome OS folks this was a quite smooth transition), the oneshot pain is probably still acceptable.
The longer we waited, there could just be more problems, and it would take more pain to flip the switch.
I think at this point we have mostly passed the finite window of time where problems could have be detected.
Finally I appreciate that FreeBSD folks can find and report problems in packages.
I am also happy to be CCed if some packages need upstream communication.

NetworkManager can either fix their code or not use -Wl,--gc-sections.

jrtc27 added a comment.Nov 2 2021, 8:11 PM

That followup 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 gives off the impression that aside from systemd, nothing else would be broken by the change. That is incorrect, enough time has passed since the traditional GNU ld behaviour was ubiquitous that more software has started relying on the new GNU ld behaviour, and lld 13 breaks at least NetworkManager too. (I have reported this to NetworkManager.) What's the right thing to do here?

Thanks for reporting the bug to NetworkManager.

At the very least, I think the release notes simply listing this as an improvement rather than as a breaking change is not enough info for users to beware that their code may need updating, but depending on whether the breakage affects enough other software, maybe it should be reverted on the 13.x branch as well?

No, I don't think the additional report from NetworkManager is sufficient to justify restoring the GNU ld buggy behavior.

It wasn't a bug, it was a feature that you broke.

It's not that many projects depend on the unfortunate behavior and I don't think restoring the behavior now helps any project.

It helps all the projects that are broken by this feature they rely on being removed, with no easy way to work around it in some cases.

-Wl,-z,nostart-stop-gc is an easy workaround.

Not if you're in code that doesn't know what linker, let alone the version of it, is being used. If you don't use it for new LLD you break because your sections are bogusly GC'ed. If you use it for old LLD or BFD you break because the option doesn't exist and gives an error. Even Clang's driver can't know if the option is supported, so how do you expect other projects using similar code (like LDC) to do so? Just because you can detect the linker at toolchain build time doesn't mean you know what linker will be used at toolchain run time.

Postponing the transition would just cause pain to users of various metadata sections (LLVM PGO).

Or you could fix those to be GC-able via some other means.

It cannot. The very problem of the 2015-10 GNU ld behavior is that it essentially makes all metadata sections using C identifier section names not GCable.
It is a built-in rule which cannot be overridden.

Sure you can. If you can add SHF_GNU_RETAIN, you can add SHF_GNU_IT_IS_SAFE_TO_GC_ME. Or you can do a proper transition (see below).

We could add ad-hoc rules like SHF_GROUP sections could still be GCed. That would help some metadata sections but not others which do not use section groups.
Section groups have high costs (sizeof(Elf64_Shdr) = 64) and may be too expensive to enable.

LDC is even less of a problem because it already needs many changes when importing LLVM changes.
Well, I have fixed some LLVM upgrade issues for it when I was still using D.

That is irrelevant. This is not about what libLLVM version it links against. This is about what the system linker is, which could be BFD, gold or LLD, and has zero connection to the libLLVM version used.

These projects relying on the post-2015-10 GNU ld behavior. While I feel sympathy for them, I think they should take the oneshot pain.
The longer we waited, there could just be more problems, and it would be even more difficult to flip the switch.

This is not how you do a transition. You do a transition by adding the option, then _several years later_ flipping the default so that it can just be assumed to exist. You _can't_ add the option and flip the default _in the same release cycle_, that leads to a mess.

From Chrome OS folks I heard that the transition was still quite smooth.
I think at this point we have mostly passed the finite window of time where problems could have be detected.

ChromeOS is hardly a distribution shipping tens of thousands of packages. It's an OS that's just Chrome, maybe a bit more by now but still fundamentally nothing in comparison to a full FreeBSD or Linux distro.

jrtc27 added a comment.Nov 2 2021, 8:13 PM

And, of course, all of this completely neglects the fact that it ends up entirely breaking static linking entirely, where the .a has no control over whether it gets linked correctly or not. It is now impossible to reliably use linker sets in a static library.

jrtc27 added a subscriber: tstellar.Nov 2 2021, 8:24 PM

I am therefore requesting that 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 be reverted on both main and release/13.x due to breakage across multiple projects in ways that cannot reasonably be fixed in a robust manner, and that it remain reverted until a transition plan is put forward and agreed upon, including with GNU ld developers if the plan involves ELF extensions.

Cc @tstellar due to the new option and changed default both first appearing in the 13 release

@jrtc27 Can you file a bug for this?

@jrtc27 Can you file a bug for this?

@tstellar I think @jrtc27 is over reacted on this issue. I'd be strongly in favor of keeping the status quo.

Reverting would just do more harm. Newer instrumentation technology will suffer more from the unneeded size bloat.

ChromeOS is hardly a distribution shipping tens of thousands of packages. It's an OS that's just Chrome, maybe a bit more by now but still fundamentally nothing in comparison to a full FreeBSD or Linux distro.

It ships more than 12k packages, probably more.

I'd appreciate more official FreeBSD folks doing an analysis, rather than making claims just based on anecdotal cases from LDC.

@jrtc27 Can you file a bug for this?

@tstellar I think @jrtc27 is over reacted on this issue. I'd be strongly in favor of keeping the status quo.

Reverting would just do more harm. Newer instrumentation technology will suffer more from the unneeded size bloat.

The bug is for tracking purposes, it doesn't mean that we've made a decision one way or the other.

MaskRay added a comment.EditedNov 2 2021, 8:51 PM

Not if you're in code that doesn't know what linker, let alone the version of it, is being used. If you don't use it for new LLD you break because your sections are bogusly GC'ed. If you use it for old LLD or BFD you break because the option doesn't exist and gives an error. Even Clang's driver can't know if the option is supported, so how do you expect other projects using similar code (like LDC) to do so? Just because you can detect the linker at toolchain build time doesn't mean you know what linker will be used at toolchain run time.

I don't understand why this is a problem.
The code doesn't need to dispatch on different behaviors.
It just needs to be written in a way portable to pre-2015-10 GNU ld and current LLD.

If it needs time for transition, -Wl,-z,nostart-stop-gc in a configure time detection.

Sure you can. If you can add SHF_GNU_RETAIN, you can add SHF_GNU_IT_IS_SAFE_TO_GC_ME. Or you can do a proper transition (see below).

Since there is a section flag with the positive semantics, there is zero chance the flag with the negative semantics would be accepted.
The name also looks irony, rather than a real proposal.

That is irrelevant. This is not about what libLLVM version it links against. This is about what the system linker is, which could be BFD, gold or LLD, and has zero connection to the libLLVM version used.

My point is that every time they upgrade llvm-project, they already need to deal with changes.
As experienced toolchain developers they are in a better position detecting and fixing the issues.
If you find a https://github.com/ldc-developers/ldc issue, the more productive way is to open an issue there.

This is not how you do a transition. You do a transition by adding the option, then _several years later_ flipping the default so that it can just be assumed to exist. You _can't_ add the option and flip the default _in the same release cycle_, that leads to a mess.

Not for this case. For this case fixing early can make sure the whole ecosystem takes the least pain. Waiting larger would create a larger disconnection between metadata section users and packages potentially relying on the unfortunate GNU ld behavior.
FWIW I still don't see why it is a mess. It is an easy change dispatching on the availability of the option, (slightly worse) dispatching on LLD version.
As I said, the window of time with potentially problems is finite and evidences from some large LLD adopters (Android, Chrome OS, Fuchsia, Sony, Meta, Alphabet) have suggested that we've mostly passed the window.

jrtc27 added a comment.Nov 2 2021, 9:14 PM

Not if you're in code that doesn't know what linker, let alone the version of it, is being used. If you don't use it for new LLD you break because your sections are bogusly GC'ed. If you use it for old LLD or BFD you break because the option doesn't exist and gives an error. Even Clang's driver can't know if the option is supported, so how do you expect other projects using similar code (like LDC) to do so? Just because you can detect the linker at toolchain build time doesn't mean you know what linker will be used at toolchain run time.

I don't understand why this is a problem.
The code doesn't need to dispatch on different behaviors.
It just needs to be written in a way portable to pre-2015-10 GNU ld and current LLD.

If it needs time for transition, -Wl,-z,nostart-stop-gc in a configure time detection.

Which breaks if I build with LLD 12 around and then update to LLD 13. This is the exact same reason why Clang doesn't do configure-time detection for the exact linker version that will be used at run time (I forget if it even does any, but if so it's just taken to be a baseline, which is of no use here if it's pre-13).

Sure you can. If you can add SHF_GNU_RETAIN, you can add SHF_GNU_IT_IS_SAFE_TO_GC_ME. Or you can do a proper transition (see below).

Since there is a section flag with the positive semantics, there is zero chance the flag with the negative semantics would be accepted.
The name also looks irony, rather than a real proposal.

Yes, the name is obviously not what you'd use. It was deliberately chosen to be extremely clear what it meant. Though I don't see what the problem with the semantics would be, force-yes/force-no/default is a pretty standard tri-state all over the place, having both flags would just encode that (albeit wasting the fourth possible state, but don't see how you can avoid that unless there's some other useful option).

That is irrelevant. This is not about what libLLVM version it links against. This is about what the system linker is, which could be BFD, gold or LLD, and has zero connection to the libLLVM version used.

My point is that every time they upgrade llvm-project, they already need to deal with changes.
As experienced toolchain developers they are in a better position detecting and fixing the issues.
If you find a https://github.com/ldc-developers/ldc issue, the more productive way is to open an issue there.

Re-read what I said. "Every time they upgrade llvm-project" is irrelevant. It could be built against libLLVM13 or libLLVM3, it doesn't matter, that has zero bearing on what version /usr/bin/ld.lld is.

This is not how you do a transition. You do a transition by adding the option, then _several years later_ flipping the default so that it can just be assumed to exist. You _can't_ add the option and flip the default _in the same release cycle_, that leads to a mess.

Not for this case. For this case fixing early can make sure the whole ecosystem takes the least pain. Waiting larger would create a larger disconnection between metadata section users and packages potentially relying on the unfortunate GNU ld behavior.

If it's so painful, why has metadata been using the scheme it does all these years? Surely it should've just picked a better design that was more efficient back then, and more efficient even today with GNU ld that still has the traditional behaviour? This really feels to me like solving the problem from the wrong end; you can't just come in and declare an old thing broken just so your new thing that you chose to work this way rather than some other way works better, you're supposed to design things appropriately based on the constraints that exist.

FWIW I still don't see why it is a mess. It is an easy change dispatching on the availability of the option, (slightly worse) dispatching on LLD version.

If it's so easy then please provide a robust patch for LDC that's acceptable to upstream and doesn't rely on brittle configure-time detection of LLD that ties it to the exact version of LLD in the environment it was built.

As I said, the window of time with potentially problems is finite and evidences from some large LLD adopters (Android, Chrome OS, Fuchsia, Sony, Meta, Alphabet) have suggested that we've mostly passed the window.

Yet here we are finding another issue just today, before 13.0.1 is even out.

MaskRay added a comment.EditedNov 2 2021, 10:57 PM

Not if you're in code that doesn't know what linker, let alone the version of it, is being used. If you don't use it for new LLD you break because your sections are bogusly GC'ed. If you use it for old LLD or BFD you break because the option doesn't exist and gives an error. Even Clang's driver can't know if the option is supported, so how do you expect other projects using similar code (like LDC) to do so? Just because you can detect the linker at toolchain build time doesn't mean you know what linker will be used at toolchain run time.

I don't understand why this is a problem.
The code doesn't need to dispatch on different behaviors.
It just needs to be written in a way portable to pre-2015-10 GNU ld and current LLD.

If it needs time for transition, -Wl,-z,nostart-stop-gc in a configure time detection.

Which breaks if I build with LLD 12 around and then update to LLD 13. This is the exact same reason why Clang doesn't do configure-time detection for the exact linker version that will be used at run time (I forget if it even does any, but if so it's just taken to be a baseline, which is of no use here if it's pre-13).

It's a pain either way. For metadata section users, for Linux, we probably could not flip the default as the default needs to work with very old GNU ld (typically 5+ years old).
Once choice might to special case -fuse-ld=lld. Some groups even default to ld for lld and may not enjoy the size saving.
The inconsistency between -fuse-ld=lld and others also has a cost.

Sure you can. If you can add SHF_GNU_RETAIN, you can add SHF_GNU_IT_IS_SAFE_TO_GC_ME. Or you can do a proper transition (see below).

Since there is a section flag with the positive semantics, there is zero chance the flag with the negative semantics would be accepted.
The name also looks irony, rather than a real proposal.

Yes, the name is obviously not what you'd use. It was deliberately chosen to be extremely clear what it meant. Though I don't see what the problem with the semantics would be, force-yes/force-no/default is a pretty standard tri-state all over the place, having both flags would just encode that (albeit wasting the fourth possible state, but don't see how you can avoid that unless there's some other useful option).

That is irrelevant. This is not about what libLLVM version it links against. This is about what the system linker is, which could be BFD, gold or LLD, and has zero connection to the libLLVM version used.

My point is that every time they upgrade llvm-project, they already need to deal with changes.
As experienced toolchain developers they are in a better position detecting and fixing the issues.
If you find a https://github.com/ldc-developers/ldc issue, the more productive way is to open an issue there.

Re-read what I said. "Every time they upgrade llvm-project" is irrelevant. It could be built against libLLVM13 or libLLVM3, it doesn't matter, that has zero bearing on what version /usr/bin/ld.lld is.

Well, my point persists. ldc developers are in a better position fixing the problem. They even contacted me for investigating some LTO problems so they know whether to get help if needed.

This is not how you do a transition. You do a transition by adding the option, then _several years later_ flipping the default so that it can just be assumed to exist. You _can't_ add the option and flip the default _in the same release cycle_, that leads to a mess.

Not for this case. For this case fixing early can make sure the whole ecosystem takes the least pain. Waiting larger would create a larger disconnection between metadata section users and packages potentially relying on the unfortunate GNU ld behavior.

If it's so painful, why has metadata been using the scheme it does all these years? Surely it should've just picked a better design that was more efficient back then, and more efficient even today with GNU ld that still has the traditional behaviour? This really feels to me like solving the problem from the wrong end; you can't just come in and declare an old thing broken just so your new thing that you chose to work this way rather than some other way works better, you're supposed to design things appropriately based on the constraints that exist.

Well, the encapsulation symbol design is quite the good. The only unfortunate thing was that GNU ld somehow broke it.
I think the current model can serve us for many years from now on.

FWIW I still don't see why it is a mess. It is an easy change dispatching on the availability of the option, (slightly worse) dispatching on LLD version.

If it's so easy then please provide a robust patch for LDC that's acceptable to upstream and doesn't rely on brittle configure-time detection of LLD that ties it to the exact version of LLD in the environment it was built.

I opened https://github.com/ldc-developers/ldc/issues/3861
If you think there is a issue, please comment there.

It seems that ldc places __minfo global variables in llvm.used, so what's the problem? Using llvm-project<13 library with ld.lld>=13.0.0?

As I said, the window of time with potentially problems is finite and evidences from some large LLD adopters (Android, Chrome OS, Fuchsia, Sony, Meta, Alphabet) have suggested that we've mostly passed the window.

Yet here we are finding another issue just today, before 13.0.1 is even out.

jrtc27 added a comment.Nov 4 2021, 2:44 PM

Not if you're in code that doesn't know what linker, let alone the version of it, is being used. If you don't use it for new LLD you break because your sections are bogusly GC'ed. If you use it for old LLD or BFD you break because the option doesn't exist and gives an error. Even Clang's driver can't know if the option is supported, so how do you expect other projects using similar code (like LDC) to do so? Just because you can detect the linker at toolchain build time doesn't mean you know what linker will be used at toolchain run time.

I don't understand why this is a problem.
The code doesn't need to dispatch on different behaviors.
It just needs to be written in a way portable to pre-2015-10 GNU ld and current LLD.

If it needs time for transition, -Wl,-z,nostart-stop-gc in a configure time detection.

Which breaks if I build with LLD 12 around and then update to LLD 13. This is the exact same reason why Clang doesn't do configure-time detection for the exact linker version that will be used at run time (I forget if it even does any, but if so it's just taken to be a baseline, which is of no use here if it's pre-13).

It's a pain either way. For metadata section users, for Linux, we probably could not flip the default as the default needs to work with very old GNU ld (typically 5+ years old).
Once choice might to special case -fuse-ld=lld. Some groups even default to ld for lld and may not enjoy the size saving.
The inconsistency between -fuse-ld=lld and others also has a cost.

Sure you can. If you can add SHF_GNU_RETAIN, you can add SHF_GNU_IT_IS_SAFE_TO_GC_ME. Or you can do a proper transition (see below).

Since there is a section flag with the positive semantics, there is zero chance the flag with the negative semantics would be accepted.
The name also looks irony, rather than a real proposal.

Yes, the name is obviously not what you'd use. It was deliberately chosen to be extremely clear what it meant. Though I don't see what the problem with the semantics would be, force-yes/force-no/default is a pretty standard tri-state all over the place, having both flags would just encode that (albeit wasting the fourth possible state, but don't see how you can avoid that unless there's some other useful option).

That is irrelevant. This is not about what libLLVM version it links against. This is about what the system linker is, which could be BFD, gold or LLD, and has zero connection to the libLLVM version used.

My point is that every time they upgrade llvm-project, they already need to deal with changes.
As experienced toolchain developers they are in a better position detecting and fixing the issues.
If you find a https://github.com/ldc-developers/ldc issue, the more productive way is to open an issue there.

Re-read what I said. "Every time they upgrade llvm-project" is irrelevant. It could be built against libLLVM13 or libLLVM3, it doesn't matter, that has zero bearing on what version /usr/bin/ld.lld is.

Well, my point persists. ldc developers are in a better position fixing the problem. They even contacted me for investigating some LTO problems so they know whether to get help if needed.

This is not how you do a transition. You do a transition by adding the option, then _several years later_ flipping the default so that it can just be assumed to exist. You _can't_ add the option and flip the default _in the same release cycle_, that leads to a mess.

Not for this case. For this case fixing early can make sure the whole ecosystem takes the least pain. Waiting larger would create a larger disconnection between metadata section users and packages potentially relying on the unfortunate GNU ld behavior.

If it's so painful, why has metadata been using the scheme it does all these years? Surely it should've just picked a better design that was more efficient back then, and more efficient even today with GNU ld that still has the traditional behaviour? This really feels to me like solving the problem from the wrong end; you can't just come in and declare an old thing broken just so your new thing that you chose to work this way rather than some other way works better, you're supposed to design things appropriately based on the constraints that exist.

Well, the encapsulation symbol design is quite the good. The only unfortunate thing was that GNU ld somehow broke it.
I think the current model can serve us for many years from now on.

There's nothing fundamentally wrong with either model. The issue is when you change what the model is without providing a long enough transition period.

FWIW I still don't see why it is a mess. It is an easy change dispatching on the availability of the option, (slightly worse) dispatching on LLD version.

If it's so easy then please provide a robust patch for LDC that's acceptable to upstream and doesn't rely on brittle configure-time detection of LLD that ties it to the exact version of LLD in the environment it was built.

I opened https://github.com/ldc-developers/ldc/issues/3861
If you think there is a issue, please comment there.

It seems that ldc places __minfo global variables in llvm.used, so what's the problem? Using llvm-project<13 library with ld.lld>=13.0.0?

Yes. The version of libLLVM used by LDC, and the version of LLD, are completely independent; I wouldn't even be surprised if they're different major versions more often than they're the same major version.

MaskRay added a comment.EditedNov 4 2021, 2:50 PM

The issue is when you change what the model is without providing a long enough transition period.

I mentioned there would be more problems doing that way.


Mach-O ld64 uses the same model as current ld.lld -z start-stop-gc.
The way ldc uses __start___minfo is also incompatible with GNU ld 2015-10.
There are several ways to make FreeBSD work.

Finally, as I mentioned in the ldc issue, the llvm-project<13.0.0 and ld.lld>=13.0.0 is not supported way using LTO (probably weaker than "unsupported" but they cannot complain if LTO doesn't work).

jrtc27 added a comment.Nov 4 2021, 3:03 PM

Mach-O ld64 uses the same model as current ld.lld -z start-stop-gc.

This isn't relevant. I am not objecting to the semantics, I am objecting to the *timeline* for the *change* in semantics. Mach-O has always (I assume) had those semantics, and had ways to deal with them. The ways to deal with them for ELF and the change in semantics happened back-to-back, rather than introducing the mechanism for explicitly retaining the sections but keeping the old default for several years so that when you come time to change the default there are no issues with assuming the mechanism to retain the sections exists. I feel like a broken record repeating this but you don't seem to acknowledge this as being my complaint so I have to keep telling you why your characterisation of my objection is inaccurate.

The way ldc uses __start___minfo is also incompatible with GNU ld 2015-10.

Which was deemed a bug in GNU ld, and got fixed. There have been bugs in its --gc-sections implementation over the years that got fixed, that doesn't mean everything that didn't used to work is regarded as something you shouldn't do.

There are several ways to make FreeBSD work.

Finally, as I mentioned in the ldc issue, the llvm-project<13.0.0 and ld.lld>=13.0.0 is not supported way using LTO (probably weaker than "unsupported").

I don't know why you're talking about LTO all of a sudden, we've been talking about using ld.lld --gc-sections on plain already-compiled-to-machine-code .o files, no LTO in sight (well, GC'ing sections is _technically_ optimisation done at link time, but so are all manner of things linkers do, and they're not what people mean by LTO).

MaskRay added a comment.EditedNov 4 2021, 4:20 PM

Mach-O ld64 uses the same model as current ld.lld -z start-stop-gc.

This isn't relevant. I am not objecting to the semantics, I am objecting to the *timeline* for the *change* in semantics. Mach-O has always (I assume) had those semantics, and had ways to deal with them. The ways to deal with them for ELF and the change in semantics happened back-to-back, rather than introducing the mechanism for explicitly retaining the sections but keeping the old default for several years so that when you come time to change the default there are no issues with assuming the mechanism to retain the sections exists. I feel like a broken record repeating this but you don't seem to acknowledge this as being my complaint so I have to keep telling you why your characterisation of my objection is inaccurate.

I kept repeating because FreeBSD' ldc usage is on the edge of what I consider supported interfaces.
In other ways, even if it is supported, I think it is barely.

This is a version lockstep question. How well can an LLD of version X support llvm-project of version Y?

When X is smaller than Y, it is largely considered "may not work". That's why asan/etc can use SHF_LINK_ORDER before the features are well ready. LTO for this case definitely can't work.

When X is larger than Y, for regular object files, this is "usually should work unless Y is too small".
When X is compiler-rt instead of LLD, this is outright "unsupported". We have such a large expectation on the linker because that is our expectation for ABI and how compilers/linkers collaborate.
The ldc usage somewhat sits between regular object file and LTO to me, even though it does just plain regular object file generation.

That said, I agree the state is unfortunate.
I failed to consider the llvm-project<13 and LLD>=13.0.0 situation and I apologized for the trouble.
However, the blast radius is small:

  • llvm-project<13 and LLD>=13.0.0 is uncommon
  • C identifier name sections usage is minor
  • ldc relies on the newish (unfortunate) GNU ld behavior which is not even a traditional behavior.
  • ldc has existing code disabling --gc-sections for PGO.
  • ldc has LDC_WITH_LLD. If it wants to use LTO, bundling lld and probably considering LDC_WITH_LLD is the way forward.
  • In https://github.com/ldc-developers/ldc/issues/3861, I have listed many ways FreeBSD folks can move forward.

Again, the window of time with potentially problems is finite and evidences from some large LLD adopters show that we've mostly moved outside the period.
With these new large adopters using the new behavior (actually traditional behavior and Mach-O's behavior), we can mostly make ensure that such future brittle dependency on our "weak support interfaces" will very unlikely happen.

The way ldc uses __start___minfo is also incompatible with GNU ld 2015-10.

Which was deemed a bug in GNU ld, and got fixed. There have been bugs in its --gc-sections implementation over the years that got fixed, that doesn't mean everything that didn't used to work is regarded as something you shouldn't do.

There are several ways to make FreeBSD work.

Finally, as I mentioned in the ldc issue, the llvm-project<13.0.0 and ld.lld>=13.0.0 is not supported way using LTO (probably weaker than "unsupported").

I don't know why you're talking about LTO all of a sudden, we've been talking about using ld.lld --gc-sections on plain already-compiled-to-machine-code .o files, no LTO in sight (well, GC'ing sections is _technically_ optimisation done at link time, but so are all manner of things linkers do, and they're not what people mean by LTO).

hvdijk added a comment.Nov 4 2021, 4:31 PM

This is a version lockstep question. How well can an LLD of version X support llvm-project of version Y?

There is also the question of how well LLD works together with GCC, specifically the most recently released LLD together with the most recently released GCC. This used to work, this is something I use myself, and right now, this is broken with no good way for code to work around it. When I submitted my MR for NetworkManager, I left this broken, I only fixed the clang+LLD case, and I highly suspect the fact that GCC+LLD is now broken will be considered by others to reflect worse on LLD than on GCC.

@tstellar asked for a bug to be created where this can be discussed further. @jrtc27, did you create one already? If not, would you prefer I do so?

jrtc27 added a comment.Nov 4 2021, 4:35 PM

Mach-O ld64 uses the same model as current ld.lld -z start-stop-gc.

This isn't relevant. I am not objecting to the semantics, I am objecting to the *timeline* for the *change* in semantics. Mach-O has always (I assume) had those semantics, and had ways to deal with them. The ways to deal with them for ELF and the change in semantics happened back-to-back, rather than introducing the mechanism for explicitly retaining the sections but keeping the old default for several years so that when you come time to change the default there are no issues with assuming the mechanism to retain the sections exists. I feel like a broken record repeating this but you don't seem to acknowledge this as being my complaint so I have to keep telling you why your characterisation of my objection is inaccurate.

I kept repeating because FreeBSD' ldc usage is on the edge of what I consider supported interfaces.
In other ways, even if it is supported, I think it is barely.

This is a version lockstep question. How well can an LLD of version X support llvm-project of version Y?

When X is smaller than Y, it is largely considered "may not work". That's why asan/etc can use SHF_LINK_ORDER before the features are well ready. LTO for this case definitely can't work.

When X is larger than Y, for regular object files, this is "usually should work unless Y is too small".
When X is compiler-rt instead of LLD, this is outright "unsupported". We have such a large expectation on the linker because that is our expectation for ABI and how compilers/linkers collaborate.
The ldc usage somewhat sits between regular object file and LTO to me, even though it does just plain regular object file generation.

I agree that I failed to consider the llvm-project<13 and LLD>=13.0.0 situation and I apologized for the trouble.
That said, I think the blast radius is small:

  • llvm-project<13 and LLD>=13.0.0 is uncommon
  • C identifier name sections usage is minor
  • ldc uses newish GNU ld feature which is not a traditional behavior.
  • ldc has existing code disabling --gc-sections for PGO.
  • ldc has LDC_WITH_LLD. If it wants to use LTO, bundling lld and probably considering LLD is the way forward.
  • In https://github.com/ldc-developers/ldc/issues/3861, I have listed many ways FreeBSD folks can move forward.

Again, the window of time with potentially problems is finite and evidences from some large LLD adopters show that we've mostly moved outside the period.
With these new large adopters using the new behavior (actually traditional behavior and Mach-O's behavior), we can mostly make ensure that such future brittle dependency on our "weak support interfaces" will very unlikely happen.

I do not think your opinion is shared with other LLVM developers. It's extremely common for compilers to use an older version of libLLVM; look at Rust, GHC, LDC, Gollvm, they all tend to lag in updating to newer libLLVM versions. LLD and LLVM live in the same repository under the llvm-project umbrella, but there is absolutely no reason why they should be tied to one another; the interface between .o files and the linker is supposed to be stable. Moreover, the fact that LDC uses libLLVM is an implementation detail (albeit in the name); it really should not matter how the compiler generates object files, it could be via libLLVM, libgccjit, printing textual assembly fed into the system assembler or some hand-rolled code emitter, it does not change the situation that changing the semantics of the linker breaks any existing compilers, whatever way they generate code. All of those cases are broken unless the projects in question have (a) updated to use SHF_GNU_RETAIN and/or -z no-start-stop-gc (b) dropped compatibility with pre-2020/1 linkers due to using SHF_GNU_RETAIN and/or -z no-start-stop-gc (or added ugly code to do run-time detection of the linker that has previously not been required due to actually maintaining a stable interface between compiler and linker, only breaking that interface in a very gradual, carefully-planned manner). So no, this argument is highly flawed.

jrtc27 added a comment.Nov 4 2021, 4:36 PM

This is a version lockstep question. How well can an LLD of version X support llvm-project of version Y?

There is also the question of how well LLD works together with GCC, specifically the most recently released LLD together with the most recently released GCC. This used to work, this is something I use myself, and right now, this is broken with no good way for code to work around it. When I submitted my MR for NetworkManager, I left this broken, I only fixed the clang+LLD case, and I highly suspect the fact that GCC+LLD is now broken will be considered by others to reflect worse on LLD than on GCC.

@tstellar asked for a bug to be created where this can be discussed further. @jrtc27, did you create one already? If not, would you prefer I do so?

Yes, https://bugs.llvm.org/show_bug.cgi?id=52384, marked as blocking for 13.0.1 so a decision is made one way or the other before we unleash the "real" release on the world.

hvdijk added a comment.Nov 4 2021, 4:50 PM

Yes, https://bugs.llvm.org/show_bug.cgi?id=52384, marked as blocking for 13.0.1 so a decision is made one way or the other before we unleash the "real" release on the world.

Thanks, appreciated. Whatever is decided for the 13.x branch, the earlier the release candidate it can be decided for, the better, in my opinion, so that if this problem is seen in more projects, we can hint whether incomplete fixes will need to be applied, or whether they can wait to update their code until a better update path is available.

I completely agree with @jrtc27 on this, specifically:

This is a version lockstep question. How well can an LLD of version X support llvm-project of version Y?

There is *no* dependency between the version of any consumer of the LLVM CodeGen libraries and and a specific linker version, LLVM or otherwise, except in three cases:

  • A specific linker version may be unsupported because it has bugs.
  • A specific linker version may be the minimum supported version for an opt-in feature in the compiler.
  • If doing LTO, LLD must be no older than the compiler (this one falls out from the bitcode compatibility guarantees).

In particular, using LLVM in your compiler front-end does not impose a requirement to use LLD as your system linker. Historically this was because LLD didn't exist but now it remains an important use case. Typically, the linker is a global system component for a given target, whereas language front ends may or may not be. If LLD had to be version locked with clang, rustc, ldc, and every other LLVM-based compiler front end then it would be almost impossible for an LLVM-based compiler ecosystem to exist. It must be possible to upgrade compilers and link modules generated by the system's C/C++ compiler (which may or may not be clang) using the system's linker (which may or may not be LLD).

What are the downsides of reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 ?

MaskRay added a comment.EditedNov 8 2021, 9:51 AM

What are the downsides of reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 ?

@tstellar Various metadata section users will get affected because they will see unneeded size increase.
Such users typically use Clang solely and use the main branch so if you keep the revert to release/13.x branch, that'll be fine.
Another disadvantage is that more software can incorrectly leverage the -z nostart-stop-gc behavior before LLD 14.0.0 comes out. There is a risk dealing with more software for LLD 14.0.0.

People objecting here neglected the point that the GNU ld behavior was also new 2015-10.
Its traditional behavior was the same as the -z start-stop-gc default.

@theraven My point is based on many conditions: the adoption of the C identifier name section, how likely the 2015-10 GNU ld behavior may be exploited by newer software, the GNU ld traditional behavior, Mach-O ld64 behavior, how likely people use llvm-project<13.0.0 library with 13.0.0 system linker, whether the symptom has good discoverability and workarounds, etc.
It's not that I think we should have an unneeded "version lockstep".

That said, I have spent so many hours on what I could use for more productive things. I have also got tired on some hostile comments.

@tstellar if the revert is release/13.x only, I am fine.
We just take the risk to deal with more software leveraging the new GNU ld behavior when LLD 14.0.0 comes out.

hvdijk added a comment.Nov 8 2021, 4:28 PM

What are the downsides of reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 ?

@tstellar Various metadata section users will get affected because they will see unneeded size increase.
Such users typically use Clang solely and use the main branch so if you keep the revert to release/13.x branch, that'll be fine.

If we revert 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 on main as well, such users will be able to explicitly add -z start-stop-gc to their linker options to reduce the size again; that change would stay in. In my opinion, adding that option is something that would be good for them to do regardless of which direction LLD takes, because they cannot rely on it being the default as long as other linkers are in widespread use where that is not the default. As far as I can tell, GNU have not changed their default, they default to -z nostart-stop-gc, and in your report to them (https://sourceware.org/bugzilla/show_bug.cgi?id=27451) you suggested that they may never want to change that default.

Another disadvantage is that more software can incorrectly leverage the -z nostart-stop-gc behavior before LLD 14.0.0 comes out. There is a risk dealing with more software for LLD 14.0.0.

As has been mentioned earlier, whether such use is incorrect is one of the things not everybody here agrees on, but otherwise this is true, there is a chance that more software will be released that will require changes to work as intended under -z start-stop-gc.

People objecting here neglected the point that the GNU ld behavior was also new 2015-10.
Its traditional behavior was the same as the -z start-stop-gc default.

The original GNU ld behaviour was regarded by them as a bug, it broke software, and fixing that bug was a relatively safe change: *not* removing sections when they could be removed tends to only increase size, doing no harm otherwise.

https://sourceware.org/bugzilla/show_bug.cgi?id=11133
https://sourceware.org/bugzilla/show_bug.cgi?id=19161
https://sourceware.org/bugzilla/show_bug.cgi?id=19167

You may disagree on whether they were correct to classify it as a bug, but given that their developers considered it as such, it seems perfectly reasonable to me for other developers to take them at their word that the new behaviour was intentional and something users can rely on.

@theraven My point is based on many conditions: the adoption of the C identifier name section, how likely the 2015-10 GNU ld behavior may be exploited by newer software, the GNU ld traditional behavior, Mach-O ld64 behavior, how likely people use llvm-project<13.0.0 library with 13.0.0 system linker, whether the symptom has good discoverability and workarounds, etc.
It's not that I think we should have an unneeded "version lockstep".

That said, I have spent so many hours on what I could use for more productive things. I have also got tired on some hostile comments.

Please understand that I have no interest in having such a long discussion myself. But, we're in a lousy situation where things are broken, I saw no good way out of that other than by reverting your commit, and although your comments suggest you do, I am not seeing how, so the only choices I have are just reverting your change unilaterally (which is obviously a bad idea) or continuing this discussion.

As for the tone, if this is also referring to me and this is something you would like to discuss, feel free to reach out privately.

Note that I now do see a possible third option though, depending on whether other people are willing to help: we could make the default linker behaviour a CMake option defaulting to the current GNU behaviour, and once we have that CMake option and get it into a release, work with a distro to do a mass rebuild with LLD with the default flipped, see what is still broken and take it from there.

MaskRay added a comment.EditedNov 8 2021, 5:01 PM

What are the downsides of reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 ?

@tstellar Various metadata section users will get affected because they will see unneeded size increase.
Such users typically use Clang solely and use the main branch so if you keep the revert to release/13.x branch, that'll be fine.

If we revert 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 on main as well, such users will be able to explicitly add -z start-stop-gc to their linker options to reduce the size again; that change would stay in. In my opinion, adding that option is something that would be good for them to do regardless of which direction LLD takes, because they cannot rely on it being the default as long as other linkers are in widespread use where that is not the default. As far as I can tell, GNU have not changed their default, they default to -z nostart-stop-gc, and in your report to them (https://sourceware.org/bugzilla/show_bug.cgi?id=27451) you suggested that they may never want to change that default.

The revert-on-main choice would place the burden on the wrong side. Many llvm-project users solely use LLD.
"Making LLD links better while making GNU ld correct but not optimal" is better than "making LLD links suffer the same way as GNU ld>=2015-10".

Another disadvantage is that more software can incorrectly leverage the -z nostart-stop-gc behavior before LLD 14.0.0 comes out. There is a risk dealing with more software for LLD 14.0.0.

As has been mentioned earlier, whether such use is incorrect is one of the things not everybody here agrees on, but otherwise this is true, there is a chance that more software will be released that will require changes to work as intended under -z start-stop-gc.

People objecting here neglected the point that the GNU ld behavior was also new 2015-10.
Its traditional behavior was the same as the -z start-stop-gc default.

The original GNU ld behaviour was regarded by them as a bug, it broke software, and fixing that bug was a relatively safe change: *not* removing sections when they could be removed tends to only increase size, doing no harm otherwise.

https://sourceware.org/bugzilla/show_bug.cgi?id=11133
https://sourceware.org/bugzilla/show_bug.cgi?id=19161
https://sourceware.org/bugzilla/show_bug.cgi?id=19167

You may disagree on whether they were correct to classify it as a bug, but given that their developers considered it as such, it seems perfectly reasonable to me for other developers to take them at their word that the new behaviour was intentional and something users can rely on.

I didn't want to reference binutils folks' names directly :(
Alan Modra considered the traditional GNU ld behavior correct in 2011.
One user who was involved in the discussion made gold to use the -z nostart-stop-gc behavior but did not think hard about potential size problems.
HJ Lu went ahead and implemented the -z nostart-stop-gc behavior but he did not notice that the conservative behavior could make metadata section GC unfortunate and (eventually) enabled the current -z nostart-stop-gc behavior in 2015-10 (there was an attempt in 2011 but that did not appear to work).

If we take into account the ld64 behavior, it is probably more obvious that the 2015-10 GNU ld change was problematic.

@theraven My point is based on many conditions: the adoption of the C identifier name section, how likely the 2015-10 GNU ld behavior may be exploited by newer software, the GNU ld traditional behavior, Mach-O ld64 behavior, how likely people use llvm-project<13.0.0 library with 13.0.0 system linker, whether the symptom has good discoverability and workarounds, etc.
It's not that I think we should have an unneeded "version lockstep".

That said, I have spent so many hours on what I could use for more productive things. I have also got tired on some hostile comments.

Please understand that I have no interest in having such a long discussion myself. But, we're in a lousy situation where things are broken, I saw no good way out of that other than by reverting your commit, and although your comments suggest you do, I am not seeing how, so the only choices I have are just reverting your change unilaterally (which is obviously a bad idea) or continuing this discussion.

As for the tone, if this is also referring to me and this is something you would like to discuss, feel free to reach out privately.

Note that I now do see a possible third option though, depending on whether other people are willing to help: we could make the default linker behaviour a CMake option defaulting to the current GNU behaviour, and once we have that CMake option and get it into a release, work with a distro to do a mass rebuild with LLD with the default flipped, see what is still broken and take it from there.

This is an option, but I have mentioned many times that I think the blast radius is small and adding more complexity would probably be difficult to be justified.
That would make some folks (whom I fail to convince) happy.
If making such a change specifically for FreeBSD can help me, I am open to helping them, but I think the change is so trivial that they can handle it by themselves.

I have asked many folks and believe that GCC plus system LLD is very uncommon.
You seemed to imply that your distro (but didn't mention it) uses such a configuration.
Even said that, I think the chance that there is a problem is low.

I'd be reluctant but can afford adding such a CMake option as I think the utility of such an option would be low.

There is a fourth option. FreeBSD can cherry pick 47c5576d7d586624c38f76bd3168e05f6ef1f838 to their 12.0.* package.
llvm.used will use SHF_GNU_RETAIN and it will work with current -z start-stop-gc behavior.
That will fix ldc and any LLVM IR producers folks might be concerned with. (Again: I think the utility would be low).

To me it seems like this should be reverted in main so that we can come up with a transition plan for the new functionality that everyone can agree on, but I also think adding a CMake option to control the behavior would be OK as long as the default was the old behavior.

@MaskRay I understand your motivation for keeping this change, but I think lld is too popular at this point to make this kind of change without a longer transition plan for users to adjust. I am also concerned about the possibility that this would break gcc + lld, and it would be nice to have more testing here (I can help with this if you want since this is a configuration I care about).

MaskRay added a comment.EditedNov 8 2021, 10:15 PM

To me it seems like this should be reverted in main so that we can come up with a transition plan for the new functionality that everyone can agree on, but I also think adding a CMake option to control the behavior would be OK as long as the default was the old behavior.

@MaskRay I understand your motivation for keeping this change, but I think lld is too popular at this point to make this kind of change without a longer transition plan for users to adjust. I am also concerned about the possibility that this would break gcc + lld, and it would be nice to have more testing here (I can help with this if you want since this is a configuration I care about).

@tstellar The point I repeated multiple times is that I think the blast radius is very small.
People kept ignoring that there isn't much software relying on the GNU ld>=2015-10 behavior.
One (but not all) argument this is rare is: projects with gcc+lld problems have problems with gcc+GNU ld<2015-10, too.
The evidence from many big groups adopting rolling-released LLD is another major argument.

Surely I want to make LLD work for more components of GNU toolchain (please find the various GCC fixes I have made and the glibc 2.35 with LLD 13 work I did).
I tried to be impartial with the information I collected. See all the points I said to theraven.

Some people exaggerated the problem.
They found ldc and NetworkManager and tried to generalize that to more things.
Sorry, it doesn't generalize that way.

I would even say (for my previous message): if you wanted to revert release/13.x,
show evidence that sufficient software has gcc+LLD 13.x regression (compared with LLD 12.x) (because I don't think so).
I didn't say that but rather gave a green light for a release/13.x revert to pull myself outside of the unnecessary debate.

For main, a revert-on-main would be a BIG regression on Clang+LLD metadata section usability.
Please don't do that.

If people want to add a CMake option on release/13.x, that works fine with me,
too, if distro people are just so fond of adding more complexity to the upstream
to avoid any even temporary local patch they may carry.

@tstellar If you can test GCC+LLD 12.x/13.x on Fedora, that'd certainly be nice.
We can ask some Gentoo people for help, too. Some people may want to use LDFLAGS=-fuse-ld=lld for non-GCC-LTO packages.

(I had a Xfinity network problem so might be slow to reply.)

MaskRay added a comment.EditedNov 9 2021, 1:24 AM

@tstellar I forgot one point the said breakage (for some software not working with GNU ld<2015-10) requires -fdata-sections -Wl,--gc-sections, which isn't common among distros.
The symptom is obvious: error: undefined symbol: __start_XXX. The fix is straightforward: remove -Wl,--gc-sections or use -z nostart-stop-gc.
So now I am not even sure it makes sense to revert it for release/13.x.

I try to be impartial and try to think of many factors. That is it.

I am asking around among Gentoo folks. One Clang+ld.lld user told me that they need to use GCC for many packages but they haven't found anything bad with ld.lld 13.0.0

@MaskRay To be clear, I'm talking about reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 not this patch. Again, I understand your motivations for keeping the default as-is, but we don't have consensus on this change and the policy for this project is to revert until a consensus can be reached. Also, the fact the this patch was approved on the condition that the default would stay the same, but then the default was changed without discussion is a pretty strong reason to revert. Can we please revert 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 in trunk and continue the discussions in this thread?

@jrtc27 @hvdijk Do you have a suggested timeline for transitioning to the new default?

jrtc27 added a comment.EditedNov 12 2021, 11:06 AM

@MaskRay To be clear, I'm talking about reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 not this patch. Again, I understand your motivations for keeping the default as-is, but we don't have consensus on this change and the policy for this project is to revert until a consensus can be reached. Also, the fact the this patch was approved on the condition that the default would stay the same, but then the default was changed without discussion is a pretty strong reason to revert. Can we please revert 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 in trunk and continue the discussions in this thread?

@jrtc27 @hvdijk Do you have a suggested timeline for transitioning to the new default?

My instinct would be something like a couple of years from the date the new flags first appeared in an LLVM release, given that's the rough release cycle length for many Linux distros, so targeting LLVM 17 assuming the current cadence continues; anything less than that and your affected software won't be able to support LLD on even the latest release of some distros.

@jrtc27 @hvdijk Do you have a suggested timeline for transitioning to the new default?

For what I had seen, I'd be happy with the change once a GCC version that handles __has_attribute properly is out. That way, there might still be breakage, but there'll be a clear path forwards on getting code fixed. I am not aware of GCC's release plans, but I'd like to think it should probably be possible to get that fixed in time for GCC 11.3, which I would expect to be released in time to flip the default in LLVM 14.

I haven't had to deal with the same things as @jrtc27 though; I am not suggesting that those do not warrant a longer transition time.

MaskRay added a comment.EditedNov 12 2021, 11:33 AM

@MaskRay To be clear, I'm talking about reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 not this patch. Again, I understand your motivations for keeping the default as-is, but we don't have consensus on this change and the policy for this project is to revert until a consensus can be reached. Also, the fact the this patch was approved on the condition that the default would stay the same, but then the default was changed without discussion is a pretty strong reason to revert. Can we please revert 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 in trunk and continue the discussions in this thread?

@jrtc27 @hvdijk Do you have a suggested timeline for transitioning to the new default?

@tstellar I know you are talking about 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619, but I am afraid you may have missed some important discussions, e.g. https://bugs.llvm.org/show_bug.cgi?id=52384#c1
https://reviews.llvm.org/D96914#3115988 and https://reviews.llvm.org/D96914#3117455 .
I'd appreciate if others can read them first.
In particular, I have mentioned that this default switch been discussed with several groups, in the absence of jrtc27's agreement (who isn't a regular reviewer for lld/ELF code).

I don't think I necessarily agree with "revert until a consensus can be reached" if beyond reasonableness. The LLVM 17 release schedule proposed by @jrtc27 would just cause more trouble to FreeBSD if more newer software relies on the unfortunate GNU ld>=2015-10 behavior, even if it might make few LLVM IR code generation software (ldc which has a pending path out by its maintainer: https://github.com/ldc-developers/ldc/issues/3861) happy.
I have repeatedly mentioned that delaying the switch will just cause more problems for distro like FreeBSD. I hope you can seek advice from @emaste and @dim for FreeBSD matter.
Let me emphasize again that distro-default --gc-sections isn't a common thing.
Well, I have asked help for some Gentoo Linux folks to test this configuration.
Their work will benefit all other distros.

On one hand, I am glad that LLD is so widely used; on the other hand, a decision may not be possible to make every group happy and we need to make right call.
Please don't overly emphasize a very rare thing just to make all other important use cases suffer.


I can accept @hvdijk's "enabling -z start-stop-gc behavior for LLD 14" timeline, which only means a revert on release/13.x, without touching main.

@MaskRay To be clear, I'm talking about reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 not this patch. Again, I understand your motivations for keeping the default as-is, but we don't have consensus on this change and the policy for this project is to revert until a consensus can be reached. Also, the fact the this patch was approved on the condition that the default would stay the same, but then the default was changed without discussion is a pretty strong reason to revert. Can we please revert 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 in trunk and continue the discussions in this thread?

@jrtc27 @hvdijk Do you have a suggested timeline for transitioning to the new default?

@tstellar I know you are talking about 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619, but I am afraid you may have missed some important discussions, e.g. https://bugs.llvm.org/show_bug.cgi?id=52384#c1
https://reviews.llvm.org/D96914#3115988 and https://reviews.llvm.org/D96914#3117455 .
I'd appreciate if others can read them first.
In particular, I have mentioned that this default switch been discussed with several groups, in the absence of jrtc27's agreement (who isn't a regular reviewer for lld/ELF code).

I'm a regular contributor, pretty familiar with the code base, knowledgeable about ELF and reviewed this specific patch. There is no need to attempt to undermine me like that.

I don't think I necessarily agree with "revert until a consensus can be reached" if beyond reasonableness. The LLVM 17 release schedule proposed by @jrtc27 would just cause more trouble to FreeBSD if more newer software relies on the unfortunate GNU ld>=2015-10 behavior,

So spit out a warning that the default is going to change if we detect --gc-sections + start_/stop_ symbols + no explicit -z (no)start-stop-gc? That's probably a good idea to do regardless.

even if it might make few LLVM IR code generation software (ldc which has a pending path out by its maintainer: https://github.com/ldc-developers/ldc/issues/3861) happy.

Again, even if it newer versions of LDC get "fixed" (which they will anyway by virtue of using LLVM 13 for CodeGen), this does not "fix" older versions of LDC that exist in the wild.

I have repeatedly mentioned that delaying the switch will just cause more problems for distro like FreeBSD. I hope you can seek advice from @emaste and @dim for FreeBSD matter.
Let me emphasize again that distro-default --gc-sections isn't a common thing.

We don't have a distro-default --gc-sections. The software affected is software that uses --gc-sections, or static libraries that are linked into binaries from a different source that use it.

On one hand, I am glad that LLD is so widely used; on the other hand, a decision may not be possible to make every group happy and we need to make right call.
Please don't overly emphasize a very rare thing just to make all other important use cases suffer.

I would argue both are currently important, and that not breaking existing commonplace software is more important than optimising for a newer use case.

I can accept @hvdijk's "enabling -z start-stop-gc behavior for LLD 14" timeline, which only means a revert on release/13.x, without touching main.

I can accept @hvdijk's "enabling -z start-stop-gc behavior for LLD 14" timeline, which only means a revert on release/13.x, without touching main.

Sorry for being unclear, I meant reverting it on both 13.x and main, fixing GCC, waiting for a release (but communicating with GCC devs on their plans and making sure the wait doesn't take unreasonably long), and then re-enabling on main. Ideally that would result in re-enabling in time for LLVM 14, and I'd certainly try to get that done, but it's not a hard guarantee.

So spit out a warning that the default is going to change if we detect --gc-sections + start_/stop_ symbols + no explicit -z (no)start-stop-gc? That's probably a good idea to do regardless.

That's a good idea when dealing with older compilers, but when up-to-date compilers are used, it would be nice if no warnings are emitted, as with up-to-date compilers in my opinion it's the code that should be changed as needed rather than the invocation. Is there a reliable way to tell that an object file was created using SHF_GNU_RETAIN-aware tools, regardless of whether the object file actually uses SHF_GNU_RETAIN?

So spit out a warning that the default is going to change if we detect --gc-sections + start_/stop_ symbols + no explicit -z (no)start-stop-gc? That's probably a good idea to do regardless.

That's a good idea when dealing with older compilers, but when up-to-date compilers are used, it would be nice if no warnings are emitted, as with up-to-date compilers in my opinion it's the code that should be changed as needed rather than the invocation. Is there a reliable way to tell that an object file was created using SHF_GNU_RETAIN-aware tools, regardless of whether the object file actually uses SHF_GNU_RETAIN?

What the warning suggests as a fix is entirely TBD; can suggest any and all of source, codegen and linker flag changes

What the warning suggests as a fix is entirely TBD; can suggest any and all of source, codegen and linker flag changes

I meant that implementing what you proposed would result in a warning in cases where my thinking is there should be no warning at all, so was hoping would be a way of detecting those cases and suppressing the warning for those cases. I wasn't thinking about what the warning would say.

What the warning suggests as a fix is entirely TBD; can suggest any and all of source, codegen and linker flag changes

I meant that implementing what you proposed would result in a warning in cases where my thinking is there should be no warning at all, so was hoping would be a way of detecting those cases and suppressing the warning for those cases. I wasn't thinking about what the warning would say.

Oh you mean the case where people do want things that look like linker sets to be GC'ed. Those cases could always explicitly opt in, or just ignore the warnings. I don't think there's a good answer to that (otherwise we'd use that exact same logic to determine which behaviour to use in the first place...).

@MaskRay To be clear, I'm talking about reverting 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 not this patch. Again, I understand your motivations for keeping the default as-is, but we don't have consensus on this change and the policy for this project is to revert until a consensus can be reached. Also, the fact the this patch was approved on the condition that the default would stay the same, but then the default was changed without discussion is a pretty strong reason to revert. Can we please revert 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 in trunk and continue the discussions in this thread?

@jrtc27 @hvdijk Do you have a suggested timeline for transitioning to the new default?

@tstellar I know you are talking about 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619, but I am afraid you may have missed some important discussions, e.g. https://bugs.llvm.org/show_bug.cgi?id=52384#c1
https://reviews.llvm.org/D96914#3115988 and https://reviews.llvm.org/D96914#3117455 .
I'd appreciate if others can read them first.
In particular, I have mentioned that this default switch been discussed with several groups, in the absence of jrtc27's agreement (who isn't a regular reviewer for lld/ELF code).

I'm a regular contributor, pretty familiar with the code base, knowledgeable about ELF and reviewed this specific patch. There is no need to attempt to undermine me like that.

I don't think I necessarily agree with "revert until a consensus can be reached" if beyond reasonableness. The LLVM 17 release schedule proposed by @jrtc27 would just cause more trouble to FreeBSD if more newer software relies on the unfortunate GNU ld>=2015-10 behavior,

So spit out a warning that the default is going to change if we detect --gc-sections + start_/stop_ symbols + no explicit -z (no)start-stop-gc? That's probably a good idea to do regardless.

even if it might make few LLVM IR code generation software (ldc which has a pending path out by its maintainer: https://github.com/ldc-developers/ldc/issues/3861) happy.

Again, even if it newer versions of LDC get "fixed" (which they will anyway by virtue of using LLVM 13 for CodeGen), this does not "fix" older versions of LDC that exist in the wild.

I have repeatedly mentioned that delaying the switch will just cause more problems for distro like FreeBSD. I hope you can seek advice from @emaste and @dim for FreeBSD matter.
Let me emphasize again that distro-default --gc-sections isn't a common thing.

We don't have a distro-default --gc-sections. The software affected is software that uses --gc-sections, or static libraries that are linked into binaries from a different source that use it.

On one hand, I am glad that LLD is so widely used; on the other hand, a decision may not be possible to make every group happy and we need to make right call.
Please don't overly emphasize a very rare thing just to make all other important use cases suffer.

I would argue both are currently important, and that not breaking existing commonplace software is more important than optimising for a newer use case.

This isn't a new use case. GNU ld's -z start-stop-gc behavior (even if it did not have the option) was traditional and had been there for a very long time, until the 2015-10 commit moved it to -z nostart-stop-gc.

Upgrading compiler -> more aggressive on optimizations on exploiting UB -> software has to fix UB. This has been pretty common.
If you count, software needs to adapt after compiler upgrade is so common.
To prove that it's an important use case, please show more evidence that many software will break. (Also, since --gc-sections isn't distro default, we don't yet know how much doesn't even work with GNU ld's --gc-sections.)

(I shall note that declaring reserved identifiers __start_ is UB in the first place. Well, the compiler has just always been just permissive.)

The traditional behavior also matches ld64's section$start$xxx$yyy behavior .

So spit out a warning that the default is going to change if we detect --gc-sections + start_/stop_ symbols + no explicit -z (no)start-stop-gc? That's probably a good idea to do regardless.

That's a good idea when dealing with older compilers, but when up-to-date compilers are used, it would be nice if no warnings are emitted, as with up-to-date compilers in my opinion it's the code that should be changed as needed rather than the invocation. Is there a reliable way to tell that an object file was created using SHF_GNU_RETAIN-aware tools, regardless of whether the object file actually uses SHF_GNU_RETAIN?

In the FreeBSD llvm-project<13.0.0 library plus LLD 13.0.0 use case, unfortunately there is no portable and reliable way.

A warning would be incompatible with all existing and new instrumentations which do want GC.

(I shall note that declaring reserved identifiers __start_ is UB in the first place. Well, the compiler has just always been just permissive.)

It's UB from a C and C++ standards POV, it's not UB from an LLVM POV. It's the LLVM POV that's relevant here, C and C++ UB that falls under a supported LLVM extension needs to behave as specified by that extension. Changes to the behaviour of that extension can be made (both in spec and in code), sure, but the fact that they are outside of the scope of the C and C++ standards doesn't automatically make them okay. If I were to commit a patch to clang to issue a hard error whenever an identifier starting with __ is used outside of a standard library header, I would expect that patch to be reverted promptly, and rightly so.

(I shall note that declaring reserved identifiers __start_ is UB in the first place. Well, the compiler has just always been just permissive.)

It's UB from a C and C++ standards POV, it's not UB from an LLVM POV. It's the LLVM POV that's relevant here, C and C++ UB that falls under a supported LLVM extension needs to behave as specified by that extension. Changes to the behaviour of that extension can be made (both in spec and in code), sure, but the fact that they are outside of the scope of the C and C++ standards doesn't automatically make them okay. If I were to commit a patch to clang to issue a hard error whenever an identifier starting with __ is used outside of a standard library header, I would expect that patch to be reverted promptly, and rightly so.

-Wreserved-identifier which is not in -Wall/-Wextra catches exactly this.

-Wreserved-identifier which is not in -Wall/-Wextra catches exactly this.

I wrote "[...] a patch to clang to issue a hard error whenever [...]" for a reason. -Wreserved-identifier does not do that, and I hope that you agree it would be absurd to change it to be enabled by other flags (except -Weverything), let alone promoted to an error, no matter what the C and C++ standards say: that would break things that we make work by design and we want to keep working, things that from LLVM POV are valid, including use of __start_*. (It also does not catch exactly the same cases that I described, but the details are irrelevant to this issue.)

We need to come to some kind of a consensus here. I think since this patch was approved on the condition that the default not change, we really need to revert the new default behavior. Here is my proposal:

  1. Add a CMake option to allow users to opt-in to the new behavior, but the default value for this option should be the old behavior.
  2. Start a thread on llvm-dev to discuss this change and a possible migration plan.
  3. Make the same change in release/13.x

@MaskRay I know this is an important feature, but we must follow LLVM process here. 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 was committed without review and with @jrtc27 nak'ing it the discussion of D96914. I think the CMake option is a good compromise. Are you able to make this change?

MaskRay added a comment.EditedNov 18 2021, 12:49 PM

We need to come to some kind of a consensus here. I think since this patch was approved on the condition that the default not change, we really need to revert the new default behavior. Here is my proposal:

  1. Add a CMake option to allow users to opt-in to the new behavior, but the default value for this option should be the old behavior.
  2. Start a thread on llvm-dev to discuss this change and a possible migration plan.
  3. Make the same change in release/13.x

@MaskRay I know this is an important feature, but we must follow LLVM process here. 6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 was committed without review and with @jrtc27 nak'ing it the discussion of D96914. I think the CMake option is a good compromise. Are you able to make this change?

A CMake option opting in the -z nostart-stop-gc behavior is fine. I created D114186.

I have to say this is reluctant as it adds complexity. But many people don't listen.

6d2d3bd0a61f5fc7fd9f61f48bc30e9ca77cc619 is not new behavior at all. It is what GNU ld<2015-10 does.
See https://reviews.llvm.org/D96914#3128046 and my previous comments.