This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][crt] Simple crtbegin and crtend implementation
ClosedPublic

Authored by phosek on Jan 16 2017, 6:32 PM.

Details

Summary

Clang relies on existence of certain symbols that are normally
provided by crtbegin.o/crtend.o. However, LLVM does not currently
provide implementation of these files, instead relying on either
libgcc or implementations provided as part of the system.

This change provides an initial implementation of crtbegin.o/crtend.o
that can be used on system that don't provide crtbegin.o/crtend.o as
part of their C library.

Diff Detail

Event Timeline

phosek created this revision.Jan 16 2017, 6:32 PM

AFAIK CMake doesn't support producing object files as final artifacts so I couldn't use the existing functions and I opted to use the custom command instead. If you know of a better solution, I'd be happy to change the implementation.

phosek edited the summary of this revision. (Show Details)Jan 16 2017, 6:34 PM

I don't think that this really belongs in compiler-rt, this is really libc specific. For example, what about hooking into the Java class registration (see glibc's JV section)?

On Linux, compiler supplies crtbegin.o and crtend.o (in case of GCC, they're part of libgcc), while the C library supplies crti.o and crtn.o. In case of existing Linux distributions, crtbegin.o and crtend.o always come from libgcc, but if we wanted to build a Clang based distribution, we need get crtbegin.o and crtend.o from somewhere else. Given the structure of LLVM project, compiler-rt seems like the right place. On other systems like Darwin where crtbegin.o and crtend.o are not used at all or NetBSD or OpenBSD where crtbegin.o and crtend.o are supplied by the C library these aren't needed which is why restricted this component to Linux for now.

beanz edited edge metadata.

Looping in Brad King and Stephen Kelly for CMake advice.

Brad & Stephen, for context we need to do something very odd that is probably only ever done by low-level runtime projects. We need to install object files. My first thought for handling this was to create an object library and a normal install action, but that doesn't work because CMake provides the error:

install TARGETS given OBJECT library "foo" which may not be installed.

My second thought to handle this was using the install(FILES ...) form with the TARGET_OBJECTS generator expression, but generator expressions aren't supported in that context. I suspect that what @phosek has written here is probably what we need to do with current versions of CMake, but any guidance from the CMake community would be greatly appreciated. Also, a better long-term solution would be appreciated too.

Thanks,
-Chris

joerg added a subscriber: joerg.Jan 17 2017, 2:23 PM

I fully agree that this doesn't belong into compiler-rt. Even then, it should at least be a working copy i.e. as found in NetBSD. This change is just adding a lot of complexity with negative net gain.

beanz added inline comments.Jan 17 2017, 2:23 PM
lib/crt/CMakeLists.txt
15 ↗(On Diff #84620)

Unless I'm missing something it looks like you're missing the actual install(...) call. :-)

beanz added a comment.Jan 17 2017, 3:15 PM

@joerg I completely disagree with you. Since libgcc provides those files for some platforms compiler-rt should as well.

I believe that one of the project goals of compiler-rt is to allow Clang + Compiler-RT to replace GCC + libGCC. We can't do that without these object files being produced, so I believe this complexity is necessary and justified.

loic.yhuel added inline comments.
lib/crt/CMakeLists.txt
9 ↗(On Diff #84620)

Perhaps we should have a "foreach (arch ${CRT_SUPPORTED_ARCH})" loop, and use ${TARGET_${arch}_CFLAGS}.
It would allow to add x86/arm/armhf.

11 ↗(On Diff #84620)

It's ${CMAKE_C_FLAGS}.

We probably should use something like :

string(TOUPPER ${CMAKE_BUILD_TYPE} CONFIG)
set(CFLAGS "${CMAKE_C_FLAGS} ${CMAKE_C_FLAGS_${CONFIG}}")
separate_arguments(CFLAGS)
joerg added a comment.Feb 1 2017, 12:42 PM

I still say the patch in this form is doing a lot more harm than good. Without a functional version of crtbegin.c/crtend.o, I absolutely object to this change.

I still say the patch in this form is doing a lot more harm than good. Without a functional version of crtbegin.c/crtend.o, I absolutely object to this change.

Can you elaborate on what do you mean by functional? In the current form, it's minimal but it's sufficient for systems like Linux. I'm happy to extend it to handle other aspects like .ctors/.dtors, but I thought it'd be better to handle it incrementally.

joerg added a comment.Feb 1 2017, 1:23 PM

It is functional only if a number of number of run time choices are made in the right way. That's exactly the reason why I am against shipping those files in first place: if you pick the wrong, you get a nicely linking binary that is magically broken much much later.

The choices available to the run time are:
(1) Should EH frame registration be attemped?
(2) Who is responsible for iterating .ctor/.dtor?
(3) Is __cxa_atexit supported by the platform?
(4) Are Java types supported?

The current version is noticably wrong for (3) on most modern systems, glibc include. This would prevent global dtors and the like in shared libraries be skipped.
Getting (1) wrong can break reasonable caching in libunwind, so it is also quite important.
Getting (2) wrong won't matter on newer glibc distros, but there are still enough systems that don't use INIT_ARRAY/FINI_ARRAY on non-EABI-ARM.

(4) is likely not relevant anymore.

phosek added a comment.Feb 2 2017, 1:58 PM

It is functional only if a number of number of run time choices are made in the right way. That's exactly the reason why I am against shipping those files in first place: if you pick the wrong, you get a nicely linking binary that is magically broken much much later.

The choices available to the run time are:
(1) Should EH frame registration be attemped?
(2) Who is responsible for iterating .ctor/.dtor?
(3) Is __cxa_atexit supported by the platform?
(4) Are Java types supported?

The current version is noticably wrong for (3) on most modern systems, glibc include. This would prevent global dtors and the like in shared libraries be skipped.
Getting (1) wrong can break reasonable caching in libunwind, so it is also quite important.
Getting (2) wrong won't matter on newer glibc distros, but there are still enough systems that don't use INIT_ARRAY/FINI_ARRAY on non-EABI-ARM.

(4) is likely not relevant anymore.

I'd definitely like to address (1), (2) and (3) which are needed on Linux. Right now, this is really critical for Fuchsia where we cannot use libgcc.a because of license and we need at least a minimal crtbegin.o/crtend.o. The current implementation is sufficient for our use case. Would it be acceptable to you if I restrict the current implementation only to Fuchsia to unblock us, and then continue working on this to address (1), (2) and (3)?

phosek updated this revision to Diff 86914.Feb 2 2017, 4:05 PM
phosek marked 2 inline comments as done.
phosek added inline comments.Feb 2 2017, 4:30 PM
lib/crt/CMakeLists.txt
15 ↗(On Diff #84620)

I've fixed that, as well as bunch of other issues that I revealed while testing this change, but this function started to resemble add_compiler_rt_runtime a lot, so I instead tried to modify add_compiler_rt_runtime to support OBJECTas an output type and it turned out to be pretty fairly straightforward. Let me know what you think about that approach.

phosek updated this revision to Diff 87328.Feb 6 2017, 3:48 PM
phosek added a comment.Feb 6 2017, 3:57 PM

We discussed this with @mcgrathr and came up with a possible alternative solution: Currently the only bit of crtbegin.o we're missing is the __dso_handle symbol because Clang generates a reference to it when compiling C++ code. That symbol has been historically defined in crtbegin.o, but there's no reason it has to be defined there. The only requirement is that this symbol is defined in a statically linked archive/object. That means that we could alternatively define this symbol in the builtins library; since this library is always linked as static, it satisfies the requirement, and it also means that if crtbegin.o is being linked in e.g. on Linux, linker will use the symbol from crtbegin.o and ignore the one defined in the builtins library.

We discussed this with @mcgrathr and came up with a possible alternative solution: Currently the only bit of crtbegin.o we're missing is the __dso_handle symbol because Clang generates a reference to it when compiling C++ code.

So only Fuchsia would be supported, Linux would still need external ctrbegin.o/crtend.o (from gcc by default, but for projects like https://blogs.gentoo.org/gsoc2016-native-clang/ there still won't be any good solution).

Btw, do you have a crtfastmath.o on Fuchsia ? It's optionnal for clang, but if it is missing performance could be slower than Linux for programs using -ffast-math.

phosek added a comment.Feb 7 2017, 7:06 PM

We don't use -ffast-math anywhere in Fuchsia at the moment (at least I haven't found any use cases) so we haven't needed crtfastmath.o yet, but we might start using it in the future in which case we'll probably need to write our own implementation and same as in the case of crtbegin.o and crtend.o I believe the right place for it would be compiler-rt.

joerg added a comment.Feb 8 2017, 5:28 AM

So only Fuchsia would be supported, Linux would still need external ctrbegin.o/crtend.o (from gcc by default, but for projects like https://blogs.gentoo.org/gsoc2016-native-clang/ there still won't be any good solution).

Btw, do you have a crtfastmath.o on Fuchsia ? It's optionnal for clang, but if it is missing performance could be slower than Linux for programs using -ffast-math.

I'd say start lobbying your libc implementation to provide a matching set. crtbegin.o/crtend.o have little to nothing
to do with the compiler and almost everything to do with choices of the dynamic linker and libc. Just because GCC
shipped them due to various political reasons, least of all coordination with vendors, doesn't mean that it is the correct
way to do it.

A good chunk of why I said that this implementation is not generally appropiate still applies.

As I mentioned before, and as @nbjoerg stated, this really belongs in your libc implementation. Since you stated that this is specifically for fuschia and you don't intend to account for the myriad of libc implementations and hosts, wouldn't it male sense to keep this in the fuschia repositories?

mcgrathr edited edge metadata.Feb 8 2017, 3:08 PM

The comments about this being part of libc implementation are exactly
wrong. There is little or nothing in crtbegin/crtend that has anything
to do with libc. It's entirely to do with what the compiler emits, what
the linker does, and what other runtime code (not libc) you are using:

(1) Should EH frame registration be attemped?

This has entirely to do with the ABI contract between the
compiler-generated unwind info and the unwinder runtime library.
The unwinder runtime is not part of libc.

In GCC, the unwinder is provided as part of libgcc.
Elsewhere, you have an implementation like libunwind or whatever.

If the compiler driver tells the linker to generate PT_GNU_EH_FRAME
(--eh-frame-hdr switch) and the unwinder runtime knows how to find all
the modules and their PT_GNU_EH_FRAME segments, then that's all you
need. If the unwinder runtime needs some explicit registration calls at
startup, that's part of the unwinder library ABI. libc is not involved.

On Linux systems, usually the unwinder runtime being used is like the
libgcc one, so it works via PT_GNU_EH_FRAME for dynamic linking but
requires calls to its registration API functions for static linking.
The details of that depend on the unwinder runtime alone.

(In some configurations, glibc does export an unwinder ABI. That is
solely for backward compatibility with old binaries that were linked
against much older GCC versions of crtbegin/crtend/libgcc. Since things
linked today aren't linked with those old GCC artifacts, no binary you
produce today will or should ever use glibc's unwinder ABI.)

(2) Who is responsible for iterating .ctor/.dtor?

This has entirely to do with what the compiler emits and what the linker
does. If the compiler emits .init_array/.fini_array directly, there is
no need for anything else. If the compiler emits .ctors/.dtors and the
linker folds those into .init_array/.fini_array, there is no need for
anything else. If the compiler emits .ctors/.dtors and the linker emits
them unchanged, then some code to iterate them is required. In none of
these cases is libc relevant.

(3) Is __cxa_atexit supported by the platform?

The C++ front-end emits calls to cxa_atexit and references to
&
dso_handle. If the C++ front-end is going to do that, then there
needs to be a definition of dso_handle somewhere. The name
dso_handle is not part of the ABI contract with the C library or
anything else. It's entirely a name chosen by the C++ front-end.

On modularity principles, __dso_handle really belongs in libc++abi.
However, for implementation reasons it needs to be included statically
into each final link (whether executable or shared library), even when
libc++abi is found entirely in a shared library. So libc++abi, and the
compiler driver rules to link it in, would need to provide a static
library (or .o file) to the link in addition to the shared library,
which they don't do today. So it's simpler on the implementation side
to put this into some existing .a or .o file that is always statically
linked. crtbegin.o is where GCC defines it, but there's no principled
reason for it to be there. It might as well be in an archive library
instead, so it's omitted from links that don't actually use it (such as
ones with no C++ or no destructors). Any existing archive library that
is always statically linked in every link would do, e.g. clang_rt-builtins.

(4) Are Java types supported?

This has to do with things that the old GCC Java front-end (GCJ) emits.
If you don't use that Java front-end, you don't need them. In no case
does anything about your Java support choices have anything to do with
the C library.

joerg requested changes to this revision.Feb 8 2017, 3:35 PM

You are wrong on (1), (2) and (3). But I am tired of this discussion. My objection stands. This is not a functional implementation for anything but a very narrow focus. There are much more complete implementations around. This creates more problems than it solves.

This revision now requires changes to proceed.Feb 8 2017, 3:35 PM
phosek updated this revision to Diff 92748.Mar 22 2017, 5:39 PM
phosek edited edge metadata.
phosek edited the summary of this revision. (Show Details)

I have expanded the implementation to also handle EH frame registration and iterating over .ctors/.dtors (including the necessary CMake change to detect whether the target platform uses .init_array/.fini_array or .ctors/.dtors).

washley added a subscriber: washley.Jun 6 2017, 8:11 AM
This revision now requires changes to proceed.Jun 6 2017, 8:11 AM
pawels added a subscriber: pawels.Jan 7 2018, 4:48 AM
emaste added a subscriber: emaste.Oct 14 2018, 1:36 PM
emaste added inline comments.Oct 14 2018, 1:52 PM
lib/crt/crtbegin.c
5–6 ↗(On Diff #92748)

As this file would be linked into all executables produced by the toolchain, should it be an exception to the standard LLVM MIT/UIUC license - i.e., 0-clause BSD or CC0?

https://tldrlegal.com/license/bsd-0-clause-license
https://creativecommons.org/share-your-work/public-domain/cc0/

phosek added inline comments.Oct 14 2018, 5:44 PM
lib/crt/crtbegin.c
5–6 ↗(On Diff #92748)

I'd be fine with that.

emaste added a subscriber: markj.Oct 15 2018, 8:22 AM
beanz added inline comments.
lib/crt/crtbegin.c
5–6 ↗(On Diff #92748)

I've asked @chandlerc to chime in either himself, or someone from the foundation. My understanding of the way we handle licensing in LLVM is that only the runtime libraries are dual licensed UIUC & MIT. The reason being that the MIT license doesn't have the same attribution requirements as UIUC. LLVM, Clang, LLDB, LLD, etc. are all UIUC licensed only.

Since all of compiler-rt is under the UIUC/MIT dual license, I would be very concerned if this file needed different licensing since lib/builtins all get embedded in binaries in the same way, and are under that same license configuration.

emaste added inline comments.Oct 15 2018, 3:29 PM
lib/crt/crtbegin.c
5–6 ↗(On Diff #92748)

Thanks, and fair enough.

The reason being that the MIT license doesn't have the same attribution requirements

I agree with you there's no need for this file to be licensed differently from the rest of the runtime components, and agree with the goal of avoiding attribution requirements on runtime components.

That said, a quick search turns up lots of opinions on MIT attribution and binary distribution from non-lawyers (and I am, of course, not a lawyer).

So, I think there's nothing to do for this file in isolation, but we should ensure the runtime components have the expected requirement (i.e., none) for binaries.

(I seem to recall being involved in a previous discussion of this with @chandlerc, but now cannot find any evidence of that discussion.)

chandlerc added inline comments.Oct 30 2018, 5:43 PM
lib/crt/crtbegin.c
5–6 ↗(On Diff #92748)

We should not add more license diversity here. The current license structure of compiler-rt was an attempt to provide a workable license for runtime libraries in LLVM and they should continue to be used for this component of the runtime libraries.

There are many problems with the license structure we ended up with (see Chris's original emails), and indeed we are working on relicensing all of LLVM to address these concerns. We have worked *very* hard with lawyers to make sure the new license is an excellent choice for runtimes. I would encourage any further efforts to go into the new license, and I think being consistent with the rest of compiler-rt is fine in the interim.

Hi Joerg,

You are wrong on (1), (2) and (3). But I am tired of this discussion. My objection stands. This is not a functional implementation for anything but a very narrow focus. There are much more complete implementations around. This creates more problems than it solves.

This is not a productive source of discussion. What path forward would you like here - saying that external projects have more complete implementations is also not a workable path forward. In particular, this could be a very good solution for platforms that aren't yours and provide a path forward for other platforms.

If you could be specific: what are you asking for this patch to do differently?

Thanks!

-eric

In addition to what Eric said, I just want to pull back up an old comment here:

@joerg I completely disagree with you. Since libgcc provides those files for some platforms compiler-rt should as well.

I believe that one of the project goals of compiler-rt is to allow Clang + Compiler-RT to replace GCC + libGCC. We can't do that without these object files being produced, so I believe this complexity is necessary and justified.

Regardless of what happens for Fuschia, we have a real platform (Linux) with a libc that does *not* provide these bits and which expects the compiler to provide these bits. I would like LLVM to be an effective toolchain on that platform and provide the missing pieces here. Changing libc in this case isn't realistic -- these are existing platforms that may not be updating their libc.

If Fuschia wants to follow a similar design pattern (putting the CRT bits into the toolchain rather than the libc) I have no particular opinion on that being the "right" or "wrong" design. But it doesn't seem to be much of a burden on LLVM (if a burden at all) given that I think we will have to provide these components to support Linux without relying on libgcc.

dxf added a subscriber: dxf.Nov 1 2018, 2:23 PM
phosek updated this revision to Diff 176228.Nov 30 2018, 5:14 PM
phosek edited the summary of this revision. (Show Details)

I've rebased and updated the change. I've also included the Clang driver changes necessary to use this, but this can be landed in a separate followup change.

Herald added subscribers: Restricted Project, atanasyan, sdardis, srhines. · View Herald TranscriptNov 30 2018, 5:14 PM

Looks like this stalled with no feedback outside of Chandler and I.

Joerg: We asked some very specific questions to which there don't appear to be answers coming from you and this patch doesn't affect any other target negatively here. Unless you come up with a compelling argument not to commit this other than "some platforms have support for this in other places" I think we need to commit this.

I'm going to ack in a week otherwise. If you feel I've overstepped here please let me know, but shutting down discussion doesn't seem to be productive here and I think Petr would like to make forward progress on support - and I think the project as a whole is better served by being able to serve as a full system compiler replacement; this is what compiler-rt is for ultimately anyhow.

Thanks!

krytarowski added a subscriber: krytarowski.EditedDec 31 2018, 3:31 PM

Is there intention to support Clang/LLVM by GLIBC/GNU developers? Unlikely (knowing FSF track of reactions against LLVM projects). There are no gcc/libgcc licensing concerns for software hosted on GLIBC (LGPLv3) with libgcc. There already does exist a functional implementation for this OS. There is no need to treat it as an issue on the Clang/LLVM side (maybe except ambitions).

In my humble opinion we shall not expand the 'vendor lock' approach of libgcc/GNU to other platforms.

There are other BSD-licensed implementations and Fuchsia can borrow the code freely.

@joerg mentioned technical issues. I would add that this kind of software is implemented in rather highly OS-specific and ideally compiler-independent way. In my opinion it belongs to libc.

Is there intention to support Clang/LLVM by GLIBC/GNU developers? Unlikely (knowing FSF track of reactions against LLVM projects). There are no gcc/libgcc licensing concerns for software hosted on GLIBC (LGPLv3) with libgcc. There already does exist a functional implementation for this OS. There is no need to treat it as an issue on the Clang/LLVM side (maybe except ambitions).

In my humble opinion we shall not expand the 'vendor lock' approach of libgcc/GNU to other platforms.

There are other BSD-licensed implementations and Fuchsia can borrow the code freely.

@joerg mentioned technical issues. I would add this kind of software is implemented in rather highly OS-specific and ideally compiler-independent. In my opinion it belongs to libc.

I think there's a lot of confusion around this change since the discussion has been dragging for so many months, and some of it has been only carried only on the mailing list and not on Phabricator, so I'll try to summarize the state.

We don't need crtbegin.o/crtend.o on Fuchsia. Since we're building a new OS from scratch, we have a luxury of changing the design of various aspects of our system as needed and after the initial discussion on this change, we've changed the design of Fuchsia to avoid the need for crtbegin.o/crtend.o altogether.

However, we don't have the same luxury when it comes to Linux. None of the popular Linux libc implementations (I looked at glibc, musl, newlib, uclibc and dietlibc) provide crtbegin.o/crtend.o aside from Bionic. This is how crtbegin.o/crtend.o is handled on Linux. You may disagree with it, but I don't think this is the place to argue about that design. If we could go back in time, we could argue for a different design, but that's simply not feasible. There are already vendors that build Linux, including glibc, with Clang. We're currently building an entire Linux system using Clang but we still need to pull in libgcc for crtbegin.o/crtend.o, so this isn't an ambition, it's the last remaining blocker.

My goal is to ensure that Clang supports Linux without relying on libgcc as it does today, but I'm not trying to replace crtbegin.o/crtend.o on other systems. If you look at compiler-rt/cmake/config-ix.cmake in this patch, it's restricted only to Linux. This is definitely not the only OS-specific bit in compiler-rt, there are other runtimes that are only supported on a single platform.

AFAIK @joerg's comments regarding technical issues were referring to the initial version of my patch which was very limited and doesn't reflect current version. I believe that the current version should address his comments and I've asked multiple times for additional feedback in case it doesn't, but I never got any. If there are other technical issues you or anyone else is aware of, please let me know and I'd be happy to address them.

krytarowski added a comment.EditedDec 31 2018, 4:21 PM

We're currently building an entire Linux system using Clang but we still need to pull in libgcc for crtbegin.o/crtend.o, so this isn't an ambition, it's the last remaining blocker.

What are the technical reasons for this to reimplement it in Clang instead of pulling it from libgcc?

We're currently building an entire Linux system using Clang but we still need to pull in libgcc for crtbegin.o/crtend.o, so this isn't an ambition, it's the last remaining blocker.

What are the technical reasons for this to reimplement it in Clang instead of pulling it from libgcc?

GCC's build system is quite monolithic and fragile and AFAIK it's not possible to build just crtbegin.o/crtend.o or libgcc without building the rest of GCC. crtbegin.o/crtend.o is produced by building crtstuff.c multiple times with different flags so building it outside of GCC's build system would mean replicating portion of GCC's build system and then keeping it up-to-date. I'm not saying that it's not possible to do this, but I'd like to be able to produce a fully working toolchain entirely from LLVM's build system, and neither of the two options outlined above get us any closer to that goal.

OK, if the GNU users intend to maintain copies of rather equivalent library in each compiler it's up to them/you to maintain it.. and keep tracking of future libgcc changes.

In order to prevent confusion I would add bold statements that this is for GNU only and not needed neither recommended for others.

Is there intention to support Clang/LLVM by GLIBC/GNU developers? Unlikely (knowing FSF track of reactions against LLVM projects). There are no gcc/libgcc licensing concerns for software hosted on GLIBC (LGPLv3) with libgcc. There already does exist a functional implementation for this OS. There is no need to treat it as an issue on the Clang/LLVM side (maybe except ambitions).

Yes. That intention is there and they've been receptive of patches.

In my humble opinion we shall not expand the 'vendor lock' approach of libgcc/GNU to other platforms.

There are other BSD-licensed implementations and Fuchsia can borrow the code freely.

This is irrelevant.

@joerg mentioned technical issues. I would add that this kind of software is implemented in rather highly OS-specific and ideally compiler-independent way. In my opinion it belongs to libc.

That's fine, it's just not the state of the world and we should support compiling appropriately.

Thanks.

-eric

krytarowski added a comment.EditedDec 31 2018, 7:03 PM

This is irrelevant.

It was explained clear that Fuchsia won't follow it.

That's fine, it's just not the state of the world and we should support compiling appropriately.

I don't want to pick a fight, but there can be used exactly same argument to defend pulling libgcc here.

But the final word is rather to platform maintainers.

echristo accepted this revision.Dec 31 2018, 7:08 PM

That's fine, it's just not the state of the world and we should support compiling appropriately.

I don't want to pick a fight, but there can be used exactly same argument to defend pulling libgcc here.

You have been picking a fight. And no, it's not the same argument. Please desist.

FWIW, I completely support landing this. I think the opposition is essentially "it would be better to not design a libc in that way" which is certainly a valid opinion, but given the existence of such libc designs widely deployed and widely used, it seems very good to lay groundwork for LLVM to support them.

I also think we should *not* be pointing at GNU or any other project here, and instead simply state that these exist to support using LLVM in conjunction with systems whose libc chooses this particular design pattern.

I don't think LLVM should be in the business of trying to cast some kind of "you should feel bad" judgement on such systems. If someone in the LLVM community wants to develop a libc as part of LLVM (which I would quite like to see), then that is the place to debate libc design decisions, and not anywhere else.

So, that said, does anyone have comments on the code itself? If no one is up to commenting on the code and giving Petr direct feedback there, I'll just LGTM this because I think it is long (looooooong) overdue.

Lastly, to address specific questions here:

That's fine, it's just not the state of the world and we should support compiling appropriately.

I don't want to pick a fight, but there can be used exactly same argument to defend pulling libgcc here.

That doesn't make sense to me -- the LLVM project has lots of reasons why it doesn't pull GPL-ed code into its tree.

But yes, we are going to be in the business of being 100% and fully compatible with libgcc in order to effectively support platforms based on libcs that follow the design pattern of glibc, certainly including platforms that use glibc. Is that work? Yes. Should the project do it? Absolutely yes. Just like we work hard to be compatible with the gcc commandline and other layers of compatibility, we should work to support these platforms well.

FWIW, I completely support landing this. I think the opposition is essentially "it would be better to not design a libc in that way" which is certainly a valid opinion, but given the existence of such libc designs widely deployed and widely used, it seems very good to lay groundwork for LLVM to support them.

I also think we should *not* be pointing at GNU or any other project here, and instead simply state that these exist to support using LLVM in conjunction with systems whose libc chooses this particular design pattern.

I don't think LLVM should be in the business of trying to cast some kind of "you should feel bad" judgement on such systems. If someone in the LLVM community wants to develop a libc as part of LLVM (which I would quite like to see), then that is the place to debate libc design decisions, and not anywhere else.

So, that said, does anyone have comments on the code itself? If no one is up to commenting on the code and giving Petr direct feedback there, I'll just LGTM this because I think it is long (looooooong) overdue.

Lastly, to address specific questions here:

That's fine, it's just not the state of the world and we should support compiling appropriately.

I don't want to pick a fight, but there can be used exactly same argument to defend pulling libgcc here.

That doesn't make sense to me -- the LLVM project has lots of reasons why it doesn't pull GPL-ed code into its tree.

But yes, we are going to be in the business of being 100% and fully compatible with libgcc in order to effectively support platforms based on libcs that follow the design pattern of glibc, certainly including platforms that use glibc. Is that work? Yes. Should the project do it? Absolutely yes. Just like we work hard to be compatible with the gcc commandline and other layers of compatibility, we should work to support these platforms well.

Definitely agree with the points here, would be nice to have an independent replacement for libgcc, with Csu being a major part of it and lay groundwork for further efforts to have an alternative compiler runtime that is suitable for general purpose use without being reliant on GNU-provided libraries or toolchain components.

(FWIW, I think what this patch is really waiting on at this point is a genuinely detailed code review of the code... But if someone has done that already and I just missed it in all the other emails, please speak up with a fresh LGTM!)

joerg added a comment.Jan 15 2019, 8:16 AM

For the clang side, I don't understand why Driver::GetFilePath is not good enough. This shouldn't need all toolchain changes at all.

For the compiler-rt side:
(1) I would drop the const for .ctor/dtor, since the content has to be writeable anyway for function pointer relocations.
(2) I'm trying to remember what the alignment rules for .eh_frame are. There seem to be different approaches here in the wild.
(3) .ctor/.dtor should be using pointer types, not long. sizeof(void (*)(*)) == sizeof(long) is a very questionable assumption.
(4) CTOR_END and DTOR_END are unused and will just be dropped.
(5) The logic around CRT_HAS_INITFINI_ARRAY is strange. If it is not set, do_init and do_fini must be called from .init/.fini with appropiate assembler code. Setting .init_array/.fini_array can't work in that case.
(6) Putting read-only content into .eh_frame can be a problem on MIPS for many GNU ld versions. It used to bitch about mixing read-only and read-write sections and the default DWARF encoding required writable .eh_frame.

dankm added a subscriber: dankm.Jan 16 2019, 1:55 PM
E5ten added a subscriber: E5ten.Jan 20 2019, 5:16 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 6 2019, 7:30 PM
phosek updated this revision to Diff 188093.Feb 24 2019, 2:37 PM

For the clang side, I don't understand why Driver::GetFilePath is not good enough. This shouldn't need all toolchain changes at all.

I split the driver changes from this patch to make it smaller.

The reason why Driver::GetFilePath is not enough is naming. These files are installed to lib/clang/<version>/lib/linux and there can be versions for multiple architectures side-by-side so I cannot use just crtbegin.o. compiler-rt CMake's build would derive e.g. clang_rt.crtbegin-x86_64.o as the file name, and ToolChain::getCompilerRT contains the logic to construct that name based on the target.

For the compiler-rt side:
(1) I would drop the const for .ctor/dtor, since the content has to be writeable anyway for function pointer relocations.

Done

(2) I'm trying to remember what the alignment rules for .eh_frame are. There seem to be different approaches here in the wild.

I went with with sizeof(void *) which is what libgcc's implementation seem to be using.

(3) .ctor/.dtor should be using pointer types, not long. sizeof(void (*)(*)) == sizeof(long) is a very questionable assumption.

Done

(4) CTOR_END and DTOR_END are unused and will just be dropped.

They're used now.

(5) The logic around CRT_HAS_INITFINI_ARRAY is strange. If it is not set, do_init and do_fini must be called from .init/.fini with appropiate assembler code. Setting .init_array/.fini_array can't work in that case.

Done

(6) Putting read-only content into .eh_frame can be a problem on MIPS for many GNU ld versions. It used to bitch about mixing read-only and read-write sections and the default DWARF encoding required writable .eh_frame.

phosek updated this revision to Diff 188094.Feb 24 2019, 2:56 PM
matthewbauer marked an inline comment as done.Feb 26 2019, 12:26 PM
matthewbauer added a subscriber: matthewbauer.
matthewbauer added inline comments.
compiler-rt/lib/crt/crtend.c
12

inttypes.h is provided by the libc while stdint.h is provided by clang. Is it possible to use stdint.h here instead?

phosek updated this revision to Diff 188496.Feb 26 2019, 8:05 PM
MaskRay added inline comments.Feb 27 2019, 1:05 AM
compiler-rt/lib/crt/crtbegin.c
18

Should there be a #ifdef macro to exclude __dso_handle? glibc libc.a also provides __dso_handle.

22

Remove , visibility("hidden")

The visibility attribute is ignored for static variables and GCC would emit a -Wattributes warning.

Another thing worth mentioning, either in this patch or in a follow-up, crtbegin.o

compiler-rt/lib/crt/CMakeLists.txt
82

Just to ask if my understanding is correct.

clang_rt.crtbegin.o: replacement of crtbegin.o
(-fno-PIC can be used here as some toolchains default to -fPIC or -fPIE, but it doesn't matter much here)

clang_rt.crtbegin_shared.o: replacement of crtbeginS.o crtbeginT.o

I was trying to build compiler-rt with WebAssembly and forgot I still had this patched applied. But it looks like it breaks when doing this and things work if you disble crt. You probably just don't want to build crt with WebAssembly targets?

I was trying to build compiler-rt with WebAssembly and forgot I still had this patched applied. But it looks like it breaks when doing this and things work if you disble crt. You probably just don't want to build crt with WebAssembly targets?

Does WebAssembly build set OS_NAME to "Linux" and architecture to one of x86, x86-64, ARM and ARM64? See config-ix.cmake, specifically this line if (CRT_SUPPORTED_ARCH AND OS_NAME MATCHES "Linux") should ensure that this is only built on Linux and set(ALL_CRT_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64}) should ensure that it's only built for x86, x86-64, ARM and ARM64.

phosek updated this revision to Diff 188850.Feb 28 2019, 11:37 PM
phosek marked 3 inline comments as done.
phosek added inline comments.
compiler-rt/lib/crt/CMakeLists.txt
82

Correct, crtbeginS.o and crtbeginT.o don't match the naming conventions of other compiler-rt artifacts, so I came up with this scheme which should be a better fit for compiler-rt. I'm open to other suggestions though.

compiler-rt/lib/crt/crtbegin.c
18

I don't think this should be conditional, other crtbegin.o implementations also define this symbol unconditionally. It should be also usable independently of the C library being use (i.e. it should work with both glibc and musl).

I was trying to build compiler-rt with WebAssembly and forgot I still had this patched applied. But it looks like it breaks when doing this and things work if you disble crt. You probably just don't want to build crt with WebAssembly targets?

Does WebAssembly build set OS_NAME to "Linux" and architecture to one of x86, x86-64, ARM and ARM64? See config-ix.cmake, specifically this line if (CRT_SUPPORTED_ARCH AND OS_NAME MATCHES "Linux") should ensure that this is only built on Linux and set(ALL_CRT_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64}) should ensure that it's only built for x86, x86-64, ARM and ARM64.

Ah ok, yeah this was my issue. CMake defaults to CMAKE_HOST_SYSTEM_NAME when no CMAKE_SYSTEM_NAME is provided. So it assumed I was targeting "wasm32-unknown-linux-gnu" not "wasm32-unknown-unknown-wasm".

MaskRay accepted this revision.Mar 3 2019, 8:17 PM
MaskRay added inline comments.
compiler-rt/lib/crt/CMakeLists.txt
82

Since crtbeginT.o is used by -static (gcc -dumpspecs: %{static:crtbeginT.o%s; shared|!no-pie:crtbeginS.o%s; :crtbegin.o%s}), crtbegin_shared may not be a good name. But I don't know of a better name, either...

Other than this, other parts look great! Thank you for doing this.

phosek marked an inline comment as done.Mar 5 2019, 6:58 PM
phosek added inline comments.
compiler-rt/lib/crt/CMakeLists.txt
82

Android seems t be using crtbegin_dynamic.o but I don't know if that's much better?

MaskRay added inline comments.Mar 5 2019, 8:23 PM
compiler-rt/lib/crt/CMakeLists.txt
82

Given a second thought. crtbegin_shared.o looks good to me now. I propose one change: adding -fPIC to the crtbegin.o rule.

While I still don't understand why ligcc seems to define void *__dso_handle = &dso_handle; in crtbeginT.o, I cannot find a relocation in my /usr/lib/gcc/x86_64-linux-gnu/8/crtbeginT.o, not sure if it has other rule to remove that...

# https://github.com/gcc-mirror/gcc/blob/master/libgcc/Makefile.in
crtbeginT$(objext): $(srcdir)/crtstuff.c
	$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN -DCRTSTUFFT_O

// https://github.com/gcc-mirror/gcc/blob/master/libgcc/crtstuff.c
#ifdef CRTSTUFFS_O
void *__dso_handle = &__dso_handle; // crtbeginS.o crtbeginT.o
#else
void *__dso_handle = 0; // crtbegin.o
#endif

If we unify 2 of the 3, I think unifying crtbegin.o and crtbeginT.o rather than unifying crtbeginS.o crtbeginT.o makes more sense, but we'd need -fPIC for crtbegin.o (-static -pie, or -fPIC object files going into -static link). This resembles the status I observed from my Linux crtbegin*.o and https://svnweb.freebsd.org/base/head/lib/csu/common/crtbegin.c?view=markup

I even think unifying all 3 is possible. __dso_handle doesn't need to be 0 if all modules (including LD_PRELOAD DSOs) are unloaded before the main executable. There would be no difference if a nullptr or another address is used as the __cxa_finalize argument in that case. There may be cases I missed, so the current approach (unifying 2) is good enough :)

phosek updated this revision to Diff 189526.Mar 6 2019, 9:31 AM
phosek marked an inline comment as done.
MaskRay accepted this revision.EditedMar 6 2019, 6:04 PM

The current division (crtbegin_shared.o (-shared or -pie) crtbegin.o (others)) resembles what NetBSD and OpenBSD do. They don't have a separate crtbeginT.o. I don't understand why in a 2012 change FreeBSD added crtbeginT.o...

I casually asked dalias to confirm my thoughts. He thinks crtbegin.o belongs to the compiler runtime side rather than libc ("its purely a mechanism for weird language runtimes the compiler supports; libc has no use for it or relation to it").

He also agrees that unifying all 3 crtbegin{,S,T}.o should be possible ("there is no reason for any version but the pic-compatible (S) one to exist. others just make logic of which to link complex for no reason").

(((That could expose some bugs in existing applications depending on bug behaviors, though: I believe glibc still has the bug (or at least an unexpected behavior) that for a -pie -fPIE executable, __cxa_finalizer(main's __dso_handle) is called before __cxa_finalizer(a depending DSO's __dso_handle). Making void *__dso_handle = &__dso_handle for non -pie non -shared builds may reveal some bugs..)))
(((We are better than libgcc here. Its crtbeginT.o doesn't allow clang -static -shared a.c -o a.so why we do! https://bugzilla.redhat.com/show_bug.cgi?id=214465)))

Now I more firmly believe that this patch moves towards the correct direction to make compiler-rt a better replacement of libgcc while also paying back some tech debt. Still LG!

phosek added a comment.Mar 7 2019, 4:14 PM

@joerg is it okay with you if I go ahead and land this?

E5ten added inline comments.Mar 9 2019, 1:24 PM
compiler-rt/lib/crt/CMakeLists.txt
60

This causes a cmake error for me: string sub-command FIND requires 3 or 4 parameters., I guess because one of the 2 variables there are empty, quoting the 2 variables fixes the issue for me.

phosek updated this revision to Diff 190046.Mar 11 2019, 12:56 AM
phosek marked an inline comment as done.
joerg accepted this revision.Mar 11 2019, 2:18 PM

LGTM from my perspective at least.

This revision is now accepted and ready to land.Mar 11 2019, 2:18 PM
brad.king resigned from this revision.Mar 12 2019, 5:36 AM
phillipjohnston marked an inline comment as done.Mar 21 2019, 12:36 PM
phillipjohnston added a subscriber: phillipjohnston.
phillipjohnston added inline comments.
compiler-rt/lib/crt/crtbegin.c
22

Is there an alternative to a zero-length array here?

phosek updated this revision to Diff 193731.Apr 4 2019, 8:44 AM
phosek updated this revision to Diff 193830.Apr 4 2019, 8:34 PM
phosek marked an inline comment as done.
phosek added inline comments.
compiler-rt/lib/crt/crtbegin.c
22

Alternative would be to use assembly but that's less portable and I'm not sure if we would gain anything by doing so.

Looks like this needs to be updated with the new license headers.

phosek updated this revision to Diff 195285.Apr 15 2019, 6:35 PM

Looks like this needs to be updated with the new license headers.

Done

niosHD added a subscriber: niosHD.Apr 18 2019, 8:30 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.May 10 2019, 7:10 AM
thakis added inline comments.
compiler-rt/trunk/lib/crt/crtbegin.c
7 ↗(On Diff #197367)

gcc warns about this:

/b/swarming/w/ir/k/src/third_party/llvm/projects/compiler-rt/lib/crt/crtend.c:1:1: warning: C++ style comments are not allowed in ISO C90 [enabled by default]

Could we either use /**/ comments or pass -std=c99 for this file?