This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Driver/ToolChains/
-
Driver/
-
ToolChains/
4/12
Clang.cpp
-
test/Driver/
-
Driver/
1/3
gpu-libc-headers.c

Differential D146973

[Clang] Implicitly include LLVM libc headers for the GPU
ClosedPublic

Authored by jhuber6 on Mar 27 2023, 8:11 AM.

Download Raw Diff

Details

Reviewers

tra
yaxunl
JonChesterfield
sivachandra
MaskRay
jdoerfert
tianshilei1992

Commits

rGf263bd8f7d4c: [Clang] Implicitly include LLVM libc headers for the GPU

Summary

There is currently work to support basic libc functionality on the
GPU. Some basic information about the projects can be found at
https://libc.llvm.org/gpu_mode.html. Typically, including the system
headers on the GPU will result in an error. For this reason the LLVM
libc project will generate its own headers that can be used with the
GPU.

The problem is that these headers will use the same name as the system headers.
For that reason, D146970 places it in the llvm-libc subfolder. In order to
still pick these files up, this patch adds changes in clang to default to
searching this directory when targeting the GPU. This lets offloading languages
such as OpenMP use the system string.h when compiling for the host and then
the LLVM libc string.h when targeting the GPU.

Depends on D146970

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Mar 27 2023, 8:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 8:11 AM

jhuber6 requested review of this revision.Mar 27 2023, 8:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 8:11 AM

Herald added subscribers: cfe-commits, jplehr, sstefan1. · View Herald Transcript

I'm not sure if there's a better way to provide these headers. Like if we let the libc project output to the Clang resource directory or some other neatly nested directory. Right now this just picks up bin/clang/../include/llvm-libc.

tra added inline comments.Mar 27 2023, 10:37 AM

clang/lib/Driver/ToolChains/Clang.cpp
1212	Ensuring the right include order will be tricky. Interaction between the headers provided by llvm-libc vs the system headers will be interesting if we end up mixing the headers. It may be less problematic compared to the C++ standard library, but I doubt that mixing two implementations would work well here, either. So, the major question I have -- does adding include path here guarantees that we're not picking up the host headers? I do not see any changes that would exclude system headers from include paths. If we do have include paths leading to both llvm-libc and the host headers, what's expected to happen if user code ends up including a header that's not present in llvm-libc? Is that possible?
clang/test/Driver/gpu-libc-headers.c
15	I think here we want to test for not just the presence of `include/llvm-libc`, but also that it's in the correct position relative to other include paths. We may want something similar to what we have in clang/test/Driver/hip-include-path.hip

jhuber6 added inline comments.Mar 27 2023, 11:07 AM

clang/lib/Driver/ToolChains/Clang.cpp
1212	Right now I'm just kind of relying on an expectation that since this will be the first `c-isystem` path set, then it will pull in these libraries first if they exist. It's not captured by the tests, but compiling with `-v` shows this path being used first in my experience. So, theoretically, if there is an implementation of said header in this location, it will be picked up before anything else. Otherwise it'll just search the other standard locations.
clang/test/Driver/gpu-libc-headers.c
15	Yeah, I wasn't sure if there was a good way to guarantee a certain path since those can change based on the system. Maybe `--sysroot`?

tra added a subscriber: echristo.Mar 27 2023, 11:28 AM

tra added inline comments.

clang/lib/Driver/ToolChains/Clang.cpp
1212	I think this will be a problem. We're cross-compiling here and for that to work reliably we need to make sure that only target headers are in effect. The problem is that we likely just do not have sufficiently complete set of headers for the GPU. Do we? I have no idea what exactly llvm-libc provides and whether it is sufficient for normal user code to cross-compile for a GPU. It would be interesting to try to compile some C++ code which would include commonly used, but generally target-agnostic, headers like <vector> <complex> <algorithm>, etc and check whether we end up pulling in any system headers. Then check what happens if we do not have system headers available at all.
clang/test/Driver/gpu-libc-headers.c
15	I do not have a good answer. @echristo -- when we need to cross-compile for some target, who/where/how tells clang where to get target-specific headers?

jhuber6 added inline comments.Mar 27 2023, 11:31 AM

clang/lib/Driver/ToolChains/Clang.cpp
1212	No, it's definitely not complete. Some headers work on the GPU, most break in some way or another. The only ones `llvm-libc` provides currently is `string.h` and `ctype.h`. But, I figured this wouldn't be a problem since it would just go to the system headers anyway if we didn't provide them. So we are merely replacing maybe broken with probably works. I was talking with Johannes and he brings up other issues about the idea of host-device compatibility between these headers. Since, fundamentally, right now `libc` generates its own headers and needs to generate its own headers to function. But there can be a problem when migrating data between the host and the device is the headers on the host differ somewhat to those on the device. I'm not sure what a good overall solution to that problem is.

sivachandra added inline comments.Mar 27 2023, 11:42 AM

clang/lib/Driver/ToolChains/Clang.cpp
1212	Normally, one should not expect target and host headers to be compatible. So, if you are building for the host, you should use host headers and if you are building for the target, you should use target headers. Does general GPU build not follow this approach? May be there are some common headers but I do not expect them to be from the standard libraries.

tra added inline comments.Mar 27 2023, 12:03 PM

clang/lib/Driver/ToolChains/Clang.cpp
1212	We can generally assume that the GPU and the host do have largely identical types. At least the subset of the types we expect to exchange between host and GPU. CUDA compilation cheats, and allows the host to provide most of the headers, with clang and CUDA SDK providing a subset of GPU-side overloads. This way, if GPU-side functions do implicitly rely on the code nominally provided for the host by the host headers, but if we need to code-gen it, we can only do so for a subset that's actually compileable for the GPU -- either constexpr functions, lambdas or `__device__` overloads provided by us. Standalone compilation does not have the benefit of having the cake and being able to eat it. It has to be all or nothing, as we do not have the ability to separate the host and GPU code we pull in via headers, nor can we provide a GPU-side overloads. In a way injecting llvm-libc path is a crude attempt to do that by providing GPU-side implementations before we get to include host-side ehaders that would provide the host versions. While it may sometimes work, I do not think it's a robust solution. The only sound(-ish) ways out I see are: implement sufficiently complete set of headers for the GPU. provide GPU-side overloads which would allow us to pull in system headers and augment them with GPU-specific implementations when necessary. The former is quite a bit of work. The latter is easier, but gets us away from "it's just a plain C compilation, just for a GPU", as we'll grow yet another C dialect and the libc headers will need to have additional glue to mark functions as "for GPU".

Can we default to freestanding on, or just document that freestanding is a good idea, instead of hacking with include behaviour directly?

This lets offloading languages
such as OpenMP use the system string.h when compiling
for the host and then
the LLVM libc string.h when targeting the GPU.

Is that a feature? We've historically insisted on the same libc on both and hacked around glibc not knowing the architecture

sivachandra added inline comments.Mar 27 2023, 12:24 PM

clang/lib/Driver/ToolChains/Clang.cpp
1212	We can generally assume that the GPU and the host do have largely identical types. At least the subset of the types we expect to exchange between host and GPU. Types for communication have to be of similar topology I would think and putting them in common header is expected. But, would standard library types have match with the host? CUDA compilation cheats, and allows the host to provide most of the headers, with clang and CUDA SDK providing a subset of GPU-side overloads. This way, if GPU-side functions do implicitly rely on the code nominally provided for the host by the host headers, but if we need to code-gen it, we can only do so for a subset that's actually compileable for the GPU -- either constexpr functions, lambdas or `__device__` overloads provided by us. In a way injecting llvm-libc path is a crude attempt to do that by providing GPU-side implementations before we get to include host-side ehaders that would provide the host versions. While it may sometimes work, I do not think it's a robust solution. Will it still be considered crude if the new path being proposed here contains only standard library headers? The only sound(-ish) ways out I see are: implement sufficiently complete set of headers for the GPU. What does "sufficiently complete" mean? Are you referring to an incomplete header file or an incomplete set of header files? Assuming it is the latter, is it problematic if we fallback to host-headers (or headers which are currently used for the GPU build)?

tschuett added a subscriber: tschuett.Mar 27 2023, 12:31 PM

Could you hide the amdgpu and nvptx libc somewhere here clang -print-resource-dir in two different directories? One for AMD, one for NVPTX.

In D146973#4225300, @tschuett wrote:

Could you hide the amdgpu and nvptx somewhere libc here clang -print-resource-dir in two different directories? One for AMD, one for NVPTX.

So, right now this header is installed from the libc projects. So is there a good way to communicate the resource directory as an install target? Also I think just calling it gpu would be sufficient, I'd like these to be common between the GPUs.

jhuber6 added inline comments.Mar 27 2023, 1:28 PM

clang/lib/Driver/ToolChains/Clang.cpp
1212	implement sufficiently complete set of headers for the GPU. This is what I was going for, we slowly implement standard C headers and provide them as a list of functions and interfaces that are expected to work on the GPU. Otherwise we default to the support we already have. Given that LLVM `libc` is already aiming to provide its own complete set of headers we should be able to reuse a lot of work to this effect. provide GPU-side overloads which would allow us to pull in system headers and augment them with GPU-specific implementations when necessary. I don't think this is a robust solution to implementing a `libc`. The problem is that the system headers can pull in implementation details that we don't necessarily want to recreate. Things like functions or symbols that aren't defined in our `libc` that we have no way of wrapping around if we use the system header. I think the only reasonable way is to try to keep them completely separate and add some glue to the generated headers for the GPU as-needed.

tra added inline comments.Mar 27 2023, 1:57 PM

clang/lib/Driver/ToolChains/Clang.cpp
1212	would standard library types have match with the host? That's probably getting us into a gray area. Ideally, we want the host and GPU to "just work" with each other. In practice we can't always make it work as such types may contain host or GPU pointers and therefore may not always be functional on the other side. Will it still be considered crude Perhaps "crude" was not the best choice of a word. Not sure what would be a better choice, though. if the new path being proposed here contains only standard library headers? If that's the only path providing the standard headers (I.e. no other host headers are involved), it would be fine. Or, if the headers involved contain only the well-defined set of functions in both libc and the system headers, and are guaranteed to be seamlessly interchangeable, that would be fine, too. "Fine" as in -- libc files will replace the subset of the host APIs, but whether the remaining host headers are compileable for GPU is still up in the air. The point is that include path injection is unlikely to address the fundamental issue that we're using host includes during cross-compilation. We happen to be able to assume that the types on the host and the GPU are expected to be identical, but that's not necessarily sufficient. E.g. host headers may use inline assembly for the host CPU. What does "sufficiently complete" mean? incomplete set of header files. is it problematic if we fallback to host-headers (or headers which are currently used for the GPU build)? It depends. Some code will compile fine. Other will parse OK, but we would not be able to codegen it (e.g. calls some host-only function), Some would fail to parse (e.g. something using inline asm for the host CPU, or relying something that's been ifdef'ed out because we're not targeting the host CPU.) The approach we used with CUDA was to allow use of the subset of the host headers (constexpr functions, lambdas) and selectively provide `__device__` overloads to allow a subset of the standard C++ libraries to work. It's largely limited to math functions, and <complex>. This approach is also problematic as it depends on the implementation details of the standard library, so it's prone to breaking. libc has the benefit of relatively constrained API (compared to libc++), but has a disadvantage that there's no easy way to provide GPU overloads, other than via include path. I guess if we only replace standard math functions, include path injection may work well enough. That said, I still don't think it's a good general approach, as it assumes that the host headers would compile sensibly for a GPU they were never intended to be used with.

This lets offloading languages such as OpenMP use the system string.h when compiling for the host and then the LLVM libc string.h when targeting the GPU.

How do we avoid ABI issues when the two headers get sufficiently out of sync? (In general, I'm pretty surprised to hear we would want different headers for the GPU and the system -- does this affect conformance requirements from the C standard?)

Excuse my ignorance on this point, but is llvm-libc intended to work with any compiler other than Clang? (e.g., will other compilers need to do this dance as well?)

In D146973#4225641, @aaron.ballman wrote:

This lets offloading languages such as OpenMP use the system string.h when compiling for the host and then the LLVM libc string.h when targeting the GPU.

How do we avoid ABI issues when the two headers get sufficiently out of sync? (In general, I'm pretty surprised to hear we would want different headers for the GPU and the system -- does this affect conformance requirements from the C standard?)

I'm not entirely sure if there's a good method. I think no matter what we do we'll need to implement some kind of 'glue'. I think most should be fine if we go by the C-standard. We expect pointer sizes and everything to be compatible at least.

Excuse my ignorance on this point, but is llvm-libc intended to work with any compiler other than Clang? (e.g., will other compilers need to do this dance as well?)

Right now the GPU target I'm working on can only be built with clang and it will be that way for the foreseeable future.

I said this before, many times:

We don't want to have different host and device libraries that are incompatible.
Effectively, what we really want, is the host environment to just work on the GPU.
That includes extensions in the host headers, macros, taking the address of stuff, etc.
This became clear when we made (c)math.h available on the GPU (for OpenMP).

For most of libc, we might get away with custom GPU headers but eventually it will break "expected to work" user code, at the latest when we arrive at libc++.
A user can, right now, map a std::vector from the host to the device, and, assuming they properly did the deep copy, it will work.
If we end up with different sizes, alignments, layouts, this will not only break, but we will also break any structure that depends on those sizes, e.g., mapping an object with a std::map inside even if it is not accessed will cause problems.

In addition, systems are never "vanilla". We want to include the system headers to get the extensions users might rely on. Providing only alternative headers even breaks working code (in the OpenMP case), e.g., when we auto-translate definitions in the header to the device (not a CUDA thing, I think).

I strongly suggest to include our GPU headers first, in them we setup the overlays for the system headers, and then we include the system versions.
This works for (c)math.h, complex, and other parts of libc and libc++ already, even though we don't ship them as libraries.

In D146973#4225983, @jdoerfert wrote:

For most of libc, we might get away with custom GPU headers but eventually it will break "expected to work" user code, at the latest when we arrive at libc++.
A user can, right now, map a std::vector from the host to the device, and, assuming they properly did the deep copy, it will work.

I do not have any strong opinions about how things ought to work. However, ISTM that the above is assuming that the type topology on the host matches the one on the GPU. Not sure if that is really an assumption or a restriction or a requirement. May be the host / device compatibility ensures this, I do not know, I know almost nothing about GPUs. But in general, I would expect a host <=> device communication channel to be a curated one. As in, the communication model can understand and serialize/deserialize only a small set of primitive types and compound types crafted from these primitive types.

In D146973#4225983, @jdoerfert wrote:

I said this before, many times:

We don't want to have different host and device libraries that are incompatible.
Effectively, what we really want, is the host environment to just work on the GPU.
That includes extensions in the host headers, macros, taking the address of stuff, etc.
This became clear when we made (c)math.h available on the GPU (for OpenMP).

The problem is that we cannot control the system headers, they are not expected to work with llvm-libc. For example: the GNU ctype.h includes features.h which will attempt to include the 32-bit stubs file because the GPU is not a recognized target on the host. If you work around that, like we do in OpenMP, then you will realize that isalnum is actually a macro to __isctype which references and external table called __ctype_b_loc which isn't defined in the C standard. So, now we have a header that causes isalnum to not longer call the implementation in LLVM's libc, it also fails at link time because there is no reference to __ctype_b_loc in LLVM's libc. What is the solution here? Do we implement libc in LLVM with a workaround for every internal implementation in the GNU libc?

For most of libc, we might get away with custom GPU headers but eventually it will break "expected to work" user code, at the latest when we arrive at libc++.
A user can, right now, map a std::vector from the host to the device, and, assuming they properly did the deep copy, it will work.
If we end up with different sizes, alignments, layouts, this will not only break, but we will also break any structure that depends on those sizes, e.g., mapping an object with a std::map inside even if it is not accessed will cause problems.

In addition, systems are never "vanilla". We want to include the system headers to get the extensions users might rely on. Providing only alternative headers even breaks working code (in the OpenMP case), e.g., when we auto-translate definitions in the header to the device (not a CUDA thing, I think).

Using custom generated headers is the only approach that is guaranteed to actually work when we compile this. We cannot sanely implement a library using headers unique to another implementation targeting an entirely different machine, we will endlessly be chasing implementation details like above. This works in OpenMP currently because we've chosen a handful of headers that this doesn't completely break for.

I strongly suggest to include our GPU headers first, in them we setup the overlays for the system headers, and then we include the system versions.
This works for (c)math.h, complex, and other parts of libc and libc++ already, even though we don't ship them as libraries.

The wrapper approach works fine for the ones we've selected. And in the GPU libc we could generate our own headers that have #include_next in them if we verify that it works for that header. I think in general though, we need to work with custom headers first, and implement a set of features we know to work.

@sivachandra Host and device should agree on all ABI related properties of types or things turn badly, fast. That has to be a given for things to work, and from there, users expect types to be copyable, and usable on both sides.

@jhuber6 There are things we might need to define, sure. However, we can actually do that. If we do not use the host headers we can never support various things, like, no matter how much we try and want to. My examples with unused members inside of classes is one of them that is critical. We cannot tell people they can't use their objects because in the type is a subtype with a subtype with a subtype that has a different alignment/padding/size/layout on the host and device so every offset (read access) on one side will fail. We worked around issues in various headers already, this is expected to happen again as we include more host headers, but all of those problems are solvable by us, without requiring the user to modify arbitrarily complex parts of an existing codebase.
So far I've heard we need to define a __ctype_b_loc, and maybe a few other pointers like that. For one, that is doable, e.g., we also provide the right hooks to work with libgomp and other external libraries. The entire macro mess is actually a reason to use the host code since the user expects consistency. If it is a function here and a macro there, constexpr here and not constexpr there, ... will cause problems for people, and eventually us. Declaring a few compatibility pointers and functions seems a small price to pay.

Harbormaster completed remote builds in B222029: Diff 508670.Mar 27 2023, 8:47 PM

In D146973#4225645, @jhuber6 wrote:

In D146973#4225641, @aaron.ballman wrote:

This lets offloading languages such as OpenMP use the system string.h when compiling for the host and then the LLVM libc string.h when targeting the GPU.

How do we avoid ABI issues when the two headers get sufficiently out of sync? (In general, I'm pretty surprised to hear we would want different headers for the GPU and the system -- does this affect conformance requirements from the C standard?)

I'm not entirely sure if there's a good method. I think no matter what we do we'll need to implement some kind of 'glue'. I think most should be fine if we go by the C-standard. We expect pointer sizes and everything to be compatible at least.

Hmmm, I've had experience with SYCL as to how it goes when you have difference between host and device; those kinds of bugs are incredibly hard to track down. Pointer sizes being compatible is a good start, but you need integer widths, floating point formats, structure layout decisions, macro definitions, etc to all be the same as well. Having only one set of headers that can be used helps users avoid these sort of problems.

Excuse my ignorance on this point, but is llvm-libc intended to work with any compiler other than Clang? (e.g., will other compilers need to do this dance as well?)

Right now the GPU target I'm working on can only be built with clang and it will be that way for the foreseeable future.

So we're comfortable painting ourselves into a corner where llvm-libc is only usable with Clang, depending on the target?

In D146973#4227114, @aaron.ballman wrote:

Hmmm, I've had experience with SYCL as to how it goes when you have difference between host and device; those kinds of bugs are incredibly hard to track down. Pointer sizes being compatible is a good start, but you need integer widths, floating point formats, structure layout decisions, macro definitions, etc to all be the same as well. Having only one set of headers that can be used helps users avoid these sort of problems.

The problem is that we are trying to implement an actual library here. It is, in my opinion, completely unreasonable to try to implement a library based off of another implementation's specification. What you are suggesting is that we implement a GPU library that copies every internal implementation detail that GNU has for that platform. So, let's just copy-paste their headers into our LLVM libc and make sure we copy all of their implementations too. Now what if someone wants to use musl instead? Do we copy that one as well and have everything surrounded by ifdefs? Do we just implement some meta libc that is compatible with every other libc? This is not going to create a usable library, and as the person who would presumably need to write it, I'm not going to spend my time copying other libc headers.

We need to provide fully-custom headers, if this fully-custom header uses #include_next after we've verified that it doesn't break, that's fine. I'm not particularly concerned if a macro or function is undefined between the CPU and GPU. The important point is that any symbol or macro we provide in the GPU's headers has an implementation that is expected to be compatible with the host. It's understandable if the macros and functions map to something slightly different, as long as it does what we say it does.

So we're comfortable painting ourselves into a corner where llvm-libc is only usable with Clang, depending on the target?

There might be somewhat of a misunderstanding here, I'm talking about the GPU implementation of libc using LLVM's libc. Expecting a specific toolchain is standard procedure for every single other offloading language. It's how we build ROCm device libraries, CUDA device libraries, the OpenMP device runtime, etc. LLVM's libc project is perfectly fine being compiled with gcc, but the GPU is such a special case we don't have that luxury and need to use clang. This is the same approach we do for OpenMP already.

In D146973#4227266, @jhuber6 wrote:

In D146973#4227114, @aaron.ballman wrote:

Hmmm, I've had experience with SYCL as to how it goes when you have difference between host and device; those kinds of bugs are incredibly hard to track down. Pointer sizes being compatible is a good start, but you need integer widths, floating point formats, structure layout decisions, macro definitions, etc to all be the same as well. Having only one set of headers that can be used helps users avoid these sort of problems.

The problem is that we are trying to implement an actual library here. It is, in my opinion, completely unreasonable to try to implement a library based off of another implementation's specification.

I am not asking you to implement a library based off another implementation's specification. I am relaying implementation experience with the design you've chosen for your implementation and how well it's worked in other, related projects. Given that two different technologies have both run into this same problem, I think the llvm-libc folks should carefully consider the design decisions here. If it turns out this is the best way forward, that's fine.

What you are suggesting is that we implement a GPU library that copies every internal implementation detail that GNU has for that platform. So, let's just copy-paste their headers into our LLVM libc and make sure we copy all of their implementations too. Now what if someone wants to use musl instead? Do we copy that one as well and have everything surrounded by ifdefs? Do we just implement some meta libc that is compatible with every other libc? This is not going to create a usable library, and as the person who would presumably need to write it, I'm not going to spend my time copying other libc headers.

I'm not asking you to copy other libc headers. I'm pointing out that having two separate headers, one for host and one for device, is a recipe for problems in practice because these two will invariably get out of sync in really fascinating ways that are extremely hard for people to debug. But maybe there's a misunderstanding here: I am assuming we consider it to be unsupported to use glibc/musl/etc on the host and llvm-libc on the device, but maybe that's a faulty assumption.

The important point is that any symbol or macro we provide in the GPU's headers has an implementation that is expected to be compatible with the host. It's understandable if the macros and functions map to something slightly different, as long as it does what we say it does.

So we're comfortable painting ourselves into a corner where llvm-libc is only usable with Clang, depending on the target?

There might be somewhat of a misunderstanding here, I'm talking about the GPU implementation of libc using LLVM's libc. Expecting a specific toolchain is standard procedure for every single other offloading language. It's how we build ROCm device libraries, CUDA device libraries, the OpenMP device runtime, etc. LLVM's libc project is perfectly fine being compiled with gcc, but the GPU is such a special case we don't have that luxury and need to use clang. This is the same approach we do for OpenMP already.

Ah, yes this was a misunderstanding then, sorry for that.

In D146973#4227433, @aaron.ballman wrote:

I am not asking you to implement a library based off another implementation's specification. I am relaying implementation experience with the design you've chosen for your implementation and how well it's worked in other, related projects. Given that two different technologies have both run into this same problem, I think the llvm-libc folks should carefully consider the design decisions here. If it turns out this is the best way forward, that's fine.

Sorry, that was more directed at Johannes, This is definitely a hard problem. Each approach has certain benefits, but I think keeping the headers synced like we do in OpenMP has mainly worked thus far because we don't have any actual implementations of most of it. If we want to provide a library I don't think there's a reasonable way to implement it as a unified header unless we control the system header as well. I'm hoping that libc offers a sufficiently small surface that we should be able to provide functionality that's expected for both. And in some cases it should be fine to share existing headers, but it shouldn't be the expected route it all I'm saying.

I'm not asking you to copy other libc headers. I'm pointing out that having two separate headers, one for host and one for device, is a recipe for problems in practice because these two will invariably get out of sync in really fascinating ways that are extremely hard for people to debug. But maybe there's a misunderstanding here: I am assuming we consider it to be unsupported to use glibc/musl/etc on the host and llvm-libc on the device, but maybe that's a faulty assumption.

We can't do that for the time being, since LLVM's libc is still in development. It's not a sufficient replacement for the host libc at this point. It may be an interesting point to get to in the future, it would make it much easier to keep things in sync for sure. It may be easier to stipulate something like that with libc++ when we get to that point since libc++ is more complete as far as I'm aware.

In D146973#4227470, @jhuber6 wrote:

In D146973#4227433, @aaron.ballman wrote:

I am not asking you to implement a library based off another implementation's specification. I am relaying implementation experience with the design you've chosen for your implementation and how well it's worked in other, related projects. Given that two different technologies have both run into this same problem, I think the llvm-libc folks should carefully consider the design decisions here. If it turns out this is the best way forward, that's fine.

Sorry, that was more directed at Johannes, This is definitely a hard problem. Each approach has certain benefits, but I think keeping the headers synced like we do in OpenMP has mainly worked thus far because we don't have any actual implementations of most of it. If we want to provide a library I don't think there's a reasonable way to implement it as a unified header unless we control the system header as well. I'm hoping that libc offers a sufficiently small surface that we should be able to provide functionality that's expected for both. And in some cases it should be fine to share existing headers, but it shouldn't be the expected route it all I'm saying.

I'm not asking you to copy other libc headers. I'm pointing out that having two separate headers, one for host and one for device, is a recipe for problems in practice because these two will invariably get out of sync in really fascinating ways that are extremely hard for people to debug. But maybe there's a misunderstanding here: I am assuming we consider it to be unsupported to use glibc/musl/etc on the host and llvm-libc on the device, but maybe that's a faulty assumption.

We can't do that for the time being, since LLVM's libc is still in development. It's not a sufficient replacement for the host libc at this point. It may be an interesting point to get to in the future, it would make it much easier to keep things in sync for sure. It may be easier to stipulate something like that with libc++ when we get to that point since libc++ is more complete as far as I'm aware.

That's an unexpected approach (probably discussed elsewhere and I just missed it!) fraught with even larger concerns than this particular patch. Intel has done something similar with *just* <math.h> and it requires quite a bit of maintenance to work by falling back to the system CRT. I won't block this patch any longer on my concerns because it sounds like this patch is heading in the expected direction for llvm-libc, but this design approach in general makes me wonder just how much we're testing against all the various versions of MSVC CRT, musl, MKL, newlib, etc. There's like 10 or so C standard library implementations that I know of (and more that are proprietary) and each one has different versions that have been released, so hopefully we're not just focusing on falling back to glibc.

I'm OK with injecting the path *now* with an understanding that it's a short-term "happens to work" way to move forward while we're working on a better solution.

In D146973#4228070, @tra wrote:

I'm OK with injecting the path *now* with an understanding that it's a short-term "happens to work" way to move forward while we're working on a better solution.

So, the proposed path forward is this. We have libc generate its own headers so we can have a base implementation. We create these headers with the intention of them providing a full interface between the host and device. This might mean curating some differences based on whatever the host does, or just making sure we choose sizes that are compatible. So these headers are the expected interface to the libc implementations we support, but we ensure that things match between the host and device by only providing interfaces we've verified somehow.

Changing to use the gpu-none-llvm subfolder name that @sivachandra recommended. Also adding a --sysroot argument to show that this include path shows up first.

Harbormaster completed remote builds in B222313: Diff 509090.Mar 28 2023, 8:59 PM

ping.

tra accepted this revision.Apr 3 2023, 10:04 AM

tra added inline comments.

clang/lib/Driver/ToolChains/Clang.cpp
1160–1161	Please add a TODO with some details outlining what it's supposed to do, the issues we've discussed, and that this is intended to be a temporary solution (famous last words, I know).

This revision is now accepted and ready to land.Apr 3 2023, 10:04 AM

jhuber6 added inline comments.Apr 3 2023, 10:06 AM

clang/lib/Driver/ToolChains/Clang.cpp
1160–1161	I'm not sure if we should consider this temporary, we simply need to ensure that the headers in this directory are compatible with the host environment somehow.

This revision was landed with ongoing or failed builds.Apr 3 2023, 10:57 AM

Closed by commit rGf263bd8f7d4c: [Clang] Implicitly include LLVM libc headers for the GPU (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rGf263bd8f7d4c: [Clang] Implicitly include LLVM libc headers for the GPU.

tra added inline comments.Apr 3 2023, 11:18 AM

clang/lib/Driver/ToolChains/Clang.cpp
1160–1161	we simply need to ensure that the headers in this directory are compatible with the host environment somehow. The problem is that you probably do not have the correct host environment to be compatible with, yet. You do need to compile the host headers with the host-specific macros defined and that will make some of the code in them uncompileable for NVPTX (e.g due to inline asm or unavailable builtins). Nor do we have the long-term solution for the "compatible, somehow" part, as we're replacing part of the host headers, implementation of which we do not control. We can drop the 'temporary' part, but I think the story here is far from over, so a prominent TODO is needed.

Revision Contents

Path

Size

clang/

lib/

Driver/

ToolChains/

Clang.cpp

18 lines

test/

Driver/

gpu-libc-headers.c

22 lines

Diff 510560

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,151 Lines • ▼ Show 20 Lines	void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA,
// before we -I or -include anything else, because we must pick up the		// before we -I or -include anything else, because we must pick up the
// CUDA/HIP headers from the particular CUDA/ROCm installation, rather than		// CUDA/HIP headers from the particular CUDA/ROCm installation, rather than
// from e.g. /usr/local/include.		// from e.g. /usr/local/include.
if (JA.isOffloading(Action::OFK_Cuda))		if (JA.isOffloading(Action::OFK_Cuda))
getToolChain().AddCudaIncludeArgs(Args, CmdArgs);		getToolChain().AddCudaIncludeArgs(Args, CmdArgs);
if (JA.isOffloading(Action::OFK_HIP))		if (JA.isOffloading(Action::OFK_HIP))
getToolChain().AddHIPIncludeArgs(Args, CmdArgs);		getToolChain().AddHIPIncludeArgs(Args, CmdArgs);

		// If we are compiling for a GPU target we want to override the system headers
		// with ones created by the 'libc' project if present.
		traUnsubmitted Not Done Reply Inline Actions Please add a TODO with some details outlining what it's supposed to do, the issues we've discussed, and that this is intended to be a temporary solution (famous last words, I know). tra: Please add a TODO with some details outlining what it's supposed to do, the issues we've…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I'm not sure if we should consider this temporary, we simply need to ensure that the headers in this directory are compatible with the host environment somehow. jhuber6: I'm not sure if we should consider this temporary, we simply need to ensure that the headers in…
		traUnsubmitted Not Done Reply Inline Actions we simply need to ensure that the headers in this directory are compatible with the host environment somehow. The problem is that you probably do not have the correct host environment to be compatible with, yet. You do need to compile the host headers with the host-specific macros defined and that will make some of the code in them uncompileable for NVPTX (e.g due to inline asm or unavailable builtins). Nor do we have the long-term solution for the "compatible, somehow" part, as we're replacing part of the host headers, implementation of which we do not control. We can drop the 'temporary' part, but I think the story here is far from over, so a prominent TODO is needed. tra: > we simply need to ensure that the headers in this directory are compatible with the host…
		if (!Args.hasArg(options::OPT_nostdinc) &&
		!Args.hasArg(options::OPT_nogpuinc) &&
		!Args.hasArg(options::OPT_nobuiltininc) &&
		(getToolChain().getTriple().isNVPTX() \|\|
		getToolChain().getTriple().isAMDGCN())) {

		// Add include/gpu-none-libc/* to our system include path. This lets us use
		// GPU-specific system headers first. These headers should be made to be
		// compatible with the host environment's headers.
		SmallString<128> P(llvm::sys::path::parent_path(D.InstalledDir));
		llvm::sys::path::append(P, "include");
		llvm::sys::path::append(P, "gpu-none-llvm");
		CmdArgs.push_back("-c-isystem");
		CmdArgs.push_back(Args.MakeArgString(P));
		}

// If we are offloading to a target via OpenMP we need to include the		// If we are offloading to a target via OpenMP we need to include the
// openmp_wrappers folder which contains alternative system headers.		// openmp_wrappers folder which contains alternative system headers.
if (JA.isDeviceOffloading(Action::OFK_OpenMP) &&		if (JA.isDeviceOffloading(Action::OFK_OpenMP) &&
!Args.hasArg(options::OPT_nostdinc) &&		!Args.hasArg(options::OPT_nostdinc) &&
!Args.hasArg(options::OPT_nogpuinc) &&		!Args.hasArg(options::OPT_nogpuinc) &&
(getToolChain().getTriple().isNVPTX() \|\|		(getToolChain().getTriple().isNVPTX() \|\|
getToolChain().getTriple().isAMDGCN())) {		getToolChain().getTriple().isAMDGCN())) {
if (!Args.hasArg(options::OPT_nobuiltininc)) {		if (!Args.hasArg(options::OPT_nobuiltininc)) {
Show All 18 Lines	void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA,

if (getToolChain().getDriver().IsCLMode()) {		if (getToolChain().getDriver().IsCLMode()) {
const Arg *YcArg = Args.getLastArg(options::OPT__SLASH_Yc);		const Arg *YcArg = Args.getLastArg(options::OPT__SLASH_Yc);
const Arg *YuArg = Args.getLastArg(options::OPT__SLASH_Yu);		const Arg *YuArg = Args.getLastArg(options::OPT__SLASH_Yu);
if (YcArg && JA.getKind() >= Action::PrecompileJobClass &&		if (YcArg && JA.getKind() >= Action::PrecompileJobClass &&
JA.getKind() <= Action::AssembleJobClass) {		JA.getKind() <= Action::AssembleJobClass) {
CmdArgs.push_back(Args.MakeArgString("-building-pch-with-obj"));		CmdArgs.push_back(Args.MakeArgString("-building-pch-with-obj"));
// -fpch-instantiate-templates is the default when creating		// -fpch-instantiate-templates is the default when creating
// precomp using /Yc		// precomp using /Yc
		traUnsubmitted Not Done Reply Inline Actions Ensuring the right include order will be tricky. Interaction between the headers provided by llvm-libc vs the system headers will be interesting if we end up mixing the headers. It may be less problematic compared to the C++ standard library, but I doubt that mixing two implementations would work well here, either. So, the major question I have -- does adding include path here guarantees that we're not picking up the host headers? I do not see any changes that would exclude system headers from include paths. If we do have include paths leading to both llvm-libc and the host headers, what's expected to happen if user code ends up including a header that's not present in llvm-libc? Is that possible? tra: Ensuring the right include order will be tricky. Interaction between the headers provided by…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Right now I'm just kind of relying on an expectation that since this will be the first `c-isystem` path set, then it will pull in these libraries first if they exist. It's not captured by the tests, but compiling with `-v` shows this path being used first in my experience. So, theoretically, if there is an implementation of said header in this location, it will be picked up before anything else. Otherwise it'll just search the other standard locations. jhuber6: Right now I'm just kind of relying on an expectation that since this will be the first `c…
		traUnsubmitted Not Done Reply Inline Actions I think this will be a problem. We're cross-compiling here and for that to work reliably we need to make sure that only target headers are in effect. The problem is that we likely just do not have sufficiently complete set of headers for the GPU. Do we? I have no idea what exactly llvm-libc provides and whether it is sufficient for normal user code to cross-compile for a GPU. It would be interesting to try to compile some C++ code which would include commonly used, but generally target-agnostic, headers like <vector> <complex> <algorithm>, etc and check whether we end up pulling in any system headers. Then check what happens if we do not have system headers available at all. tra: I think this will be a problem. We're cross-compiling here and for that to work reliably we…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions No, it's definitely not complete. Some headers work on the GPU, most break in some way or another. The only ones `llvm-libc` provides currently is `string.h` and `ctype.h`. But, I figured this wouldn't be a problem since it would just go to the system headers anyway if we didn't provide them. So we are merely replacing maybe broken with probably works. I was talking with Johannes and he brings up other issues about the idea of host-device compatibility between these headers. Since, fundamentally, right now `libc` generates its own headers and needs to generate its own headers to function. But there can be a problem when migrating data between the host and the device is the headers on the host differ somewhat to those on the device. I'm not sure what a good overall solution to that problem is. jhuber6: No, it's definitely not complete. Some headers work on the GPU, most break in some way or…
		sivachandraUnsubmitted Not Done Reply Inline Actions Normally, one should not expect target and host headers to be compatible. So, if you are building for the host, you should use host headers and if you are building for the target, you should use target headers. Does general GPU build not follow this approach? May be there are some common headers but I do not expect them to be from the standard libraries. sivachandra: Normally, one should not expect target and host headers to be compatible. So, if you are…
		traUnsubmitted Not Done Reply Inline Actions We can generally assume that the GPU and the host do have largely identical types. At least the subset of the types we expect to exchange between host and GPU. CUDA compilation cheats, and allows the host to provide most of the headers, with clang and CUDA SDK providing a subset of GPU-side overloads. This way, if GPU-side functions do implicitly rely on the code nominally provided for the host by the host headers, but if we need to code-gen it, we can only do so for a subset that's actually compileable for the GPU -- either constexpr functions, lambdas or `__device__` overloads provided by us. Standalone compilation does not have the benefit of having the cake and being able to eat it. It has to be all or nothing, as we do not have the ability to separate the host and GPU code we pull in via headers, nor can we provide a GPU-side overloads. In a way injecting llvm-libc path is a crude attempt to do that by providing GPU-side implementations before we get to include host-side ehaders that would provide the host versions. While it may sometimes work, I do not think it's a robust solution. The only sound(-ish) ways out I see are: implement sufficiently complete set of headers for the GPU. provide GPU-side overloads which would allow us to pull in system headers and augment them with GPU-specific implementations when necessary. The former is quite a bit of work. The latter is easier, but gets us away from "it's just a plain C compilation, just for a GPU", as we'll grow yet another C dialect and the libc headers will need to have additional glue to mark functions as "for GPU". tra: We can generally assume that the GPU and the host do have largely identical types. At least the…
		sivachandraUnsubmitted Not Done Reply Inline Actions We can generally assume that the GPU and the host do have largely identical types. At least the subset of the types we expect to exchange between host and GPU. Types for communication have to be of similar topology I would think and putting them in common header is expected. But, would standard library types have match with the host? CUDA compilation cheats, and allows the host to provide most of the headers, with clang and CUDA SDK providing a subset of GPU-side overloads. This way, if GPU-side functions do implicitly rely on the code nominally provided for the host by the host headers, but if we need to code-gen it, we can only do so for a subset that's actually compileable for the GPU -- either constexpr functions, lambdas or `__device__` overloads provided by us. In a way injecting llvm-libc path is a crude attempt to do that by providing GPU-side implementations before we get to include host-side ehaders that would provide the host versions. While it may sometimes work, I do not think it's a robust solution. Will it still be considered crude if the new path being proposed here contains only standard library headers? The only sound(-ish) ways out I see are: implement sufficiently complete set of headers for the GPU. What does "sufficiently complete" mean? Are you referring to an incomplete header file or an incomplete set of header files? Assuming it is the latter, is it problematic if we fallback to host-headers (or headers which are currently used for the GPU build)? sivachandra: > We can generally assume that the GPU and the host do have largely identical types. At least…
		traUnsubmitted Not Done Reply Inline Actions would standard library types have match with the host? That's probably getting us into a gray area. Ideally, we want the host and GPU to "just work" with each other. In practice we can't always make it work as such types may contain host or GPU pointers and therefore may not always be functional on the other side. Will it still be considered crude Perhaps "crude" was not the best choice of a word. Not sure what would be a better choice, though. if the new path being proposed here contains only standard library headers? If that's the only path providing the standard headers (I.e. no other host headers are involved), it would be fine. Or, if the headers involved contain only the well-defined set of functions in both libc and the system headers, and are guaranteed to be seamlessly interchangeable, that would be fine, too. "Fine" as in -- libc files will replace the subset of the host APIs, but whether the remaining host headers are compileable for GPU is still up in the air. The point is that include path injection is unlikely to address the fundamental issue that we're using host includes during cross-compilation. We happen to be able to assume that the types on the host and the GPU are expected to be identical, but that's not necessarily sufficient. E.g. host headers may use inline assembly for the host CPU. What does "sufficiently complete" mean? incomplete set of header files. is it problematic if we fallback to host-headers (or headers which are currently used for the GPU build)? It depends. Some code will compile fine. Other will parse OK, but we would not be able to codegen it (e.g. calls some host-only function), Some would fail to parse (e.g. something using inline asm for the host CPU, or relying something that's been ifdef'ed out because we're not targeting the host CPU.) The approach we used with CUDA was to allow use of the subset of the host headers (constexpr functions, lambdas) and selectively provide `__device__` overloads to allow a subset of the standard C++ libraries to work. It's largely limited to math functions, and <complex>. This approach is also problematic as it depends on the implementation details of the standard library, so it's prone to breaking. libc has the benefit of relatively constrained API (compared to libc++), but has a disadvantage that there's no easy way to provide GPU overloads, other than via include path. I guess if we only replace standard math functions, include path injection may work well enough. That said, I still don't think it's a good general approach, as it assumes that the host headers would compile sensibly for a GPU they were never intended to be used with. tra: > would standard library types have match with the host? That's probably getting us into a…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions implement sufficiently complete set of headers for the GPU. This is what I was going for, we slowly implement standard C headers and provide them as a list of functions and interfaces that are expected to work on the GPU. Otherwise we default to the support we already have. Given that LLVM `libc` is already aiming to provide its own complete set of headers we should be able to reuse a lot of work to this effect. provide GPU-side overloads which would allow us to pull in system headers and augment them with GPU-specific implementations when necessary. I don't think this is a robust solution to implementing a `libc`. The problem is that the system headers can pull in implementation details that we don't necessarily want to recreate. Things like functions or symbols that aren't defined in our `libc` that we have no way of wrapping around if we use the system header. I think the only reasonable way is to try to keep them completely separate and add some glue to the generated headers for the GPU as-needed. jhuber6: > - implement sufficiently complete set of headers for the GPU. This is what I was going for…
if (Args.hasFlag(options::OPT_fpch_instantiate_templates,		if (Args.hasFlag(options::OPT_fpch_instantiate_templates,
options::OPT_fno_pch_instantiate_templates, true))		options::OPT_fno_pch_instantiate_templates, true))
CmdArgs.push_back(Args.MakeArgString("-fpch-instantiate-templates"));		CmdArgs.push_back(Args.MakeArgString("-fpch-instantiate-templates"));
}		}
if (YcArg \|\| YuArg) {		if (YcArg \|\| YuArg) {
StringRef ThroughHeader = YcArg ? YcArg->getValue() : YuArg->getValue();		StringRef ThroughHeader = YcArg ? YcArg->getValue() : YuArg->getValue();
if (!isa<PrecompileJobAction>(JA)) {		if (!isa<PrecompileJobAction>(JA)) {
CmdArgs.push_back("-include-pch");		CmdArgs.push_back("-include-pch");
▲ Show 20 Lines • Show All 7,298 Lines • Show Last 20 Lines

clang/test/Driver/gpu-libc-headers.c

This file was added.

				// REQUIRES: nvptx-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp --sysroot=./ \
				// RUN: -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx908 \
				// RUN: -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-HEADERS
				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp=libomp --sysroot=./ \
				// RUN: -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_70 \
				// RUN: -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-HEADERS
				// RUN: %clang -### --target=nvptx64-nvidia-cuda -march=sm_70 -nogpulib --sysroot=./ %s 2>&1 \| \
				// RUN: FileCheck %s --check-prefix=CHECK-HEADERS
				// RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib --sysroot=./ %s 2>&1 \| \
				// RUN: FileCheck %s --check-prefix=CHECK-HEADERS
				// CHECK-HEADERS: "-cc1"{{.}}"-c-isystem" "{{.}}include/gpu-none-llvm"{{.*}}"-isysroot" "./"

				traUnsubmitted Not Done Reply Inline Actions I think here we want to test for not just the presence of `include/llvm-libc`, but also that it's in the correct position relative to other include paths. We may want something similar to what we have in clang/test/Driver/hip-include-path.hip tra: I think here we want to test for not just the presence of `include/llvm-libc`, but also that…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions Yeah, I wasn't sure if there was a good way to guarantee a certain path since those can change based on the system. Maybe `--sysroot`? jhuber6: Yeah, I wasn't sure if there was a good way to guarantee a certain path since those can change…
				traUnsubmitted Not Done Reply Inline Actions I do not have a good answer. @echristo -- when we need to cross-compile for some target, who/where/how tells clang where to get target-specific headers? tra: I do not have a good answer. @echristo -- when we need to cross-compile for some target…
				// RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
				// RUN: -nogpuinc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-HEADERS-DISABLED
				// RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
				// RUN: -nostdinc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-HEADERS-DISABLED
				// RUN: %clang -### --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nogpulib \
				// RUN: -nobuiltininc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-HEADERS-DISABLED
				// CHECK-HEADERS-DISABLED-NOT: "-cc1"{{.}}"-c-isystem" "{{.}}include/gpu-none-llvm"