- There are several scenarios where the compilation needs stopping before backend, such as -E, -fsyntax-ony, and even more if developers want to diagnose outputs from different phases. Under these cases, the offload bundler is not yet required or not valid to run as the output from the device-side compilation is not ready yet. As the result, it's assumed that, if the final phase is ahead of backend, these compilations are host only. If developers need the corresponding outputs for those phases from the device-side one, --cuda-device-only needs specifying to the compiler.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 39099 Build 39109: arc lint + arc unit
Event Timeline
I'm fine with this for -E/-M,
I would argue that with -fsyntax-only we want to know whether our source code, which is common for all sub-compilations, has syntactic errors.
The way we compile HIP & CUDA sources, some of the errors will only be reported on one side of the compilation.
So, in order to make sure there are no syntax errors, we need to perform *all* sub-compilations with -fsyntax-only.
E.g. it would be rather surprising to see the compilation succeeding with -fsyntax-only, but then fail with a syntax error somewhere on device side during a real compilation.
for most compilation tools, single input and single output are expected. Without assuming -fsyntax-only alone is host-compilation only, that at least run syntax checking twice. The result may be misleading and there are clang-based tools (like clang-tidy) may have no legacy way to be runnable. To check device-side compilation syntax, we are still able to explicitly ask that by specifying --cuda-device-only.
I believe the driver will not run subsequent jobs if one of the device compilations fails. You may see duplicate warnings from multiple stages, but overall the error handling works in a fairly predictable way now.
The result may be misleading
Potentially repeated warning are still *correct*, while omitting an error is not, IMO. I believe that did come up before and we had to make some changes to the driver to keep host compilation working even when device-side compilations produce no output.
To think of it, I'm starting to doubt that this patch is an improvement for -M either. You will get the dependencies for host, but they are not necessarily the same as the dependencies for the device-side compilation. Producing partial list of dependencies will be potentially incorrect. IMO we do need dependencies info from all sub-compilations.
Perhaps we should limit the scope of this patch to -E only for now?
and there are clang-based tools (like clang-tidy) may have no legacy way to be runnable.
Tooling does get surprised by the multiple jobs created by CUDA compilation.
The work around is to pass --cuda-host-only. Passing an extra flag is usually not a showstopper (at least it was not in cases I had to deal with at work and we have fair number of clang-derived tools). Usually in order to get correct CUDA compilation in this scenario you will also need to tell the tool where to find CUDA's headers, so the mechanism for passing additional options already exists.
If that's still a problem, then we should change the tooling infrastructure to use host-only compilation for HIP and CUDA by default.
To check device-side compilation syntax, we are still able to explicitly ask that by specifying --cuda-device-only.
Yes, that could be done. However, as I've mentioned above the same argument applies to tooling & --cuda-host-only, so there's no advantage here. IMO the most common use case should be the default, so it's the clang itself which should remain working correctly without having to add extra flags.
Also, this patch makes it impossible to run -fsyntax-only on *all* sub-compilations at once. I will have to run clang -fsyntax-only multiple times -- once per host and once per each device.
I do want to check all sub-compilations with -fsyntax-only on a regular basis (e.g. when running creduce on a cuda source), so having to do that for each sub-compilation separately does not look like an improvement to me.
It still runs and gives the same error (if that error is applicable to both sides) at least twice if you just specify -fsyntax-only or -E. That won't happen for regular compilation option (-c) due to the additional device dependencies added.
The error itself is, in fact, should be clear enough, the most confusing part is the diagnostic message and suggestions from clang as host- and device-side compilations are quite different, especially the error message may be mixed with other-side the normal output.
The result may be misleading
Potentially repeated warning are still *correct*, while omitting an error is not, IMO. I believe that did come up before and we had to make some changes to the driver to keep host compilation working even when device-side compilations produce no output.
To think of it, I'm starting to doubt that this patch is an improvement for -M either. You will get the dependencies for the host, but they are not necessarily the same as the dependencies for the device-side compilation. Producing a partial list of dependencies will be potentially incorrect. IMO we do need dependencies info from all sub-compilations.
Even without this patch, -M or more specifically -MD already breaks now as we just run the dependency generation action twice for each side. The later will overwrite the former *.d file. We need special handling of -M to match nvcc.
Perhaps we should limit the scope of this patch to -E only for now?
Just found nvcc's -E returns the output of the device-side compilation for the first GPU arch specified. Anyway, whether to match that behavior is just another question.
and there are clang-based tools (like clang-tidy) may have no legacy way to be runnable.
Tooling does get surprised by the multiple jobs created by CUDA compilation.
The work around is to pass --cuda-host-only. Passing an extra flag is usually not a showstopper (at least it was not in cases I had to deal with at work and we have fair number of clang-derived tools). Usually in order to get correct CUDA compilation in this scenario you will also need to tell the tool where to find CUDA's headers, so the mechanism for passing additional options already exists.
but some tools, like clang-tidy, may be found difficult to insert that option properly, says clang-tidy -p supposes to read the compilation command databased generated by cmake or meta-build systems and performs additional checks for sources. Adding that option may inevitably make them CUDA/HIP aware.
If that's still a problem, then we should change the tooling infrastructure to use host-only compilation for HIP and CUDA by default.
That's an option I was looking into as well. But, generally speaking, we need a clear definition on expected output for options like -M, -MD, -E, -fsyntax-only, -S -emit-llvm, even -S (for HIP only.)
To check device-side compilation syntax, we are still able to explicitly ask that by specifying --cuda-device-only.
Yes, that could be done. However, as I've mentioned above the same argument applies to tooling & --cuda-host-only, so there's no advantage here. IMO the most common use case should be the default, so it's the clang itself which should remain working correctly without having to add extra flags.
Also, this patch makes it impossible to run -fsyntax-only on *all* sub-compilations at once. I will have to run clang -fsyntax-only multiple times -- once per host and once per each device.
We do have another option -cuda-compile-host-device to explicit ask for host- and device-side compilations.
I do want to check all sub-compilations with -fsyntax-only on a regular basis (e.g. when running creduce on a cuda source), so having to do that for each sub-compilation separately does not look like an improvement to me.
TL; DR;
+1 to formalizing how we want -M*/-E/-S/-emit-llvm/-fsyntax-only to behave.
OK with -M/-E/-S defaulting to host, and erroring out if applied to multiple sub-compilations.
I'm still convinced that the tooling issue with multiple subcompilations is orthogonal to this change and should be handled in libclang and that -fsyntax-only should not default to one sub-compilation.
See the details below.
Are you talking about the behavior with or without this patch?
Without the patch I do see the driver stopping as soon as one of the subcompilations errors out and it needs *all* subcompilations to succeed in order for the whole invocation to succeed.
E.g.: err.cu:
common error; #if __CUDA_ARCH__ __device__ void foo() { device error; } #else __host__ void bar() { host error; } #endif
$ bin/clang --cuda-path=$HOME/local/cuda-10.0 --cuda-gpu-arch=sm_30 -fsyntax-only err.cu err.cu:1:1: error: unknown type name 'common' common error; ^ err.cu:9:3: error: unknown type name 'host' host error; ^ 2 errors generated when compiling for host.
If I comment out the host and common errors, then clang finishes host compilation, moves on to device-side compilation and reports the error there:
err.cu:5:3: error: unknown type name 'device'; did you mean 'CUdevice'? device error; ^~~~~~ CUdevice /usr/local/google/home/tra/local/cuda-10.0/include/cuda.h:252:13: note: 'CUdevice' declared here typedef int CUdevice; /**< CUDA device */ ^ 1 error generated when compiling for sm_30.
Again, it reports that there's an error on device side. My understanding is that with this patch clang would've succeeded, which is, IMO, incorrect.
The error itself is, in fact, should be clear enough, the most confusing part is the diagnostic message and suggestions from clang as host- and device-side compilations are quite different, especially the error message may be mixed with other-side the normal output.
Mixing errors and output will be confusing no matter what you do. That's why diags go to stderr by default, so you can separate them from the regular output.
As far as errors go, clang clearly labels which side of the compilation produced them:
1 error generated when compiling for sm_30.
To think of it, I'm starting to doubt that this patch is an improvement for -M either. You will get the dependencies for the host, but they are not necessarily the same as the dependencies for the device-side compilation. Producing a partial list of dependencies will be potentially incorrect. IMO we do need dependencies info from all sub-compilations.
Even without this patch, -M or more specifically -MD already breaks now as we just run the dependency generation action twice for each side. The later will overwrite the former *.d file. We need special handling of -M to match nvcc.
This sounds like a bug to me. We should've refused to write multiple files -- similar to how we refuse to preprocess if we have more than one sub-compilation.
OK. In this sense defaulting to host-only here would be a marginal improvement. Until we can produce complete dependency info for all sub-compilations letting user specifically select one (with host being default) is probably the best choice we can make at the moment.
Perhaps we should limit the scope of this patch to -E only for now?
Just found nvcc's -E returns the output of the device-side compilation for the first GPU arch specified. Anyway, whether to match that behavior is just another question.
NVCC is not a good guideline here. Considering that they do source code transformation, it's not quite clear what exactly -E means for NVCC at all.
Defaulting to host-side -E would probably be fine as long as user can override it with --cuda-device-only.
but some tools, like clang-tidy, may be found difficult to insert that option properly, says clang-tidy -p supposes to read the compilation command databased generated by cmake or meta-build systems and performs additional checks for sources. Adding that option may inevitably make them CUDA/HIP aware.
I'm not sure I understand your argument here. Clang driver does create multiple sub-compilations for CUDA/HIP. That's the reality. The tools that use the library commonly assume that 1 compilation == 1 clang action, which is not valid for CUDA. Our options are:
- Let user specify which subcompilation they want. This requires passing extra flags and complicates their use, especially for tools that rely on using the exact flags used during normal build.
- Teach the tools to deal with multiple actions. This will need changing all tools. Probably the best way long-term, but does not help much now.
- Let libclang pick a sensible default, but allow the tool override it if they need it. This would be preferred way, IMO. This keeps current tools working and allows future tools to dig into all sub-compilations if they wish.
- Let clang driver guess that the user would only need one sub-compilation (e.g. assuming that the tools would commonly use -fsyntax-only). This, IMO, not the right place to deal with tooling issues and it does result in what I think is incorrect/unexpected output for normal use of -fsyntax-only in clang.
If that's still a problem, then we should change the tooling infrastructure to use host-only compilation for HIP and CUDA by default.
That's an option I was looking into as well. But, generally speaking, we need a clear definition on expected output for options like -M, -MD, -E, -fsyntax-only, -S -emit-llvm, even -S (for HIP only.)
I think we have a decent understanding of what we want (even if clang does not quite do the right thing ATM as you've pointed above regarding -M).
- -M, -MD, -E, -S -emit-llvm, even -S
Require unambiguous single sub-compilation. It could either be explicitly specified by user (--cuda-host-only/cuda-device-only w/ single GPU arch), or have a reasonable default (host?)
- -fsyntax-only
Should succeed only if all sub-compilations succeeded. Error reporting stops after one of sub-compilations fail with clear indication of which side produced the errors.
Does this make sense?
Also, this patch makes it impossible to run -fsyntax-only on *all* sub-compilations at once. I will have to run clang -fsyntax-only multiple times -- once per host and once per each device.
We do have another option -cuda-compile-host-device to explicit ask for host- and device-side compilations.
Does it work with this patch? IIRC, --cuda-compile-host-device would only override --cuda-host-only/--cuda-device-only flags and with this patch we'd still end up with the host compilation only.
I am starting to fix clang tooling issues found in both driver (https://reviews.llvm.org/D68652) and tooling parts. Will submit tooling part for review soon. After that, we may nail down the details on expected output from -M*/-E and etc.