This patches fixes an issue in which the __clang_cuda_cmath.h header is being included even when cmath or math.h headers are not included.
Details
- Reviewers
jdoerfert ABataev hfinkel caomhin tra - Commits
- rZORGc2401f8391f0: [OpenMP][Clang][BugFix] Split declares and math functions inclusion.
rZORGe082bc297abe: [OpenMP][Clang][BugFix] Split declares and math functions inclusion.
rGc2401f8391f0: [OpenMP][Clang][BugFix] Split declares and math functions inclusion.
rGe082bc297abe: [OpenMP][Clang][BugFix] Split declares and math functions inclusion.
rG946957189d6b: [OpenMP][Clang][BugFix] Split declares and math functions inclusion.
rL360626: [OpenMP][Clang][BugFix] Split declares and math functions inclusion.
rC360626: [OpenMP][Clang][BugFix] Split declares and math functions inclusion.
Diff Detail
- Repository
- rC Clang
Event Timeline
This always includes the declare file but not the define file, correct?
Could we have 4 tests that are compiled in target mode with:
// with and without math.h/cmath (clang/clang++) #include <math.h> long abs(long __i) { return (__i < 0 ? -i : i); }
lib/Headers/openmp_wrappers/__clang_openmp_math_declares.h | ||
---|---|---|
18 | I thought the "not defining abs" in __clang_cuda_math_forward_declares.h was the solution? |
lib/Headers/__clang_cuda_math_forward_declares.h | ||
---|---|---|
30–37 | I'm not quite sure what's the idea here. It may be worth adding a comment. It could also be expressed somewhat simpler: #if !(defined(_OPENMP) && defined(__cplusplus)) ... #endif | |
lib/Headers/openmp_wrappers/__clang_openmp_math_declares.h | ||
11 | You may want to add include guards. I'd also make inclusion of the file in non-openmp compilation an error, if it makes sense for OpenMP. It does for CUDA. |
lib/Headers/__clang_cuda_math_forward_declares.h | ||
---|---|---|
30–37 | When these two functions definitions are here or in the clang_cuda_cmath.h header then I get the following error (adapted for the clang_cuda_cmath.h case): /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/cstdlib:166:3: error: declaration conflicts with target of using declaration already in scope abs(long __i) { return __builtin_labs(__i); } ^ /autofs/home/gbercea/patch-compiler/obj-release/lib/clang/9.0.0/include/__clang_cuda_cmath.h:40:17: note: target of using declaration __DEVICE__ long abs(long __n) { return ::labs(__n); } ^ /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/cstdlib:122:11: note: using declaration using ::abs; ^ /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/cstdlib:174:3: error: declaration conflicts with target of using declaration already in scope abs(long long __x) { return __builtin_llabs (__x); } ^ /autofs/home/gbercea/patch-compiler/obj-release/lib/clang/9.0.0/include/__clang_cuda_cmath.h:39:22: note: target of using declaration __DEVICE__ long long abs(long long __n) { return ::llabs(__n); } ^ /usr/lib/gcc/ppc64le-redhat-linux/4.8.5/../../../../include/c++/4.8.5/cstdlib:122:11: note: using declaration using ::abs; |
Last issue I have (in addition to the check @tra suggested) is the order in which we include math.h and cstdlib. can you flip it in one of the test cases?
lib/Headers/__clang_cuda_math_forward_declares.h | ||
---|---|---|
30–37 | Long story short, we currently cannot use the overload trick through __device__ and therefore *replace* (not augment) host math headers with the cuda versions which unfortunately mix std math functions with other functions that we don't want/need. | |
lib/Headers/openmp_wrappers/__clang_openmp_math_declares.h | ||
11 | That is something we should be able to do, error out if _OPENMP is not defined I mean. |
lib/Headers/__clang_cuda_math_forward_declares.h | ||
---|---|---|
30–37 | This doesn't seem to be happening in the CUDA case. My suspicion is it's because of the device attribute. |
lib/Headers/__clang_cuda_math_forward_declares.h | ||
---|---|---|
30–37 | It looks like until OpenMP supports some sort of target-based overloading this will not play nicely with libstdc++. |
lib/Headers/__clang_cuda_math_forward_declares.h | ||
---|---|---|
30–37 | Correct. __device__ functions overload whatever (implicitly)__host__ functions declared by the standard library, so they coexist w/o problems. Usually. host/device implementation nuances are still observable. |
lib/Headers/__clang_cuda_math_forward_declares.h | ||
---|---|---|
30–37 | Just did a few quick tests with libstdc++ and it was all good. |
lib/Headers/__clang_cuda_math_forward_declares.h | ||
---|---|---|
30–37 | How about libc++? The idea is to make sure the change works with both libraries. |
Two small changes and then it is fine with me. @tra ?
- we need to use ifdef to not define clock
- Can you switch the include order in test/Headers/nvptx_device_math_functions.cpp?
P.S. I'm currently at the OpenMP standard meeting to get the OpenMP variants fixed.
Once done, we should prioritize the implementation.
Excluding non-math functions in the cuda headers is not perfect...
As soon as libc++ the limits header included in
__clang_cuda_cmath.h:15 ``` is not found:
__clang_cuda_cmath.h:15:10: fatal error: 'limits' file not found
#include <limits>
Not even CUDA works actually so I'm not sure what the best answer to this problem is.
Could you give me more details on how you've got this error?
If this change breaks CUDA compilation with libc++, that's going to be a problem. Currently CUDA and clang's headers we ship do work with both libc++ and few versions of libstdc++:
E.g: http://lab.llvm.org:8011/builders/clang-cuda-build/builds/33364/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio
It's an error on my side, I don't have libc++ installed so trying to use it will come up with header not found errors.
LGTM for CUDA. I'll leave the question of testing with libc++ to someone more familiar with OpenMP.
LGTM from my side. I don't have strong feelings about testing libc++ now, though it is probably a good idea to have such a testbed.
I agree this should not infer with CUDA (at least that is our intention).
P.S.
There will be a ticket to add the OpenMP variant support we need (similar to the __device__ in CUDA) will be written by Tom Scogland and me for a first vote already in this meeting.
That is what we will then need to implement.
I'm not quite sure what's the idea here. It may be worth adding a comment.
It could also be expressed somewhat simpler: