- User Since
- Jan 8 2015, 1:53 PM (273 w, 5 d)
Enumerate all known GPU variants during libdevice detection instead of
Please add some more details to the bug description. This change is to make clangd work when compilation database sees CUDA sources compiled with nvcc. NVCC uses different options that should be properly translated. This patch only deals with recognizing the sources as CUDA, but does not handle the compiler options. While not perfect, it's still a useful improvement.
Fri, Apr 3
LGTM for NVPTX tests.
Thu, Apr 2
Wed, Apr 1
We do have a problem. With your patch I see a lot of errors about function redefinitions conflicting with the ones in CUDA's math_functions.hpp:
The other macro uses a unqualified lookup already and the qualified one
will cause problems in the OpenMP overlay.
We'll need to make sure that all of these new functions are vailable in all supported CUDA versions.
E.g. acoshf does not seem to be present in CUDA-9.
Tue, Mar 31
LGTM for NVPTX changes.
Mon, Mar 30
+ @echristo who OK'ed the idea conditional on the actual patch. :-)
Looks OK to me, but I'm not that familiar with the legalizer.
Fri, Mar 27
Reopened for further work
Would it be possible to update the old review with the new diff? It would make it easier to see the incremental changes you've made. If the old review can be reopened that would be great as it would keep all relevant info in one place, but I'm fine doing the review here, too, if phabricator does not let you do it.
Looks like the change breaks compilation for us:
Thu, Mar 26
LGTM. Next step is to figure out what various __nv_tex_surf_handler(<string>...) maps to for various strings (there are ~110 of them in CUDA-10.2) and implement its replacement. I think we should be able to do it in the wrapper file.
Wed, Mar 25
Mon, Mar 23
Fri, Mar 20
Does it handle options with values? E.g. if I want to pass -mframe-pointer=none ? I vaguely recall the current -Xarch_* implementation had some limitations.
It may be worth adding a test for that.
Thu, Mar 19
I believe LLVM does have nvvm.texsurf.handle implemented: https://github.com/llvm/llvm-project/blob/d9972f848294b06807c8764615852ba2bc1e8a74/llvm/include/llvm/IR/IntrinsicsNVVM.td#L1150
Note that, clang-based one needs defining texture fetch functions as they could not be reused from CUDA SDK. That part is enclosed with #if defined(clang).
+1 for refactoring, but what's the long term plan.
Long time ago echristo@ and I had a vague idea to change clang's option parsing to allow something like -Xarch_host <host-only args> -Xarch_device <args for all GPU compilations...> -Xarch=<target> <options for <target> only...>
Wed, Mar 18
Mon, Mar 16
@hans -- this should be cherry-picked into 10 if it's not too late yet.
Thu, Mar 12
Wed, Mar 11
LGTM, but I'm not familiar with the details of clang-cl. I've added Reid who'd have a better idea.
clang-formatted the changes.
Added a test.
This patch should fix it: https://reviews.llvm.org/D76030
Tue, Mar 10
Mon, Mar 9
While such reordering may be beneficial, in general, GEP(ASC(x)) is not always equivalent of ASC(GEP(x)), so we can't just blindly do it. I believe this has been discussed few times in the past, though I can't find relevant emails now.
This looks like an elaborate way to achieve the effect of -Wno-unknown_cuda_version. :-)
I'm not sure that's the problem worth solving.
Few nits. LGTM otherwise.
Couple of nits below. LGTM for CUDA headers otherwise.
Feb 27 2020
Feb 25 2020
Feb 19 2020
Feb 18 2020
Nice! Thank you for making these changes.
Feb 14 2020
Feb 13 2020
It's a bit premature to call CUDA-10.2 supported. We can compile using it, but clang/llvm has no support for the new things introduced by CUDA-10.2.
E.g. CUDA-10.2 introduces new PTX version with new instructions (and matching clang builtins)
Feb 12 2020
Minor test nits. LGTM otherwise.
I'm curious, what does generated ptx for the function look before/after the patch.
Feb 11 2020
Tensorflow folks report that the __try() here does not compile on windows:
ERROR: T:/tmp/nsz6drem/external/llvm-project/llvm/BUILD:3716:1: C++ compilation of rule '@llvm-project//llvm:support' failed (Exit 2) external/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp(218): error C2712: Cannot use __try in functions that require object unwinding
Feb 10 2020
LGTM to unblock our build/integration.
LGTM, but wait a bit before landing in case @nhaehnle has concerns.
Feb 4 2020
Thank you for adding the escape hatch option.
A better description for the change would be helpful.
On one hand the change makes sense to me and fits well with what we've done so far.
On the other hand, I worry that this is likely to break things.
We sort of have been implicitly relying on not having the macros related to advanced CPU features enabled on device side which typically results in device-side compilation seeing a more portable/simpler version of the code.
Defining the macros that enable more host-side CPU-specific code may trigger interesting compatibility features. Postponed diagnostics combined with ignore (some) errors in the wrong-side-only code should probably deal with most of them, but I suspect we'll see new interesting failure cases. We would not know until we try.
Jan 29 2020
LGTM for CUDA.
Jan 28 2020
@hans : that's another candidate for 10.x cherry-pick, if you're OK with it.
Jan 27 2020
Use std::string instead of Twine which can't be stored.