- User Since
- Jan 8 2015, 1:53 PM (335 w, 1 d)
Mon, Jun 7
Fri, Jun 4
Thu, Jun 3
LGTM for CUDA.
Wed, Jun 2
Tue, Jun 1
I'm done with testing. The patch does not seem to break anything obvious. Tensorflow builds and works.
LGTM. I would like to test the patch on our code first. Please wait a bit before landing the patch. I should be able to have the results tomorrow.
Context for the changes in this patch: https://reviews.llvm.org/D97340#2775477
Thu, May 27
Wed, May 26
Overall looks good, though I've got one more question.
Tue, May 25
Mon, May 24
Fri, May 21
Check only _GLIBCXX_RELEASE
Fixed typo in push/pop macro name.
Thu, May 20
I've verified that Tensorflow still builds with this patch and that the patch does fix the regressions we've seen.
If you could land this patch soon, that would be appreciated.
Wed, May 19
This patch does not appear to fix the second regression introduced by the D102237.
Tentative LGTM as we need it to fix the regression soon.
Tue, May 18
Here's a slightly simpler reproducer: https://godbolt.org/z/rW6P9e37s
Here's one example reproducer: https://godbolt.org/z/77M596W89
It's rather hairy, but should be usable for further debugging.
Sam, this patch has apparently triggered some unwanted side effects. I'm still reducing the failures to something that could be used for debugging, but the rough symptoms are:
Mon, May 17
I'll land this patch along with D100124
LGTM with a test nit.
Fri, May 14
In effect this patch applies __host__ __device__ to a subset of the standard library headers and whatever headers *they* happen to include. While it may happen to work, I'm not at all confident that it does not create interesting issues.
May 13 2021
May 12 2021
May 11 2021
LGTM in general.
Perhaps it would make sense to combine this patch with D102237 as both patches are changing the same code for the same reason, just for slightly different kinds of variables.
LGTM for CUDA. This matches the intent of deferred diags -- we only emit them if we get to generate the code for the sources that triggered them, so they should not show up for the false constexpr branches.
LGTM with few nits.
May 10 2021
May 4 2021
May 3 2021
Apr 30 2021
What will happen with this patch in the following scenarios:
- --offload_arch=A -S -o out.s
- --offload_arch=A --offload-arch=B -S -o out.s
CUDA compilation currently errors out if -o is used when more than one output would be produced.
% bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2 zz.cu -c -E #... preprocessed output from host and 2 GPU compilations is printed out
Apr 29 2021
Apr 26 2021
Few nits. LGTM overall.
Apr 22 2021
General style nit: the script has very inconsistent quoting for the variables. They are quoted in some places but not others.
Apr 21 2021
Do you know if any existing code already uses the __nvvm_* builtins for cp.async? In other words, does nvcc provide them already or is it something we're free to name as we wish?
I do not see any relevant intrinsics mentioned in NVVM IR spec: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html and I don't think NVCC's builtins are publicly documented anywhere.
The planned new option for offloading will be a more generic solution, however, I expect it will take time to develop and be adopted.
LGTM. I'll leave the patch approval to @gkistanova .
Apr 20 2021
LGTM overall, modulo few test and naming nits.
Apr 19 2021
Small test nit. LGTM otherwise.
Apr 16 2021
Overall the patch looks good. We may still need to tweak intrinsic properties later, but this is a good starting point.