Page MenuHomePhabricator
Feed Advanced Search

Mon, Aug 12

tra added inline comments to D25845: [CUDA] Ignore implicit target attributes during function template instantiation. .
Mon, Aug 12, 11:18 AM · Restricted Project

Thu, Aug 1

tra accepted D65612: Handle casts changing pointer size in the vectorizer.
Thu, Aug 1, 5:03 PM · Restricted Project
tra added inline comments to D65612: Handle casts changing pointer size in the vectorizer.
Thu, Aug 1, 4:27 PM · Restricted Project
tra accepted D65600: Relax load store vectorizer pointer strip checks.
Thu, Aug 1, 3:04 PM · Restricted Project
tra added inline comments to D65600: Relax load store vectorizer pointer strip checks.
Thu, Aug 1, 2:33 PM · Restricted Project
tra added a comment to D65600: Relax load store vectorizer pointer strip checks.

It looks to me that the root cause is in

Thu, Aug 1, 1:42 PM · Restricted Project

Tue, Jul 30

tra accepted D65468: [NVPTX] Fix PR41651.
Tue, Jul 30, 12:47 PM · Restricted Project
tra added inline comments to D65468: [NVPTX] Fix PR41651.
Tue, Jul 30, 12:27 PM · Restricted Project
tra added a reviewer for D65468: [NVPTX] Fix PR41651: tra.
Tue, Jul 30, 12:21 PM · Restricted Project

Jul 18 2019

tra closed D64588: Minor cleanup..

Landed ir r365913

Jul 18 2019, 9:39 AM · Restricted Project

Jul 12 2019

tra committed rGd9963b627ad2: Minor cleanup. (authored by tra).
Minor cleanup.
Jul 12 2019, 9:14 AM
tra committed rL365913: Minor cleanup..
Minor cleanup.
Jul 12 2019, 9:14 AM

Jul 11 2019

tra updated the summary of D64588: Minor cleanup..
Jul 11 2019, 2:22 PM · Restricted Project
tra created D64588: Minor cleanup..
Jul 11 2019, 1:32 PM · Restricted Project

Jul 10 2019

tra accepted D64364: [HIP] Add GPU arch gfx1010, gfx1011, and gfx1012.
Jul 10 2019, 3:42 PM · Restricted Project

Jun 26 2019

tra added inline comments to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
Jun 26 2019, 8:52 AM · Restricted Project

Jun 25 2019

tra accepted D62738: [HIP] Support attribute hip_pinned_shadow.

LGTM. Thank you!

Jun 25 2019, 1:35 PM · Restricted Project
tra added a comment to D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG)..

Hi Sam,
The compiler generates metadata for the first 48 bytes. I compiled a sample code and verified it. The backend does nothing for the extra bytes now.
I will soon submit the backend patch to generate the new metadata.

Jun 25 2019, 10:24 AM · Restricted Project, Restricted Project

Jun 24 2019

tra added inline comments to D63209: [AMDGPU] gfx1010 wave32 clang support.
Jun 24 2019, 2:41 PM · Restricted Project
tra added inline comments to D62738: [HIP] Support attribute hip_pinned_shadow.
Jun 24 2019, 10:39 AM · Restricted Project

Jun 19 2019

tra accepted D63277: [CUDA][HIP] Don't set "comdat" attribute for CUDA device stub functions..
Jun 19 2019, 12:32 PM · Restricted Project, Restricted Project
tra added a comment to D62738: [HIP] Support attribute hip_pinned_shadow.

Overall looks good. Thank you for making the change.

Jun 19 2019, 11:05 AM · Restricted Project
tra added a comment to D63277: [CUDA][HIP] Don't set "comdat" attribute for CUDA device stub functions..

SGTM in principle. Folding the stubs would be bad as their addresses are implicitly used to identify the kernels to launch.

Jun 19 2019, 10:39 AM · Restricted Project, Restricted Project

Jun 17 2019

tra added a comment to D63097: [cmake] Don't add Support/Testing library if tests are not included.

LGTM, but I'm not familiar with the use of the stuff under Testing/. I've added @zturner who did some work there.

Jun 17 2019, 12:16 PM · Restricted Project
tra added a reviewer for D63097: [cmake] Don't add Support/Testing library if tests are not included: zturner.
Jun 17 2019, 12:16 PM · Restricted Project
tra added inline comments to D63097: [cmake] Don't add Support/Testing library if tests are not included.
Jun 17 2019, 10:42 AM · Restricted Project

Jun 14 2019

tra accepted D63335: [HIP] Add the interface deriving the stub name of device kernels..

LGTM. This is a cleaner way to provide stub name tweaks.

Jun 14 2019, 3:19 PM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..

I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?

Jun 14 2019, 3:05 PM · Restricted Project, Restricted Project
tra requested changes to D63335: [HIP] Add the interface deriving the stub name of device kernels..

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.

Jun 14 2019, 2:29 PM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..

Is it OK for us to mangle __device_stub __ as the nested name into the original one, says, we prepend _ZN15__device_stub__E, so that we have _ZN15__device_stub__E10kernelfuncIiEvv

and

$ c++filt _ZN15__device_stub__E10kernelfuncIiEvv
__device_stub__(kernelfunc<int>, void, void)
Jun 14 2019, 2:19 PM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..
In D63335#1543854, @tra wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.

Jun 14 2019, 2:15 PM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

Jun 14 2019, 10:12 AM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..

Is there particular reason you need to switch to this naming scheme?

Jun 14 2019, 9:55 AM · Restricted Project, Restricted Project

Jun 13 2019

tra added a comment to D63210: gn build: Add NVPTX target.

Yay! Thank you. Lack of this has stopped me when I tried gn last time. I'll try again once this patch lands.

Jun 13 2019, 11:01 AM · Restricted Project
tra added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
In D62393#1542042, @tra wrote:

C++ volatile will give you that. You also rely on atomicity. C++ volatile does not guarantee that, even if NVPTX does happen to. It's a mere coincidence. What if next PTX revision provides a better way to implement C++ volatile without providing atomicity? Then your code will all of a sudden break in new and exciting ways.

I think it's completely healthy and fair to be skeptical that we'll do the right thing by you, but FWIW the changes I want to make to CUDA volatile after we ship std::atomic<T> go in the other direction. And the legacy requirements are pretty entrenched here, too.

Jun 13 2019, 10:45 AM · Restricted Project
tra added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
In D62393#1541969, @tra wrote:

@reames , @tra , @Hahnfeld , @jlebar , @chandlerc, I see that this was discussed in D50391 (in terms of using PTX's volatile to provide atomic lowering), but it's not clear to me that using volatile with atomic semantics in LLVM is something we guarantee will work (even in CUDA mode). I imagine that we'd need to lower these in terms of relaxed atomics in Clang for this to really be guaranteed to work correctly.

Do I understand it correctly that the argument is about using C++ volatile with the assumption that those will map to ld.volatile/st.volatile, which happen to provide sufficiently strong guarantees to be equivalent of LLVM's atomic monotonic operations?

If that's the case, then I would agree that it's an implementation detail one should not be relying on. If atomicity is needed, we should figure out a way to express that in C++.

In practice, the standard C++ library support in cuda is somewhat limited. I don't think atomic<> works, so volatile might be a 'happens to work' short-term workaround. If that's the case, then there should a comment describing what's going on here and TODO to fix it when better support for atomics on GPU is available.

Another option would be to use atomic*() functions provided by CUDA, but those do have limited functionality on older GPUs.

Yet another alternative is to explicitly use ld.volatile/st.volatile from inline assembly. I don't think we have it plumbed as clang builtin.

Artem, thanks for your comment. But we need not only atomicity, bht also we need to command the ptxas compiler to not optimize accesses to parallelLevel.

Jun 13 2019, 10:08 AM · Restricted Project
tra added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

@reames , @tra , @Hahnfeld , @jlebar , @chandlerc, I see that this was discussed in D50391 (in terms of using PTX's volatile to provide atomic lowering), but it's not clear to me that using volatile with atomic semantics in LLVM is something we guarantee will work (even in CUDA mode). I imagine that we'd need to lower these in terms of relaxed atomics in Clang for this to really be guaranteed to work correctly.

Jun 13 2019, 9:15 AM · Restricted Project

Jun 11 2019

tra added a comment to D63164: [HIP] Add option to force lambda nameing following ODR in HIP/CUDA..

So, in short, what you're saying is that lambda type may leak into the mangled name of a __global__ function and ne need to ensure that the mangled name is identical for both host and device, hence the need for consistent naming of lambdas.

Jun 11 2019, 2:55 PM · Restricted Project
tra added a reviewer for D63164: [HIP] Add option to force lambda nameing following ODR in HIP/CUDA.: rsmith.
Jun 11 2019, 2:50 PM · Restricted Project
tra added a reviewer for D62738: [HIP] Support attribute hip_pinned_shadow: jlebar.
Jun 11 2019, 2:23 PM · Restricted Project
tra added a comment to D62738: [HIP] Support attribute hip_pinned_shadow.

So, the only thing this patch appears to do is make everything with this attribute uninitialized on device side and give protected visibility.
If I understand it correctly, you're using the attribute in order to construct something that's sort of opposite of the currently used device vars with host-side shadows. Only now the real variable lives on the host side and it's device side that gets the 'shadow' copy. Do I understand it correctly?

Jun 11 2019, 2:22 PM · Restricted Project
tra added a comment to D62738: [HIP] Support attribute hip_pinned_shadow.

Syntactically the patch looks OK to me, but I think the purpose and meaning of the builtin type should be documented in more details. Based on this patch alone it's not clear at all what it's supposed to be used for and how.

Jun 11 2019, 11:33 AM · Restricted Project

Jun 10 2019

tra accepted D63029: [CUDA] Fix grep pattern in cuda-types.cu.
Jun 10 2019, 9:19 AM · Restricted Project, Restricted Project

Jun 7 2019

tra added a reviewer for D63029: [CUDA] Fix grep pattern in cuda-types.cu: tra.
Jun 7 2019, 3:15 PM · Restricted Project, Restricted Project
tra added inline comments to D63029: [CUDA] Fix grep pattern in cuda-types.cu.
Jun 7 2019, 3:11 PM · Restricted Project, Restricted Project

May 29 2019

tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

I think static __device__ globals would fall into the same category -- nominally they should not be visible outside of device-side object file, but in practice we do need to make them visible from the host side of the same TU.

Are you sure nvcc support accessing static __device__ variables in host code? That would be expensive to implement.

May 29 2019, 1:56 PM · Restricted Project
tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

that should assume that variable is not declared with static. that's also the motivation of this patch.

May 29 2019, 12:31 PM · Restricted Project
tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

NVCC also allows that: https://godbolt.org/z/t78RvM

BTW, that code posted looks quite weird to me, how the code could make sense by return a pointer of device variable? or a pointer of shadow host variable?

May 29 2019, 12:12 PM · Restricted Project
tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

Note for the future -- it would be great if we could finish discussing the patch before landing it.
I would still like to see the host-side test.

May 29 2019, 10:29 AM · Restricted Project
tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

thanks, but that static __device__ variable won't have shadow in host anymore.

May 29 2019, 10:29 AM · Restricted Project
tra accepted D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..
May 29 2019, 10:13 AM · Restricted Project

May 28 2019

tra accepted D62483: [CUDA][HIP] Emit dependent libs for host only.
May 28 2019, 9:59 AM · Restricted Project

May 17 2019

tra accepted D62046: [OpenMP][bugfix] Add missing math functions variants for log and abs..

I'd add a comment with a brief explanation for the const variant and a TODO() to remove it.

May 17 2019, 9:32 AM · Restricted Project

May 16 2019

tra added inline comments to D62046: [OpenMP][bugfix] Add missing math functions variants for log and abs..
May 16 2019, 5:03 PM · Restricted Project

May 15 2019

tra added a comment to D61949: [OpenMP][bugfix] Fix issues with C++ 17 compilation when handling math functions.

LGTM.

May 15 2019, 12:12 PM · Restricted Project
tra added inline comments to D61949: [OpenMP][bugfix] Fix issues with C++ 17 compilation when handling math functions.
May 15 2019, 9:43 AM · Restricted Project

May 13 2019

tra accepted D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..

This won't affect CUDA in any way, all we have added is OpenMP specific.

May 13 2019, 2:32 PM · Restricted Project
tra added a comment to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..

As soon as libc++ the limits header included in

__clang_cuda_cmath.h:15
``` is not found:

__clang_cuda_cmath.h:15:10: fatal error: 'limits' file not found
#include <limits>

Not even CUDA works actually so I'm not sure what the best answer to this problem is.
May 13 2019, 1:06 PM · Restricted Project
tra added a comment to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..

Two small changes and then it is fine with me. @tra ?

May 13 2019, 10:32 AM · Restricted Project

May 10 2019

tra added inline comments to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..
May 10 2019, 2:37 PM · Restricted Project
tra added inline comments to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..
May 10 2019, 2:01 PM · Restricted Project
tra added inline comments to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..
May 10 2019, 1:53 PM · Restricted Project
tra added inline comments to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..
May 10 2019, 1:24 PM · Restricted Project

May 3 2019

tra committed rL359928: [CUDA buildbot] tell libunwind where to find libcxx..
[CUDA buildbot] tell libunwind where to find libcxx.
May 3 2019, 1:38 PM
tra accepted D61474: [CUDA][Clang][Bugfix] Add missing CUDA 9.2 case.
May 3 2019, 9:38 AM · Restricted Project
tra added inline comments to D61458: [hip] Relax CUDA call restriction within `decltype` context..
May 3 2019, 9:24 AM · Restricted Project

May 2 2019

tra committed rG4cbb23502612: [CUDA] Do not pass deprecated option fo fatbinary (authored by tra).
[CUDA] Do not pass deprecated option fo fatbinary
May 2 2019, 3:36 PM
tra committed rL359838: [CUDA] Do not pass deprecated option fo fatbinary.
[CUDA] Do not pass deprecated option fo fatbinary
May 2 2019, 3:35 PM
tra committed rC359838: [CUDA] Do not pass deprecated option fo fatbinary.
[CUDA] Do not pass deprecated option fo fatbinary
May 2 2019, 3:35 PM
tra closed D61470: [CUDA] Do not pass deprecated option fo fatbinary.
May 2 2019, 3:35 PM · Restricted Project
tra created D61470: [CUDA] Do not pass deprecated option fo fatbinary.
May 2 2019, 3:03 PM · Restricted Project
tra added inline comments to D61458: [hip] Relax CUDA call restriction within `decltype` context..
May 2 2019, 2:27 PM · Restricted Project
tra added a comment to D61458: [hip] Relax CUDA call restriction within `decltype` context..
In D61458#1488523, @tra wrote:

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

good point, do we have a dedicated context for sizeof? that make the checking easier.

May 2 2019, 2:02 PM · Restricted Project
tra added a comment to D61458: [hip] Relax CUDA call restriction within `decltype` context..

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

May 2 2019, 1:37 PM · Restricted Project
tra added a reviewer for D61458: [hip] Relax CUDA call restriction within `decltype` context.: jlebar.
May 2 2019, 1:35 PM · Restricted Project
tra accepted D61399: [OpenMP][Clang] Support for target math functions.
May 2 2019, 10:45 AM · Restricted Project

May 1 2019

tra added a comment to D61396: [hip] Fix ambiguity from `>>>` of CUDA..

LGTM, but I've added @rsmith who is way more familiar with this code.

May 1 2019, 1:22 PM · Restricted Project, Restricted Project
tra added a reviewer for D61396: [hip] Fix ambiguity from `>>>` of CUDA.: rsmith.
May 1 2019, 1:16 PM · Restricted Project, Restricted Project

Apr 30 2019

tra added a comment to D60907: [OpenMP] Add math functions support in OpenMP offloading.

I actually don't want to preinclude anything and my arguments are (mostly) for the OpenMP offloading code path not necessarily Cuda.
Maybe to clarify, what I want is:

  1. Make sure the clang/Headers/math.h is found first if math.h is included.
  2. Use a scheme similar to the one described https://reviews.llvm.org/D47849#1483653 in clang/Headers/math.h
  3. Only add math.h function overloads in our math.h. <- This is debatable
Apr 30 2019, 11:42 AM · Restricted Project
tra added a comment to D60907: [OpenMP] Add math functions support in OpenMP offloading.

+1 to Hal's comments.

Apr 30 2019, 10:33 AM · Restricted Project

Apr 29 2019

tra added a comment to D61274: [Sema][AST] Explicit visibility for OpenCL/CUDA kernels/variables.

A kernel functions in CUDA is actually two different functions. One is the real kernel we compile for the GPU, another is a host-side stub that launches the device-side kernel.

Apr 29 2019, 2:21 PM · Restricted Project

Apr 25 2019

tra committed rG5fe85a003f6b: [CUDA] Implemented _[bi]mma* builtins. (authored by tra).
[CUDA] Implemented _[bi]mma* builtins.
Apr 25 2019, 3:28 PM
tra committed rG16737538f4fc: PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32. (authored by tra).
PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32.
Apr 25 2019, 3:28 PM
tra committed rG8d825b38ed2c: [NVPTX] generate correct MMA instruction mnemonics with PTX63+. (authored by tra).
[NVPTX] generate correct MMA instruction mnemonics with PTX63+.
Apr 25 2019, 3:27 PM
tra committed rG7ecd82ce19ae: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC. (authored by tra).
[NVPTX] Refactor generation of MMA intrinsics and instructions. NFC.
Apr 25 2019, 3:27 PM
tra committed rL359248: [CUDA] Implemented _[bi]mma* builtins..
[CUDA] Implemented _[bi]mma* builtins.
Apr 25 2019, 3:27 PM
tra committed rC359248: [CUDA] Implemented _[bi]mma* builtins..
[CUDA] Implemented _[bi]mma* builtins.
Apr 25 2019, 3:27 PM
tra closed D60279: [CUDA] Implemented _[bi]mma* builtins..
Apr 25 2019, 3:27 PM · Restricted Project, Restricted Project
tra committed rL359247: PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32..
PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32.
Apr 25 2019, 3:27 PM
tra closed D60015: [NVPTX] Added intrinsics/instructions for MMA ops on (sub-)integers.
Apr 25 2019, 3:26 PM · Restricted Project
tra committed rL359246: [NVPTX] generate correct MMA instruction mnemonics with PTX63+..
[NVPTX] generate correct MMA instruction mnemonics with PTX63+.
Apr 25 2019, 3:26 PM
tra closed D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+..
Apr 25 2019, 3:26 PM · Restricted Project
tra committed rL359245: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC..
[NVPTX] Refactor generation of MMA intrinsics and instructions. NFC.
Apr 25 2019, 3:26 PM
tra closed D59389: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC..
Apr 25 2019, 3:26 PM · Restricted Project

Apr 22 2019

tra added a comment to D60985: Fix compatability for cuda sm_75.

FYI, I have almost-ready set of patches to implement missing bits of sm_75 support, including this change:
https://reviews.llvm.org/D60279

Apr 22 2019, 4:32 PM · Restricted Project

Apr 17 2019

tra accepted D60818: [CUDA][Windows] restrict long double device functions declarations to Windows.

LGTM. Thank you for cleaning this up.

Apr 17 2019, 9:24 AM · Restricted Project

Apr 15 2019

tra added a comment to D60727: [NVPTXAsmPrinter] clean up dead code. NFC.

Nick: Can you do some archaeology on the original patch and find out if there was supposed to be something supported here?

Art: Any thoughts?

Apr 15 2019, 12:06 PM · Restricted Project

Apr 12 2019

tra updated subscribers of D60620: [HIP] Support -offloading-target-id.

@arsenm Matt, FYI, this patch seems to be a continuation of D59863 you've commented on.

Apr 12 2019, 10:28 AM
tra updated subscribers of D60620: [HIP] Support -offloading-target-id.

It looks like you are solving two problems here.
a) you want to create multiple device passes for the same GPU, but with different options.
b) you may want to pass different compiler options to different device compilations.
The patch effectively hard-codes {gpu, options} tuple into --offloading-target-id variant.
Is that correct?

Apr 12 2019, 10:22 AM

Apr 10 2019

tra accepted D60513: [HIP] Use -mlink-builtin-bitcode to link device library.
Apr 10 2019, 12:01 PM · Restricted Project