Page MenuHomePhabricator

tra (Artem Belevich)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 8 2015, 1:53 PM (245 w, 3 d)

Recent Activity

Fri, Sep 20

tra added a comment to D67509: [CUDA][HIP] Fix hostness of defaulted constructor.

Looks like CUDA test-suite is triggering the assertion added by this patch:

Fri, Sep 20, 2:55 PM · Restricted Project

Thu, Sep 19

tra accepted D67509: [CUDA][HIP] Fix hostness of defaulted constructor.

LGTM. Thank you!

Thu, Sep 19, 2:34 PM · Restricted Project
tra added inline comments to D67509: [CUDA][HIP] Fix hostness of defaulted constructor.
Thu, Sep 19, 2:06 PM · Restricted Project
tra added inline comments to D67509: [CUDA][HIP] Fix hostness of defaulted constructor.
Thu, Sep 19, 12:15 PM · Restricted Project
tra accepted D67509: [CUDA][HIP] Fix hostness of defaulted constructor.
Thu, Sep 19, 10:26 AM · Restricted Project
tra added inline comments to D67509: [CUDA][HIP] Fix hostness of defaulted constructor.
Thu, Sep 19, 9:11 AM · Restricted Project

Wed, Sep 18

tra accepted D67730: [CUDA][HIP] Fix typo in `BestViableFunction`.
Wed, Sep 18, 3:38 PM · Restricted Project, Restricted Project
tra added a comment to D67730: [CUDA][HIP] Fix typo in `BestViableFunction`.

LGTM. You may want to wait a bit for Justin's feedback, in case he has some concerns.

Wed, Sep 18, 3:38 PM · Restricted Project, Restricted Project
tra added a reviewer for D67730: [CUDA][HIP] Fix typo in `BestViableFunction`: jlebar.
Wed, Sep 18, 3:37 PM · Restricted Project, Restricted Project

Fri, Sep 13

tra added inline comments to D67487: [CodeEmitter] Support instruction widths > 64 bits.
Fri, Sep 13, 9:46 AM · Restricted Project
tra added inline comments to D67414: [AST] Treat "inline gnu_inline" the same way as "extern inline gnu_inline" in C++ mode.
Fri, Sep 13, 9:26 AM · Restricted Project

Thu, Sep 12

tra added a comment to D67509: [CUDA][HIP] Fix hostness of defaulted constructor.

Example of the actual error produced by clang: https://godbolt.org/z/Dl1FfC

Thu, Sep 12, 12:20 PM · Restricted Project
tra added a reviewer for D67509: [CUDA][HIP] Fix hostness of defaulted constructor: jlebar.
Thu, Sep 12, 12:04 PM · Restricted Project

Tue, Sep 3

tra added inline comments to D67130: [NVPTX] Add activemask intrinsic..
Tue, Sep 3, 3:55 PM · Restricted Project
tra committed rGce94ec661f53: [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+ (authored by tra).
[CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+
Tue, Sep 3, 10:34 AM
tra committed rL370793: Request commit access for tra.
Request commit access for tra
Tue, Sep 3, 10:34 AM
tra committed rL370792: [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+.
[CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+
Tue, Sep 3, 10:34 AM
tra closed D66665: [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+.
Tue, Sep 3, 10:34 AM · Restricted Project

Aug 23 2019

tra created D66665: [CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+.
Aug 23 2019, 10:16 AM · Restricted Project
tra committed rG4c9d2ab145ad: Fixed a typo. (authored by tra).
Fixed a typo.
Aug 23 2019, 9:26 AM
tra committed rL369777: Fixed a typo..
Fixed a typo.
Aug 23 2019, 9:25 AM
tra added inline comments to D25845: [CUDA] Ignore implicit target attributes during function template instantiation. .
Aug 23 2019, 9:25 AM · Restricted Project

Aug 12 2019

tra added inline comments to D25845: [CUDA] Ignore implicit target attributes during function template instantiation. .
Aug 12 2019, 11:18 AM · Restricted Project

Aug 1 2019

tra accepted D65612: Handle casts changing pointer size in the vectorizer.
Aug 1 2019, 5:03 PM · Restricted Project
tra added inline comments to D65612: Handle casts changing pointer size in the vectorizer.
Aug 1 2019, 4:27 PM · Restricted Project
tra accepted D65600: Relax load store vectorizer pointer strip checks.
Aug 1 2019, 3:04 PM · Restricted Project
tra added inline comments to D65600: Relax load store vectorizer pointer strip checks.
Aug 1 2019, 2:33 PM · Restricted Project
tra added a comment to D65600: Relax load store vectorizer pointer strip checks.

It looks to me that the root cause is in

Aug 1 2019, 1:42 PM · Restricted Project

Jul 30 2019

tra accepted D65468: [NVPTX] Fix PR41651.
Jul 30 2019, 12:47 PM · Restricted Project
tra added inline comments to D65468: [NVPTX] Fix PR41651.
Jul 30 2019, 12:27 PM · Restricted Project
tra added a reviewer for D65468: [NVPTX] Fix PR41651: tra.
Jul 30 2019, 12:21 PM · Restricted Project

Jul 18 2019

tra closed D64588: Minor cleanup..

Landed ir r365913

Jul 18 2019, 9:39 AM · Restricted Project

Jul 12 2019

tra committed rGd9963b627ad2: Minor cleanup. (authored by tra).
Minor cleanup.
Jul 12 2019, 9:14 AM
tra committed rL365913: Minor cleanup..
Minor cleanup.
Jul 12 2019, 9:14 AM

Jul 11 2019

tra updated the summary of D64588: Minor cleanup..
Jul 11 2019, 2:22 PM · Restricted Project
tra created D64588: Minor cleanup..
Jul 11 2019, 1:32 PM · Restricted Project

Jul 10 2019

tra accepted D64364: [HIP] Add GPU arch gfx1010, gfx1011, and gfx1012.
Jul 10 2019, 3:42 PM · Restricted Project

Jun 26 2019

tra added inline comments to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
Jun 26 2019, 8:52 AM · Restricted Project

Jun 25 2019

tra accepted D62738: [HIP] Support attribute hip_pinned_shadow.

LGTM. Thank you!

Jun 25 2019, 1:35 PM · Restricted Project
tra added a comment to D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG)..

Hi Sam,
The compiler generates metadata for the first 48 bytes. I compiled a sample code and verified it. The backend does nothing for the extra bytes now.
I will soon submit the backend patch to generate the new metadata.

Jun 25 2019, 10:24 AM · Restricted Project, Restricted Project

Jun 24 2019

tra added inline comments to D63209: [AMDGPU] gfx1010 wave32 clang support.
Jun 24 2019, 2:41 PM · Restricted Project
tra added inline comments to D62738: [HIP] Support attribute hip_pinned_shadow.
Jun 24 2019, 10:39 AM · Restricted Project

Jun 19 2019

tra accepted D63277: [CUDA][HIP] Don't set "comdat" attribute for CUDA device stub functions..
Jun 19 2019, 12:32 PM · Restricted Project, Restricted Project
tra added a comment to D62738: [HIP] Support attribute hip_pinned_shadow.

Overall looks good. Thank you for making the change.

Jun 19 2019, 11:05 AM · Restricted Project
tra added a comment to D63277: [CUDA][HIP] Don't set "comdat" attribute for CUDA device stub functions..

SGTM in principle. Folding the stubs would be bad as their addresses are implicitly used to identify the kernels to launch.

Jun 19 2019, 10:39 AM · Restricted Project, Restricted Project

Jun 17 2019

tra added a comment to D63097: [cmake] Don't add Support/Testing library if tests are not included.

LGTM, but I'm not familiar with the use of the stuff under Testing/. I've added @zturner who did some work there.

Jun 17 2019, 12:16 PM · Restricted Project
tra added a reviewer for D63097: [cmake] Don't add Support/Testing library if tests are not included: zturner.
Jun 17 2019, 12:16 PM · Restricted Project
tra added inline comments to D63097: [cmake] Don't add Support/Testing library if tests are not included.
Jun 17 2019, 10:42 AM · Restricted Project

Jun 14 2019

tra accepted D63335: [HIP] Add the interface deriving the stub name of device kernels..

LGTM. This is a cleaner way to provide stub name tweaks.

Jun 14 2019, 3:19 PM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..

I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?

Jun 14 2019, 3:05 PM · Restricted Project, Restricted Project
tra requested changes to D63335: [HIP] Add the interface deriving the stub name of device kernels..

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.

Jun 14 2019, 2:29 PM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..

Is it OK for us to mangle __device_stub __ as the nested name into the original one, says, we prepend _ZN15__device_stub__E, so that we have _ZN15__device_stub__E10kernelfuncIiEvv

and

$ c++filt _ZN15__device_stub__E10kernelfuncIiEvv
__device_stub__(kernelfunc<int>, void, void)
Jun 14 2019, 2:19 PM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..
In D63335#1543854, @tra wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.

Jun 14 2019, 2:15 PM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

Jun 14 2019, 10:12 AM · Restricted Project, Restricted Project
tra added a comment to D63335: [HIP] Add the interface deriving the stub name of device kernels..

Is there particular reason you need to switch to this naming scheme?

Jun 14 2019, 9:55 AM · Restricted Project, Restricted Project

Jun 13 2019

tra added a comment to D63210: gn build: Add NVPTX target.

Yay! Thank you. Lack of this has stopped me when I tried gn last time. I'll try again once this patch lands.

Jun 13 2019, 11:01 AM · Restricted Project
tra added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
In D62393#1542042, @tra wrote:

C++ volatile will give you that. You also rely on atomicity. C++ volatile does not guarantee that, even if NVPTX does happen to. It's a mere coincidence. What if next PTX revision provides a better way to implement C++ volatile without providing atomicity? Then your code will all of a sudden break in new and exciting ways.

I think it's completely healthy and fair to be skeptical that we'll do the right thing by you, but FWIW the changes I want to make to CUDA volatile after we ship std::atomic<T> go in the other direction. And the legacy requirements are pretty entrenched here, too.

Jun 13 2019, 10:45 AM · Restricted Project
tra added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
In D62393#1541969, @tra wrote:

@reames , @tra , @Hahnfeld , @jlebar , @chandlerc, I see that this was discussed in D50391 (in terms of using PTX's volatile to provide atomic lowering), but it's not clear to me that using volatile with atomic semantics in LLVM is something we guarantee will work (even in CUDA mode). I imagine that we'd need to lower these in terms of relaxed atomics in Clang for this to really be guaranteed to work correctly.

Do I understand it correctly that the argument is about using C++ volatile with the assumption that those will map to ld.volatile/st.volatile, which happen to provide sufficiently strong guarantees to be equivalent of LLVM's atomic monotonic operations?

If that's the case, then I would agree that it's an implementation detail one should not be relying on. If atomicity is needed, we should figure out a way to express that in C++.

In practice, the standard C++ library support in cuda is somewhat limited. I don't think atomic<> works, so volatile might be a 'happens to work' short-term workaround. If that's the case, then there should a comment describing what's going on here and TODO to fix it when better support for atomics on GPU is available.

Another option would be to use atomic*() functions provided by CUDA, but those do have limited functionality on older GPUs.

Yet another alternative is to explicitly use ld.volatile/st.volatile from inline assembly. I don't think we have it plumbed as clang builtin.

Artem, thanks for your comment. But we need not only atomicity, bht also we need to command the ptxas compiler to not optimize accesses to parallelLevel.

Jun 13 2019, 10:08 AM · Restricted Project
tra added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

@reames , @tra , @Hahnfeld , @jlebar , @chandlerc, I see that this was discussed in D50391 (in terms of using PTX's volatile to provide atomic lowering), but it's not clear to me that using volatile with atomic semantics in LLVM is something we guarantee will work (even in CUDA mode). I imagine that we'd need to lower these in terms of relaxed atomics in Clang for this to really be guaranteed to work correctly.

Jun 13 2019, 9:15 AM · Restricted Project

Jun 11 2019

tra added a comment to D63164: [HIP] Add option to force lambda nameing following ODR in HIP/CUDA..

So, in short, what you're saying is that lambda type may leak into the mangled name of a __global__ function and ne need to ensure that the mangled name is identical for both host and device, hence the need for consistent naming of lambdas.

Jun 11 2019, 2:55 PM · Restricted Project
tra added a reviewer for D63164: [HIP] Add option to force lambda nameing following ODR in HIP/CUDA.: rsmith.
Jun 11 2019, 2:50 PM · Restricted Project
tra added a reviewer for D62738: [HIP] Support attribute hip_pinned_shadow: jlebar.
Jun 11 2019, 2:23 PM · Restricted Project
tra added a comment to D62738: [HIP] Support attribute hip_pinned_shadow.

So, the only thing this patch appears to do is make everything with this attribute uninitialized on device side and give protected visibility.
If I understand it correctly, you're using the attribute in order to construct something that's sort of opposite of the currently used device vars with host-side shadows. Only now the real variable lives on the host side and it's device side that gets the 'shadow' copy. Do I understand it correctly?

Jun 11 2019, 2:22 PM · Restricted Project
tra added a comment to D62738: [HIP] Support attribute hip_pinned_shadow.

Syntactically the patch looks OK to me, but I think the purpose and meaning of the builtin type should be documented in more details. Based on this patch alone it's not clear at all what it's supposed to be used for and how.

Jun 11 2019, 11:33 AM · Restricted Project

Jun 10 2019

tra accepted D63029: [CUDA] Fix grep pattern in cuda-types.cu.
Jun 10 2019, 9:19 AM · Restricted Project, Restricted Project

Jun 7 2019

tra added a reviewer for D63029: [CUDA] Fix grep pattern in cuda-types.cu: tra.
Jun 7 2019, 3:15 PM · Restricted Project, Restricted Project
tra added inline comments to D63029: [CUDA] Fix grep pattern in cuda-types.cu.
Jun 7 2019, 3:11 PM · Restricted Project, Restricted Project

May 29 2019

tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

I think static __device__ globals would fall into the same category -- nominally they should not be visible outside of device-side object file, but in practice we do need to make them visible from the host side of the same TU.

Are you sure nvcc support accessing static __device__ variables in host code? That would be expensive to implement.

May 29 2019, 1:56 PM · Restricted Project
tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

that should assume that variable is not declared with static. that's also the motivation of this patch.

May 29 2019, 12:31 PM · Restricted Project
tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

NVCC also allows that: https://godbolt.org/z/t78RvM

BTW, that code posted looks quite weird to me, how the code could make sense by return a pointer of device variable? or a pointer of shadow host variable?

May 29 2019, 12:12 PM · Restricted Project
tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

Note for the future -- it would be great if we could finish discussing the patch before landing it.
I would still like to see the host-side test.

May 29 2019, 10:29 AM · Restricted Project
tra added a comment to D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..

thanks, but that static __device__ variable won't have shadow in host anymore.

May 29 2019, 10:29 AM · Restricted Project
tra accepted D62603: [CUDA][HIP] Skip setting `externally_initialized` for static device variables..
May 29 2019, 10:13 AM · Restricted Project

May 28 2019

tra accepted D62483: [CUDA][HIP] Emit dependent libs for host only.
May 28 2019, 9:59 AM · Restricted Project

May 17 2019

tra accepted D62046: [OpenMP][bugfix] Add missing math functions variants for log and abs..

I'd add a comment with a brief explanation for the const variant and a TODO() to remove it.

May 17 2019, 9:32 AM · Restricted Project

May 16 2019

tra added inline comments to D62046: [OpenMP][bugfix] Add missing math functions variants for log and abs..
May 16 2019, 5:03 PM · Restricted Project

May 15 2019

tra added a comment to D61949: [OpenMP][bugfix] Fix issues with C++ 17 compilation when handling math functions.

LGTM.

May 15 2019, 12:12 PM · Restricted Project
tra added inline comments to D61949: [OpenMP][bugfix] Fix issues with C++ 17 compilation when handling math functions.
May 15 2019, 9:43 AM · Restricted Project

May 13 2019

tra accepted D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..

This won't affect CUDA in any way, all we have added is OpenMP specific.

May 13 2019, 2:32 PM · Restricted Project
tra added a comment to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..

As soon as libc++ the limits header included in

__clang_cuda_cmath.h:15
``` is not found:

__clang_cuda_cmath.h:15:10: fatal error: 'limits' file not found
#include <limits>

Not even CUDA works actually so I'm not sure what the best answer to this problem is.
May 13 2019, 1:06 PM · Restricted Project
tra added a comment to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..

Two small changes and then it is fine with me. @tra ?

May 13 2019, 10:32 AM · Restricted Project

May 10 2019

tra added inline comments to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..
May 10 2019, 2:37 PM · Restricted Project
tra added inline comments to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..
May 10 2019, 2:01 PM · Restricted Project
tra added inline comments to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..
May 10 2019, 1:53 PM · Restricted Project
tra added inline comments to D61765: [OpenMP][Clang][BugFix] Split declares and math functions inclusion..
May 10 2019, 1:24 PM · Restricted Project

May 3 2019

tra committed rL359928: [CUDA buildbot] tell libunwind where to find libcxx..
[CUDA buildbot] tell libunwind where to find libcxx.
May 3 2019, 1:38 PM
tra accepted D61474: [CUDA][Clang][Bugfix] Add missing CUDA 9.2 case.
May 3 2019, 9:38 AM · Restricted Project
tra added inline comments to D61458: [hip] Relax CUDA call restriction within `decltype` context..
May 3 2019, 9:24 AM · Restricted Project

May 2 2019

tra committed rG4cbb23502612: [CUDA] Do not pass deprecated option fo fatbinary (authored by tra).
[CUDA] Do not pass deprecated option fo fatbinary
May 2 2019, 3:36 PM
tra committed rL359838: [CUDA] Do not pass deprecated option fo fatbinary.
[CUDA] Do not pass deprecated option fo fatbinary
May 2 2019, 3:35 PM
tra committed rC359838: [CUDA] Do not pass deprecated option fo fatbinary.
[CUDA] Do not pass deprecated option fo fatbinary
May 2 2019, 3:35 PM
tra closed D61470: [CUDA] Do not pass deprecated option fo fatbinary.
May 2 2019, 3:35 PM · Restricted Project
tra created D61470: [CUDA] Do not pass deprecated option fo fatbinary.
May 2 2019, 3:03 PM · Restricted Project
tra added inline comments to D61458: [hip] Relax CUDA call restriction within `decltype` context..
May 2 2019, 2:27 PM · Restricted Project
tra added a comment to D61458: [hip] Relax CUDA call restriction within `decltype` context..
In D61458#1488523, @tra wrote:

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

good point, do we have a dedicated context for sizeof? that make the checking easier.

May 2 2019, 2:02 PM · Restricted Project
tra added a comment to D61458: [hip] Relax CUDA call restriction within `decltype` context..

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

May 2 2019, 1:37 PM · Restricted Project
tra added a reviewer for D61458: [hip] Relax CUDA call restriction within `decltype` context.: jlebar.
May 2 2019, 1:35 PM · Restricted Project
tra accepted D61399: [OpenMP][Clang] Support for target math functions.
May 2 2019, 10:45 AM · Restricted Project

May 1 2019

tra added a comment to D61396: [hip] Fix ambiguity from `>>>` of CUDA..

LGTM, but I've added @rsmith who is way more familiar with this code.

May 1 2019, 1:22 PM · Restricted Project, Restricted Project
tra added a reviewer for D61396: [hip] Fix ambiguity from `>>>` of CUDA.: rsmith.
May 1 2019, 1:16 PM · Restricted Project, Restricted Project