tra (Artem Belevich)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 8 2015, 1:53 PM (145 w, 3 d)

Recent Activity

Tue, Oct 17

tra created D39026: [NVPTX] allow address space inference for volatile loads/stores..
Tue, Oct 17, 5:25 PM
tra requested changes to D39005: [OpenMP] Clean up variable and function names for NVPTX backend.

Justin is right. I completely forgot about this. :-/
Hal offered possible solution: https://reviews.llvm.org/D17738#661115

Tue, Oct 17, 11:03 AM
tra accepted D39005: [OpenMP] Clean up variable and function names for NVPTX backend.
Tue, Oct 17, 9:18 AM

Mon, Oct 16

tra added a comment to D38978: [OpenMP] Enable the lowering of implicitly shared variables in OpenMP GPU-offloaded target regions to the GPU shared memory.

Please add tests for the cases where such local->shaed conversion should and should not happen.
I would appreciate if you could add details on what exactly your passes are supposed to move to shared memory.

Mon, Oct 16, 5:06 PM
tra accepted D38883: [CMake][OpenMP] Customize default offloading arch.
Mon, Oct 16, 1:48 PM

Fri, Oct 13

tra accepted D38901: [CUDA] Require libdevice only if needed.

Looks good. Thank you.

Fri, Oct 13, 1:37 PM
tra added inline comments to D38883: [CMake][OpenMP] Customize default offloading arch.
Fri, Oct 13, 1:13 PM
tra added a comment to D38901: [CUDA] Require libdevice only if needed.

The change could use a test.

Fri, Oct 13, 1:01 PM
tra added inline comments to D38883: [CMake][OpenMP] Customize default offloading arch.
Fri, Oct 13, 11:25 AM
tra added a reviewer for D38883: [CMake][OpenMP] Customize default offloading arch: tra.
Fri, Oct 13, 11:19 AM
tra added inline comments to D38883: [CMake][OpenMP] Customize default offloading arch.
Fri, Oct 13, 11:13 AM
tra added inline comments to D38883: [CMake][OpenMP] Customize default offloading arch.
Fri, Oct 13, 11:11 AM
tra added inline comments to D38883: [CMake][OpenMP] Customize default offloading arch.
Fri, Oct 13, 10:52 AM
tra added inline comments to D38883: [CMake][OpenMP] Customize default offloading arch.
Fri, Oct 13, 9:29 AM

Thu, Oct 12

tra committed rL315624: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions on sm_70.
[CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions on sm_70
Thu, Oct 12, 2:32 PM
tra closed D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70 by committing rL315624: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions on sm_70.
Thu, Oct 12, 2:32 PM
tra committed rL315601: [NVPTX] Implemented wmma intrinsics and instructions..
[NVPTX] Implemented wmma intrinsics and instructions.
Thu, Oct 12, 11:28 AM
tra closed D38645: [NVPTX] Implemented wmma intrinsics and instructions. by committing rL315601: [NVPTX] Implemented wmma intrinsics and instructions..
Thu, Oct 12, 11:28 AM
tra committed rL315598: [TableGen] Allow intrinsics to have up to 8 return values..
[TableGen] Allow intrinsics to have up to 8 return values.
Thu, Oct 12, 10:40 AM
tra closed D38633: [TableGen] Allow intrinsics to have up to 8 return values. by committing rL315598: [TableGen] Allow intrinsics to have up to 8 return values..
Thu, Oct 12, 10:40 AM

Wed, Oct 11

tra updated the diff for D38645: [NVPTX] Implemented wmma intrinsics and instructions..

Added an explanation for WMMA_VARIANT macro and related enum.

Wed, Oct 11, 2:21 PM
tra added a comment to D38645: [NVPTX] Implemented wmma intrinsics and instructions..

We took this approach to reduce the number of intrinsic functions that opt and code-gen has to deal with, for example to have one ld_a_f16 instead of 12.

Wed, Oct 11, 1:58 PM
tra updated the diff for D38645: [NVPTX] Implemented wmma intrinsics and instructions..

Fixed text alignment in IntrinsicsNVVM.td

Wed, Oct 11, 1:41 PM
tra added inline comments to D38645: [NVPTX] Implemented wmma intrinsics and instructions..
Wed, Oct 11, 12:00 PM
tra updated the diff for D38645: [NVPTX] Implemented wmma intrinsics and instructions..

Addressed Justin's comments.

Wed, Oct 11, 11:59 AM
tra added inline comments to D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70.
Wed, Oct 11, 10:13 AM
tra updated the diff for D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70.

Addressed Justin's comments.

Wed, Oct 11, 10:12 AM
tra updated the diff for D38633: [TableGen] Allow intrinsics to have up to 8 return values..

Replaced fixed-size array with SmallVector<>. One less place for making potential buffer overflow error when struct encoding changes.

Wed, Oct 11, 9:40 AM

Tue, Oct 10

tra updated the diff for D38633: [TableGen] Allow intrinsics to have up to 8 return values..

Ugh. Make sure static array to keep per-field info is also sized properly.

Tue, Oct 10, 5:14 PM
tra added a dependency for D38645: [NVPTX] Implemented wmma intrinsics and instructions.: D38633: [TableGen] Allow intrinsics to have up to 8 return values..
Tue, Oct 10, 9:54 AM
tra added a dependent revision for D38633: [TableGen] Allow intrinsics to have up to 8 return values.: D38645: [NVPTX] Implemented wmma intrinsics and instructions..
Tue, Oct 10, 9:54 AM
tra added a dependent revision for D38645: [NVPTX] Implemented wmma intrinsics and instructions.: D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70.
Tue, Oct 10, 9:54 AM
tra added a dependency for D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70: D38645: [NVPTX] Implemented wmma intrinsics and instructions..
Tue, Oct 10, 9:54 AM
tra created D38742: [CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70.
Tue, Oct 10, 9:53 AM

Mon, Oct 9

tra updated the diff for D38645: [NVPTX] Implemented wmma intrinsics and instructions..

Changed names of MMA intrinsics and instructions to use <typeD>.<typeC> order to match nomenclature used in CUDA headers.

Mon, Oct 9, 7:16 PM
tra updated the diff for D38633: [TableGen] Allow intrinsics to have up to 8 return values..

Fixed another place where we're checking the number of fields in a struct.

Mon, Oct 9, 5:20 PM
tra updated the diff for D38645: [NVPTX] Implemented wmma intrinsics and instructions..

No need for Optional<> in getWmmaLdVariant().

Mon, Oct 9, 5:18 PM

Fri, Oct 6

tra created D38645: [NVPTX] Implemented wmma intrinsics and instructions..
Fri, Oct 6, 1:51 PM
tra added inline comments to D38633: [TableGen] Allow intrinsics to have up to 8 return values..
Fri, Oct 6, 1:01 PM
tra added reviewers for D38633: [TableGen] Allow intrinsics to have up to 8 return values.: jlebar, rnk.
Fri, Oct 6, 9:44 AM
tra created D38633: [TableGen] Allow intrinsics to have up to 8 return values..
Fri, Oct 6, 9:37 AM

Wed, Sep 27

tra committed rL314334: [CUDA] Work around conflicting function definitions in CUDA-9 headers..
[CUDA] Work around conflicting function definitions in CUDA-9 headers.
Wed, Sep 27, 12:08 PM
tra closed D38326: [CUDA] Work around conflicting function definitions in CUDA-9 headers. by committing rL314334: [CUDA] Work around conflicting function definitions in CUDA-9 headers..
Wed, Sep 27, 12:08 PM
tra created D38326: [CUDA] Work around conflicting function definitions in CUDA-9 headers..
Wed, Sep 27, 12:01 PM

Tue, Sep 26

tra committed rL314223: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..
[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
Tue, Sep 26, 3:34 PM
tra closed D38191: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..

Landed with fix in r314223.

Tue, Sep 26, 3:30 PM
tra updated the diff for D38191: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..

Added missing return. Tests pass now.

Tue, Sep 26, 3:30 PM
tra reopened D38191: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..
Tue, Sep 26, 3:29 PM

Mon, Sep 25

tra committed rL314135: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..
[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
Mon, Sep 25, 11:56 AM
tra closed D38191: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. by committing rL314135: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..
Mon, Sep 25, 11:55 AM
tra added inline comments to D38191: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..
Mon, Sep 25, 11:30 AM
tra updated the diff for D38191: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..

Addressed Justin's comments.

Mon, Sep 25, 11:30 AM
tra committed rL314129: [CUDA] Fix names of __nvvm_vote* intrinsics..
[CUDA] Fix names of __nvvm_vote* intrinsics.
Mon, Sep 25, 10:57 AM
tra closed D38188: [CUDA] Fix names of __nvvm_vote* intrinsics. by committing rL314129: [CUDA] Fix names of __nvvm_vote* intrinsics..
Mon, Sep 25, 10:57 AM
tra added a comment to D38188: [CUDA] Fix names of __nvvm_vote* intrinsics..

Should we add tests to the test-suite? Or, are these already caught by the existing tests we have?

Mon, Sep 25, 10:55 AM
tra created D38191: [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins..
Mon, Sep 25, 5:02 AM
tra created D38188: [CUDA] Fix names of __nvvm_vote* intrinsics..
Mon, Sep 25, 5:02 AM

Sep 21 2017

tra committed rL313899: [CUDA] Fixed order of words in the names of shfl builtins..
[CUDA] Fixed order of words in the names of shfl builtins.
Sep 21 2017, 11:48 AM
tra closed D38147: [CUDA] Fixed order of words in the names of shfl builtins. by committing rL313899: [CUDA] Fixed order of words in the names of shfl builtins..
Sep 21 2017, 11:48 AM
tra committed rL313898: [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync}….
[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync}…
Sep 21 2017, 11:46 AM
tra closed D38148: [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins. by committing rL313898: [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync}….
Sep 21 2017, 11:46 AM
tra updated the diff for D38148: [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins..

Fixed a typo in one test.

Sep 21 2017, 11:43 AM
tra created D38148: [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins..
Sep 21 2017, 11:33 AM
tra created D38147: [CUDA] Fixed order of words in the names of shfl builtins..
Sep 21 2017, 11:10 AM

Sep 20 2017

tra committed rL313820: [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins..
[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.
Sep 20 2017, 2:24 PM
tra closed D38090: [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins. by committing rL313820: [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins..
Sep 20 2017, 2:24 PM
tra updated the diff for D38090: [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins..

Addressed Justin's comments.

Sep 20 2017, 11:54 AM
tra created D38090: [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins..
Sep 20 2017, 11:21 AM
tra added inline comments to D38040: [OpenMP] Add an additional test for D34888.
Sep 20 2017, 9:41 AM
tra accepted D37913: [OpenMP] Enable the existing nocudalib flag for OpenMP offloading toolchain..

One small nit. LGTM otherwise.

Sep 20 2017, 9:37 AM
tra accepted D37912: [OpenMP] Bugfix: output file name drops the absolute path where full path is needed..
Sep 20 2017, 9:28 AM

Sep 19 2017

tra added a comment to D38040: [OpenMP] Add an additional test for D34888.

LGTM in general.

Sep 19 2017, 10:31 AM
tra added inline comments to D37912: [OpenMP] Bugfix: output file name drops the absolute path where full path is needed..
Sep 19 2017, 10:26 AM
tra added inline comments to D37913: [OpenMP] Enable the existing nocudalib flag for OpenMP offloading toolchain..
Sep 19 2017, 10:16 AM

Sep 18 2017

tra accepted D37914: [OpenMP] Don't throw cudalib not found error if only front-end is required..
Sep 18 2017, 12:07 PM

Sep 15 2017

tra added a comment to D37914: [OpenMP] Don't throw cudalib not found error if only front-end is required..

I'm not particularly familiar with OpenMP internals. Could you elaborate on why libdevice would not be needed with -c for the OpenMP case?
Is that because it would only apply to the host compilation and that nothing will be compiled for the openmp targets?
Does openmp allow separate compilation for the target (i.e. something similar to what --cuda-device-only does?)

Sep 15 2017, 3:20 PM
tra added a comment to D37914: [OpenMP] Don't throw cudalib not found error if only front-end is required..

Now we just need a test case to make sure this works as intended.

Sep 15 2017, 1:48 PM
tra added a comment to D37914: [OpenMP] Don't throw cudalib not found error if only front-end is required..

BTW, at least for CUDA compilation, '-c' would still needs libdevice as device-side will compile PTX to SASS and will need all the symbols PTX may refer to.
Would that not be the case for OpenMP's compilation, too?

Sep 15 2017, 12:09 PM
tra added a comment to D37912: [OpenMP] Bugfix: output file name drops the absolute path where full path is needed..
In D37912#872294, @tra wrote:

Shouldn't this temp .cubin file go into the temporary directory, as opposed to the same directory as the input file?

That is indeed the intention. The filename already contains the "/tmp/" I just make sure that doesn't get dropped.

Sep 15 2017, 12:02 PM
tra added inline comments to D37914: [OpenMP] Don't throw cudalib not found error if only front-end is required..
Sep 15 2017, 11:47 AM
tra added a comment to D37912: [OpenMP] Bugfix: output file name drops the absolute path where full path is needed..

Shouldn't this temp .cubin file go into the temporary directory, as opposed to the same directory as the input file?

Sep 15 2017, 11:41 AM
tra committed rL313369: [CUDA] Work around a new quirk in CUDA9 headers..
[CUDA] Work around a new quirk in CUDA9 headers.
Sep 15 2017, 10:32 AM
tra closed D37906: [CUDA] Work around a new quirk in CUDA9 headers. by committing rL313369: [CUDA] Work around a new quirk in CUDA9 headers..
Sep 15 2017, 10:32 AM
tra added a comment to D37906: [CUDA] Work around a new quirk in CUDA9 headers..

I don't think we really care why they do it for nvcc.
My understanding is that nvcc needs to avoid name clashes between their implementation of functions and the ones that come from the host headers and that's why they have to tread really carefully around host includes.

Sep 15 2017, 9:53 AM
tra created D37906: [CUDA] Work around a new quirk in CUDA9 headers..
Sep 15 2017, 9:33 AM

Sep 7 2017

tra committed rL312734: [CUDA] Added rudimentary support for CUDA-9 and sm_70..
[CUDA] Added rudimentary support for CUDA-9 and sm_70.
Sep 7 2017, 11:16 AM
tra closed D37576: [CUDA] Added rudimentary support for CUDA-9 and sm_70. by committing rL312734: [CUDA] Added rudimentary support for CUDA-9 and sm_70..
Sep 7 2017, 11:16 AM
tra updated the diff for D37576: [CUDA] Added rudimentary support for CUDA-9 and sm_70..

Added tests for sm_70 support.

Sep 7 2017, 10:31 AM
tra created D37576: [CUDA] Added rudimentary support for CUDA-9 and sm_70..
Sep 7 2017, 10:14 AM
tra accepted D37548: [CUDA] When compilation fails, print the compilation mode..

Nice.

Sep 7 2017, 9:27 AM

Sep 6 2017

tra accepted D37540: [CUDA] Tests for device-side overloads of non-placement new/delete..
Sep 6 2017, 5:21 PM
tra accepted D37539: [CUDA] Add device overloads for non-placement new/delete..
Sep 6 2017, 5:20 PM

Jul 24 2017

tra added inline comments to D35703: [GPGPU] Add support for NVIDIA libdevice.
Jul 24 2017, 3:51 PM · Restricted Project

Jul 20 2017

tra added inline comments to D35703: [GPGPU] Add support for NVIDIA libdevice.
Jul 20 2017, 4:01 PM · Restricted Project
tra committed rL308675: [NVPTX] Add lowering of i128 params..
[NVPTX] Add lowering of i128 params.
Jul 20 2017, 2:17 PM
tra closed D34555: [NVPTX] Add lowering of i128 params. by committing rL308675: [NVPTX] Add lowering of i128 params..
Jul 20 2017, 2:17 PM
tra committed rL308671: Changed EOL back to LF. NFC..
Changed EOL back to LF. NFC.
Jul 20 2017, 1:58 PM
tra accepted D34555: [NVPTX] Add lowering of i128 params..
Jul 20 2017, 11:37 AM

Jul 7 2017

tra requested changes to D34555: [NVPTX] Add lowering of i128 params..

Is it safe just to change data layout at clang side? Won't it break backward compatibility?

Jul 7 2017, 9:29 AM

Jul 6 2017

tra accepted D34555: [NVPTX] Add lowering of i128 params..

Nice. With alignment=16 we also benefit from 128-bit loads/stores.

Jul 6 2017, 9:37 AM