Page MenuHomePhabricator

tra (Artem Belevich)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 8 2015, 1:53 PM (335 w, 1 d)

Recent Activity

Mon, Jun 7

tra accepted D103835: [CUDA][HIP] Fix store of vtbl in ctor.
Mon, Jun 7, 12:40 PM · Restricted Project
tra accepted D101630: [HIP] Add --gpu-bundle-output.
Mon, Jun 7, 10:12 AM · Restricted Project

Fri, Jun 4

tra added inline comments to D103108: [CUDA][HIP] Promote const variables to constant.
Fri, Jun 4, 3:55 PM · Restricted Project
tra added a comment to D101630: [HIP] Add --gpu-bundle-output.

But how do we control emitting LLVM IR with or without bundle? -emit-llvm -emit-gpu-object or -emit-llvm -emit-gpu-bundle? -emit-* is usually for specifying a specific file type.

Fri, Jun 4, 10:30 AM · Restricted Project
tra added a comment to D101630: [HIP] Add --gpu-bundle-output.

For sure we will need -fgpu-bundle-device-output to control bundling of intermediate files. Then adding -emit-gpu-object and -emit-gpu-bundle may be redundant and can cause confusion. What if users specify -c -fgpu-bundle-device-output -emit-gpu-object or -c -fno-gpu-bundle-device-output -emit-gpu-bundle? To me a single option -fgpu-bundle-device-output to control all device output seems cleaner.

Fri, Jun 4, 9:57 AM · Restricted Project

Thu, Jun 3

tra accepted D103658: CUDA/HIP: Change device-use-host-var.cu's NOT "external" check to include "addrspace".
Thu, Jun 3, 5:28 PM · Restricted Project
tra added inline comments to D103658: CUDA/HIP: Change device-use-host-var.cu's NOT "external" check to include "addrspace".
Thu, Jun 3, 5:03 PM · Restricted Project
tra accepted D103579: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch.

LGTM for CUDA.

Thu, Jun 3, 1:11 PM · Restricted Project
tra added inline comments to D103579: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch.
Thu, Jun 3, 11:44 AM · Restricted Project
tra added inline comments to D103579: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch.
Thu, Jun 3, 10:42 AM · Restricted Project
tra accepted D103563: [HIP] Fix amdgcn builtin for long type.

Still LGTM.

Thu, Jun 3, 9:56 AM · Restricted Project

Wed, Jun 2

tra accepted D103563: [HIP] Fix amdgcn builtin for long type.
Wed, Jun 2, 3:34 PM · Restricted Project

Tue, Jun 1

tra accepted D103108: [CUDA][HIP] Promote const variables to constant.

I'm done with testing. The patch does not seem to break anything obvious. Tensorflow builds and works.

Tue, Jun 1, 2:46 PM · Restricted Project
tra accepted D103221: [CUDA][HIP] Change default lang std to c++14.
Tue, Jun 1, 2:26 PM · Restricted Project
tra added a comment to D101630: [HIP] Add --gpu-bundle-output.

I think for intermediate outputs e.g. preprocessor expansion, IR, and assembly, probably it makes sense not to bundle by default.

Tue, Jun 1, 2:24 PM · Restricted Project
tra added inline comments to D103221: [CUDA][HIP] Change default lang std to c++14.
Tue, Jun 1, 12:51 PM · Restricted Project
tra added a comment to D103108: [CUDA][HIP] Promote const variables to constant.

LGTM. I would like to test the patch on our code first. Please wait a bit before landing the patch. I should be able to have the results tomorrow.

Tue, Jun 1, 12:40 PM · Restricted Project
tra added a comment to D101630: [HIP] Add --gpu-bundle-output.

How does nvcc --genco behave when there are multiple GPU arch's? Does it output a fat binary containing multiple ISA's? Also, does it support device-only compilation for intermediate outputs?

Tue, Jun 1, 11:57 AM · Restricted Project
tra accepted D103281: [HIP] Fix spack HIP device lib detection.

Context for the changes in this patch: https://reviews.llvm.org/D97340#2775477

Tue, Jun 1, 11:36 AM · Restricted Project
tra added inline comments to D103221: [CUDA][HIP] Change default lang std to c++14.
Tue, Jun 1, 8:55 AM · Restricted Project

Thu, May 27

tra accepted D103221: [CUDA][HIP] Change default lang std to c++14.
Thu, May 27, 10:31 AM · Restricted Project

Wed, May 26

tra added a comment to D103108: [CUDA][HIP] Promote const variables to constant.

Overall looks good, though I've got one more question.

Wed, May 26, 2:37 PM · Restricted Project
tra added inline comments to D103108: [CUDA][HIP] Promote const variables to constant.
Wed, May 26, 10:50 AM · Restricted Project

Tue, May 25

tra accepted D102975: [HIP] Check compatibility of -fgpu-sanitize with offload arch.
Tue, May 25, 2:00 PM · Restricted Project

Mon, May 24

tra committed rG9a75c06cd9d9: [CUDA] Work around compatibility issue with libstdc++ 11.1.0 (authored by tra).
[CUDA] Work around compatibility issue with libstdc++ 11.1.0
Mon, May 24, 11:07 AM
tra closed D102936: [CUDA] Work around compatibility issue with libstdc++ 11.1.0.
Mon, May 24, 11:07 AM · Restricted Project
tra added inline comments to D102975: [HIP] Check compatibility of -fgpu-sanitize with offload arch.
Mon, May 24, 10:56 AM · Restricted Project
tra updated subscribers of D101630: [HIP] Add --gpu-bundle-output.

How about this:
If the user explicitly specified --cuda-host-only or --cuda-device-only, then by default only allow producing the natural output format, unless a bundled output is requested by an option. This should keep existing users working.
If the compilation is done without explicitly requested sub-compilation(s), then bundle the output by default. This should keep the GPU-unaware tools like ccache happy as they would always get the single output they expect.

WDYT?

--cuda-host-only always have one output, therefore there is no point of bundle its output. We only need to decide the proper behavior of --cuda-device-only.

Mon, May 24, 10:41 AM · Restricted Project

Fri, May 21

tra added inline comments to D102936: [CUDA] Work around compatibility issue with libstdc++ 11.1.0.
Fri, May 21, 4:26 PM · Restricted Project
tra added a comment to D102936: [CUDA] Work around compatibility issue with libstdc++ 11.1.0.

You can't use __GLIBCXX__ this way. It will be different for different snapshots from the gcc-11 branch. Some distros are already shipping gcc-11 snapshots with later dates.

I would just check RELEASE == 11. If __failed_assertion is present, you'll rename it. If it's not present, nothing gets renamed but it works anyway.

Fri, May 21, 3:07 PM · Restricted Project
tra updated the diff for D102936: [CUDA] Work around compatibility issue with libstdc++ 11.1.0.

Check only _GLIBCXX_RELEASE

Fri, May 21, 3:01 PM · Restricted Project
tra added inline comments to D102936: [CUDA] Work around compatibility issue with libstdc++ 11.1.0.
Fri, May 21, 12:01 PM · Restricted Project
tra updated the diff for D102936: [CUDA] Work around compatibility issue with libstdc++ 11.1.0.

Fixed typo in push/pop macro name.

Fri, May 21, 12:01 PM · Restricted Project
tra added inline comments to D102936: [CUDA] Work around compatibility issue with libstdc++ 11.1.0.
Fri, May 21, 11:58 AM · Restricted Project
tra requested review of D102936: [CUDA] Work around compatibility issue with libstdc++ 11.1.0.
Fri, May 21, 11:00 AM · Restricted Project

Thu, May 20

tra accepted D102801: [CUDA][HIP] Fix device variables used by host.

I've verified that Tensorflow still builds with this patch and that the patch does fix the regressions we've seen.
If you could land this patch soon, that would be appreciated.

Thu, May 20, 1:31 PM · Restricted Project
tra added a comment to D102801: [CUDA][HIP] Fix device variables used by host.

In the updated patch I have a simpler solution which is easier to explain to the users. Basically we classify variables by how they are emitted: device side only, host side only, both sides as different entities (e.g. default constexpr var), and both sides as unified entity (e.g. managed var). For variables emitted on both sides as separate entities, we have limited knowledge and we limit what we can do for them. I think users should understand the compiler's limitation in such cases. And they can easily workaround that by making the variable explicitly device variable.

Thu, May 20, 11:25 AM · Restricted Project

Wed, May 19

tra added a comment to D102801: [CUDA][HIP] Fix device variables used by host.

This patch does not appear to fix the second regression introduced by the D102237.

Wed, May 19, 4:32 PM · Restricted Project
tra added a comment to D102237: [CUDA][HIP] Fix non-ODR-use of static device variable.

Here's a slightly simpler reproducer: https://godbolt.org/z/rW6P9e37s

I have a fix for this: https://reviews.llvm.org/D102801

Wed, May 19, 4:25 PM · Restricted Project
tra added a reviewer for D102801: [CUDA][HIP] Fix device variables used by host: rsmith.

Tentative LGTM as we need it to fix the regression soon.

Wed, May 19, 1:59 PM · Restricted Project
tra committed rT5a7fde03206e: [test-suite,CUDA] Unbreak the test of new/delete with C++98 (authored by tra).
[test-suite,CUDA] Unbreak the test of new/delete with C++98
Wed, May 19, 9:53 AM
tra committed rTea321f7241a3: [test-suite, CUDA] Work around build break in CUDA tests. (authored by tra).
[test-suite, CUDA] Work around build break in CUDA tests.
Wed, May 19, 9:45 AM

Tue, May 18

tra added a comment to D102237: [CUDA][HIP] Fix non-ODR-use of static device variable.

Here's a slightly simpler reproducer: https://godbolt.org/z/rW6P9e37s

Tue, May 18, 5:44 PM · Restricted Project
tra added a comment to D102237: [CUDA][HIP] Fix non-ODR-use of static device variable.

Here's one example reproducer: https://godbolt.org/z/77M596W89
It's rather hairy, but should be usable for further debugging.

Tue, May 18, 4:50 PM · Restricted Project
tra added a comment to D102237: [CUDA][HIP] Fix non-ODR-use of static device variable.
Tue, May 18, 2:42 PM · Restricted Project
tra added a comment to D102237: [CUDA][HIP] Fix non-ODR-use of static device variable.

Sam, this patch has apparently triggered some unwanted side effects. I'm still reducing the failures to something that could be used for debugging, but the rough symptoms are:

Tue, May 18, 1:56 PM · Restricted Project

Mon, May 17

tra committed rGf226e28a880f: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync… (authored by steffenlarsen).
[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync…
Mon, May 17, 9:48 AM
tra committed rG02c2468864bb: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async… (authored by nyalloc).
[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async…
Mon, May 17, 9:47 AM
tra closed D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.
Mon, May 17, 9:47 AM · Restricted Project, Restricted Project
tra closed D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions.
Mon, May 17, 9:47 AM · Restricted Project, Restricted Project
tra added a comment to D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions.

I'll land this patch along with D100124

Mon, May 17, 9:40 AM · Restricted Project, Restricted Project
tra accepted D102556: [HIP] Fix spack detection.

LGTM with a test nit.

Mon, May 17, 9:13 AM · Restricted Project

Fri, May 14

tra added a comment to D102306: Add gfx1034.

Is there documentation for the mapping from product names to gfx numbers?

See the Processors section of https://llvm.org/docs/AMDGPUUsage.html

Fri, May 14, 2:31 PM · Restricted Project, Restricted Project, Restricted Project
tra updated subscribers of D102507: [HIP] Support <functional> in device code.

In effect this patch applies __host__ __device__ to a subset of the standard library headers and whatever headers *they* happen to include. While it may happen to work, I'm not at all confident that it does not create interesting issues.

Fri, May 14, 9:54 AM

May 13 2021

tra added a comment to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

@tra Thanks a ton for the review! This is my first LLVM patch so I only know as much as the Code Review documentation tells me. Is there a process for chasing up additional reviews?

May 13 2021, 9:57 AM · Restricted Project, Restricted Project

May 12 2021

tra accepted D102270: [CUDA][HIP] Fix device template variables.
May 12 2021, 10:07 AM

May 11 2021

tra accepted D102270: [CUDA][HIP] Fix device template variables.

LGTM in general.
Perhaps it would make sense to combine this patch with D102237 as both patches are changing the same code for the same reason, just for slightly different kinds of variables.

May 11 2021, 12:41 PM
tra accepted D102251: Suppress Deferred Diagnostics in discarded statements..

LGTM for CUDA. This matches the intent of deferred diags -- we only emit them if we get to generate the code for the sources that triggered them, so they should not show up for the false constexpr branches.

May 11 2021, 10:38 AM · Restricted Project
tra accepted D102237: [CUDA][HIP] Fix non-ODR-use of static device variable.

LGTM with few nits.

May 11 2021, 10:37 AM · Restricted Project

May 10 2021

tra added inline comments to D97598: [NFC][AMDGPU] Document the AMDGPU target feature defaults.
May 10 2021, 2:28 PM · Restricted Project
tra added a comment to D101630: [HIP] Add --gpu-bundle-output.

[snip] it is the convention for compiler to have one output.
The compilation is like a pipeline. If we break it into stages, users would expect to use the output from one stage as input for the next stage. This is possible only if there is one output.
Also, when users do not want the output to be bundled, it is usually for debugging or special purposes. Users need to know the naming convention of the multiple outputs. I think it is justifiable to enable this by an option.

May 10 2021, 11:50 AM · Restricted Project

May 4 2021

tra added inline comments to D101630: [HIP] Add --gpu-bundle-output.
May 4 2021, 9:29 AM · Restricted Project

May 3 2021

tra updated subscribers of D101630: [HIP] Add --gpu-bundle-output.

How about an option -fhip-bundle-device-output. If it is on, device output is bundled no matter how many GPU arch there are. By default it is on.

May 3 2021, 10:22 AM · Restricted Project

Apr 30 2021

tra accepted D101654: [HIP] Fix device lib selection.
Apr 30 2021, 12:35 PM · Restricted Project
tra added a comment to D101630: [HIP] Add --gpu-bundle-output.

What will happen with this patch in the following scenarios:

  • --offload_arch=A -S -o out.s
  • --offload_arch=A --offload-arch=B -S -o out.s
Apr 30 2021, 11:35 AM · Restricted Project
tra added a comment to D101630: [HIP] Add --gpu-bundle-output.

CUDA compilation currently errors out if -o is used when more than one output would be produced.
E.g.

% bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c -E 
#... preprocessed output from host and 2 GPU compilations is printed out
Apr 30 2021, 9:35 AM · Restricted Project

Apr 29 2021

tra added a comment to D100060: [zorg] Add HIP builder script.

@tra have you seen this exception in your CUDA buildbots before?
https://lab.llvm.org/staging/#/builders/152/builds/3

Apr 29 2021, 3:20 PM
tra accepted D101575: [zorg] Fix AnnotatedBuilder.py extra_args bug.

LGTM.

Apr 29 2021, 3:15 PM

Apr 26 2021

tra added inline comments to D100404: Add Global support for #pragma clang attributes.
Apr 26 2021, 10:54 AM
tra accepted D98650: [NVPTX] Enable lowering of atomics on local memory.

Few nits. LGTM overall.

Apr 26 2021, 10:22 AM · Restricted Project

Apr 22 2021

tra accepted D100794: [HIP] Support overloaded math functions for hipRTC.
Apr 22 2021, 2:53 PM · Restricted Project
tra accepted D101106: [HIP] Fix overloaded function for _Float16.
Apr 22 2021, 2:46 PM
tra added inline comments to D100060: [zorg] Add HIP builder script.
Apr 22 2021, 1:14 PM
tra added a comment to D99997: [test-suite] Add HIP Tests to External.

Tagging @jdoerfert as HIP CI may be of interest. I wonder if an openmp directory added here could be persuaded to run on nvptx or amdgpu hardware, based on what happens to be in the bot.

Apr 22 2021, 1:03 PM
tra accepted D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

Do you know if any existing code already uses the __nvvm_* builtins for cp.async? In other words, does nvcc provide them already or is it something we're free to name as we wish? I do not see any relevant intrinsics mentioned in NVVM IR spec: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html and I don't think NVCC's builtins are publicly documented anywhere.

I don't know of any yet. We will be using these in the relatively near future, but we can still change them no problem. However, the intrinsic and builtin naming for NVVM and NVPTX seems a bit inconsistent so it may be a long discussion (or maybe not.)

Apr 22 2021, 10:19 AM · Restricted Project, Restricted Project
tra accepted D100060: [zorg] Add HIP builder script.

General style nit: the script has very inconsistent quoting for the variables. They are quoted in some places but not others.

Apr 22 2021, 9:54 AM

Apr 21 2021

tra accepted D99233: [HIP] Add option -fgpu-inline-threshold.
Apr 21 2021, 1:03 PM · Restricted Project
tra added a comment to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.

Do you know if any existing code already uses the __nvvm_* builtins for cp.async? In other words, does nvcc provide them already or is it something we're free to name as we wish?
I do not see any relevant intrinsics mentioned in NVVM IR spec: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html and I don't think NVCC's builtins are publicly documented anywhere.

Apr 21 2021, 11:37 AM · Restricted Project, Restricted Project
tra added a comment to D99233: [HIP] Add option -fgpu-inline-threshold.

The planned new option for offloading will be a more generic solution, however, I expect it will take time to develop and be adopted.

Apr 21 2021, 11:04 AM · Restricted Project
tra added a comment to D100666: [zorg] Support other relative scripts in AnnotatedBuilder.

LGTM. I'll leave the patch approval to @gkistanova .

Apr 21 2021, 10:26 AM

Apr 20 2021

tra accepted D100794: [HIP] Support overloaded math functions for hipRTC.

LGTM.

Apr 20 2021, 4:27 PM · Restricted Project
tra added inline comments to D100666: [zorg] Support other relative scripts in AnnotatedBuilder.
Apr 20 2021, 1:58 PM
tra added inline comments to D100794: [HIP] Support overloaded math functions for hipRTC.
Apr 20 2021, 11:43 AM · Restricted Project
tra added inline comments to D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions.
Apr 20 2021, 10:52 AM · Restricted Project, Restricted Project
tra added inline comments to D100666: [zorg] Support other relative scripts in AnnotatedBuilder.
Apr 20 2021, 10:44 AM
tra accepted D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions.

LGTM overall, modulo few test and naming nits.

Apr 20 2021, 10:02 AM · Restricted Project, Restricted Project
tra added inline comments to D100060: [zorg] Add HIP builder script.
Apr 20 2021, 9:49 AM
tra added inline comments to D100666: [zorg] Support other relative scripts in AnnotatedBuilder.
Apr 20 2021, 9:41 AM
tra added a comment to D100404: Add Global support for #pragma clang attributes.

Sure, it will work for me, though FWIW I think it's the worse option of the 3 potential solutions, and I'll be puzzled if we end with that UI.
Can anyone else speak up and state your opinions, please?

I agree with @palves that this seems the least consistent of the options. Is there an objection to to the more consistent and more intuitive alternative?

Apr 20 2021, 9:34 AM

Apr 19 2021

tra updated subscribers of D100794: [HIP] Support overloaded math functions for hipRTC.

LGTM overall.

Apr 19 2021, 2:58 PM · Restricted Project
tra accepted D98193: [CUDA][HIP] Allow non-ODR use of host var in device.
Apr 19 2021, 10:52 AM · Restricted Project
tra accepted D98193: [CUDA][HIP] Allow non-ODR use of host var in device.

Small test nit. LGTM otherwise.

Apr 19 2021, 9:26 AM · Restricted Project

Apr 16 2021

tra added inline comments to D100609: [Offload][OpenMP][CUDA] Allow fembed-bitcode for device offload.
Apr 16 2021, 2:36 PM · Restricted Project
tra committed rGeaa9ef075d9b: [CUDA, FDO] Filter out profiling options from GPU-side compilations. (authored by tra).
[CUDA, FDO] Filter out profiling options from GPU-side compilations.
Apr 16 2021, 11:36 AM
tra closed D100598: [CUDA, FDO] Filter out profiling options from GPU-side compilations..
Apr 16 2021, 11:36 AM · Restricted Project
tra added inline comments to D100666: [zorg] Support other relative scripts in AnnotatedBuilder.
Apr 16 2021, 11:32 AM
tra requested changes to D100609: [Offload][OpenMP][CUDA] Allow fembed-bitcode for device offload.
Apr 16 2021, 11:23 AM · Restricted Project
tra updated subscribers of D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions.

Overall the patch looks good. We may still need to tweak intrinsic properties later, but this is a good starting point.

Apr 16 2021, 10:55 AM · Restricted Project, Restricted Project
tra added inline comments to D100609: [Offload][OpenMP][CUDA] Allow fembed-bitcode for device offload.
Apr 16 2021, 9:58 AM · Restricted Project
tra accepted D100652: [HIP] Support hipRTC in header.
Apr 16 2021, 9:26 AM · Restricted Project