Page MenuHomePhabricator

tra (Artem Belevich)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 8 2015, 1:53 PM (367 w, 1 h)

Recent Activity

Today

tra added a comment to D117137: [Driver] Add CUDA support for --offload param.

LGTM in general, modulo few nits.
Nit: looks like the changes need some clang-formatting.

Thu, Jan 20, 10:54 AM · Restricted Project

Yesterday

tra updated subscribers of D116959: [DWARF] Try not to crash for codes with missing debug information.

I would wonder why NVPTX has this unique codepath

Wed, Jan 19, 10:02 AM · Restricted Project

Thu, Jan 13

tra added a comment to D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions.

I can land the patch on your behalf. Are you OK to use the name/email in this patch or do you prefer to use a different email for the LLVM commit?

Thanks very much. Yes the name/email in the patch is fine.

Thu, Jan 13, 1:38 PM · Restricted Project, Restricted Project
tra committed rGbef3eb83442a: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80… (authored by JackAKirk).
[Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80…
Thu, Jan 13, 1:35 PM
tra closed D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions.
Thu, Jan 13, 1:34 PM · Restricted Project, Restricted Project
tra accepted D117198: [NVPTX] Add version test for sm_75, sm_80, sm_86..
Thu, Jan 13, 10:33 AM · Restricted Project
tra accepted D117204: [NVPTX] Add fmin/fmax.NaN lowering for sm_80+..
Thu, Jan 13, 10:32 AM · Restricted Project

Wed, Jan 12

tra added a comment to D117137: [Driver] Add CUDA support for --offload param.

I think instead of setting the triple directly from the command line, we should start with adding another --cuda-gpu-arch (AKA --offload-arch) variant and derive the triple and other parameters from it.

Wed, Jan 12, 12:14 PM · Restricted Project
tra added a reviewer for D117137: [Driver] Add CUDA support for --offload param: tra.
Wed, Jan 12, 12:09 PM · Restricted Project
tra added a comment to D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions.

I thought I should let you know that I do not have commit access to land the patch. I'm also happy to wait a little longer in case you think other interested parties might still chime in.

Wed, Jan 12, 10:01 AM · Restricted Project, Restricted Project
tra added inline comments to D112718: Add intrinsics and builtins for PTX atomics with semantic orders.
Wed, Jan 12, 9:56 AM · Restricted Project, Restricted Project
tra accepted D117122: [NVPTX] Lower fp16 fminnum, fmaxnum to native on sm_80..
Wed, Jan 12, 9:44 AM · Restricted Project

Tue, Jan 11

tra added inline comments to D112718: Add intrinsics and builtins for PTX atomics with semantic orders.
Tue, Jan 11, 10:59 AM · Restricted Project, Restricted Project
tra added inline comments to D116967: [HIP] Fix device malloc/free.
Tue, Jan 11, 10:21 AM · Restricted Project

Mon, Jan 10

tra accepted D116967: [HIP] Fix device malloc/free.
Mon, Jan 10, 1:42 PM · Restricted Project
tra added inline comments to D112718: Add intrinsics and builtins for PTX atomics with semantic orders.
Mon, Jan 10, 11:47 AM · Restricted Project, Restricted Project
tra accepted D116840: [HIP] Fix device only linking for -fgpu-rdc.
Mon, Jan 10, 10:31 AM · Restricted Project

Fri, Jan 7

tra added inline comments to D116840: [HIP] Fix device only linking for -fgpu-rdc.
Fri, Jan 7, 3:09 PM · Restricted Project
tra added inline comments to D112718: Add intrinsics and builtins for PTX atomics with semantic orders.
Fri, Jan 7, 10:36 AM · Restricted Project, Restricted Project

Thu, Jan 6

tra added inline comments to D112718: Add intrinsics and builtins for PTX atomics with semantic orders.
Thu, Jan 6, 2:16 PM · Restricted Project, Restricted Project
tra added a comment to D114601: Read path to CUDA from env. variable CUDA_PATH on Windows.

Ping. @mojca, do you need help landing the patch?

Yes, please. I don't have commit access yet.
You can attribute it to mojca at macports.org, for example.

Thu, Jan 6, 12:00 PM · Restricted Project
tra accepted D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions.

LGTM.

Thu, Jan 6, 11:29 AM · Restricted Project, Restricted Project

Wed, Jan 5

tra added a comment to D114601: Read path to CUDA from env. variable CUDA_PATH on Windows.

Ping. @mojca, do you need help landing the patch?

Wed, Jan 5, 11:54 AM · Restricted Project
tra added a comment to D116673: [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions.

LGTM overall.

Wed, Jan 5, 11:45 AM · Restricted Project, Restricted Project
tra added inline comments to D116583: Change the default optimisation level of PTXAS from -O0 to -O3. This makes the optimisation levels of PTXAS and the ptxjitcompiler equal (ptxjitcompiler defaults to -O3)..
Wed, Jan 5, 10:52 AM · Restricted Project

Tue, Jan 4

tra committed rGc99b2c63169d: CUDA/HIP: Allow __int128 on the host side (authored by linjamaki).
CUDA/HIP: Allow __int128 on the host side
Tue, Jan 4, 4:10 PM
tra closed D111047: CUDA/HIP: Allow __int128 on the host side.
Tue, Jan 4, 4:10 PM · Restricted Project
tra added inline comments to D116583: Change the default optimisation level of PTXAS from -O0 to -O3. This makes the optimisation levels of PTXAS and the ptxjitcompiler equal (ptxjitcompiler defaults to -O3)..
Tue, Jan 4, 10:44 AM · Restricted Project

Dec 14 2021

tra added inline comments to D115302: GlobalsModRef should treat functions w/o nosync conservatively..
Dec 14 2021, 1:01 PM · Restricted Project
tra committed rG4e94cba5b4e4: [HIPSPV][2/4] Add HIPSPV tool chain (authored by linjamaki).
[HIPSPV][2/4] Add HIPSPV tool chain
Dec 14 2021, 10:23 AM
tra closed D110618: [HIPSPV][2/4] Add HIPSPV tool chain.
Dec 14 2021, 10:23 AM · Restricted Project

Dec 13 2021

tra added a reviewer for D115302: GlobalsModRef should treat functions w/o nosync conservatively.: asbirlea.

Added @asbirlea as a reviewer for the GlobalsModRef tests.

Dec 13 2021, 3:30 PM · Restricted Project
tra updated the diff for D115302: GlobalsModRef should treat functions w/o nosync conservatively..

Reverted the changes in test/Transforms/ObjCARC/basic.ll that are no longer needed.

Dec 13 2021, 3:09 PM · Restricted Project
tra updated the diff for D115302: GlobalsModRef should treat functions w/o nosync conservatively..

Undo ObjC intrinsic properties change. Fixed the tests instead.
Moved nosync check into getModRefInfo, where it makes more
sense as an additional check for potential side effects of the call.

Dec 13 2021, 3:05 PM · Restricted Project

Dec 9 2021

tra added a comment to D115302: GlobalsModRef should treat functions w/o nosync conservatively..

Problem: not all intrinsics have been converted to use default attributes, so it will be a performance regression for them.

This could be a problem, yes. Not sure how much of one it is in practice.

Dec 9 2021, 2:52 PM · Restricted Project
tra added a comment to D115302: GlobalsModRef should treat functions w/o nosync conservatively..

The bug being fixed here isn't exclusive to intrinsics. The following shows the same issue using a spinloop:
Given that, I don't think this whole tangent about the control flow graph and hardcoding which intrinsics are nocallback is relevant to this bug.

Dec 9 2021, 2:43 PM · Restricted Project
tra added a comment to D115302: GlobalsModRef should treat functions w/o nosync conservatively..

The properties we associate (somewhere) with intrinsics are captured by nosync and nocallback. Neither is inherent to intrinsics but we pretend they are and that causes the bug. Only by making these explicit we can fix the bug without just hiding it, and treat functions that have the same properties the same way.

Dec 9 2021, 2:08 PM · Restricted Project
tra added a comment to D115302: GlobalsModRef should treat functions w/o nosync conservatively..

In general, GlobalsAA follows the call graph, and therefore conservatively assumes a call to an external function can call back into the current module.

The callgraph code special-cases intrinsics, though. Except for a few specific intrinsics, we assume intrinsics can't call back into the current module. See Intrinsic::isLeaf.

Dec 9 2021, 10:25 AM · Restricted Project

Dec 8 2021

tra added a comment to D115302: GlobalsModRef should treat functions w/o nosync conservatively..

It seems the function w/o nosycn is already treated as potential clobber of the global while the intrinsic is not. This patch now "unifies" it by looking for nosync at the call site, but maybe we should not pretend intrinsics are special.
Here is your original example with a function instead of an intrinsic, all seems to work just fine: https://godbolt.org/z/7cn8jKz17

Dec 8 2021, 4:30 PM · Restricted Project
tra added a comment to D115302: GlobalsModRef should treat functions w/o nosync conservatively..

But then I'm confused. The code that is patched here is generic for calls, no? I think this is a particular intrinsic issue.

Dec 8 2021, 3:45 PM · Restricted Project
tra added a comment to D115302: GlobalsModRef should treat functions w/o nosync conservatively..

EDIT:
Second thought, this does not fix my bug: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154185.html

Dec 8 2021, 3:07 PM · Restricted Project
tra updated the diff for D115302: GlobalsModRef should treat functions w/o nosync conservatively..

Use CallBase::hasFnAttr

Dec 8 2021, 2:58 PM · Restricted Project
tra updated subscribers of D115302: GlobalsModRef should treat functions w/o nosync conservatively..
Dec 8 2021, 2:22 PM · Restricted Project
tra updated the diff for D115302: GlobalsModRef should treat functions w/o nosync conservatively..

Updated tests to use nosync attributes where the tests assumed it.
Added nosync to a couple of objc intrinsics.

Dec 8 2021, 2:16 PM · Restricted Project
tra updated the diff for D115302: GlobalsModRef should treat functions w/o nosync conservatively..

Renamed the test.

Dec 8 2021, 12:15 PM · Restricted Project
tra retitled D115302: GlobalsModRef should treat functions w/o nosync conservatively. from [AA] Teach AA about convergent instrinsics that affect loads/stores. to GlobalsModRef should treat functions w/o nosync conservatively..
Dec 8 2021, 12:06 PM · Restricted Project
tra added a comment to D115302: GlobalsModRef should treat functions w/o nosync conservatively..

I don't think this should be part of the main AA implementation, but rather be sunk into the AA provider that is producing incorrect results (here GlobalModRef, unless BasicAA also produces an incorrect result?)

Dec 8 2021, 12:04 PM · Restricted Project
tra updated the diff for D115302: GlobalsModRef should treat functions w/o nosync conservatively..

Moved the check to GlobalModRef and switched to checking for nosync

Dec 8 2021, 11:57 AM · Restricted Project

Dec 7 2021

tra requested review of D115302: GlobalsModRef should treat functions w/o nosync conservatively..
Dec 7 2021, 4:03 PM · Restricted Project
tra added a comment to D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.

I don't think --offload= is restricted to be specified only once. The test checks --offload-arch= and --offload= are mutually exclusive.

Dec 7 2021, 1:31 PM · Restricted Project
tra added a comment to D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

--offload-arch= only accepts GPU arch which is translated to processor option (-mcpu= or -march=) in clang -cc1. spirv is a target triple which is not suitable for --offload-arch=.

--offload= is supposed to cover both target triple and processor with some flexibility. If only target triple is specified, it assumes default processor. If only processor is specified, it deduces target triple. It also allows both triple and processor. In this case, --offload=spirv translates to -triple spirv -mcpu=generic.

Dec 7 2021, 10:53 AM · Restricted Project

Dec 6 2021

tra accepted D114812: [CUDA][HIP] Add pre-defined macro `__CLANG_RDC__`.
Dec 6 2021, 3:42 PM · Restricted Project
tra committed rGabbdc13e6803: [CUDA][SPIRV] Use OpenCLKernel CC for CUDA -> SPIRV (authored by dcastagna).
[CUDA][SPIRV] Use OpenCLKernel CC for CUDA -> SPIRV
Dec 6 2021, 3:12 PM
tra closed D114407: [CUDA][SPIRV] Use OpenCLKernel CC for CUDA -> SPIRV.
Dec 6 2021, 3:12 PM · Restricted Project
tra accepted D115039: [HIP] Fix -fgpu-rdc for Windows.
Dec 6 2021, 12:44 PM · Restricted Project
tra added a comment to D114601: Read path to CUDA from env. variable CUDA_PATH on Windows.

What can/should I do next in order to proceed with this?

Dec 6 2021, 11:20 AM · Restricted Project
tra accepted D114407: [CUDA][SPIRV] Use OpenCLKernel CC for CUDA -> SPIRV.
Dec 6 2021, 10:41 AM · Restricted Project
tra added a comment to D115039: [HIP] Fix -fgpu-rdc for Windows.

Put __hip_gpubin_handle in comdat when it has linkonce_odr linkage.

Dec 6 2021, 10:40 AM · Restricted Project
tra accepted D114367: [NVPTX] Auto-generate tests for sufrace and texture instructions.
Dec 6 2021, 10:30 AM · Restricted Project
tra accepted D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.

Note to self: don't forget to hit "submit". The comments below have been left unsubmitted for two weeks. Sorry about that.

Dec 6 2021, 10:17 AM · Restricted Project

Nov 30 2021

tra added a comment to D114812: [CUDA][HIP] Add pre-defined macro `__CLANG_RDC__`.

I am not sure whether we want to define a similar macro for cuda-clang.

Maybe __CLANG_RDC__ is better?

Nov 30 2021, 12:06 PM · Restricted Project

Nov 29 2021

tra requested changes to D114326: Update the list of CUDA versions up to 11.5.

With D114601, this patch would no longer be needed.

Nov 29 2021, 1:21 PM · Restricted Project
tra accepted D114601: Read path to CUDA from env. variable CUDA_PATH on Windows.
Nov 29 2021, 1:20 PM · Restricted Project
tra added inline comments to D114601: Read path to CUDA from env. variable CUDA_PATH on Windows.
Nov 29 2021, 1:20 PM · Restricted Project
tra added a comment to D114326: Update the list of CUDA versions up to 11.5.

@tra: this is not yet 100% ready since the unit tests are now failing (expecting to find CUDA 8.0).
I can fix the unit test, but I suppose that someone needs to install additional SDK somewhere into the infrastructure as well?

Nov 29 2021, 1:14 PM · Restricted Project
tra added a comment to D114367: [NVPTX] Auto-generate tests for sufrace and texture instructions.

Looks good overall. My main concern is with the test reaching out to LLVM sources and parsing tablegen files.

Nov 29 2021, 11:58 AM · Restricted Project
tra added inline comments to D114601: Read path to CUDA from env. variable CUDA_PATH on Windows.
Nov 29 2021, 11:29 AM · Restricted Project
tra accepted D114326: Update the list of CUDA versions up to 11.5.
Nov 29 2021, 10:40 AM · Restricted Project
tra added a comment to D114326: Update the list of CUDA versions up to 11.5.

Somewhat off-topic from a discussion earlier in the thread.
What's the purpose of the following code then if users are supposed to explicitly specify the -L flag anyway?

Nov 29 2021, 10:40 AM · Restricted Project

Nov 24 2021

tra accepted D113653: [NVPTX][AsmPrinter] Avoid removing globals before calling AsmPrinter::doFinalization().
Nov 24 2021, 5:30 PM · Restricted Project
tra added inline comments to D114326: Update the list of CUDA versions up to 11.5.
Nov 24 2021, 5:29 PM · Restricted Project
tra added inline comments to D114326: Update the list of CUDA versions up to 11.5.
Nov 24 2021, 10:36 AM · Restricted Project

Nov 22 2021

tra added a comment to D111023: [ConstantFold] Refactor load folding.

Good news this pass does not appear to be the culprit. The miscompilation happens with earlier builds.
a9bceb2b059dc24870882a71baece895fe430107 from before this patch landed already has the issue.

Nov 22 2021, 5:03 PM · Restricted Project
tra added a comment to D111023: [ConstantFold] Refactor load folding.

FYI, there's a miscompilation apparently triggered by this change. Not sure yet whether it's the source of the problem or just exposed it.
https://godbolt.org/z/PvsE4Ybqr

Nov 22 2021, 3:50 PM · Restricted Project
tra added a comment to D114367: [NVPTX] Auto-generate tests for sufrace and texture instructions.

Currently the test takes ~2 seconds on my machine.
There are 508 unique variants, but some are duplicated for cuda and nvcl triples, so there are total 847 test cases.

Nov 22 2021, 1:46 PM · Restricted Project
tra updated subscribers of D110549: [HIPSPV][1/4] Refactor HIP tool chain.
Nov 22 2021, 1:11 PM · Restricted Project
tra accepted D110618: [HIPSPV][2/4] Add HIPSPV tool chain.

LGTM in general, modulo push_back/append nits.

Nov 22 2021, 1:01 PM · Restricted Project
tra added a comment to D114367: [NVPTX] Auto-generate tests for sufrace and texture instructions.

Awsome! I'll go over the script in a couple of days.
Meanwhile, could you post a representative example of the code this file generates?

Nov 22 2021, 12:30 PM · Restricted Project
tra added inline comments to D114326: Update the list of CUDA versions up to 11.5.
Nov 22 2021, 12:20 PM · Restricted Project

Nov 18 2021

tra accepted D107054: [Clang][CUDA] Add descriptors, mappings, and features for missing CUDA and PTX versions.

I think this patch has been obsoleted by https://reviews.llvm.org/D113249 which has already landed.

Nov 18 2021, 2:10 PM · Restricted Project, Restricted Project

Nov 16 2021

tra added inline comments to D114015: [Loads] Handle addrspacecast constant expressions when determining dereferenceability.
Nov 16 2021, 11:59 AM · Restricted Project

Nov 11 2021

tra added a comment to D113653: [NVPTX][AsmPrinter] Avoid removing globals before calling AsmPrinter::doFinalization().

LGTM in general.

Nov 11 2021, 10:19 AM · Restricted Project

Nov 10 2021

tra added a comment to D112466: [NVPTX] Drop memory references of LDG/LDU.

Does it make sense to add similar logic for NVPTX?

Nov 10 2021, 12:16 PM · Restricted Project
tra accepted D112492: [CUDA][HIP] Allow comdat for kernels.
Nov 10 2021, 10:48 AM · Restricted Project

Nov 9 2021

tra added a comment to D112492: [CUDA][HIP] Allow comdat for kernels.

Yes, we do need to merge identical functions with identical names for templates.

Nov 9 2021, 2:45 PM · Restricted Project
tra accepted D112232: [NVPTX] Add imm variants for surface and texture instructions.

The changes look good in general. Thank you for cleaning this up.

Nov 9 2021, 11:28 AM · Restricted Project
tra updated subscribers of D112492: [CUDA][HIP] Allow comdat for kernels.
Nov 9 2021, 11:04 AM · Restricted Project
tra added a reviewer for D113490: [NFC] Let Microsoft mangler accept GlobalDecl: rnk.

+ @rnk as it's a windows-specific change.

Nov 9 2021, 10:48 AM · Restricted Project
tra accepted D113491: [HIP] Fix device stub name for Windows.

LGTM in general.

Nov 9 2021, 10:42 AM · Restricted Project

Nov 8 2021

tra accepted D113249: [CUDA] Bump CUDA version to 11.5.
  • The driver needs to enable ptx75 when it constructs cc1 command line in clang/lib/Driver/ToolChains/Cuda.cpp

@tra Haven't I already done it in line 712? Or where should I enable it?

Nov 8 2021, 10:55 AM · Restricted Project, Restricted Project
tra added a comment to D111443: [Driver] Fix ToolChain::getSanitizerArgs.

I'll defer to @eugenis. Overall it looks OK to be.

Nov 8 2021, 10:46 AM · Restricted Project

Nov 5 2021

tra added a comment to D113249: [CUDA] Bump CUDA version to 11.5.

Experimental support for __int128 is new in CUDA 11.5, not sure if Clang enables this for CUDA.

Nov 5 2021, 11:14 AM · Restricted Project, Restricted Project
tra requested changes to D113249: [CUDA] Bump CUDA version to 11.5.

I think we're missing few more changes here:

Nov 5 2021, 10:33 AM · Restricted Project, Restricted Project

Nov 4 2021

tra added a comment to D112492: [CUDA][HIP] Allow comdat for kernels.

With these changes, we should have consistent name mangling for kernel stubs and kernel launching mechanism on Linux and Windows.

Nov 4 2021, 10:12 AM · Restricted Project

Nov 3 2021

tra added a comment to D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates..

LGTM in general, modulo remaining nits.

Nov 3 2021, 10:37 AM · Restricted Project, Restricted Project
tra added a comment to D112466: [NVPTX] Drop memory references of LDG/LDU.

While ldu does indeed specify that it loads from read-only memory, I do not think we can treat ld.global.nc the same way.
PTX spec says Load register variable d from the location specified by the source address operand a in the global state space, and optionally cache in non-coherent texture cache. Since the cache is non-coherent, the data should be read-only within the kernel's process.

The way I read it -- it's a regular load that bypasses cache.

I'm not sure, but I think the spec says that the load is cached:
Load [...] from the location [...] in the global state space, and optionally cache in non-coherent texture cache.
The problem is that the cache is non-coherent and optional.

Nov 3 2021, 10:23 AM · Restricted Project

Nov 2 2021

tra accepted D113001: [NVPTX] Copy machine operand flags in TII::insertBranch.
Nov 2 2021, 4:45 PM · Restricted Project
tra updated subscribers of D112760: Require 'contract' fast-math flag for FMA generation.
Nov 2 2021, 4:20 PM · Restricted Project
tra added inline comments to D112760: Require 'contract' fast-math flag for FMA generation.
Nov 2 2021, 12:38 PM · Restricted Project
tra added a comment to D112466: [NVPTX] Drop memory references of LDG/LDU.

While ldu does indeed specify that it loads from read-only memory, I do not think we can treat ld.global.nc the same way.
PTX spec says Load register variable d from the location specified by the source address operand a in the global state space, and optionally cache in non-coherent texture cache. Since the cache is non-coherent, the data should be read-only within the kernel's process.

Nov 2 2021, 11:52 AM · Restricted Project