Page MenuHomePhabricator

tra (Artem Belevich)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 8 2015, 1:53 PM (218 w, 6 d)

Recent Activity

Today

tra added a comment to D59647: [CUDA][HIP] Warn shared var initialization.

This looks like one of the things we should *not* do as it affects correctness -- non-trivial constructor may be arbitrarily complex and the per-TU flag to enable this behavior is way too coarse, IMO.
On the other hand, I can believe that someone somewhere did write the code and relies to NVCC accepting it.

Thu, Mar 21, 11:14 AM · Restricted Project
tra added reviewers for D59647: [CUDA][HIP] Warn shared var initialization: jlebar, rsmith.
Thu, Mar 21, 11:08 AM · Restricted Project

Yesterday

tra added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

This is, or is very similar to, the problem that the host/device overloading addresses in CUDA.

Wed, Mar 20, 10:10 AM · Restricted Project

Tue, Mar 19

tra added inline comments to D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+..
Tue, Mar 19, 11:27 AM · Restricted Project

Mon, Mar 18

tra added inline comments to D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+..
Mon, Mar 18, 5:16 PM · Restricted Project
tra updated the diff for D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+..

Rebased on updated D59389

Mon, Mar 18, 5:09 PM · Restricted Project
tra updated the diff for D59389: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC..
  • Addressed Tim's comments.
Mon, Mar 18, 4:14 PM · Restricted Project
tra updated the diff for D59389: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC..
  • Addressed Tim's comments.
Mon, Mar 18, 4:10 PM · Restricted Project
tra added inline comments to D59389: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC..
Mon, Mar 18, 4:10 PM · Restricted Project

Fri, Mar 15

tra added a comment to D59423: [CUDA][Windows] Partial fix for bug 38811 (Step 2 of 3).

The intent is to avoid unintentional clashes with the preprocessor macros the user may have defined.
https://reviews.llvm.org/rL260647

Fri, Mar 15, 11:59 AM · Restricted Project
tra accepted D59423: [CUDA][Windows] Partial fix for bug 38811 (Step 2 of 3).
Fri, Mar 15, 11:41 AM · Restricted Project
tra added a comment to D59423: [CUDA][Windows] Partial fix for bug 38811 (Step 2 of 3).

Perhaps for consistency sake it would be better to replace __sptr -> __s and __cptr -> __c.

Well, it came from NVIDIA code, you know, I mean all those double underscores.

Fri, Mar 15, 11:40 AM · Restricted Project
tra added a comment to D59423: [CUDA][Windows] Partial fix for bug 38811 (Step 2 of 3).

___ stands out as a sore thumb and raises unnecessary questions -- "why does it have three underscores, while __cptr is fine with two?".
Perhaps for consistency sake it would be better to replace __sptr -> __s and __cptr -> __c.
Given that we're just passing the args through, we don't really need to have ptr here.

Fri, Mar 15, 11:06 AM · Restricted Project

Thu, Mar 14

tra created D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+..
Thu, Mar 14, 3:19 PM · Restricted Project
tra added a parent revision for D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+.: D59389: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC..
Thu, Mar 14, 3:19 PM · Restricted Project
tra added a child revision for D59389: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC.: D59393: [NVPTX] generate correct MMA instruction mnemonics with PTX63+..
Thu, Mar 14, 3:19 PM · Restricted Project
tra created D59389: [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC..
Thu, Mar 14, 2:51 PM · Restricted Project
tra accepted D59361: [CUDA][Windows] Partial fix for bug 38811 (Step 1 of 3).
Thu, Mar 14, 11:40 AM · Restricted Project

Fri, Mar 8

tra added a comment to rL351974: [DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target..

Thank you for landing the workaround!

Fri, Mar 8, 1:58 PM
tra accepted D59148: [NVPTX][DEBUGINFO]Temp fix for crash of ptxas: disable packed bytes in debug sections..

This patch fixes the bug in the ptxas tool with the processing of bytes

Fri, Mar 8, 1:21 PM · Restricted Project
tra added a comment to rL351974: [DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target..

OK. I'll file my own and will add a comment that it may be a known issue.

Fri, Mar 8, 11:27 AM
tra added a comment to rL351974: [DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target..

Do you have NVIDIA's bug number for the crasher?

Fri, Mar 8, 11:16 AM
tra accepted D57250: [DEBUG_INFO][NVPTX]Emit empty .debug_loc section in presence of the debug option..
Fri, Mar 8, 11:11 AM · Restricted Project
tra added a comment to rL351974: [DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target..

Something about DWARF debug info is making ptxas upset to the point of SIGSEV'ing. :-/

Fri, Mar 8, 10:46 AM

Mon, Mar 4

tra updated subscribers of D58917: [HIP] Do not unbundle object files for -fno-gpu-rdc.

The change looks OK as far as regular CUDA is concerned.
That said, I'm not quite familiar with the use of bundling/unbundling actions and you should probably get someone who uses/depends on them to take a look. I think OpenMP uses them. Perhaps @ABataev would be the right person.

Mon, Mar 4, 1:15 PM · Restricted Project, Restricted Project

Tue, Feb 26

tra accepted D58518: [HIP] change kernel stub name.
Tue, Feb 26, 2:06 PM · Restricted Project, Restricted Project
tra accepted D57716: [CUDA][HIP] Check calling convention based on function target.

LGTM.

Tue, Feb 26, 1:43 PM · Restricted Project
tra added a comment to D58463: [CUDA]Delayed diagnostics for the asm instructions..

E.g.:

namespace {
__host__ __device__ a() {
  int prev;
  __asm__ __volatile__("mov %0, 0" : "=a" (prev)::);
  return prev;
}

__host__ __device__ b() {
  int prev;
  return prev;
}

} //namespace

Ideally we should always emit uninitialized diagnostics for b, but never for a in both host and device compilation modes.
I think we may want to propagate assignment from the inline asm statement -- we may not know the meaning of the constraint, but we do know which argument gets used/modified by the asm statement. Perhaps we can construct a fake GCCAsmStmt but bail out before we attempt to validate the asm string.

But it is going to be emitted for b() if b() is really used on the host or on the device.

Tue, Feb 26, 12:03 PM · Restricted Project, Restricted Project
tra added a comment to D58463: [CUDA]Delayed diagnostics for the asm instructions..

Hi Artem, I think we can just delay emission of this warning to solve this problem.

Tue, Feb 26, 11:37 AM · Restricted Project, Restricted Project
tra added a comment to D58463: [CUDA]Delayed diagnostics for the asm instructions..

There's a new quirk we've ran into after this patch landed. Consider this code:

int foo() {
  int prev;
  __asm__ __volatile__("whatever" : "=a" (prev)::);
  return prev;
}
Tue, Feb 26, 10:39 AM · Restricted Project, Restricted Project

Fri, Feb 22

tra requested changes to D58518: [HIP] change kernel stub name.
Fri, Feb 22, 2:20 PM · Restricted Project, Restricted Project
tra added a comment to D58539: [NVPTX] Fixed param symbol name lowering in unnamed funcs. Fixes PR40817.

Could you add a test to demonstrate the issue and verify that it is fixed?

Fri, Feb 22, 9:35 AM · Restricted Project

Thu, Feb 21

tra added inline comments to D57716: [CUDA][HIP] Check calling convention based on function target.
Thu, Feb 21, 2:25 PM · Restricted Project
tra added a comment to D58518: [HIP] change kernel stub name.

To clarify, I am saying that the stub does have a different name since it is conceptually part of the implementation of doing the call to the device function implementation, and is not in fact the the device function being called itself. However, when we generate code for a function that is present on both the host and device, both copies of the code are for the same source level function and so can have the same symbol name (which was a question that was asked)

Thu, Feb 21, 1:01 PM · Restricted Project, Restricted Project
tra added a comment to D58518: [HIP] change kernel stub name.

Yes this relates to supporting the debugger.

For the same function being present on both host and device, having the same name is correct as the debugger must set a breakpoint at both places. This is similar to needing to set a breakpoint at every place a function is inlined.

Thu, Feb 21, 11:56 AM · Restricted Project, Restricted Project
tra accepted D58518: [HIP] change kernel stub name.

My guess is that this is needed because HIP debugger can see symbols from both host and device executables at the same time. Is that so?

Thu, Feb 21, 11:08 AM · Restricted Project, Restricted Project

Wed, Feb 20

tra accepted D58463: [CUDA]Delayed diagnostics for the asm instructions..

Thank you.

Wed, Feb 20, 2:07 PM · Restricted Project, Restricted Project

Feb 19 2019

tra added inline comments to D58243: [OPENMP] Delay emission of the asm target-specific error messages..
Feb 19 2019, 12:57 PM · Restricted Project, Restricted Project
tra updated subscribers of D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Okay, but it's not great design to have a kind of overloading that can't be resolved to an exact intended declaration even by an explicit cast. That's why I think making *optional* host/device typing is a good idea. And I strongly want to caution you against doing language design by just incrementally hacking at the compiler to progressively make more test-cases work, which is what it feels like you're doing.

Feb 19 2019, 11:03 AM · Restricted Project
tra added inline comments to D58243: [OPENMP] Delay emission of the asm target-specific error messages..
Feb 19 2019, 10:45 AM · Restricted Project, Restricted Project

Feb 14 2019

tra added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

That said, does CUDA have a general rule resolving __host__ vs. __device__ overloads based on context? And does it allow overloading based solely on __host__ vs. __device__?

Feb 14 2019, 10:57 AM · Restricted Project

Feb 13 2019

tra accepted D58163: [CUDA][HIP] Use device side kernel and variable names when registering them.

Thank you. LGTM.

Feb 13 2019, 11:45 AM · Restricted Project
tra accepted D58163: [CUDA][HIP] Use device side kernel and variable names when registering them.
Feb 13 2019, 11:26 AM · Restricted Project
tra added inline comments to D58163: [CUDA][HIP] Use device side kernel and variable names when registering them.
Feb 13 2019, 10:27 AM · Restricted Project

Feb 7 2019

tra accepted D57908: [SEMA]Generalize deferred diagnostic interface, NFC..

LGTM. I've added jlebar@ as he's originally written the code.

Feb 7 2019, 10:59 AM · Restricted Project, Restricted Project
tra added a reviewer for D57908: [SEMA]Generalize deferred diagnostic interface, NFC.: jlebar.
Feb 7 2019, 10:37 AM · Restricted Project, Restricted Project

Feb 6 2019

tra added a comment to D57829: [HIP] Disable emitting llvm.linker.options in device compilation.
In D57829#1387412, @tra wrote:

Could you elaborate on why you want to disable this metadata? I think the original idea of llvm.linker.options was that it should be ignored if the back-end does not support it.

If backend does not support it, it goes to TargetLoweringObjectFileELF::emitModuleMetadata and causes codegen to fail.

Feb 6 2019, 11:29 AM
tra added a comment to D57829: [HIP] Disable emitting llvm.linker.options in device compilation.

Could you elaborate on why you want to disable this metadata? I think the original idea of llvm.linker.options was that it should be ignored if the back-end does not support it.

Feb 6 2019, 11:12 AM

Feb 5 2019

tra committed rG4071763bb881: Basic CUDA-10 support. (authored by tra).
Basic CUDA-10 support.
Feb 5 2019, 2:39 PM
tra committed rC353232: Basic CUDA-10 support..
Basic CUDA-10 support.
Feb 5 2019, 2:39 PM
tra committed rL353232: Basic CUDA-10 support..
Basic CUDA-10 support.
Feb 5 2019, 2:39 PM
tra closed D57771: [CUDA] Add basic support for CUDA-10.1.
Feb 5 2019, 2:39 PM · Restricted Project
tra updated the diff for D57771: [CUDA] Add basic support for CUDA-10.1.

Made a comment more readable.

Feb 5 2019, 1:53 PM · Restricted Project
tra added inline comments to D57716: [CUDA][HIP] Check calling convention based on function target.
Feb 5 2019, 1:44 PM · Restricted Project
tra updated the diff for D57771: [CUDA] Add basic support for CUDA-10.1.

Make the function object local.

Feb 5 2019, 11:02 AM · Restricted Project
tra created D57771: [CUDA] Add basic support for CUDA-10.1.
Feb 5 2019, 10:58 AM · Restricted Project

Jan 31 2019

tra committed rC352799: [CUDA] add support for the new kernel launch API in CUDA-9.2+..
[CUDA] add support for the new kernel launch API in CUDA-9.2+.
Jan 31 2019, 1:34 PM
tra committed rL352799: [CUDA] add support for the new kernel launch API in CUDA-9.2+..
[CUDA] add support for the new kernel launch API in CUDA-9.2+.
Jan 31 2019, 1:34 PM
tra closed D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+..
Jan 31 2019, 1:34 PM
tra committed rC352798: [CUDA] Propagate detected version of CUDA to cc1.
[CUDA] Propagate detected version of CUDA to cc1
Jan 31 2019, 1:32 PM
tra committed rL352798: [CUDA] Propagate detected version of CUDA to cc1.
[CUDA] Propagate detected version of CUDA to cc1
Jan 31 2019, 1:32 PM
tra closed D57487: [CUDA] Propagate detected version of CUDA to cc1.
Jan 31 2019, 1:32 PM
tra updated the diff for D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+..

Updated ASTMatchers unit test.

Jan 31 2019, 1:24 PM
tra added inline comments to D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+..
Jan 31 2019, 10:37 AM
tra updated the diff for D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+..

Addressed Justin's comments.

Jan 31 2019, 10:29 AM

Jan 30 2019

tra updated the diff for D57487: [CUDA] Propagate detected version of CUDA to cc1.

Addressed Justin's comments.

Jan 30 2019, 5:12 PM
tra added a comment to D55673: [darwin] parse the SDK settings from SDKSettings.json if it exists and pass in the -target-sdk-version to the compiler and backend.

I would be ok with reusing that option, as long as it's documented that there is a difference in terms of how it can be used.

Jan 30 2019, 4:52 PM
tra updated the diff for D57487: [CUDA] Propagate detected version of CUDA to cc1.

Updated the comment about SDKVersion use.

Jan 30 2019, 4:51 PM
tra updated the summary of D57487: [CUDA] Propagate detected version of CUDA to cc1.
Jan 30 2019, 4:37 PM
tra added a parent revision for D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+.: D57487: [CUDA] Propagate detected version of CUDA to cc1.
Jan 30 2019, 4:37 PM
tra added a child revision for D57487: [CUDA] Propagate detected version of CUDA to cc1: D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+..
Jan 30 2019, 4:37 PM
tra created D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+..
Jan 30 2019, 4:36 PM
tra created D57487: [CUDA] Propagate detected version of CUDA to cc1.
Jan 30 2019, 4:32 PM
tra added a comment to D55673: [darwin] parse the SDK settings from SDKSettings.json if it exists and pass in the -target-sdk-version to the compiler and backend.

Would that be OK to use target_sdk_version to pass *CUDA* SDK version to the CC1 compilations?
I have upcoming changes that need to know the version to generate correct glue IR for CUDA. The driver currently figures out detected CUDA version in lib/Driver/ToolChains/Cuda.cpp and I could use -target-sdk-version to pass it on to CC1 instances.

Jan 30 2019, 10:24 AM
tra added inline comments to D57162: [DEBUG_INFO][NVPTX] Generate correct data about variable address class..
Jan 30 2019, 9:58 AM · Restricted Project

Jan 28 2019

tra added a comment to D57259: [NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly attribute..
Jan 28 2019, 11:11 AM

Jan 25 2019

tra committed rL352256: [NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly….
[NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly…
Jan 25 2019, 4:30 PM
tra closed D57259: [NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly attribute..
Jan 25 2019, 4:30 PM
tra created D57259: [NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly attribute..
Jan 25 2019, 2:25 PM

Jan 14 2019

tra added inline comments to D56654: Update GettingStarted guide to recommend that people use the new official Git repository..
Jan 14 2019, 3:05 PM

Dec 21 2018

tra committed rC349981: [CUDA] Treat extern global variable shadows same as regular extern vars..
[CUDA] Treat extern global variable shadows same as regular extern vars.
Dec 21 2018, 5:15 PM
tra committed rL349981: [CUDA] Treat extern global variable shadows same as regular extern vars..
[CUDA] Treat extern global variable shadows same as regular extern vars.
Dec 21 2018, 5:15 PM
tra closed D56033: [CUDA] Treat extern global variable shadows same as regular extern vars..
Dec 21 2018, 5:14 PM
tra updated the diff for D56033: [CUDA] Treat extern global variable shadows same as regular extern vars..

Fixed a typo.

Dec 21 2018, 4:25 PM
tra accepted D34708: [NVPTX] Allow to make libcalls that are defined in the current module..

Nice. Thank you for adding the tests.
LGTM.

Dec 21 2018, 4:01 PM
tra created D56033: [CUDA] Treat extern global variable shadows same as regular extern vars..
Dec 21 2018, 3:54 PM

Dec 19 2018

tra added a comment to D34708: [NVPTX] Allow to make libcalls that are defined in the current module..

The tests you have only check the end result and do not directly verify the new functionality the patch adds.

Dec 19 2018, 3:50 PM

Dec 18 2018

tra added a comment to D34708: [NVPTX] Allow to make libcalls that are defined in the current module..

You probably want to remove libcalls.patch from the patch diff.

Dec 18 2018, 2:20 PM

Dec 14 2018

tra committed rL349213: [NVPTX] Lower instructions that expand into libcalls..
[NVPTX] Lower instructions that expand into libcalls.
Dec 14 2018, 3:56 PM
tra closed D55145: [NVPTX] Lower instructions that expands into libcalls.
Dec 14 2018, 3:56 PM
tra accepted D55145: [NVPTX] Lower instructions that expands into libcalls.
In D55145#1324230, @tra wrote:

Here the behaviour of unsupported instructions is changed to match the behaviour of explicit intrinsics calls.

If I remember correctly, the crash was only happening for instructions lowered to libcall. Do we crash in any other case?
If libcall is the only source of failure, would it make more sense to bail out of this function early? We're going to fail in this case anyways.

You are right, normally it would make sense. But in this case, the end goal is to support these libcalls.
It would be easier to achieve when both instructions and intrinsics are behaving consistently. And I'm afraid, bailing them out, would be an extra work that will have to be dropped when the final libcall solution will come (means, very-very soon).

Dec 14 2018, 9:29 AM

Dec 13 2018

tra committed rC349087: [CUDA] Make all host-side shadows of device-side variables undef..
[CUDA] Make all host-side shadows of device-side variables undef.
Dec 13 2018, 1:46 PM
tra committed rL349087: [CUDA] Make all host-side shadows of device-side variables undef..
[CUDA] Make all host-side shadows of device-side variables undef.
Dec 13 2018, 1:46 PM
tra closed D55663: [CUDA] Make all host-side shadows of device-side variables undef..
Dec 13 2018, 1:46 PM
tra retitled D55663: [CUDA] Make all host-side shadows of device-side variables undef. from [CUDA] Make all host-side shadows of device-side variables undefined. to [CUDA] Make all host-side shadows of device-side variables undef..
Dec 13 2018, 11:44 AM
tra created D55663: [CUDA] Make all host-side shadows of device-side variables undef..
Dec 13 2018, 11:43 AM

Dec 12 2018

tra committed rL348952: [NVPTX] do not rely on cached subtarget info..
[NVPTX] do not rely on cached subtarget info.
Dec 12 2018, 10:36 AM
tra closed D55580: [NVPTX] do not rely on cached subtarget info..
Dec 12 2018, 10:36 AM

Dec 11 2018

tra created D55580: [NVPTX] do not rely on cached subtarget info..
Dec 11 2018, 2:19 PM

Dec 7 2018

tra accepted D55456: [CUDA] added missing 'inline' for the functions defined in the header..

jlebar@ LGTM'ed via email.
Landed in rL348662

Dec 7 2018, 5:16 PM