tra (Artem Belevich)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 8 2015, 1:53 PM (193 w, 2 d)

Recent Activity

Fri, Sep 21

tra added a comment to D52377: [HIP] Support early finalization of device code.

Overall the patch look OK. I'll take a closer look on Monday.

Fri, Sep 21, 2:02 PM
tra committed rL342752: [CUDA] Fixed parsing of optional template-argument-list..
[CUDA] Fixed parsing of optional template-argument-list.
Fri, Sep 21, 10:50 AM
tra committed rC342752: [CUDA] Fixed parsing of optional template-argument-list..
[CUDA] Fixed parsing of optional template-argument-list.
Fri, Sep 21, 10:50 AM
tra closed D52321: [CUDA] Fixed parsing of optional template-argument-list..
Fri, Sep 21, 10:50 AM
tra updated the diff for D52321: [CUDA] Fixed parsing of optional template-argument-list..

Added '>=' and '>>=' to the list of tokens that may indicate the end of the
empty template argument list.

Fri, Sep 21, 10:40 AM
tra committed rC342749: [CUDA] Ignore uncallable functions when we check for usual deallocators..
[CUDA] Ignore uncallable functions when we check for usual deallocators.
Fri, Sep 21, 10:31 AM
tra committed rL342749: [CUDA] Ignore uncallable functions when we check for usual deallocators..
[CUDA] Ignore uncallable functions when we check for usual deallocators.
Fri, Sep 21, 10:31 AM
tra closed D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators..
Fri, Sep 21, 10:31 AM

Thu, Sep 20

tra created D52321: [CUDA] Fixed parsing of optional template-argument-list..
Thu, Sep 20, 1:03 PM

Wed, Sep 19

tra added a comment to D52259: [CUDA] Rearrange search path ordering to fix two test case failures.

The patch does not seem to match the description and appears to have nothing to do with rearranging include paths. Could you check if these are the changes you intended to send for review.

Wed, Sep 19, 1:10 PM · Restricted Project

Tue, Sep 18

tra updated the diff for D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators..

Renamed last instance of 'Matches' -> 'PreventedBy'.

Tue, Sep 18, 5:19 PM
tra updated the diff for D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators..

Updated assertion message.

Tue, Sep 18, 5:05 PM
tra committed rL342514: [clang-tidy] Replace redundant checks with an assert()..
[clang-tidy] Replace redundant checks with an assert().
Tue, Sep 18, 2:52 PM
tra committed rCTE342514: [clang-tidy] Replace redundant checks with an assert()..
[clang-tidy] Replace redundant checks with an assert().
Tue, Sep 18, 2:52 PM
tra closed D52179: [clang-tidy] Replace redundant checks with an assert()..
Tue, Sep 18, 2:52 PM · Restricted Project

Mon, Sep 17

tra added inline comments to D52179: [clang-tidy] Replace redundant checks with an assert()..
Mon, Sep 17, 4:32 PM · Restricted Project
tra added inline comments to D52179: [clang-tidy] Replace redundant checks with an assert()..
Mon, Sep 17, 4:00 PM · Restricted Project
tra updated the diff for D52179: [clang-tidy] Replace redundant checks with an assert()..
  • Check that D is non-null
Mon, Sep 17, 3:31 PM · Restricted Project
tra added a comment to D52179: [clang-tidy] Replace redundant checks with an assert()..

Is the condition for this assertion checked beforehand or could this create runtime failures?

Mon, Sep 17, 12:57 PM · Restricted Project
tra added a dependency for D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators.: D52179: [clang-tidy] Replace redundant checks with an assert()..
Mon, Sep 17, 11:25 AM
tra added a dependent revision for D52179: [clang-tidy] Replace redundant checks with an assert().: D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators..
Mon, Sep 17, 11:24 AM · Restricted Project
tra updated the diff for D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators..

Addressed Richard's comments.
Moved clang-tidy changes into separate review D52179.

Mon, Sep 17, 11:24 AM
tra created D52179: [clang-tidy] Replace redundant checks with an assert()..
Mon, Sep 17, 11:20 AM · Restricted Project

Thu, Sep 13

tra added a comment to D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators..

@rsmith ping.

Thu, Sep 13, 10:07 AM

Fri, Sep 7

tra added a comment to D51809: [CUDA][HIP] Fix assertion in LookupSpecialMember.

@jlebar Justin, can you take a look?

Fri, Sep 7, 1:33 PM
tra added a reviewer for D51809: [CUDA][HIP] Fix assertion in LookupSpecialMember: jlebar.
Fri, Sep 7, 1:32 PM
tra created D51808: [CUDA] Ignore uncallable functions when we check for usual deallocators..
Fri, Sep 7, 11:46 AM

Tue, Sep 4

tra committed rT341430: [test-suite, CUDA] Update CUDA cmake files..
[test-suite, CUDA] Update CUDA cmake files.
Tue, Sep 4, 4:50 PM
tra committed rL341430: [test-suite, CUDA] Update CUDA cmake files..
[test-suite, CUDA] Update CUDA cmake files.
Tue, Sep 4, 4:44 PM
tra closed D51663: [test-suite, CUDA] Update CUDA test suite cmake files..
Tue, Sep 4, 4:44 PM
tra added a comment to D51663: [test-suite, CUDA] Update CUDA test suite cmake files..

LGTM, sorry for the breakage.

Tue, Sep 4, 4:44 PM
tra updated the diff for D51663: [test-suite, CUDA] Update CUDA test suite cmake files..

Updated the changes according to Matthias' comments.

Tue, Sep 4, 4:41 PM
tra added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().

How about D51663?

Tue, Sep 4, 4:22 PM
tra created D51663: [test-suite, CUDA] Update CUDA test suite cmake files..
Tue, Sep 4, 4:21 PM
tra added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().

Your patch above *almost* works, except that each test variant wants to create the same symlink $BUILD/ExternalCUDA/$TEST.teferece_output -> $SRC/External/CUDA/$TEST.reference_output. If symlinks are created at different points in time ninja manages to avoid conflicts, but typically I get one or two attempts to create the symlink launches simultaneously and one of them fails with "symlink already exists". Perhaps the symlink target name should be uniquified in some way.

Tue, Sep 4, 12:04 PM
tra added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().

Though thinking about it, I don't really understand your problem:

How can there be multiple outputs with the same name? They have different names in the repository (algorithm.reference_output, assert.reference_output, ...)

Tue, Sep 4, 11:41 AM
tra abandoned D51501: [CUDA] Fix CUDA compilation broken by D50845.

Not needed anymore after the reverts in rC341115 and rC341118, right?

Tue, Sep 4, 9:52 AM

Fri, Aug 31

tra added a comment to D51048: cmake: Specify reference outputs in llvm_test_data().

I've ran into an unexpected problem after this patches have landed.

Fri, Aug 31, 5:33 PM
tra accepted D51554: [CUDA][OPENMP][NVPTX]Improve logic of the debug info support..

Nice. So, in effect, for optimized builds we'll generate pre-DWARF line info only, unless --cuda-noopt-device-debug is specified.
Will this deal with the warnings about back-end being unable to handle particular debug options?

Fri, Aug 31, 2:08 PM

Thu, Aug 30

tra added inline comments to D51507: Allow all supportable attributes to be used with #pragma clang attribute..
Thu, Aug 30, 3:06 PM
tra added a comment to D51441: Add predefined macro __gnu_linux__ for proper aux-triple.

Tests reverted in rL341118.

Thu, Aug 30, 1:55 PM
tra added a comment to D51312: [OpenMP][NVPTX] Use appropriate _CALL_ELF macro when offloading.

Reverted in rL341115 & rL341118.

Thu, Aug 30, 1:55 PM
tra committed rC341118: Revert the tests that should've been reverted in rL341115.
Revert the tests that should've been reverted in rL341115
Thu, Aug 30, 1:55 PM
tra committed rL341118: Revert the tests that should've been reverted in rL341115.
Revert the tests that should've been reverted in rL341115
Thu, Aug 30, 1:55 PM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

Tests reverted in rL341118

Thu, Aug 30, 1:54 PM
tra added a comment to D51441: Add predefined macro __gnu_linux__ for proper aux-triple.

Reverted in rL341115.

Thu, Aug 30, 1:47 PM
tra committed rL341115: Reverted the "[CUDA/OpenMP] Define only some host macros during device….
Reverted the "[CUDA/OpenMP] Define only some host macros during device…
Thu, Aug 30, 1:44 PM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

Reverted in rL341115

Thu, Aug 30, 1:44 PM
tra committed rC341115: Reverted the "[CUDA/OpenMP] Define only some host macros during device….
Reverted the "[CUDA/OpenMP] Define only some host macros during device…
Thu, Aug 30, 1:44 PM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

That, and r340967 D51441. I'm running check-clang now and will land reverted changes shortly.

Thu, Aug 30, 1:26 PM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

Ok, the top preprocessor condition for that function is #ifndef __SSE2_MATH__ - the exact same macro that was part of the motivation. Can you please test compiling a simple C file (including math.h) with -mno-sse? My guess would be that this is broken as well.
If yes I'm fine with reverting because I need to teach Clang to allow anonymous unions in type specifiers to make that weird system header work with this patch.

Thu, Aug 30, 1:00 PM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

I've sent out D51501. It unbreaks CUDA compilation and keeps OpenMP unchanged.

Thu, Aug 30, 12:07 PM
tra created D51501: [CUDA] Fix CUDA compilation broken by D50845.
Thu, Aug 30, 12:04 PM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

In general, it looks like this patch leads to some host macros having to be defined again for the auxiliary triple case. It is not clear to me how to exhaustively identify the missing macros, so far it's been just trial and error.

Well, that's the point of this patch, isn't it? Again, the current approach is to just define all macros which is definitely broken.

Thu, Aug 30, 11:45 AM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

In our case the headers from a relatively old glibc and compiler errors out on this:

/* This function is used in the `isfinite' macro.  */
__MATH_INLINE int
__NTH (__finite (double __x))
{
  return (__extension__
	  (((((union { double __d; int __i[2]; }) {__d: __x}).__i[1]
	     | 0x800fffffu) + 1) >> 31));
}
Thu, Aug 30, 11:39 AM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

FYI. This breaks our CUDA compilation. I haven't figured out what exactly is wrong yet. I may need to unroll the patch if the fix is not obvious.

Thu, Aug 30, 11:20 AM
tra accepted D51465: Revamp test-suite documentation.

Looks great. Thank you for updating the docs.

Thu, Aug 30, 9:47 AM

Wed, Aug 29

tra added inline comments to D51434: [HIP] Add -fvisibility hidden option to clang.
Wed, Aug 29, 2:30 PM
tra accepted D51441: Add predefined macro __gnu_linux__ for proper aux-triple.

While we're here, perhaps Builder.defineMacro("__linux__") should be changed to DefineStd("linux") which defines linux/__linux/__linux__?

Wed, Aug 29, 11:40 AM
tra added inline comments to D51441: Add predefined macro __gnu_linux__ for proper aux-triple.
Wed, Aug 29, 10:52 AM
tra added a comment to D51434: [HIP] Add -fvisibility hidden option to clang.

I could not find anything about PLTs in AMDGPU-ABI, nor could I find anything relevant on google.
I still have no idea why PLTs are required in this case. Without that info, the problem may as well be due to unintended requirement for PLT that this patch would hide.

Wed, Aug 29, 10:26 AM
tra added a comment to D51434: [HIP] Add -fvisibility hidden option to clang.

Could you elaborate on what exactly is the problem this patch fixes?
I don't see how internalizing the symbols connects to PLTs. My understanding is that PLTs are used to provide stubs for symbols to be resolved by dynamic linker at runtime. AFAICT AMD does not use shared libs on device side. What do I miss?

Wed, Aug 29, 9:37 AM

Tue, Aug 28

tra accepted D51336: [HIP] Fix output file extension.
Tue, Aug 28, 10:41 AM

Mon, Aug 27

tra accepted D51256: [test-suite, CUDA] Fix some CMake problems.
Mon, Aug 27, 11:35 AM
tra accepted D51306: [NVPTX] Implement isLegalToVectorizeLoadChain.

Nice. LGTM.

Mon, Aug 27, 9:57 AM

Fri, Aug 24

tra accepted D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

Please keep an eye on CUDA buildbot http://lab.llvm.org:8011/builders/clang-cuda-build.
It runs fair amount of tests with libc++ and handful of libstdc++ versions and may a canary if these changes break something.

Fri, Aug 24, 10:32 AM

Aug 23 2018

tra added a comment to D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called.

I've confirmed that the patch does not break anything in our CUDA code, so it's good to go as far as CUDA is concerned.

Aug 23 2018, 11:06 AM

Aug 22 2018

tra added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

__clang_cuda_device_functions.h is not intended to be a device-side math.h, despite having a lot of overlap/similarities. It may change at any time we get new CUDA version.
I would suggest writing an OpenMP-specific replacement for math.h which would map to whatever device-specific function OpenMP needs. For NVPTX that may be libdevice, for which you have declarations in __clang_cuda_libdevice_declares.h. Using part of __clang_cuda_device_functions.h may be a decent starting point for NVPTX, but OpenMP will likely need to provide an equivalent for other back-ends, too.

Aug 22 2018, 3:16 PM

Aug 21 2018

tra accepted D51042: [NVPTX] Remove ftz variants of cvt with rounding mode.
In D51042#1207769, @tra wrote:

This is a surprise. PTX ISA does not mention that .ftz is not applicable to cvt.*.f16.* instructions.
Is it only cvt that does not support .ftz or does it impact other instructions? PTX spec has add/sub/mul/fma/set/setp instructions that support f16 and have .ftz variant.

It's only cvt with an explicit rounding mode. I actually ran the output of f16-instructions.ll with FTZ through ptxas and removed instructions until it compiled it. This might even be a bug in ptxas.

Aug 21 2018, 11:00 AM
tra added a comment to D51042: [NVPTX] Remove ftz variants of cvt with rounding mode.

This is a surprise. PTX ISA does not mention that .ftz is not applicable to cvt.*.f16.* instructions.
Is it only cvt that does not support .ftz or does it impact other instructions? PTX spec has add/sub/mul/fma/set/setp instructions that support f16 and have .ftz variant.

Aug 21 2018, 9:44 AM

Aug 20 2018

tra accepted D50957: Rename -mlink-cuda-bitcode to -mlink-builtin-bitcode.
Aug 20 2018, 10:31 AM

Aug 17 2018

tra added a comment to D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called.

Talked to @ahatanak over IRC. It appears that this patch may have exposed a preexisting bug.
Apparently delete t; in test/SemaCUDA/call-host-fn-from-device.cu does actually end up calling __host__ operator delete. It should've picked __device__ operator delete, but it does not, so reporting an error here appears to be correct.

Aug 17 2018, 3:27 PM
tra added a comment to D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called.
In D47757#1204561, @tra wrote:

It's a regression. There's a decent chance it breaks someone and this patch, if committed by itself, will end up being rolled back.

Is the regression you are referring to about the static function case? I don't see a difference between ToT clang and my patch in the diagnostics they produce when I compile the following code:

__host__ void f();
static __host__ __device__ void g() { f(); }
__host__ __device__ void g2() { g(); }

Both error out when -fcuda-is-device is provided. If I comment out the definition of g2, it compiles fine.

Aug 17 2018, 1:08 PM
tra added a comment to D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called.

@tra and @rsmith: Can we move forward and fix the incorrect cuda diagnostics in a separate patch?

Aug 17 2018, 12:03 PM

Aug 16 2018

tra added a comment to D50815: Establish the <bit> header.

It appears that libcxx/include/CMakeLists.txt needs to be updated to include bit file into the file set.

Aug 16 2018, 5:22 PM
tra added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

As a result, we should really have a separate header that has those actually-available functions. When targeting NVPTX, why don't we have the included math.h be CUDA's math.h? In the end, those are the functions we need to call when we generate code. Right?

That's what D47849 deals with.

Yes, but it doesn't get CUDA's math.h. Maybe I misunderstand how this works (and I very well might, because it's not clear that CUDA has a math.h by that name), but that patch tries to avoid problems with the host's math.h and then also injects __clang_cuda_device_functions.h into the device compilation. How does this compare to when you include math.h in Clang's CUDA mode? It seems to be that we want to somehow map standard includes, where applicable, to include files in CUDA's include/crt directory (e.g., crt/math_functions.h and crt/common_functions.h for stdio.h for printf), and nothing else ends up being available (because it is, in fact, not available).

There's no CUDA specific math.h unless you want to regard clang_cuda_device_functions.h as a math header.

Aug 16 2018, 1:57 PM
tra updated subscribers of D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

Maybe for device compilation we also should define __NO_MATH_INLINES and __NO_STRING_INLINES macros to disable inline assembly in glibc?

The problem is that __NO_MATH_INLINES doesn't even avoid all inline assembly from bits/mathinline.h :-( incidentally Clang already defines __NO_MATH_INLINES for x86 (due to an old bug which has been fixed long ago) - and on CentOS we still have problems as described in PR38464.

As a second thought: This might be valid for NVPTX, but I don't think it's a good idea for x86-like offloading targets - they might well profit from inline assembly code.

I'm not saying that we should define those macros for all targets, only for NVPTX. But still, it may disable some inline assembly for other architectures.

Aug 16 2018, 11:12 AM

Aug 15 2018

tra added a comment to D46021: [DEBUGINFO] Disable emission of the dwarf sections, but allow directives..

Is there a way to control this from clang command line?

Aug 15 2018, 9:59 AM

Aug 13 2018

tra added inline comments to D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called.
Aug 13 2018, 11:21 AM
tra accepted D50596: [HIP] Make __hip_gpubin_handle hidden to avoid being merged across different shared libraries.
Aug 13 2018, 10:44 AM

Aug 9 2018

tra added inline comments to D47757: [Sema] Produce diagnostics when unavailable aligned allocation/deallocation functions are called.
Aug 9 2018, 2:09 PM

Aug 8 2018

tra accepted D50391: [NVPTX] Select atomic loads and stores.

In general .relaxed.sys semantics does appear to match guarantees provided by llvm's monotonic ordering, so the patch overall looks like the right thing to do.

Aug 8 2018, 10:41 AM

Aug 3 2018

tra added a comment to rL337903: Fix PR34170: Crash on inline asm with 64bit output in 32bit GPR.

Thank you for the analysis. The issue is not a showstopper for us, so I'm OK with waiting for your patches to land.

Aug 3 2018, 11:14 AM
tra committed rL338908: [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH")..
[NVPTX] Handle __nvvm_reflect("__CUDA_ARCH").
Aug 3 2018, 11:06 AM
tra closed D50207: [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH")..
Aug 3 2018, 11:05 AM

Aug 2 2018

tra updated the diff for D50207: [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH")..

Added 'explicit' keyword to contructor.

Aug 2 2018, 5:15 PM
tra added a comment to D50207: [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH")..

Just to check, the notion is that it's OK if I report a sm version less than what I end up running on?

Aug 2 2018, 5:13 PM
tra created D50207: [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH")..
Aug 2 2018, 5:00 PM
tra added a comment to rL337903: Fix PR34170: Crash on inline asm with 64bit output in 32bit GPR.

It appears that this patch crashes llvm under some circumstances.

Aug 2 2018, 3:48 PM

Aug 1 2018

tra added a comment to D49148: [DEBUGINFO] Disable unsupported debug info options for NVPTX target..

We normally do not need to deviate from the host options all that often. I would argue that keeping options identical is a reasonable default for most options.
For some options the driver may be able to derive a sensible value based on the host options. E.g. some options can be ignored. Some can be downgraded. Some can be replaced with a target-specific equivalent.
For others we must require the user to provide the value.

Aug 1 2018, 3:27 PM
tra added a comment to D49148: [DEBUGINFO] Disable unsupported debug info options for NVPTX target..

I wonder, what's the right thing to do to silence the warnings. For instance, we compile everything with -Werror and the warnings result in build breaks.

Aug 1 2018, 2:37 PM

Jul 30 2018

tra abandoned D49763: [CUDA] Call atexit() for CUDA destructor early on..

It appears that the issue that originally prompted this change is due to suspected bug in glibc triggered by specific details of our internal build.

Jul 30 2018, 3:59 PM

Jul 27 2018

tra accepted D49931: [CUDA][HIP] Allow function-scope static const variable.
Jul 27 2018, 4:08 PM
tra added a comment to D49931: [CUDA][HIP] Allow function-scope static const variable.

Looks OK overall except for the huge if below.

Jul 27 2018, 2:31 PM
tra added a comment to D49931: [CUDA][HIP] Allow function-scope static const variable.

This patch also allows function-scope static const variable without device memory qualifier and emits it as a global variable in constant address space.

Jul 27 2018, 1:08 PM
tra committed rT338142: [test-suite, CUDA] Filter out long-running redundant SIMD tests..
[test-suite, CUDA] Filter out long-running redundant SIMD tests.
Jul 27 2018, 11:24 AM
tra committed rL338142: [test-suite, CUDA] Filter out long-running redundant SIMD tests..
[test-suite, CUDA] Filter out long-running redundant SIMD tests.
Jul 27 2018, 11:13 AM
tra closed D49889: [test-suite, CUDA] Filter out long-running redundant SIMD tests..
Jul 27 2018, 11:13 AM

Jul 26 2018

tra updated the diff for D49889: [test-suite, CUDA] Filter out long-running redundant SIMD tests..

Use [] to escape special characters in the regex.
Otherwise I have to use "\\\\" which is way too many characters for my taste.

Jul 26 2018, 4:54 PM
tra created D49889: [test-suite, CUDA] Filter out long-running redundant SIMD tests..
Jul 26 2018, 4:41 PM