Page MenuHomePhabricator

tra (Artem Belevich)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 8 2015, 1:53 PM (298 w, 4 d)

Recent Activity

Yesterday

tra added a comment to D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables..

OK, now I'm starting to I understand this change..

Before, in function scope, we allow static const/non-const __shared__, and allow static const so long as it's not __device__ or __constant__.

  • static -> error? (I understood us saying above that it is, but now that I read the code, isn't it saying it's an error?)
  • static const -> allowed
  • static __device__ -> error
  • static const __device__ -> error
  • static __constant__ -> error
  • static const __constant__ -> error

After, in function scope, the rule is, allow static const/non-const __shared__ or anything that's static const.

  • static -> error, must be const
  • static const -> allowed
  • static __device__ -> error, must be const
  • static const __device__ -> allowed
  • static __constant__ -> error, must be const
  • static const __constant__ -> allowed

I guess my question when I write out this table is, why shouldn't it be like this?

  • static -> allowed
  • static const -> allowed
  • static __device__ -> allowed
  • const static __device__ -> allowed
  • static __constant__ -> error, must be const
  • const static __constant__ -> allowed
Mon, Sep 28, 4:34 PM · Restricted Project
tra added a comment to D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables..

OK, backing up, what are the semantics of static on __constant__, __device__, and __shared__?

  • My understanding is that __shared__ behaves the same whether or not it's static. It's not equivalent to namespace a { __shared__ int c = 4; }, because that's illegal.
Mon, Sep 28, 11:31 AM · Restricted Project
tra accepted D88425: Skip -fPIE for AMDGPU and HIP toolchain.

LGTM

Mon, Sep 28, 9:55 AM · Restricted Project
tra added a comment to D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables..

wha... As you know, const doesn't mean anything, that can be const-casted away. And then you'll be able to observe that this nominally-static variable is just a normal variable.

Mon, Sep 28, 9:48 AM · Restricted Project

Fri, Sep 25

tra retitled D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables. from [CUDA] Allow `static const {__constant__, __device__}` variables. to [CUDA] Allow local `static const {__constant__, __device__}` variables..
Fri, Sep 25, 4:32 PM · Restricted Project
tra requested review of D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables..
Fri, Sep 25, 4:32 PM · Restricted Project
tra closed D88255: [test-suite, CUDA] Compile-time test for builtin variables..

Landed in 93b9e85695e583d1c577b40

Fri, Sep 25, 3:48 PM

Thu, Sep 24

tra committed rG30514f0afa3e: [CUDA] Added conversion functions to builtin vars. (authored by tra).
[CUDA] Added conversion functions to builtin vars.
Thu, Sep 24, 2:34 PM
tra closed D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars..
Thu, Sep 24, 2:33 PM · Restricted Project
tra added a comment to D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars..
In D88250#2293346, @tra wrote:

I know it comes in a separate change, but can we add a check to the test-suite?

Will do.

Thu, Sep 24, 12:20 PM · Restricted Project
tra requested review of D88255: [test-suite, CUDA] Compile-time test for builtin variables..
Thu, Sep 24, 12:19 PM
tra added a comment to D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars..

I know it comes in a separate change, but can we add a check to the test-suite?

Thu, Sep 24, 11:55 AM · Restricted Project
tra updated the diff for D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars..

Fixed compatibility with pre-c++11

Thu, Sep 24, 11:52 AM · Restricted Project
tra requested changes to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.
Thu, Sep 24, 11:32 AM · Restricted Project
tra requested review of D88250: [CUDA] Added dim3/uint3 conversion functions to builtin vars..
Thu, Sep 24, 11:08 AM · Restricted Project

Tue, Sep 22

tra accepted D88115: [CUDA][HIP] Fix static device var used by host code only.
Tue, Sep 22, 2:11 PM · Restricted Project

Mon, Sep 21

tra accepted D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.
In D84362#2283078, @tra wrote:

It's possible. Unfortunately it's only triggered by our internal tool and it's hard to create a public reproducer for it. I'll debug and try to fix it on Monday.

Mon, Sep 21, 12:23 PM · Restricted Project

Fri, Sep 18

tra added a comment to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

I think this is probably a different issue. The issue reported in D84364 was introduced by change in D84364.

Fri, Sep 18, 2:59 PM · Restricted Project
tra added a comment to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

The fix is for the change in D84364. It has no effect on the change in this review. Are you sure the issue you saw is due to change in this review instead of change in D84364?

Fri, Sep 18, 2:52 PM · Restricted Project
tra reopened D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

I have a fix for the issue reported in D84364. Would you like to try? Thanks.

Fri, Sep 18, 2:27 PM · Restricted Project

Thu, Sep 17

tra added a comment to D87325: [HIP] Add -emit-pch option to clang driver.

@tra , I tried to invoke clang with -Xclang -emit-pch, but the -x hip path doesn't know about the precompile phase. It does pass -emit-pch to the cc1 command, but is overridden by the -emit-obj default flag to cc1 in compiler phase. Also, from there it will go through backend, assembler, linker, which we will need to disable if we want this method to work. Here is the experiment:

root@e6915ef660c7:~/llvm-project/build_rel# ./bin/clang++ -x hip --cuda-device-only --cuda-gpu-arch=gfx803 -Xclang -emit-pch a.hip -o a.hip.pch -ccc-print-bindings

If you dd -S and remove -ccc-print-bindingsthis command does produce a PCH.

Thu, Sep 17, 2:45 PM
tra added a comment to D87791: [CUDA][HIP] Fix -gsplit-dwarf option.

It is requested by our debugger team, so it should work with amdgpu.

Thu, Sep 17, 11:26 AM · Restricted Project
tra added a comment to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.
In D84362#2279688, @tra wrote:

Apparently this patch triggers compiler crashes on some of our code. I'll try to create a reproducer, but it would be great to revert the patch for now.

Thu, Sep 17, 11:05 AM · Restricted Project
tra added a comment to D87791: [CUDA][HIP] Fix -gsplit-dwarf option.

Therefore in either case there is no need to rename the intermediate .o files since they are temporary files which have unique names.

The .dwo files are not temporary files. They are supposed to be shipped with .o files for debugging info.

Thu, Sep 17, 10:54 AM · Restricted Project
tra added a comment to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

Apparently this patch triggers compiler crashes on some of our code. I'll try to create a reproducer, but it would be great to revert the patch for now.

Thu, Sep 17, 10:13 AM · Restricted Project

Wed, Sep 16

tra accepted D87791: [CUDA][HIP] Fix -gsplit-dwarf option.

Does this naming scheme the same as used for .o files? We may want to keep them in sync.

Wed, Sep 16, 2:23 PM · Restricted Project
tra added a comment to D87325: [HIP] Add -emit-pch option to clang driver.

Having -emit-pch in the clang driver is useful because it doesn't require users to specify standard C++ include paths, clang include paths, and CUDA/HIP wrapper headers needed by CC1. That is error prone for the user.

Wed, Sep 16, 2:17 PM
tra accepted D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

This is a very nice cleanup. Thank you, Sam.

Wed, Sep 16, 1:25 PM · Restricted Project
tra added a comment to D87325: [HIP] Add -emit-pch option to clang driver.
In D87325#2271676, @tra wrote:

Can you elaborate on the use case of PCH files for CUDA/HIP?

I believe one use-case for PCH is for common include headers such as hip_runtime.h which is being re-used in many application source files. To improve the performance, we can pre-compile the header and re-use it during online compilation.

Wed, Sep 16, 12:03 PM
tra accepted D87761: [clang][codegen] Skip adding default function attributes on intrinsics..
Wed, Sep 16, 10:22 AM · Restricted Project

Tue, Sep 15

tra added a comment to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.
Tue, Sep 15, 1:46 PM · Restricted Project
tra accepted D87709: InferAddressSpaces: Fix assert with unreachable code.
Tue, Sep 15, 11:33 AM · Restricted Project
tra added a comment to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

There are use patterns expecting PartialDiagnosticInst << X << Y to continue to be a PartialDiagnostic&, e.g.

PartialDiagnosticAt PDAt(SourceLoc, PartialDiagnosticInst << X << Y);

However if we derive PartialDiagnostic and DiagnosticBuilder from a base class DiagnosticBuilderBase which implements the << operators, PartialDiagnosticInst << X << Y will become a DiagnosticBuilderBase&, then we can no longer write the above code.

That's one reason I use templates to implement << operators.

Do we want to sacrifice this convenience?

Tue, Sep 15, 11:32 AM · Restricted Project

Mon, Sep 14

tra added a comment to D87325: [HIP] Add -emit-pch option to clang driver.

Can you elaborate on the use case of PCH files for CUDA/HIP?

Mon, Sep 14, 11:26 AM
tra updated subscribers of D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

So, the idea here is to do some sort of duck-typing and allow DiagBuilder to work with both DiagnosticBuilder and PartialDiagnostic.

Mon, Sep 14, 10:54 AM · Restricted Project

Thu, Sep 3

tra accepted D84364: [CUDA][HIP] Defer overloading resolution diagnostics for host device functions.
Thu, Sep 3, 4:41 PM · Restricted Project
tra added a comment to D84364: [CUDA][HIP] Defer overloading resolution diagnostics for host device functions.

To sum it up -- the patch introduces -fgpu-defer-diag flag which allows deferring overload resolution diagnostics, if overload set included candidates from both sides.

Thu, Sep 3, 4:41 PM · Restricted Project

Aug 26 2020

tra added a comment to D86376: [HIP] Simplify kernel launching.

For example, in HIP program, there is a kernel void foo(int*). If a C++ program wants to launch it, the desirable way is

void foo(int*);
hipLaunchKernel(foo, grids, blocks, args, shmem, stream);

Due to the prefixed kernel stub name, currently the users have to use

void __device_stub_foo(int*);
hipLaunchKernel(__device_stub_foo, grids, blocks, args, shmem, stream);
Aug 26 2020, 10:34 AM

Aug 25 2020

tra added a comment to D86376: [HIP] Simplify kernel launching.

My previous measurements did not warming up, which caused some one time overhead due to device initialization and loading of device binary. With warm up, the call of __hipPushCallConfigure/__hipPopCallConfigure takes about 19 us. Based on the trace from rocprofile, the time spent inside these functions can be ignored. Most of the time is spent making the calls. These functions stay in a shared library, which may be the reason why they take such long time. Making them always_inline may get rid of the overhead, however, that would require exposing internal data structures.

Aug 25 2020, 10:35 AM

Aug 24 2020

tra added a comment to D86376: [HIP] Simplify kernel launching.

This patch appears to be somewhere in the gray area to me. My prior experience with CUDA suggests that it will make little to no difference. On the other hand, AMD GPUs may be different enough to prove me wrong. Without specific evidence, I still can't tell what's the case here.

Sorry, the overhead due to __hipPushConfigure/__hipPopConfigure is about 60 us. The typical kernel launching latency is about 500us, therefore the improvement is around 10%.

Aug 24 2020, 3:49 PM
tra added a comment to D86376: [HIP] Simplify kernel launching.

I'm OK with how the patch is implemented.
I'm still on the fence regarding whether it should be implemented.

Aug 24 2020, 1:40 PM
tra added a comment to D86376: [HIP] Simplify kernel launching.

How much does this inlining buy you in practice? I.e. what's a typical launch latency before/after the patch? For CUDA, config push/pop is negligible compared to the cost of actually launching the kernel on the GPU. It is measurable if the launch is asynchronous, but queueing kernels fast, does not help all that much in the long run -- you eventually have to run those kernels on the GPU, so in most cases you're just spend a bit more time idling while waiting for the queued kernels to finish. To be beneficial, you'll need a finely balanced CPU/GPU workload and that's rather hard to achieve. Not to the point where the minor savings here would be meaningful. I would assume the situation on AMD GPUs is not that different.

Aug 24 2020, 11:08 AM

Aug 14 2020

tra committed rG1689c36b1aeb: Split Preprocessor/init.c test (authored by tra).
Split Preprocessor/init.c test
Aug 14 2020, 1:15 PM
tra closed D85798: Split Preprocessor/init.c test. NFC..
Aug 14 2020, 1:14 PM · Restricted Project
tra updated the diff for D85798: Split Preprocessor/init.c test. NFC..

Separated blocks of RUN commands with an empty line to make navigation easier.

Aug 14 2020, 1:06 PM · Restricted Project
tra accepted D85879: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU.
Aug 14 2020, 10:38 AM · Restricted Project

Aug 13 2020

tra added inline comments to D85879: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU.
Aug 13 2020, 4:33 PM · Restricted Project
tra accepted D60620: [HIP] Support target id by --offload-arch.

Couple of minor nits. LGTM otherwise.

Aug 13 2020, 2:05 PM · Restricted Project, Restricted Project
tra added inline comments to D85879: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU.
Aug 13 2020, 1:31 PM · Restricted Project
tra added inline comments to D85798: Split Preprocessor/init.c test. NFC..
Aug 13 2020, 10:35 AM · Restricted Project
tra added inline comments to D85879: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU.
Aug 13 2020, 9:31 AM · Restricted Project

Aug 11 2020

tra requested review of D85798: Split Preprocessor/init.c test. NFC..
Aug 11 2020, 5:29 PM · Restricted Project
tra added a comment to D60620: [HIP] Support target id by --offload-arch.

Looks good in general. Mostly C++ style comments below.

Aug 11 2020, 4:41 PM · Restricted Project, Restricted Project
tra committed rGec5f793996f4: [OpenMP] split execution of a long test into smaller parts. (authored by tra).
[OpenMP] split execution of a long test into smaller parts.
Aug 11 2020, 11:53 AM
tra closed D85695: [OpenMP] split execution of a long test into smaller parts..
Aug 11 2020, 11:53 AM · Restricted Project

Aug 10 2020

tra added a comment to D85695: [OpenMP] split execution of a long test into smaller parts..

The test is no longer sticking out: https://gist.github.com/Artem-B/d0b05c2e98a49158c02de23f7f4f0279

Aug 10 2020, 4:11 PM · Restricted Project
tra added inline comments to D85551: [OpenMP] Split OpenMP/target_map_codegen test [NFC].
Aug 10 2020, 4:00 PM · Restricted Project
tra requested review of D85695: [OpenMP] split execution of a long test into smaller parts..
Aug 10 2020, 4:00 PM · Restricted Project
tra committed rG9c8ae4086031: [ARM] Speed up arm-cortex-cpus.c test (authored by tra).
[ARM] Speed up arm-cortex-cpus.c test
Aug 10 2020, 2:28 PM
tra closed D85575: [ARM] Speed up arm-cortex-cpus.c test.
Aug 10 2020, 2:27 PM · Restricted Project
tra updated subscribers of D84068: AMDGPU/clang: Search resource directory for device libraries.

If we ship them with clang, who/where/how builds them?
If they come from ROCm packages, how would those packages add stuff into *clang* install directory? Resource dir is a rather awkward location if contents may be expected to change routinely.

Symlinks. I've been building the device libraries as part of LLVM_EXTERNAL_PROJECTS, and think this should be the preferred way to build and package the libraries. This is how compiler-rt is packaged on linux distributions. The compiler-rt binaries are a separate package symlinked into the resource directory locations. I'm not sure what you mean exactly by change routinely, the libraries should be an implementation detail invisible to users, not something they should be directly relying on. Only clang actually knows how to use them correctly and every other user is buggy

What if I have multiple ROCm versions installed? Which one should provide the bitcode in the resource dir?

These should be treated as an integral part of clang, and not something to mix and match. Each rocm version should have its own copy of the device libraries. It only happens to work most of the time if you mismatch these, and this isn't a guaranteed property.

Aug 10 2020, 2:15 PM
tra added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

The fix is here https://reviews.llvm.org/D85686

Aug 10 2020, 1:51 PM · Restricted Project
tra accepted D85686: [CUDA][HIP] Do not externalize implicit constant static variable.
Aug 10 2020, 1:51 PM · Restricted Project
tra added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

We can restrict externalization to constant variables with explicit 'constant' attributes only, which should fix this issue.

Aug 10 2020, 12:40 PM · Restricted Project
tra added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

Sam, just a FYI that the patch has a couple of unintended consequences.

Aug 10 2020, 11:45 AM · Restricted Project

Aug 7 2020

tra updated the summary of D85575: [ARM] Speed up arm-cortex-cpus.c test.
Aug 7 2020, 5:32 PM · Restricted Project
tra requested review of D85575: [ARM] Speed up arm-cortex-cpus.c test.
Aug 7 2020, 5:30 PM · Restricted Project
tra committed rGcd01980f308a: [OpenMP] Split OpenMP/target_map_codegen test [NFC] (authored by tra).
[OpenMP] Split OpenMP/target_map_codegen test [NFC]
Aug 7 2020, 1:48 PM
tra closed D85551: [OpenMP] Split OpenMP/target_map_codegen test [NFC].
Aug 7 2020, 1:48 PM · Restricted Project
tra added inline comments to D85551: [OpenMP] Split OpenMP/target_map_codegen test [NFC].
Aug 7 2020, 1:34 PM · Restricted Project
tra added a comment to D85551: [OpenMP] Split OpenMP/target_map_codegen test [NFC].

Wow, cool. I imagine it was hard to split this given the manual check lines. We really need to start using the upgrade scripts here.

I'm fine with this, @ABataev WDYT?

Aug 7 2020, 1:29 PM · Restricted Project
tra requested review of D85551: [OpenMP] Split OpenMP/target_map_codegen test [NFC].
Aug 7 2020, 1:14 PM · Restricted Project

Aug 6 2020

tra added a comment to D84364: [CUDA][HIP] Defer overloading resolution diagnostics for host device functions.

I added a Deferrable bit to the diagnostics which can be specified in td files. This can be added to individual diagnostic defs or added to a bunch of diagnostic defs all together.

This field is used to control whether a diagnostic message can be deferred.

Aug 6 2020, 3:38 PM · Restricted Project
tra added inline comments to D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..
Aug 6 2020, 10:53 AM · Restricted Project
tra added a comment to D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..

Do we need to disable pgo and coverage mapping for device compilation? Or it is already disabled?

Aug 6 2020, 9:28 AM · Restricted Project

Aug 5 2020

tra committed rG7d057efddc00: [CUDA] Work around a bug in rint/nearbyint caused by a broken implementation… (authored by tra).
[CUDA] Work around a bug in rint/nearbyint caused by a broken implementation…
Aug 5 2020, 1:14 PM
tra closed D85236: [CUDA] Work around a bug in rint() caused by a broken implementation provided by CUDA..
Aug 5 2020, 1:14 PM · Restricted Project
tra updated the diff for D85236: [CUDA] Work around a bug in rint() caused by a broken implementation provided by CUDA..

Also fixed the same bug in nearbyint().

Aug 5 2020, 12:49 PM · Restricted Project
tra requested review of D85352: [CUDA, test-suite] More test cases for rint() and nearint().
Aug 5 2020, 12:47 PM
tra requested review of D85349: [CUDA, test-suite] Prevent constant folding of the test inputs..
Aug 5 2020, 12:44 PM
tra added inline comments to D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..
Aug 5 2020, 10:50 AM · Restricted Project
tra added a comment to D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..

LGTM for CUDA.

Aug 5 2020, 9:42 AM · Restricted Project

Aug 4 2020

tra requested review of D85236: [CUDA] Work around a bug in rint() caused by a broken implementation provided by CUDA..
Aug 4 2020, 12:05 PM · Restricted Project
tra accepted D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

What is expected to happen to device statics in anonymous name spaces? It may be worth adding them to the tests.

Aug 4 2020, 11:02 AM · Restricted Project

Jul 29 2020

tra accepted D84824: [HIP] Emit target-id module flag.
Jul 29 2020, 11:27 AM
tra added a comment to D71726: Let clang atomic builtins fetch add/sub support floating point types.

LGTM, modulo couple of nits.

Jul 29 2020, 10:45 AM
tra added a comment to D84743: [Clang][AMDGCN] Universal device offloading macros header.

Also, do we need the header at all?
It would be much easier to just get clang itself to add normalized macros without trying to reconstruct them from the existing macros.

Jul 29 2020, 10:12 AM · Restricted Project

Jul 28 2020

tra added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

I think Sam's approach is reasonable.

Jul 28 2020, 11:44 AM · Restricted Project
tra added a comment to D84743: [Clang][AMDGCN] Universal device offloading macros header.

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

Jul 28 2020, 11:11 AM · Restricted Project

Jul 27 2020

tra added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

It's a good point. Perhaps this is one of the cases where we should *not* follow nvcc.
We can't have our cake (preserve static behavior) and eat it (treat it as non-static in case something on the host side may decide to use an API which uses symbol names). Something's got to give. While we could make it work in some cases, I don't think we can make it work consistently.
I think it would be reasonable to restrict APIs that access symbols by name to be applicable to visible symbols only.

Jul 27 2020, 3:15 PM · Restricted Project

Jul 24 2020

tra updated the diff for D84258: [buildbot] Added config files for CUDA build bots.

Updated directory structure.

Jul 24 2020, 12:11 PM
tra added a comment to D84256: [buildbot] Moved MLIR buildbot config under buildbot/google/mlir.

Phabricator seems to be confused by the renames. The change just moved terraform -> terraform/buildbot-mlir-nvidia
The directory structure looks like this now:

Jul 24 2020, 12:02 PM
tra updated the diff for D84256: [buildbot] Moved MLIR buildbot config under buildbot/google/mlir.

Really updated directory structure.

Jul 24 2020, 11:54 AM
tra updated the diff for D84256: [buildbot] Moved MLIR buildbot config under buildbot/google/mlir.

Updated directory structure.

Jul 24 2020, 11:53 AM
tra added a comment to D84258: [buildbot] Added config files for CUDA build bots.

I would not run multiple containers on one VM. As you said, k8s cannot share one GPUs across containers. I would rather create one "build slave" per VM (or a group of "build slaves" each in a separate VM) in buildbot and then have that VM(s) execute a set of "builders". We could have an m:n mapping of "build slaves" and "builders".

My mlir-nvidia builder is not very picky. It would probably run on any of your machines as long as it has an Nvidia card. Sorry about the non-inclusive wording here, but that's what buildbot calls them in the UI.

Jul 24 2020, 11:18 AM

Jul 23 2020

tra added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

The problem is not whether we have solution to tell them but when we need to add that. Not all static device variables need to be visible to the host side. Externalizing them adds the overhead for the linker and may pose additional restrictions on aggressive optimizations. Do we have to support every ambiguous usage in the burden of the compiler change?

Jul 23 2020, 2:59 PM · Restricted Project
tra added a comment to D84364: [CUDA][HIP] Defer overloading resolution diagnostics for host device functions.
In D84364#2170244, @tra wrote:

I'm going to try the patch on our CUDA code and see how it fares. Stay tuned.

Jul 23 2020, 2:31 PM · Restricted Project
tra added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.
In D80858#2170547, @tra wrote:

Would it work if we generate a globally unique visible aliases for the static vars and use the alias' name to register device-side entities without changing their visibility?

We still need to define how a static device variable should be visible on the host side.

Jul 23 2020, 1:50 PM · Restricted Project
tra added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

Would it work if we generate a globally unique visible aliases for the static vars and use the alias' name to register device-side entities without changing their visibility?

Jul 23 2020, 1:02 PM · Restricted Project
tra added a comment to D84364: [CUDA][HIP] Defer overloading resolution diagnostics for host device functions.

I'm going to try the patch on our CUDA code and see how it fares. Stay tuned.

Jul 23 2020, 11:12 AM · Restricted Project