Page MenuHomePhabricator

yaxunl (Yaxun Liu)
User

Projects

User does not belong to any projects.

User Details

User Since
May 13 2015, 10:16 AM (191 w, 6 d)

Recent Activity

Fri, Jan 11

yaxunl updated the diff for D56318: [HIP] Fix size_t for MSVC environment.

Copy type information from AuxTarget.

Fri, Jan 11, 11:10 AM

Thu, Jan 10

yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

No, I understand that things like the function-call ABI should be different from the associated host ABI, but things like the size of long and the bit-field layout algorithm presumably shouldn't be, and that's the sort of thing that's configured by TargetInfo.

How about create a ForwardingTargegInfo which will has a pointer to AuxTarget and forward to that target if it is not null. Then let AMDGPUTargetInfo inherit from that.

Why forward? You have, like, two supported host environments, right? Can you just a subclass apiece of either MicrosoftX86_64TargetInfo or X86_64TargetInfo?

If that's unreasonable and you do need to forward, having a ForwardingTargetInfo sounds like a good idea, although I think you should require it to have an underlying target, and I think you need it to copy all the fields of that target.

Thu, Jan 10, 1:03 PM
yaxunl committed rC350885: [HIP] Use nul instead of /dev/null when running on windows.
[HIP] Use nul instead of /dev/null when running on windows
Thu, Jan 10, 12:15 PM
yaxunl committed rL350885: [HIP] Use nul instead of /dev/null when running on windows.
[HIP] Use nul instead of /dev/null when running on windows
Thu, Jan 10, 12:15 PM
yaxunl closed D56225: [HIP] Use nul instead of /dev/null when running on windows.
Thu, Jan 10, 12:15 PM
yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

No, I understand that things like the function-call ABI should be different from the associated host ABI, but things like the size of long and the bit-field layout algorithm presumably shouldn't be, and that's the sort of thing that's configured by TargetInfo.

Thu, Jan 10, 10:49 AM
yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

If I was only concerned about size_t, your current solution would be fine. My concern is that you really need to match *all* of the associated CPU target's ABI choices, so your target really ought to be forwarding everything to that target by default and only selectively overriding it in order to support GPU-specific features. Probably the easiest way to do that is via inheritance.

Thu, Jan 10, 10:30 AM
yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

Okay. Is there a reasonable way to make your targets delegate to a different TargetInfo implementation for most things so that you can generally match the host target for things like type sizes and alignments?

Thu, Jan 10, 7:18 AM
yaxunl added inline comments to D56225: [HIP] Use nul instead of /dev/null when running on windows.
Thu, Jan 10, 7:07 AM
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

This patch still doesn't make any sense. You don't need to do any special validation when passing a function as a template argument. When Sema instantiates the template definition, it'll rebuild the expressions that refer to the template parameter, which will trigger the normal checking for whether those expressions are illegally referencing a host function from the device, etc. All you need to do is suppress that checking (whether it happens in a template definition or not) for references from non-potentially-evaluated contexts.

Thu, Jan 10, 4:49 AM

Wed, Jan 9

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

But why? Why do you want to limit this to just template arguments instead of all sorts of similar contexts?

Wed, Jan 9, 4:56 PM
yaxunl updated the diff for D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

disable the check for more general cases.

Wed, Jan 9, 4:28 PM
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Sema won't necessarily have resolved a template decl when parsing a template argument list, so trying to propagate that decl down to indicate that we're resolving a template argument is not a good approach.

I was going to suggest recording that we're within a template argument in the current ExpressionEvaluationContextRecord, but in fact there's an even simpler and more general solution: there's no reason to enforce this restriction in *any* unevaluated context. If someone wants to refer to a device function within a decltype or sizeof operand, that should be fine. So you should just conditionalize the diagnostic on whether this is within an unevaluated context.

Wed, Jan 9, 11:28 AM
yaxunl updated the diff for D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Passing template decl by ExpressionEvaluationContextRecord.

Wed, Jan 9, 11:23 AM
yaxunl updated the diff for D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Add test for __host__ __device__.
Removing the flag IsParsingTemplateArgument in Sema. Instead, check ExprEvalContexts
for disabling checking device/host consistency.
I did not use ExprEvalContext Unevaluated to condition the check because
the issue happens with ExprEvalContext ConstantEvaluated. Also we do not want to
totally remove the check, we just want to defer the check until the arg evaluation is done.
When the deferred check is performed, ExprEvalContext is still in ConstantEvaluated but
its kind is no longer EK_TemplateArgument, therefore we can use the expr kind to condition
the check.

Wed, Jan 9, 8:41 AM
yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.
__host__ void bar() {}
__device__ int bar() { return 0; }
__host__ __device__ void foo() { int x = bar(); }
template <void (*devF)()> __global__ void kernel() { devF();}

kernel<foo>();

we DTRT for this case. Here host bar needs to return int since foo() expects that. will add a test for that.

__host__ bar() should not need to return int if foo is inline (or templated), because then we should never codegen foo for host. I guess my question is, we should be sure that kernel<foo>() does not force an inline/templated foo to be codegen'ed for host. (Sorry that wasn't more clear before.)

Wed, Jan 9, 7:59 AM

Tue, Jan 8

yaxunl added a comment to D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.

Without reading the patch in detail (sorry) but looking mainly at the testcase: It looks like we're not checking how overloading and __host__ __device__ functions play into this. Maybe there are some additional edge-cases to explore/check.

will add test for __host__ __device__.

Tue, Jan 8, 1:52 PM

Mon, Jan 7

yaxunl created D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter.
Mon, Jan 7, 2:05 PM

Fri, Jan 4

yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

No, no, I understand that you're not changing pointer sizes, but this is one example of trying to match the ABI of the target environment, and I'm trying to understand how far that goes. What does it mean to be in the "MSVC" environment when you're actually just compiling for the GPU? Why are you using OS headers in the first place? Do you need struct layout to match MSVC (presumably on x86-64)? What should happen with the C++ ABI?

Fri, Jan 4, 11:30 AM
yaxunl accepted D56321: [HIP][DRIVER][OFFLOAD] Do not unbundle unsupported file types.

LGTM. Can you rename the lit test as hip-link-shared-library.hip? That is more meaningful. Thanks.

Fri, Jan 4, 11:04 AM
yaxunl added a comment to D56318: [HIP] Fix size_t for MSVC environment.

What's the general idea here, that you're going to pretend to be the environment's "standard" CPU target of the right pointer width and try to match the ABI exactly? This seems like a pretty treacherous road to go down.

Fri, Jan 4, 10:50 AM
yaxunl added a reviewer for D56225: [HIP] Use nul instead of /dev/null when running on windows: rjmccall.
Fri, Jan 4, 6:56 AM
yaxunl created D56318: [HIP] Fix size_t for MSVC environment.
Fri, Jan 4, 6:50 AM

Wed, Jan 2

yaxunl created D56225: [HIP] Use nul instead of /dev/null when running on windows.
Wed, Jan 2, 2:12 PM

Dec 13 2018

yaxunl accepted D55663: [CUDA] Make all host-side shadows of device-side variables undef..

LGTM. Thanks!

Dec 13 2018, 1:39 PM

Dec 5 2018

yaxunl added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

I think if we can just declare something simple to follow that doesn't depend on the IR type alignment, we could pack any basic type and align any aggregates to 4

Dec 5 2018, 7:25 PM

Dec 4 2018

yaxunl added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.
Dec 4 2018, 12:04 PM

Nov 30 2018

yaxunl added a comment to D53153: [OpenCL] Mark kernel functions with default visibility.

You still have the same linkage model for those other languages, right? Ultimately there's something like a kernel function that has to be declared specially and which becomes the only public interface of the code?

I don't think this is true for all languages targeting AMDGPU. For example, HIP has APIs like hipMemcpyToSymbol which accept a string identifying a symbol for a variable in global/constant address space. @yaxunl is my understanding about the HIP runtime requiring dynamic symbols for global variables correct here?

Nov 30 2018, 12:17 PM

Nov 29 2018

yaxunl added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

This seems backwards. Clang knows what the actual ABI alignment of the C type is, and it doesn't have to match the alignment of the IR type that IRGen produces. It's the actual C ABI alignment that's supposed to affect the calling convention, so there needs to be some way to specify the C ABI alignment on the parameter in IR. That may mean using byval, which can be given an explicit alignment.

AMDGPU backend does not support passing struct type kernel argument by pointer with byval attribute.

@arsenm Do you think it is feasible to use pointer with byval attribute to pass the struct type kernel argument? Thanks.

byval really doesn't make any sense for kernels. The address space wouldn't even match correctly, since byval is for stack passed arguments

Nov 29 2018, 1:16 PM
yaxunl added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

This seems backwards. Clang knows what the actual ABI alignment of the C type is, and it doesn't have to match the alignment of the IR type that IRGen produces. It's the actual C ABI alignment that's supposed to affect the calling convention, so there needs to be some way to specify the C ABI alignment on the parameter in IR. That may mean using byval, which can be given an explicit alignment.

Also, the alignment of the kernel argument is only used to put the argument in some buffer accessible to the device. It needs not match the C ABI alignment. If the user take address of this argument, they will get address of an alloca, which matches the C ABI alignment.

For example, in code below, %s.coerce is OK to align to 1. If user takes its address, they get %s, which is aligned to 8.

%struct.S = type <{ i32*, i8, %struct.U, [5 x i8] }>
%struct.U = type { i16 }

; Function Attrs: convergent noinline nounwind optnone
define amdgpu_kernel void @_Z6kernelc1SPi(i8 %a, %struct.S %s.coerce, i32* %b) #0 {
entry:
  %s = alloca %struct.S, align 8, addrspace(5)
  %s1 = addrspacecast %struct.S addrspace(5)* %s to %struct.S*
  %a.addr = alloca i8, align 1, addrspace(5)
  %0 = addrspacecast i8 addrspace(5)* %a.addr to i8*
  %b.addr = alloca i32*, align 8, addrspace(5)
  %1 = addrspacecast i32* addrspace(5)* %b.addr to i32**
  store %struct.S %s.coerce, %struct.S* %s1, align 8
  store i8 %a, i8* %0, align 1
  store i32* %b, i32** %1, align 8
  ret void
}
Nov 29 2018, 12:44 PM
yaxunl added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

This seems backwards. Clang knows what the actual ABI alignment of the C type is, and it doesn't have to match the alignment of the IR type that IRGen produces. It's the actual C ABI alignment that's supposed to affect the calling convention, so there needs to be some way to specify the C ABI alignment on the parameter in IR. That may mean using byval, which can be given an explicit alignment.

Nov 29 2018, 12:36 PM
yaxunl added a comment to D55067: [HIP] Fix offset of kernel argument for AMDGPU target.

This seems backwards. Clang knows what the actual ABI alignment of the C type is, and it doesn't have to match the alignment of the IR type that IRGen produces. It's the actual C ABI alignment that's supposed to affect the calling convention, so there needs to be some way to specify the C ABI alignment on the parameter in IR. That may mean using byval, which can be given an explicit alignment.

Nov 29 2018, 12:17 PM
yaxunl created D55067: [HIP] Fix offset of kernel argument for AMDGPU target.
Nov 29 2018, 11:20 AM

Nov 16 2018

yaxunl added a comment to D51341: [HEADER] Overloadable function candidates for half/double types.

@Anastasia @yaxunl
Hi, I am working on generalizing this patch and several questions have raised during this, so I want to discuss them with you:

  1. Should #pragma OPENCL EXTENSION ext_name : begin enables the extension as well? For now I see it's not, as an example:

    ` #pragma OPENCL EXTENSION cl_khr_fp16 : enable half attribute((overloadable)) goo(half in1, half in2); // all ok #pragma OPENCL EXTENSION cl_khr_fp16 : disable

    #pragma OPENCL EXTENSION cl_khr_fp16 : begin half attribute((overloadable)) goo(half in1, half in2); // declaring function parameter of type 'half' is not allowed; did you forget * ? #pragma OPENCL EXTENSION cl_khr_fp16 : end `
Nov 16 2018, 6:11 AM

Nov 13 2018

yaxunl committed rL346828: [HIP] Fix device only compilation.
[HIP] Fix device only compilation
Nov 13 2018, 8:50 PM
yaxunl committed rC346828: [HIP] Fix device only compilation.
[HIP] Fix device only compilation
Nov 13 2018, 8:50 PM
yaxunl closed D54496: [HIP] Fix device only compilation.
Nov 13 2018, 8:50 PM
yaxunl added a comment to D54496: [HIP] Fix device only compilation.
In D54496#1297710, @tra wrote:

Do I understand it correctly that the bug appears to affect HIP compilation only?

Nov 13 2018, 4:04 PM
yaxunl created D54496: [HIP] Fix device only compilation.
Nov 13 2018, 2:03 PM

Nov 9 2018

yaxunl committed rL346536: [HIP] Remove useless sections in linked files.
[HIP] Remove useless sections in linked files
Nov 9 2018, 10:54 AM
yaxunl committed rC346536: [HIP] Remove useless sections in linked files.
[HIP] Remove useless sections in linked files
Nov 9 2018, 10:54 AM
yaxunl closed D54275: [HIP] Remove useless sections in linked files.
Nov 9 2018, 10:54 AM

Nov 8 2018

yaxunl created D54275: [HIP] Remove useless sections in linked files.
Nov 8 2018, 2:04 PM
yaxunl committed rL346413: Fix bitcast to address space cast for coerced load/stores .
Fix bitcast to address space cast for coerced load/stores
Nov 8 2018, 8:58 AM
yaxunl committed rC346413: Fix bitcast to address space cast for coerced load/stores .
Fix bitcast to address space cast for coerced load/stores
Nov 8 2018, 8:58 AM
yaxunl closed D53780: Fix bitcast to address space cast for coerced load/stores.
Nov 8 2018, 8:58 AM

Nov 6 2018

yaxunl created D54183: [HIP] Change default optimization level to -O3.
Nov 6 2018, 1:59 PM
yaxunl committed rL346267: AMDGPU: Add an option -disable-promote-alloca-to-lds.
AMDGPU: Add an option -disable-promote-alloca-to-lds
Nov 6 2018, 1:30 PM
yaxunl closed D54158: AMDGPU: Add an option -disable-promote-alloca-to-lds.
Nov 6 2018, 1:30 PM
yaxunl updated the diff for D54158: AMDGPU: Add an option -disable-promote-alloca-to-lds.

add a test

Nov 6 2018, 12:57 PM
yaxunl updated the diff for D54158: AMDGPU: Add an option -disable-promote-alloca-to-lds.

Revised by Matt's comments.

Nov 6 2018, 11:26 AM
yaxunl created D54158: AMDGPU: Add an option -disable-promote-alloca-to-lds.
Nov 6 2018, 9:35 AM

Oct 27 2018

yaxunl added a reviewer for D53780: Fix bitcast to address space cast for coerced load/stores: rjmccall.
Oct 27 2018, 6:12 AM

Oct 23 2018

yaxunl accepted D53558: Add gfx909 to GPU Arch.

LGTM

Oct 23 2018, 4:30 AM

Oct 22 2018

yaxunl committed rC344996: Add gfx904 and gfx906 to GPU Arch.
Add gfx904 and gfx906 to GPU Arch
Oct 22 2018, 7:07 PM
yaxunl committed rL344996: Add gfx904 and gfx906 to GPU Arch.
Add gfx904 and gfx906 to GPU Arch
Oct 22 2018, 7:07 PM
yaxunl closed D53472: Add gfx904 and gfx906 to GPU Arch.
Oct 22 2018, 7:07 PM

Oct 20 2018

yaxunl created D53472: Add gfx904 and gfx906 to GPU Arch.
Oct 20 2018, 5:04 AM

Oct 16 2018

yaxunl committed rL344665: AMDGPU: add __builtin_amdgcn_update_dpp.
AMDGPU: add __builtin_amdgcn_update_dpp
Oct 16 2018, 7:34 PM
yaxunl committed rC344665: AMDGPU: add __builtin_amdgcn_update_dpp.
AMDGPU: add __builtin_amdgcn_update_dpp
Oct 16 2018, 7:34 PM
yaxunl closed D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Oct 16 2018, 7:34 PM
yaxunl added a comment to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.

Brian checked the extra argument for dpp mov should be the first one. so mov_dpp(x,...) --> update_dpp(undef, x, ...). I will fix that when committing.

Oct 16 2018, 2:32 PM
yaxunl updated the summary of D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Oct 16 2018, 1:24 PM
yaxunl updated the diff for D52320: AMDGPU: add __builtin_amdgcn_update_dpp.

emit llvm.amdgcn.update.dpp for __builtin_amdgcn_mov_dpp.

Oct 16 2018, 1:23 PM
yaxunl updated the diff for D53295: Mark store and load of block invoke function as invariant.group.

mark store and load of block invoke function as invariant.group.

Oct 16 2018, 11:15 AM
yaxunl committed rC344630: Disable code object version 3 for HIP toolchain.
Disable code object version 3 for HIP toolchain
Oct 16 2018, 10:38 AM
yaxunl committed rL344630: Disable code object version 3 for HIP toolchain.
Disable code object version 3 for HIP toolchain
Oct 16 2018, 10:38 AM
yaxunl closed D53325: Disable code object version 3 for HIP toolchain.
Oct 16 2018, 10:38 AM
yaxunl created D53325: Disable code object version 3 for HIP toolchain.
Oct 16 2018, 8:54 AM
yaxunl added inline comments to D53295: Mark store and load of block invoke function as invariant.group.
Oct 16 2018, 4:22 AM

Oct 15 2018

yaxunl created D53295: Mark store and load of block invoke function as invariant.group.
Oct 15 2018, 11:12 AM

Oct 12 2018

yaxunl added a comment to D53153: [OpenCL] Mark kernel functions with default visibility.

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default.

Visibility is meaningless for an internal-linkage declaration.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

-fvisibility is not governed by the OpenCL specification; it's a Clang / GCC extension, and we get to determine its semantics, which generally override the rules from the language model we're implementing.

I'm a little surprised that symbol visibility is even meaningful when compiling GPU code. Can you give some more background on that?

To the extent that visibility is meaningful, it does seem to make sense for kernel to imply default visibility in the same way an attribute would, since it's an unambiguous marker that the function is intended to be called from outside the DSO. It's less obvious to me that the same is true for global variables; is there no distinction in OpenCL between global variables that can be accessed by the host and global variables that are internal to the device-side computation?

This makes more sense, thank you for the explanation. Ignore my last post, it makes sense that visibility is defaulted and ignored when a symbol is assigned internal linkage.

I am still not confident about globals; maybe @b-sumner has more insight? We have APIs for accessing global variable symbols form the host, but this may be AMD-specific, not general to OpenCL.

Oct 12 2018, 12:45 PM
yaxunl added a comment to D53153: [OpenCL] Mark kernel functions with default visibility.

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default. In cases where individual symbols need unique visibility an __attribute__ can be used.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

Oct 12 2018, 12:02 PM
yaxunl added a reviewer for D53153: [OpenCL] Mark kernel functions with default visibility: rjmccall.
Oct 12 2018, 11:57 AM
yaxunl added a comment to D53153: [OpenCL] Mark kernel functions with default visibility.

This approach is trying to make OpenCL kernel and variable exceptions to -fvisibility option. However it does not provide users with choices. What if a user really wants to change the visibility of kernels and variables by -fvisibility? I think is more like a hack compared to https://reviews.llvm.org/D52891

Oct 12 2018, 11:03 AM

Oct 10 2018

yaxunl accepted D52673: [HIP] Remove disabled irif library.

LGTM. Thanks!

Oct 10 2018, 7:28 PM
yaxunl added a comment to D52891: [AMDGPU] Add -fvisibility-amdgpu-non-kernel-functions.

I will update the patch to modify the HIP toolchain and to add tests for global variables.

As far as the semantics are concerned, are we OK with this being AMDGPU only? I do not see a means of determining what is a "kernel" in a language-agnostic way other than checking for our AMDGPU-specific calling convention. If there is a more general mechanism, this could be implemented in LinkageComputer::getLVForNamespaceScopeDecl instead. As it stands, it sounds like being AMDGPU specific, but omitting amdgpu from the option name is preferred?

Oct 10 2018, 9:17 AM

Oct 9 2018

yaxunl added inline comments to D52673: [HIP] Remove disabled irif library.
Oct 9 2018, 11:19 AM
yaxunl committed rC344057: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
[CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors
Oct 9 2018, 8:55 AM
yaxunl committed rL344057: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
[CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors
Oct 9 2018, 8:55 AM
yaxunl closed D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
Oct 9 2018, 8:55 AM
yaxunl added inline comments to D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
Oct 9 2018, 8:25 AM

Oct 5 2018

yaxunl updated the diff for D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.

fix a typo.

Oct 5 2018, 12:01 PM
yaxunl added a comment to D52891: [AMDGPU] Add -fvisibility-amdgpu-non-kernel-functions.

Can you also fix HIP toolchain? It is in HIPToolChain::addClangTargetOptions. Thanks.

Oct 5 2018, 10:51 AM
yaxunl added a comment to D52891: [AMDGPU] Add -fvisibility-amdgpu-non-kernel-functions.

I think the name needs work, but I'm not sure what it should be. I think it should avoid using "non" and "amdgpu"

Oct 5 2018, 10:45 AM
yaxunl updated the summary of D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
Oct 5 2018, 9:37 AM
yaxunl updated the diff for D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.

Revised by Justin's comments.

Oct 5 2018, 9:35 AM

Oct 3 2018

yaxunl added a comment to D52658: [OpenCL] Remove PIPE_RESERVE_ID_VALID_BIT from opencl-c.h.

I am OK with the change.

Oct 3 2018, 10:44 AM

Oct 2 2018

yaxunl committed rL343611: [HIP] Support early finalization of device code for -fno-gpu-rdc.
[HIP] Support early finalization of device code for -fno-gpu-rdc
Oct 2 2018, 10:53 AM
yaxunl committed rC343611: [HIP] Support early finalization of device code for -fno-gpu-rdc.
[HIP] Support early finalization of device code for -fno-gpu-rdc
Oct 2 2018, 10:53 AM
yaxunl closed D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.
Oct 2 2018, 10:53 AM
yaxunl updated the diff for D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.

Added -f{no}-cuda-rdc as alias to -f{no}-gpu-rdc.

Oct 2 2018, 7:59 AM

Oct 1 2018

yaxunl updated the diff for D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.

Uses -fno-gpu-rdc for early finalization.

Oct 1 2018, 10:52 AM
yaxunl added a comment to D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.
In D52377#1242547, @tra wrote:

Overall the patch look OK. I'll take a closer look on Monday.

Which mode do you expect will be most commonly used for HIP by default? With this patch we'll have two different ways to do similar things in HIP vs. CUDA.
E.g. by default CUDA compiles GPU code in each TU in a complete executable and requires -fcuda-rdc to compile to GPU object file.
HIP defaults to object-file compilation and requires --hip-early-finalize to match CUDA's default behavior.

I wonder if it would make sense to provide a single way to control this behavior. E.g. --fgpu-rdc (an alias for -cuda-rdc, perhaps?) would default to true in HIP, but disabled in CUDA. -fno-gpu-rdc would force 'whole GPU executable per TU' mode.

Oct 1 2018, 8:26 AM
yaxunl added a comment to D52673: [HIP] Remove disabled irif library.

this seems to be duplicate of https://reviews.llvm.org/D51857

Oct 1 2018, 7:33 AM

Sep 21 2018

yaxunl updated the diff for D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.

Fix comments.

Sep 21 2018, 1:50 PM
yaxunl created D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.
Sep 21 2018, 1:27 PM

Sep 20 2018

yaxunl created D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Sep 20 2018, 12:50 PM
yaxunl added a comment to D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.

ping

Sep 20 2018, 9:05 AM

Sep 14 2018

yaxunl added a comment to D43783: [OpenCL] Remove block invoke function from emitted block literal struct.

Ping! Do you still plan to do this? :)

Sep 14 2018, 10:31 AM