yaxunl (Yaxun Liu)
User

Projects

User does not belong to any projects.

User Details

User Since
May 13 2015, 10:16 AM (183 w, 4 d)

Recent Activity

Fri, Nov 16

yaxunl added a comment to D51341: [HEADER] Overloadable function candidates for half/double types.

@Anastasia @yaxunl
Hi, I am working on generalizing this patch and several questions have raised during this, so I want to discuss them with you:

  1. Should #pragma OPENCL EXTENSION ext_name : begin enables the extension as well? For now I see it's not, as an example:

    ` #pragma OPENCL EXTENSION cl_khr_fp16 : enable half attribute((overloadable)) goo(half in1, half in2); // all ok #pragma OPENCL EXTENSION cl_khr_fp16 : disable

    #pragma OPENCL EXTENSION cl_khr_fp16 : begin half attribute((overloadable)) goo(half in1, half in2); // declaring function parameter of type 'half' is not allowed; did you forget * ? #pragma OPENCL EXTENSION cl_khr_fp16 : end `
Fri, Nov 16, 6:11 AM

Tue, Nov 13

yaxunl committed rL346828: [HIP] Fix device only compilation.
[HIP] Fix device only compilation
Tue, Nov 13, 8:50 PM
yaxunl committed rC346828: [HIP] Fix device only compilation.
[HIP] Fix device only compilation
Tue, Nov 13, 8:50 PM
yaxunl closed D54496: [HIP] Fix device only compilation.
Tue, Nov 13, 8:50 PM
yaxunl added a comment to D54496: [HIP] Fix device only compilation.
In D54496#1297710, @tra wrote:

Do I understand it correctly that the bug appears to affect HIP compilation only?

Tue, Nov 13, 4:04 PM
yaxunl created D54496: [HIP] Fix device only compilation.
Tue, Nov 13, 2:03 PM

Fri, Nov 9

yaxunl committed rL346536: [HIP] Remove useless sections in linked files.
[HIP] Remove useless sections in linked files
Fri, Nov 9, 10:54 AM
yaxunl committed rC346536: [HIP] Remove useless sections in linked files.
[HIP] Remove useless sections in linked files
Fri, Nov 9, 10:54 AM
yaxunl closed D54275: [HIP] Remove useless sections in linked files.
Fri, Nov 9, 10:54 AM

Thu, Nov 8

yaxunl created D54275: [HIP] Remove useless sections in linked files.
Thu, Nov 8, 2:04 PM
yaxunl committed rL346413: Fix bitcast to address space cast for coerced load/stores .
Fix bitcast to address space cast for coerced load/stores
Thu, Nov 8, 8:58 AM
yaxunl committed rC346413: Fix bitcast to address space cast for coerced load/stores .
Fix bitcast to address space cast for coerced load/stores
Thu, Nov 8, 8:58 AM
yaxunl closed D53780: Fix bitcast to address space cast for coerced load/stores.
Thu, Nov 8, 8:58 AM

Tue, Nov 6

yaxunl created D54183: [HIP] Change default optimization level to -O3.
Tue, Nov 6, 1:59 PM
yaxunl committed rL346267: AMDGPU: Add an option -disable-promote-alloca-to-lds.
AMDGPU: Add an option -disable-promote-alloca-to-lds
Tue, Nov 6, 1:30 PM
yaxunl closed D54158: AMDGPU: Add an option -disable-promote-alloca-to-lds.
Tue, Nov 6, 1:30 PM
yaxunl updated the diff for D54158: AMDGPU: Add an option -disable-promote-alloca-to-lds.

add a test

Tue, Nov 6, 12:57 PM
yaxunl updated the diff for D54158: AMDGPU: Add an option -disable-promote-alloca-to-lds.

Revised by Matt's comments.

Tue, Nov 6, 11:26 AM
yaxunl created D54158: AMDGPU: Add an option -disable-promote-alloca-to-lds.
Tue, Nov 6, 9:35 AM

Sat, Oct 27

yaxunl added a reviewer for D53780: Fix bitcast to address space cast for coerced load/stores: rjmccall.
Sat, Oct 27, 6:12 AM

Tue, Oct 23

yaxunl accepted D53558: Add gfx909 to GPU Arch.

LGTM

Tue, Oct 23, 4:30 AM

Mon, Oct 22

yaxunl committed rC344996: Add gfx904 and gfx906 to GPU Arch.
Add gfx904 and gfx906 to GPU Arch
Mon, Oct 22, 7:07 PM
yaxunl committed rL344996: Add gfx904 and gfx906 to GPU Arch.
Add gfx904 and gfx906 to GPU Arch
Mon, Oct 22, 7:07 PM
yaxunl closed D53472: Add gfx904 and gfx906 to GPU Arch.
Mon, Oct 22, 7:07 PM

Oct 20 2018

yaxunl created D53472: Add gfx904 and gfx906 to GPU Arch.
Oct 20 2018, 5:04 AM

Oct 16 2018

yaxunl committed rL344665: AMDGPU: add __builtin_amdgcn_update_dpp.
AMDGPU: add __builtin_amdgcn_update_dpp
Oct 16 2018, 7:34 PM
yaxunl committed rC344665: AMDGPU: add __builtin_amdgcn_update_dpp.
AMDGPU: add __builtin_amdgcn_update_dpp
Oct 16 2018, 7:34 PM
yaxunl closed D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Oct 16 2018, 7:34 PM
yaxunl added a comment to D52320: AMDGPU: add __builtin_amdgcn_update_dpp.

Brian checked the extra argument for dpp mov should be the first one. so mov_dpp(x,...) --> update_dpp(undef, x, ...). I will fix that when committing.

Oct 16 2018, 2:32 PM
yaxunl updated the summary of D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Oct 16 2018, 1:24 PM
yaxunl updated the diff for D52320: AMDGPU: add __builtin_amdgcn_update_dpp.

emit llvm.amdgcn.update.dpp for __builtin_amdgcn_mov_dpp.

Oct 16 2018, 1:23 PM
yaxunl updated the diff for D53295: Mark store and load of block invoke function as invariant.group.

mark store and load of block invoke function as invariant.group.

Oct 16 2018, 11:15 AM
yaxunl committed rC344630: Disable code object version 3 for HIP toolchain.
Disable code object version 3 for HIP toolchain
Oct 16 2018, 10:38 AM
yaxunl committed rL344630: Disable code object version 3 for HIP toolchain.
Disable code object version 3 for HIP toolchain
Oct 16 2018, 10:38 AM
yaxunl closed D53325: Disable code object version 3 for HIP toolchain.
Oct 16 2018, 10:38 AM
yaxunl created D53325: Disable code object version 3 for HIP toolchain.
Oct 16 2018, 8:54 AM
yaxunl added inline comments to D53295: Mark store and load of block invoke function as invariant.group.
Oct 16 2018, 4:22 AM

Oct 15 2018

yaxunl created D53295: Mark store and load of block invoke function as invariant.group.
Oct 15 2018, 11:12 AM

Oct 12 2018

yaxunl added a comment to D53153: [OpenCL] Mark kernel functions with default visibility.

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default.

Visibility is meaningless for an internal-linkage declaration.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

-fvisibility is not governed by the OpenCL specification; it's a Clang / GCC extension, and we get to determine its semantics, which generally override the rules from the language model we're implementing.

I'm a little surprised that symbol visibility is even meaningful when compiling GPU code. Can you give some more background on that?

To the extent that visibility is meaningful, it does seem to make sense for kernel to imply default visibility in the same way an attribute would, since it's an unambiguous marker that the function is intended to be called from outside the DSO. It's less obvious to me that the same is true for global variables; is there no distinction in OpenCL between global variables that can be accessed by the host and global variables that are internal to the device-side computation?

This makes more sense, thank you for the explanation. Ignore my last post, it makes sense that visibility is defaulted and ignored when a symbol is assigned internal linkage.

I am still not confident about globals; maybe @b-sumner has more insight? We have APIs for accessing global variable symbols form the host, but this may be AMD-specific, not general to OpenCL.

Oct 12 2018, 12:45 PM
yaxunl added a comment to D53153: [OpenCL] Mark kernel functions with default visibility.

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default. In cases where individual symbols need unique visibility an __attribute__ can be used.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

Oct 12 2018, 12:02 PM
yaxunl added a reviewer for D53153: [OpenCL] Mark kernel functions with default visibility: rjmccall.
Oct 12 2018, 11:57 AM
yaxunl added a comment to D53153: [OpenCL] Mark kernel functions with default visibility.

This approach is trying to make OpenCL kernel and variable exceptions to -fvisibility option. However it does not provide users with choices. What if a user really wants to change the visibility of kernels and variables by -fvisibility? I think is more like a hack compared to https://reviews.llvm.org/D52891

Oct 12 2018, 11:03 AM

Oct 10 2018

yaxunl accepted D52673: [HIP] Remove disabled irif library.

LGTM. Thanks!

Oct 10 2018, 7:28 PM
yaxunl added a comment to D52891: [AMDGPU] Add -fvisibility-amdgpu-non-kernel-functions.

I will update the patch to modify the HIP toolchain and to add tests for global variables.

As far as the semantics are concerned, are we OK with this being AMDGPU only? I do not see a means of determining what is a "kernel" in a language-agnostic way other than checking for our AMDGPU-specific calling convention. If there is a more general mechanism, this could be implemented in LinkageComputer::getLVForNamespaceScopeDecl instead. As it stands, it sounds like being AMDGPU specific, but omitting amdgpu from the option name is preferred?

Oct 10 2018, 9:17 AM

Oct 9 2018

yaxunl added inline comments to D52673: [HIP] Remove disabled irif library.
Oct 9 2018, 11:19 AM
yaxunl committed rC344057: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
[CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors
Oct 9 2018, 8:55 AM
yaxunl committed rL344057: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
[CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors
Oct 9 2018, 8:55 AM
yaxunl closed D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
Oct 9 2018, 8:55 AM
yaxunl added inline comments to D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
Oct 9 2018, 8:25 AM

Oct 5 2018

yaxunl updated the diff for D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.

fix a typo.

Oct 5 2018, 12:01 PM
yaxunl added a comment to D52891: [AMDGPU] Add -fvisibility-amdgpu-non-kernel-functions.

Can you also fix HIP toolchain? It is in HIPToolChain::addClangTargetOptions. Thanks.

Oct 5 2018, 10:51 AM
yaxunl added a comment to D52891: [AMDGPU] Add -fvisibility-amdgpu-non-kernel-functions.

I think the name needs work, but I'm not sure what it should be. I think it should avoid using "non" and "amdgpu"

Oct 5 2018, 10:45 AM
yaxunl updated the summary of D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
Oct 5 2018, 9:37 AM
yaxunl updated the diff for D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.

Revised by Justin's comments.

Oct 5 2018, 9:35 AM

Oct 3 2018

yaxunl added a comment to D52658: [OpenCL] Remove PIPE_RESERVE_ID_VALID_BIT from opencl-c.h.

I am OK with the change.

Oct 3 2018, 10:44 AM

Oct 2 2018

yaxunl committed rL343611: [HIP] Support early finalization of device code for -fno-gpu-rdc.
[HIP] Support early finalization of device code for -fno-gpu-rdc
Oct 2 2018, 10:53 AM
yaxunl committed rC343611: [HIP] Support early finalization of device code for -fno-gpu-rdc.
[HIP] Support early finalization of device code for -fno-gpu-rdc
Oct 2 2018, 10:53 AM
yaxunl closed D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.
Oct 2 2018, 10:53 AM
yaxunl updated the diff for D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.

Added -f{no}-cuda-rdc as alias to -f{no}-gpu-rdc.

Oct 2 2018, 7:59 AM

Oct 1 2018

yaxunl updated the diff for D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.

Uses -fno-gpu-rdc for early finalization.

Oct 1 2018, 10:52 AM
yaxunl added a comment to D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.
In D52377#1242547, @tra wrote:

Overall the patch look OK. I'll take a closer look on Monday.

Which mode do you expect will be most commonly used for HIP by default? With this patch we'll have two different ways to do similar things in HIP vs. CUDA.
E.g. by default CUDA compiles GPU code in each TU in a complete executable and requires -fcuda-rdc to compile to GPU object file.
HIP defaults to object-file compilation and requires --hip-early-finalize to match CUDA's default behavior.

I wonder if it would make sense to provide a single way to control this behavior. E.g. --fgpu-rdc (an alias for -cuda-rdc, perhaps?) would default to true in HIP, but disabled in CUDA. -fno-gpu-rdc would force 'whole GPU executable per TU' mode.

Oct 1 2018, 8:26 AM
yaxunl added a comment to D52673: [HIP] Remove disabled irif library.

this seems to be duplicate of https://reviews.llvm.org/D51857

Oct 1 2018, 7:33 AM

Sep 21 2018

yaxunl updated the diff for D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.

Fix comments.

Sep 21 2018, 1:50 PM
yaxunl created D52377: [HIP] Support early finalization of device code for -fno-gpu-rdc.
Sep 21 2018, 1:27 PM

Sep 20 2018

yaxunl created D52320: AMDGPU: add __builtin_amdgcn_update_dpp.
Sep 20 2018, 12:50 PM
yaxunl added a comment to D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.

ping

Sep 20 2018, 9:05 AM

Sep 14 2018

yaxunl added a comment to D43783: [OpenCL] Remove block invoke function from emitted block literal struct.

Ping! Do you still plan to do this? :)

Sep 14 2018, 10:31 AM

Sep 7 2018

yaxunl created D51809: [CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors.
Sep 7 2018, 12:36 PM

Sep 5 2018

yaxunl added a comment to D51411: [OpenCL] Improve diagnostic of argument in address space conversion builtins .

Is this a feature requested by users?

I can understand that this may be useful to pinpoint some conversions which are potentially harmful to performance. However such conversions can often be eliminated by optimization, which makes this warning less useful.

On the other hand, too many warnings is a nuance to users, therefore I am wondering whether this warning should be off by default.

Do you have any chance to have any feedback from users about how useful or intrusive this warning is.

Hi Sam, no feedback from the user. If you think this is not that useful we won't commit or make is off by default as you suggested. Let us know what you think makes more sense.

Sep 5 2018, 7:08 AM

Aug 30 2018

yaxunl committed rC341077: [HIP] Add -fvisibility hidden option to clang.
[HIP] Add -fvisibility hidden option to clang
Aug 30 2018, 8:11 AM
yaxunl committed rL341077: [HIP] Add -fvisibility hidden option to clang.
[HIP] Add -fvisibility hidden option to clang
Aug 30 2018, 8:11 AM
yaxunl closed D51434: [HIP] Add -fvisibility hidden option to clang.
Aug 30 2018, 8:11 AM
yaxunl accepted D51480: AMDGPU: Remove remnants of old address space mapping.

LGTM. except the typos due to script. Thanks a lot for this effort!

Aug 30 2018, 8:01 AM
yaxunl added inline comments to D51434: [HIP] Add -fvisibility hidden option to clang.
Aug 30 2018, 7:42 AM
yaxunl added inline comments to D51434: [HIP] Add -fvisibility hidden option to clang.
Aug 30 2018, 7:37 AM

Aug 29 2018

yaxunl updated the diff for D51434: [HIP] Add -fvisibility hidden option to clang.

Revised by Artem's comments.

Aug 29 2018, 2:42 PM
yaxunl updated the diff for D51434: [HIP] Add -fvisibility hidden option to clang.

Use -fvisibility hidden.

Aug 29 2018, 1:59 PM
yaxunl accepted D51209: AMDGPU: Default to hidden visibility.

LGTM. Thanks!

Aug 29 2018, 1:55 PM
yaxunl committed rL340967: Add predefined macro __gnu_linux__ for proper aux-triple.
Add predefined macro __gnu_linux__ for proper aux-triple
Aug 29 2018, 1:40 PM
yaxunl committed rC340967: Add predefined macro __gnu_linux__ for proper aux-triple.
Add predefined macro __gnu_linux__ for proper aux-triple
Aug 29 2018, 1:40 PM
yaxunl closed D51441: Add predefined macro __gnu_linux__ for proper aux-triple.
Aug 29 2018, 1:40 PM
yaxunl added a comment to D51441: Add predefined macro __gnu_linux__ for proper aux-triple.
In D51441#1218010, @tra wrote:

While we're here, perhaps Builder.defineMacro("__linux__") should be changed to DefineStd("linux") which defines linux/__linux/__linux__?

Will do when committing. Thanks.

Aug 29 2018, 1:17 PM
yaxunl added a comment to D51441: Add predefined macro __gnu_linux__ for proper aux-triple.
In D51441#1218010, @tra wrote:

While we're here, perhaps Builder.defineMacro("__linux__") should be changed to DefineStd("linux") which defines linux/__linux/__linux__?

Aug 29 2018, 11:59 AM
yaxunl added a comment to D51434: [HIP] Add -fvisibility hidden option to clang.

D51209 is the patch. I think HIP will need an additional patch, since I think it isn’t subclassing the amdgpu toolchain

Aug 29 2018, 11:42 AM
yaxunl updated the diff for D51441: Add predefined macro __gnu_linux__ for proper aux-triple.

Revised by Artem's comments.

Aug 29 2018, 11:11 AM
yaxunl added a reviewer for D51434: [HIP] Add -fvisibility hidden option to clang: scott.linder.
Aug 29 2018, 10:31 AM
yaxunl added a comment to D51434: [HIP] Add -fvisibility hidden option to clang.
In D51434#1217772, @tra wrote:

Could you elaborate on what exactly is the problem this patch fixes?
I don't see how internalizing the symbols connects to PLTs. My understanding is that PLTs are used to provide stubs for symbols to be resolved by dynamic linker at runtime. AFAICT AMD does not use shared libs on device side. What do I miss?

Aug 29 2018, 10:12 AM
yaxunl created D51441: Add predefined macro __gnu_linux__ for proper aux-triple.
Aug 29 2018, 10:06 AM
yaxunl created D51434: [HIP] Add -fvisibility hidden option to clang.
Aug 29 2018, 8:52 AM
yaxunl added a comment to D43783: [OpenCL] Remove block invoke function from emitted block literal struct.

Sorry for digging up an old commit...

Apparently this broke block arguments, e.g. the following test case:

int foo(int (^ bl)(void)) {
  return bl();
}

int get21() {
  return foo(^{return 21;});
}

int get42() {
  return foo(^{return 42;});
}

In particular, the VarDecl that getBlockExpr() sees doesn't have an initializer when the called block comes from an argument (causing clang to crash).

Sorry for the delay. I think block should not be allowed as function argument since this generally leads indirect function calls therefore requires support of function pointer. It will rely on optimizations to get rid of indirect function calls.

The idea was to allow blocks as function parameters because they are statically known at each function call. This can be resolved later at IR level instead of frontend. I am also not sure there can be other corner cases where it wouldn't work in Clang since we can't leverage analysis passes here. I feel this might be a risky thing to do in Clang. Currently it doesn't work for the examples provided and therefore breaking the compliance with the spec.

Aug 29 2018, 7:09 AM
yaxunl added inline comments to D51411: [OpenCL] Improve diagnostic of argument in address space conversion builtins .
Aug 29 2018, 6:58 AM
yaxunl added a comment to D51411: [OpenCL] Improve diagnostic of argument in address space conversion builtins .

Is this a feature requested by users?

Aug 29 2018, 6:52 AM

Aug 28 2018

yaxunl committed rL340873: [HIP] Fix output file extension.
[HIP] Fix output file extension
Aug 28 2018, 2:10 PM
yaxunl committed rC340873: [HIP] Fix output file extension.
[HIP] Fix output file extension
Aug 28 2018, 2:10 PM
yaxunl closed D51336: [HIP] Fix output file extension.
Aug 28 2018, 2:10 PM

Aug 27 2018

yaxunl created D51336: [HIP] Fix output file extension.
Aug 27 2018, 9:16 PM

Aug 24 2018

yaxunl added a comment to D43783: [OpenCL] Remove block invoke function from emitted block literal struct.

Sorry for digging up an old commit...

Apparently this broke block arguments, e.g. the following test case:

int foo(int (^ bl)(void)) {
  return bl();
}

int get21() {
  return foo(^{return 21;});
}

int get42() {
  return foo(^{return 42;});
}

In particular, the VarDecl that getBlockExpr() sees doesn't have an initializer when the called block comes from an argument (causing clang to crash).

Aug 24 2018, 9:17 AM
yaxunl accepted D51212: [OpenCL][Docs] Release notes for OpenCL in Clang .

LGTM. Thanks.

Aug 24 2018, 9:01 AM

Aug 17 2018

yaxunl accepted D50259: [OpenCL] Disallow negative attribute arguments.

LGTM. Thanks.

Aug 17 2018, 2:58 PM
yaxunl committed rL340056: [HIP] Make __hip_gpubin_handle hidden to avoid being merged across different….
[HIP] Make __hip_gpubin_handle hidden to avoid being merged across different…
Aug 17 2018, 10:48 AM