Page MenuHomePhabricator
Feed Advanced Search

Today

yaxunl added a comment to D101095: [clang][amdgpu] Use implicit code object version.

LGTM. However, I would like to hear Tony's opinion.

Fri, Apr 23, 9:10 AM · Restricted Project

Yesterday

yaxunl accepted D101077: [clang][nfc] Split getOrCheckAMDGPUCodeObjectVersion.

LGTM. Thanks.

Thu, Apr 22, 4:11 PM · Restricted Project
yaxunl committed rG8baba6890de7: [HIP] Support overloaded math functions for hipRTC (authored by yaxunl).
[HIP] Support overloaded math functions for hipRTC
Thu, Apr 22, 4:07 PM
yaxunl closed D100794: [HIP] Support overloaded math functions for hipRTC.
Thu, Apr 22, 4:07 PM · Restricted Project
yaxunl added a comment to D101117: [clang][nfc] Split getOrCheckAMDGPUCodeObjectVersion.

I'm not very pleased with this. There's too much state to move around when representing the failure and the control flow is probably harder to read than it was in the original. May take another pass at it.

Thu, Apr 22, 3:50 PM · Restricted Project
yaxunl accepted D101117: [clang][nfc] Split getOrCheckAMDGPUCodeObjectVersion.

nice. Thanks.

Thu, Apr 22, 3:49 PM · Restricted Project
yaxunl requested review of D101106: [HIP] Fix overloaded function for _Float16.
Thu, Apr 22, 1:50 PM
yaxunl updated the diff for D100794: [HIP] Support overloaded math functions for hipRTC.

make it NFC for non-hipRTC

Thu, Apr 22, 1:47 PM · Restricted Project
yaxunl added inline comments to D101077: [clang][nfc] Split getOrCheckAMDGPUCodeObjectVersion.
Thu, Apr 22, 1:23 PM · Restricted Project
yaxunl added inline comments to D100794: [HIP] Support overloaded math functions for hipRTC.
Thu, Apr 22, 12:33 PM · Restricted Project
yaxunl accepted D101043: [OpenCL] Drop extension pragma handling for extension types/declarations.

LGTM. Thanks.

Thu, Apr 22, 6:30 AM
yaxunl added a comment to D101043: [OpenCL] Drop extension pragma handling for extension types/declarations.

Did you check whether this patch will cause regression in OpenCL conformance tests? If not, I am OK with it.

Thu, Apr 22, 5:49 AM

Wed, Apr 21

yaxunl committed rG5a2d78b16397: [HIP] Add option -fgpu-inline-threshold (authored by yaxunl).
[HIP] Add option -fgpu-inline-threshold
Wed, Apr 21, 2:19 PM
yaxunl closed D99233: [HIP] Add option -fgpu-inline-threshold.
Wed, Apr 21, 2:19 PM · Restricted Project
yaxunl retitled D99233: [HIP] Add option -fgpu-inline-threshold from [HIP] Add option --gpu-inline-threshold to [HIP] Add option -fgpu-inline-threshold.
Wed, Apr 21, 12:45 PM · Restricted Project
yaxunl updated the diff for D99233: [HIP] Add option -fgpu-inline-threshold.

revised by Artem's comments.

Wed, Apr 21, 12:45 PM · Restricted Project
yaxunl added inline comments to D99233: [HIP] Add option -fgpu-inline-threshold.
Wed, Apr 21, 11:34 AM · Restricted Project
yaxunl added a comment to D99233: [HIP] Add option -fgpu-inline-threshold.
In D99233#2656446, @tra wrote:

I'm concerned that if we make it a top-level option, it's likely to be cargo-culted and (mis)used as a sledgehammer in cases where it's not needed. I think the option should remain hidden.

While thresholds do need to be tweaked, in my experience it happens very rarely.
When it does, most of the time it's sufficient to apply __attribute__((always_inline)) to a few functions where it matters.
If AMDGPU bumps into the limit too often, perhaps it's the default threshold value that needs to be changed.

Currently ROCm builds all math libs and frameworks with an LLVM option which inline all functions for AMDGPU target. We cannot simply remove that option and use the default inline threshold since it will cause performance degradations. We cannot use -mllvm -inline-threshold=x directly either since it will affect both host and device compilation. We need an option to set the inline threshold for GPU only so that we could fine-tuning the inline threshold. I agree that this option should be hidden since it is intended for compiler development.

Wed, Apr 21, 8:46 AM · Restricted Project

Tue, Apr 20

yaxunl added inline comments to D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.
Tue, Apr 20, 9:19 PM
yaxunl updated the diff for D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.

revised by Matt's comments

Tue, Apr 20, 9:19 PM
yaxunl added inline comments to D100794: [HIP] Support overloaded math functions for hipRTC.
Tue, Apr 20, 8:05 PM · Restricted Project
yaxunl added inline comments to D100794: [HIP] Support overloaded math functions for hipRTC.
Tue, Apr 20, 8:01 PM · Restricted Project
yaxunl updated the diff for D100794: [HIP] Support overloaded math functions for hipRTC.

fix hip::numeric_type

Tue, Apr 20, 3:18 PM · Restricted Project
yaxunl updated the diff for D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.

revised by Matt's comments

Tue, Apr 20, 1:21 PM
yaxunl added inline comments to D100794: [HIP] Support overloaded math functions for hipRTC.
Tue, Apr 20, 12:03 PM · Restricted Project
yaxunl added inline comments to D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.
Tue, Apr 20, 11:49 AM
yaxunl added inline comments to D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.
Tue, Apr 20, 10:16 AM
yaxunl added a comment to D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.

The recent change https://reviews.llvm.org/D96280 caused some difficulty for this patch. I would like to have some suggestions.

Tue, Apr 20, 10:06 AM
yaxunl added reviewers for D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee: jansvoboda11, tra.
Tue, Apr 20, 9:57 AM
yaxunl added a comment to D100404: Add Global support for #pragma clang attributes.

there are still pre-merge failures. you may need update your test

Tue, Apr 20, 8:36 AM
yaxunl updated the diff for D100794: [HIP] Support overloaded math functions for hipRTC.

revised by Artem's comments

Tue, Apr 20, 7:31 AM · Restricted Project
yaxunl added a comment to D100404: Add Global support for #pragma clang attributes.

You get pre-merge failures due to format issues. Better clean them up.

Tue, Apr 20, 7:21 AM
yaxunl added inline comments to D100794: [HIP] Support overloaded math functions for hipRTC.
Tue, Apr 20, 7:07 AM · Restricted Project

Mon, Apr 19

yaxunl requested review of D100794: [HIP] Support overloaded math functions for hipRTC.
Mon, Apr 19, 2:20 PM · Restricted Project
yaxunl committed rGd8805574c183: [CUDA][HIP] Allow non-ODR use of host var in device (authored by yaxunl).
[CUDA][HIP] Allow non-ODR use of host var in device
Mon, Apr 19, 11:46 AM
yaxunl closed D98193: [CUDA][HIP] Allow non-ODR use of host var in device.
Mon, Apr 19, 11:45 AM · Restricted Project
yaxunl updated the diff for D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.

rebase

Mon, Apr 19, 11:24 AM
yaxunl updated the diff for D98193: [CUDA][HIP] Allow non-ODR use of host var in device.

Rebase and fix HIP header bug exposed by this patch.

Mon, Apr 19, 10:40 AM · Restricted Project
yaxunl added inline comments to D98193: [CUDA][HIP] Allow non-ODR use of host var in device.
Mon, Apr 19, 9:29 AM · Restricted Project
yaxunl added a comment to D98193: [CUDA][HIP] Allow non-ODR use of host var in device.

ping

Mon, Apr 19, 7:05 AM · Restricted Project

Sun, Apr 18

yaxunl updated the diff for D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.

Revised by Matt's comments.

Sun, Apr 18, 7:22 AM
yaxunl added inline comments to D77013: [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee.
Sun, Apr 18, 7:15 AM

Sat, Apr 17

yaxunl committed rG6823af0ca858: [HIP] Support hipRTC in header (authored by yaxunl).
[HIP] Support hipRTC in header
Sat, Apr 17, 8:35 AM
yaxunl closed D100652: [HIP] Support hipRTC in header.
Sat, Apr 17, 8:35 AM · Restricted Project
yaxunl committed rGd5c0f00e216a: [CUDA][HIP] Mark device var used by host only (authored by yaxunl).
[CUDA][HIP] Mark device var used by host only
Sat, Apr 17, 8:26 AM
yaxunl committed rG3597f02fd5c6: [AMDGPU] Add GlobalDCE before internalization pass (authored by yaxunl).
[AMDGPU] Add GlobalDCE before internalization pass
Sat, Apr 17, 8:26 AM
yaxunl closed D98814: [CUDA][HIP] Mark device var used by host only.
Sat, Apr 17, 8:26 AM · Restricted Project
yaxunl closed D98783: [AMDGPU] Add GlobalDCE before internalization pass.
Sat, Apr 17, 8:25 AM · Restricted Project, Restricted Project
yaxunl added inline comments to D98814: [CUDA][HIP] Mark device var used by host only.
Sat, Apr 17, 7:47 AM · Restricted Project

Fri, Apr 16

yaxunl updated the summary of D100652: [HIP] Support hipRTC in header.
Fri, Apr 16, 7:59 AM · Restricted Project
yaxunl requested review of D100652: [HIP] Support hipRTC in header.
Fri, Apr 16, 7:59 AM · Restricted Project

Thu, Apr 15

yaxunl accepted D100598: [CUDA, FDO] Filter out profiling options from GPU-side compilations..

LGTM. Thanks!

Thu, Apr 15, 2:32 PM · Restricted Project
yaxunl updated the diff for D98783: [AMDGPU] Add GlobalDCE before internalization pass.

revised tests by Artem's comments.

Thu, Apr 15, 2:25 PM · Restricted Project, Restricted Project
yaxunl added inline comments to D100598: [CUDA, FDO] Filter out profiling options from GPU-side compilations..
Thu, Apr 15, 2:07 PM · Restricted Project
yaxunl added inline comments to D98783: [AMDGPU] Add GlobalDCE before internalization pass.
Thu, Apr 15, 2:01 PM · Restricted Project, Restricted Project
yaxunl updated the diff for D100552: [HIP] Diagnose compiling kernel without offload arch.

revised error msg by Aaron's comments

Thu, Apr 15, 1:22 PM
yaxunl updated the diff for D98783: [AMDGPU] Add GlobalDCE before internalization pass.

Add GlobalDCE before internalization pass.

Thu, Apr 15, 10:54 AM · Restricted Project, Restricted Project
yaxunl added a comment to D98783: [AMDGPU] Add GlobalDCE before internalization pass.
In D98783#2642269, @tra wrote:

We may want to add someone with more expertise with the IR as a reviewer. I'd like an educated opinion on whether the invisible dangling IR is something that needs fixing in general or if it's OK to just clean it up in this particular case. Or both.

@rjmccall, @rsmith -- do you have any suggestions -- either on the subject of the invisible dangling IR or on who may be the right person to talk to?

If Clang is creating constants unnecessarily, we should try to avoid that on general compile time / memory usage grounds unless doing so is a serious problem. But I don't think it should be our responsibility to GC unused constant expressions in order to make DCE work.

It's extremely common to create bitcast global constants. So the fact that this happens with addrspacecasts but hasn't been a persistent problem with bitcasts makes more suspect that DCE actually tries to handle this, but something about what it's doing isn't aware of addrspacecast. And indeed, LLVM's GlobalDCE seems to call a method called removeDeadConstantUsers() before concluding that a constant can't be thrown away. So either you're using a different transform that needs to do the same thing, or something is stopping removeDeadConstantUsers() from eliminating this addrspacecast.

Thu, Apr 15, 10:48 AM · Restricted Project, Restricted Project
yaxunl requested review of D100552: [HIP] Diagnose compiling kernel without offload arch.
Thu, Apr 15, 5:11 AM

Fri, Apr 9

yaxunl committed rGf9264ac0fdb7: [HIP] Workaround ICE compiling SemaChecking.cpp with gcc 5 (authored by yaxunl).
[HIP] Workaround ICE compiling SemaChecking.cpp with gcc 5
Fri, Apr 9, 7:40 AM
yaxunl committed rG25942d7c49ed: [AMDGPU] Allow relaxed/consume memory order for atomic inc/dec (authored by yaxunl).
[AMDGPU] Allow relaxed/consume memory order for atomic inc/dec
Fri, Apr 9, 6:24 AM
yaxunl closed D100144: [AMDGPU] Allow relaxed/consume memory order for atomic inc/dec.
Fri, Apr 9, 6:24 AM · Restricted Project

Thu, Apr 8

yaxunl requested review of D100144: [AMDGPU] Allow relaxed/consume memory order for atomic inc/dec.
Thu, Apr 8, 2:48 PM · Restricted Project
yaxunl added a comment to D99683: [HIP] Support ThinLTO.

Any other concerns? Thanks.

Thu, Apr 8, 10:52 AM · Restricted Project

Wed, Apr 7

yaxunl updated the diff for D98193: [CUDA][HIP] Allow non-ODR use of host var in device.

fix test failure on windows. need to specify triple since it affects name mangling.

Wed, Apr 7, 12:31 PM · Restricted Project
yaxunl accepted D100045: [HIP] Fix rocm-detect.hip test path.

LGTM. Thanks.

Wed, Apr 7, 9:58 AM · Restricted Project
yaxunl added a comment to D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.

LGTM about AMDGPU toolchain change. Thanks.

Wed, Apr 7, 6:39 AM · Restricted Project

Tue, Apr 6

yaxunl added a comment to D99683: [HIP] Support ThinLTO.

This raises some higher level questions for me:

First, how will you deal with other corner cases that won't or cannot be imported right now? While enabling importing of noinline functions and cranking up the threshold will get the majority of functions imported, there are cases that we still won't import (functions/vars that are interposable, certain funcs/vars that cannot be renamed, most non-const variables with non-trivial initializers).

We will document the limitation of thinLTO support of HIP toolchain and recommend users not to use thinLTO in those corner cases.

Second, force importing of everything transitively referenced defeats the purpose of ThinLTO and would probably make it worse than regular LTO. The main entry module will need to import everything transitively referenced from there, so everything not dead in the binary, which should make that module post importing equivalent to a regular LTO module. In addition, every other module needs to transitively import everything referenced from those modules, making them very large depending on how many leaf vs non-leaf functions and variables they contain. What is the goal of doing ThinLTO in this case?

The objective is to improve optimization/codegen time by using multi-threads of thinLTO. For example, I have 10 modules each containing a kernel. In full LTO linking, I get one big module containing 10 kernels with all functions inlined, and I have one thread for optimization/codegen. With thinLTO, I get one kernel in each module, with all functions inlined. AMDGPU internalization and global DCE will remove functions not used by that kernel in each module. I will get 10 threads, each doing optimization/codegen for one kernel. Theoretically, there could be 10 times speed up.

That will work as long as there are no dependence edges anywhere between the kernels. Is this a library that has a bunch of totally independent kernels only called externally?

Tue, Apr 6, 8:01 PM · Restricted Project
yaxunl updated the diff for D98193: [CUDA][HIP] Allow non-ODR use of host var in device.

revised by Richard's comments. Check function-scope static var.

Tue, Apr 6, 7:51 PM · Restricted Project
yaxunl committed rG86175d5fedba: Minor fix for test hip-code-object-version.hip (authored by yaxunl).
Minor fix for test hip-code-object-version.hip
Tue, Apr 6, 5:33 PM
yaxunl added inline comments to D99235: [HIP] Change to code object v4.
Tue, Apr 6, 5:28 PM · Restricted Project
yaxunl committed rG4fd05e0ad7fb: [HIP] Change to code object v4 (authored by yaxunl).
[HIP] Change to code object v4
Tue, Apr 6, 5:23 PM
yaxunl closed D99235: [HIP] Change to code object v4.
Tue, Apr 6, 5:23 PM · Restricted Project
yaxunl added inline comments to D99235: [HIP] Change to code object v4.
Tue, Apr 6, 4:22 PM · Restricted Project
yaxunl added a comment to D99683: [HIP] Support ThinLTO.

This raises some higher level questions for me:

First, how will you deal with other corner cases that won't or cannot be imported right now? While enabling importing of noinline functions and cranking up the threshold will get the majority of functions imported, there are cases that we still won't import (functions/vars that are interposable, certain funcs/vars that cannot be renamed, most non-const variables with non-trivial initializers).

We will document the limitation of thinLTO support of HIP toolchain and recommend users not to use thinLTO in those corner cases.

Tue, Apr 6, 2:48 PM · Restricted Project
yaxunl added a comment to D99235: [HIP] Change to code object v4.

ping.

Tue, Apr 6, 2:24 PM · Restricted Project
yaxunl committed rG61d065e21ff3: Let clang atomic builtins fetch add/sub support floating point types (authored by yaxunl).
Let clang atomic builtins fetch add/sub support floating point types
Tue, Apr 6, 12:45 PM
yaxunl closed D71726: Let clang atomic builtins fetch add/sub support floating point types.
Tue, Apr 6, 12:45 PM · Restricted Project
yaxunl added inline comments to D98193: [CUDA][HIP] Allow non-ODR use of host var in device.
Tue, Apr 6, 12:04 PM · Restricted Project
yaxunl updated the diff for D99683: [HIP] Support ThinLTO.

rebase

Tue, Apr 6, 10:09 AM · Restricted Project
yaxunl updated the diff for D99683: [HIP] Support ThinLTO.

revert the change about option

Tue, Apr 6, 10:07 AM · Restricted Project
yaxunl added a comment to D99683: [HIP] Support ThinLTO.

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.

How does a non-LTO build work, or is (full) LTO currently required? Because with ThinLTO we only import functions that are externally defined but referenced in the current module. Also, when ThinLTO imports functions it makes them available_externally, which means they are dropped and made external symbols again after inlining. So anything imported but not inlined will go back to being an external symbol.

AMDGPU backend by default uses full LTO for linking. It does not support non-LTO linking. Currently, it inlines all functions except kernels. However we want to be able to be able not to inline all functions. Is it OK to add an LLVM option to mark imported functions as linkonce_odr so that AMDGPU backend can keep the definitions of the imported functions?

Tue, Apr 6, 9:58 AM · Restricted Project
yaxunl added inline comments to D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.
Tue, Apr 6, 6:52 AM · Restricted Project
yaxunl updated the diff for D71726: Let clang atomic builtins fetch add/sub support floating point types.

revised by John's comments

Tue, Apr 6, 5:19 AM · Restricted Project

Mon, Apr 5

yaxunl updated the diff for D71726: Let clang atomic builtins fetch add/sub support floating point types.

Revised by John's comments. Do not allow atomic fetch add with x86_fp80.

Mon, Apr 5, 6:31 PM · Restricted Project
yaxunl added inline comments to D71726: Let clang atomic builtins fetch add/sub support floating point types.
Mon, Apr 5, 6:28 PM · Restricted Project
yaxunl accepted D99857: [OpenCL, test] Fix use of undef FileCheck var.

LGTM. Thanks.

Mon, Apr 5, 10:46 AM · Restricted Project
yaxunl added a comment to D99683: [HIP] Support ThinLTO.

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.

How does a non-LTO build work, or is (full) LTO currently required? Because with ThinLTO we only import functions that are externally defined but referenced in the current module. Also, when ThinLTO imports functions it makes them available_externally, which means they are dropped and made external symbols again after inlining. So anything imported but not inlined will go back to being an external symbol.

Mon, Apr 5, 9:20 AM · Restricted Project
yaxunl updated the diff for D99683: [HIP] Support ThinLTO.

Revise by Artem's comments. Add patch description.

Mon, Apr 5, 9:01 AM · Restricted Project
yaxunl added a comment to D99683: [HIP] Support ThinLTO.

I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.

Mon, Apr 5, 8:52 AM · Restricted Project

Sun, Apr 4

yaxunl committed rG907af8439672: [CUDA][HIP] rename -fcuda-flush-denormals-to-zero (authored by yaxunl).
[CUDA][HIP] rename -fcuda-flush-denormals-to-zero
Sun, Apr 4, 9:14 PM
yaxunl closed D99688: [CUDA][HIP] rename -fcuda-flush-denormals-to-zero.
Sun, Apr 4, 9:14 PM · Restricted Project
yaxunl updated the diff for D99688: [CUDA][HIP] rename -fcuda-flush-denormals-to-zero.

rebase and add tests for old options

Sun, Apr 4, 8:08 PM · Restricted Project
yaxunl added inline comments to D99688: [CUDA][HIP] rename -fcuda-flush-denormals-to-zero.
Sun, Apr 4, 5:57 PM · Restricted Project
yaxunl added inline comments to D71726: Let clang atomic builtins fetch add/sub support floating point types.
Sun, Apr 4, 8:25 AM · Restricted Project
yaxunl accepted D99831: [HIP-Clang, test] Fix use of undef FileCheck var.

LGTM. Thanks.

Sun, Apr 4, 6:14 AM · Restricted Project
yaxunl accepted D99832: [HIP, test] Fix use of undef FileCheck var.

LGTM. Thanks.

Sun, Apr 4, 6:11 AM · Restricted Project

Thu, Apr 1

yaxunl committed rG85ff35a9529a: [HIP] remove overloaded abs in header (authored by yaxunl).
[HIP] remove overloaded abs in header
Thu, Apr 1, 9:27 AM
yaxunl closed D99738: [HIP] remove overloaded abs in header.
Thu, Apr 1, 9:26 AM · Restricted Project
yaxunl added a comment to D99738: [HIP] remove overloaded abs in header.

I have tested removing this did not cause regressions in our CI.

Thu, Apr 1, 8:58 AM · Restricted Project