Page MenuHomePhabricator

yaxunl (Yaxun Liu)
User

Projects

User does not belong to any projects.

User Details

User Since
May 13 2015, 10:16 AM (349 w, 1 d)

Recent Activity

Today

yaxunl added a comment to D117137: [Driver] Add CUDA support for --offload param.

The title says --offline option, which should be --offload.

Thu, Jan 20, 11:48 AM · Restricted Project

Yesterday

yaxunl committed rG15f54dd5e496: AMDGPU: Account for usage HIP-style dynamic LDS (authored by yaxunl).
AMDGPU: Account for usage HIP-style dynamic LDS
Wed, Jan 19, 10:06 AM
yaxunl closed D117494: AMDGPU: Account for usage HIP-style dynamic LDS.
Wed, Jan 19, 10:06 AM · Restricted Project
yaxunl committed rG85c2bd2a0e0e: Prevent adding module flag amdgpu_hostcall multiple times (authored by yaxunl).
Prevent adding module flag amdgpu_hostcall multiple times
Wed, Jan 19, 9:53 AM
yaxunl closed D116216: Prevent adding module flag - amdgpu_hostcall multiple times..
Wed, Jan 19, 9:53 AM · Restricted Project
yaxunl accepted D117494: AMDGPU: Account for usage HIP-style dynamic LDS.

I am OK for now. We may need keep an eye for the perf regression and be prepared to figure a way to alleviate that.

Wed, Jan 19, 7:31 AM · Restricted Project

Tue, Jan 18

yaxunl accepted D116216: Prevent adding module flag - amdgpu_hostcall multiple times..

LGTM. Thanks.

Tue, Jan 18, 6:21 AM · Restricted Project

Mon, Jan 17

yaxunl added a comment to D117494: AMDGPU: Account for usage HIP-style dynamic LDS.

This may cause some perf degradation.

Mon, Jan 17, 9:38 AM · Restricted Project

Thu, Jan 13

yaxunl added a comment to D115523: [OpenCL] Set external linkage for block enqueue kernels.

It is possible that block kernels are defined and invoked in static functions, therefore two block kernels in different TU's may have the same name. Making such kernels external may cause duplicate symbols.

Potentially we should append the name of the translation unit to all kernel wrapper names for the enqueued blocks to resolve this? For example, global constructors stubs are using such a similar naming scheme taken from the translation unit.

But the kernel function in OpenCL has to be globally visible and many tools have been built with this assumption. Additionally, some toolchains might require the enqueued kernels to be globally visible as well in order to access them as an execution entry point.

Thu, Jan 13, 8:39 AM · Restricted Project

Wed, Jan 12

yaxunl added a comment to D115523: [OpenCL] Set external linkage for block enqueue kernels.

It is possible that block kernels are defined and invoked in static functions, therefore two block kernels in different TU's may have the same name. Making such kernels external may cause duplicate symbols.

Wed, Jan 12, 6:38 AM · Restricted Project

Tue, Jan 11

yaxunl committed rG694fd10659eb: [HIP] Fix device malloc/free (authored by yaxunl).
[HIP] Fix device malloc/free
Tue, Jan 11, 11:50 AM
yaxunl closed D116967: [HIP] Fix device malloc/free.
Tue, Jan 11, 11:49 AM · Restricted Project
yaxunl added inline comments to D116967: [HIP] Fix device malloc/free.
Tue, Jan 11, 6:41 AM · Restricted Project

Mon, Jan 10

yaxunl committed rG98ab43a1d209: [HIP] Fix device only linking for -fgpu-rdc (authored by yaxunl).
[HIP] Fix device only linking for -fgpu-rdc
Mon, Jan 10, 2:38 PM
yaxunl closed D116840: [HIP] Fix device only linking for -fgpu-rdc.
Mon, Jan 10, 2:38 PM · Restricted Project
yaxunl updated the summary of D116967: [HIP] Fix device malloc/free.
Mon, Jan 10, 12:35 PM · Restricted Project
yaxunl requested review of D116967: [HIP] Fix device malloc/free.
Mon, Jan 10, 12:34 PM · Restricted Project
yaxunl updated the diff for D116840: [HIP] Fix device only linking for -fgpu-rdc.

avoid clearing AL

Mon, Jan 10, 8:53 AM · Restricted Project
yaxunl added inline comments to D116840: [HIP] Fix device only linking for -fgpu-rdc.
Mon, Jan 10, 7:44 AM · Restricted Project
yaxunl added a comment to D116216: Prevent adding module flag - amdgpu_hostcall multiple times..

I think it will be cleaner to keep the original amdgpu-asan.cu unchanged whereas add amdgpu-asan-printf.cu which tests asan with printf.

Mon, Jan 10, 6:58 AM · Restricted Project

Fri, Jan 7

yaxunl requested review of D116840: [HIP] Fix device only linking for -fgpu-rdc.
Fri, Jan 7, 2:28 PM · Restricted Project

Tue, Jan 4

yaxunl added a comment to D116216: Prevent adding module flag - amdgpu_hostcall multiple times..

@yaxunl It would be very much helpful to know how to write test coverage for this particular patch? thanks

Tue, Jan 4, 7:35 AM · Restricted Project

Thu, Dec 23

yaxunl accepted D116216: Prevent adding module flag - amdgpu_hostcall multiple times..

LGTM. Thanks.

Thu, Dec 23, 6:40 AM · Restricted Project

Dec 20 2021

yaxunl committed rGa6786cdd5757: [HIPSPV][3/4] Enable SPIR-V emission for HIP (authored by yaxunl).
[HIPSPV][3/4] Enable SPIR-V emission for HIP
Dec 20 2021, 8:01 AM
yaxunl closed D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.
Dec 20 2021, 8:01 AM · Restricted Project

Dec 17 2021

yaxunl added a comment to D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.

Assuming that this patch is ready to land. @tra or @yaxunl, could you please commit this patch to the LLVM for us? Thanks.

Dec 17 2021, 7:57 AM · Restricted Project

Dec 14 2021

yaxunl added a comment to D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

The two lit tests which changed from addr space 4 to 1 demonstrated that. In alias analysis, if a variable is in addr space 4, the backend knows that it is constant and can do optimizations on it. After changing to addr space 1, those optimizations are gone.

The backend also knows because the constant flag is set on the global variable. Addrspace(4) is a kludge which is largely redundant with other mechanisms for indicating constants

If backend can only rely on constant flag then we do not need put global variables in constant addr space.

Let's leave this patch as it is now. And revisit it if there are any regressions found.

Dec 14 2021, 2:39 PM · Restricted Project
yaxunl added a comment to D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

The two lit tests which changed from addr space 4 to 1 demonstrated that. In alias analysis, if a variable is in addr space 4, the backend knows that it is constant and can do optimizations on it. After changing to addr space 1, those optimizations are gone.

The backend also knows because the constant flag is set on the global variable. Addrspace(4) is a kludge which is largely redundant with other mechanisms for indicating constants

Dec 14 2021, 2:26 PM · Restricted Project
yaxunl added a comment to D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..

This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to address space (4) in a constructor seems the wrong thing to do.

Dec 14 2021, 12:36 PM · Restricted Project

Dec 13 2021

yaxunl requested changes to D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4..

This may cause perf regressions for HIP.

Dec 13 2021, 2:15 PM · Restricted Project
yaxunl committed rG006fb62434f5: Fix build failure of HIPUtility.cpp on Windows (authored by yaxunl).
Fix build failure of HIPUtility.cpp on Windows
Dec 13 2021, 8:54 AM
yaxunl committed rG240be6541d49: Fix warning about unused variable in HIPAMD.cpp (authored by yaxunl).
Fix warning about unused variable in HIPAMD.cpp
Dec 13 2021, 8:26 AM
yaxunl committed rG78b0f3701d44: [HIPSPV][1/4] Refactor HIP tool chain (authored by yaxunl).
[HIPSPV][1/4] Refactor HIP tool chain
Dec 13 2021, 7:51 AM
yaxunl closed D110549: [HIPSPV][1/4] Refactor HIP tool chain.
Dec 13 2021, 7:50 AM · Restricted Project

Dec 10 2021

yaxunl added a comment to D110549: [HIPSPV][1/4] Refactor HIP tool chain.

Assuming this patch is ready to land. @yaxunl, Could you please commit this patch to the LLVM for us. Thanks.

Dec 10 2021, 11:35 AM · Restricted Project

Dec 9 2021

yaxunl added a comment to D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls..

Not exactly that. The weak symbol isn't the function name, as that gets renamed or inlined.

We discussed this before. As code object ABI use runtime metadata to represent hostcall_buffer, we need to check whether hostcall is needed by IR.

This approach will require checking asm instructions inside a function to determine whether this function requires hostcall. It is hacky for IR representation.

There are two approaches here:
1/ Tag the function using inline asm and totally ignore it in the compiler. HSA/etc tests per-code-object if the symbol is present
2/ Tag the function (in source or in compiler), propagate information to llc, embed it in msgpack data, HSA/etc tests per-function if the field is present

2/ is somewhat useful if we elide the 8 byte slot of kernarg memory for functions that don't use it, otherwise it just increases work done by the runtime. Instead of checking for presence of one symbol (a hashtable lookup), it's a linear scan through msgpack data. We don't currently elide those 8 bytes, so right now this is making the compiler more complicated in exchange for making the runtime slower.

1/ has the benefit of being dead simple and totally compiler agnostic, and the cost of passing the 8 byte hostcall thing to every function in a code object that asked for it.

Dec 9 2021, 12:58 PM · Restricted Project, Restricted Project
yaxunl added a comment to D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls..

If we only need to check whether __ockl_hostcall_internal exists in the final module in LLVM codegen to determine whether we need the hostcall metadata, probably we don't even need a function attribute or even module flag.

Right, we used to do exactly that (just check at the CodeGen phase if 'ockl_hostcall_internal()' is present in the module), but then it turned out that it does not work with -fgpu-rdc since IPO may rename the 'ockl_hostcall_internal()'.

Dec 9 2021, 7:34 AM · Restricted Project, Restricted Project
yaxunl added a comment to D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls..

Not exactly that. The weak symbol isn't the function name, as that gets renamed or inlined.

Dec 9 2021, 7:25 AM · Restricted Project, Restricted Project

Dec 8 2021

yaxunl added a comment to D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls..

If we only need to check whether __ockl_hostcall_internal exists in the final module in LLVM codegen to determine whether we need the hostcall metadata, probably we don't even need a function attribute or even module flag.

Dec 8 2021, 1:21 PM · Restricted Project, Restricted Project
yaxunl accepted D115154: AMDGPU: Remove fixed function ABI option.

LGTM. Thanks.

Dec 8 2021, 10:13 AM · Restricted Project
yaxunl added a comment to D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls..

One drawback of this approach is that it does not work for LLVM modules generated from assembly or programmatically e.g. Tensorflow XLA.

Dec 8 2021, 7:52 AM · Restricted Project, Restricted Project

Dec 7 2021

yaxunl committed rGd55f05d9f7dc: [CUDA][HIP] Add pre-defined macro `__CLANG_RDC__` (authored by yaxunl).
[CUDA][HIP] Add pre-defined macro `__CLANG_RDC__`
Dec 7 2021, 3:10 PM
yaxunl closed D114812: [CUDA][HIP] Add pre-defined macro `__CLANG_RDC__`.
Dec 7 2021, 3:10 PM · Restricted Project
yaxunl added inline comments to D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.
Dec 7 2021, 2:08 PM · Restricted Project
yaxunl added a comment to D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

--offload-arch= only accepts GPU arch which is translated to processor option (-mcpu= or -march=) in clang -cc1. spirv is a target triple which is not suitable for --offload-arch=.

--offload= is supposed to cover both target triple and processor with some flexibility. If only target triple is specified, it assumes default processor. If only processor is specified, it deduces target triple. It also allows both triple and processor. In this case, --offload=spirv translates to -triple spirv -mcpu=generic.

So, one would expect that we should be able to specify it more than once to target multiple GPU variants, if we were to use it as a more flexible --offload-arch.
If I read the tests correctly, using --offload= limits us to exactly one variant now. Perhaps it should eventually be relaxed to only enforce single --offload= variant if we're offloading to SPIR-V. It's not a showstopper for this patch. We can relax it later.

Dec 7 2021, 1:19 PM · Restricted Project
yaxunl added a comment to D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

Dec 7 2021, 9:31 AM · Restricted Project

Dec 6 2021

yaxunl updated the diff for D114812: [CUDA][HIP] Add pre-defined macro `__CLANG_RDC__`.
Dec 6 2021, 3:23 PM · Restricted Project
yaxunl committed rG3b172f60c692: [HIP] Fix -fgpu-rdc for Windows (authored by yaxunl).
[HIP] Fix -fgpu-rdc for Windows
Dec 6 2021, 1:43 PM
yaxunl closed D115039: [HIP] Fix -fgpu-rdc for Windows.
Dec 6 2021, 1:42 PM · Restricted Project
yaxunl added a comment to D115039: [HIP] Fix -fgpu-rdc for Windows.

Put __hip_gpubin_handle in comdat when it has linkonce_odr linkage.

I wonder when would this happen? I'm not sure we ever want gpubin handles from different TUs merged. I think it may result in different TUs attempting to load/init the same GPU binary multiple times.

Dec 6 2021, 12:11 PM · Restricted Project

Dec 3 2021

yaxunl accepted D115032: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

Ideally, we could let the builtins accept both vec3 and vec4. But I am OK with this for now. I think the overhead may be minimal.

Dec 3 2021, 10:03 AM · Restricted Project, Restricted Project
yaxunl requested review of D115039: [HIP] Fix -fgpu-rdc for Windows.
Dec 3 2021, 6:45 AM · Restricted Project

Dec 2 2021

yaxunl added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

I think this macro is purely terrible and should not be added (and at least should be all caps?). If we can't just hard break users, I would rather just leave the builtin signatures broken

Rather than adding an ad-hoc named macro, could they just directly check the clang version?

Dec 2 2021, 11:59 AM · Restricted Project, Restricted Project
yaxunl added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

LGTM from clang side. Thanks.

Dec 2 2021, 11:49 AM · Restricted Project, Restricted Project
yaxunl added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

This is a flag-day change to the signatures of the LLVM intrinsics and the OpenCL builtins. Is that OK?

This breaks users' code. If we have to do this, at least let clang emit a pre-defined macro e.g. __amdgcn_bvh_use_vec3__=1 so that users can make their code work before and after the change.

I don't know anything about OpenCL macros. Is it good enough to put this in AMDGPUTargetInfo::getTargetDefines:

if (Opts.OpenCL)
  Builder.defineMacro("__amdgcn_bvh_use_vec3__");

Does it need tests, documentation, etc?

Dec 2 2021, 10:01 AM · Restricted Project, Restricted Project
yaxunl added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

This is a flag-day change to the signatures of the LLVM intrinsics and the OpenCL builtins. Is that OK?

This breaks users' code. If we have to do this, at least let clang emit a pre-defined macro e.g. __amdgcn_bvh_use_vec3__=1 so that users can make their code work before and after the change.

I don't know anything about OpenCL macros. Is it good enough to put this in AMDGPUTargetInfo::getTargetDefines:

if (Opts.OpenCL)
  Builder.defineMacro("__amdgcn_bvh_use_vec3__");

Does it need tests, documentation, etc?

But how long would that be carried? And then deprecated?

Then do you think the patch is OK as-is?

Dec 2 2021, 7:48 AM · Restricted Project, Restricted Project
yaxunl added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

This is a flag-day change to the signatures of the LLVM intrinsics and the OpenCL builtins. Is that OK?

This breaks users' code. If we have to do this, at least let clang emit a pre-defined macro e.g. __amdgcn_bvh_use_vec3__=1 so that users can make their code work before and after the change.

I do not think it's worth introducing a macro for this. Are there actually C users of these builtins?

Dec 2 2021, 6:59 AM · Restricted Project, Restricted Project
yaxunl added a comment to D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args.

This is a flag-day change to the signatures of the LLVM intrinsics and the OpenCL builtins. Is that OK?

Dec 2 2021, 6:53 AM · Restricted Project, Restricted Project

Dec 1 2021

yaxunl accepted D114849: [AMDGPU][clang] Fix __builtin_nontemporal_store() failure on AMDGPU.

LGTM. Thanks.

Dec 1 2021, 1:18 PM · Restricted Project
yaxunl added inline comments to D114849: [AMDGPU][clang] Fix __builtin_nontemporal_store() failure on AMDGPU.
Dec 1 2021, 7:15 AM · Restricted Project

Nov 30 2021

yaxunl added a comment to D114812: [CUDA][HIP] Add pre-defined macro `__CLANG_RDC__`.

I am not sure whether we want to define a similar macro for cuda-clang.

Nov 30 2021, 12:03 PM · Restricted Project
yaxunl requested review of D114812: [CUDA][HIP] Add pre-defined macro `__CLANG_RDC__`.
Nov 30 2021, 12:01 PM · Restricted Project

Nov 29 2021

yaxunl accepted D114553: [HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang.

LGTM. Thanks.

Nov 29 2021, 10:19 AM · Restricted Project
yaxunl added inline comments to D114553: [HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang.
Nov 29 2021, 7:28 AM · Restricted Project

Nov 26 2021

yaxunl added inline comments to D114553: [HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang.
Nov 26 2021, 7:52 AM · Restricted Project

Nov 24 2021

yaxunl added a comment to D114553: [HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang.

we also need a sema test like clang/test/SemaOpenCL/atomic-ops.cl

Nov 24 2021, 12:51 PM · Restricted Project
yaxunl added a comment to D114502: File Reorganization changes.

could you please include a complete diff context in the patch? You can do that by using git diff -U9999

Nov 24 2021, 5:56 AM · Restricted Project, Restricted Project, Restricted Project

Nov 23 2021

yaxunl committed rG38211bbab1d9: [HIP] Fix device stub name for Windows (authored by yaxunl).
[HIP] Fix device stub name for Windows
Nov 23 2021, 9:04 AM
yaxunl closed D113491: [HIP] Fix device stub name for Windows.
Nov 23 2021, 9:04 AM · Restricted Project
yaxunl committed rGb472bd855ed8: [NFC] Let Microsoft mangler accept GlobalDecl (authored by yaxunl).
[NFC] Let Microsoft mangler accept GlobalDecl
Nov 23 2021, 8:14 AM
yaxunl closed D113490: [NFC] Let Microsoft mangler accept GlobalDecl.
Nov 23 2021, 8:14 AM · Restricted Project
yaxunl committed rGaa9b90ca441d: Fix warning due to default switch label (authored by yaxunl).
Fix warning due to default switch label
Nov 23 2021, 7:53 AM
yaxunl committed rGe13246a2ec3d: [HIP] Add HIP scope atomic operations (authored by yaxunl).
[HIP] Add HIP scope atomic operations
Nov 23 2021, 7:14 AM
yaxunl closed D113925: [HIP] Add HIP scope atomic operations.
Nov 23 2021, 7:14 AM · Restricted Project

Nov 22 2021

yaxunl added inline comments to D110549: [HIPSPV][1/4] Refactor HIP tool chain.
Nov 22 2021, 3:27 PM · Restricted Project
yaxunl added inline comments to D113490: [NFC] Let Microsoft mangler accept GlobalDecl.
Nov 22 2021, 6:23 AM · Restricted Project

Nov 18 2021

yaxunl added a comment to D113925: [HIP] Add HIP scope atomic operations.

@yaxunl thanks for the review! My Github account is locked unfortunately so I will have to ask you to push this commit to the main branch. Thank you!

Nov 18 2021, 12:40 PM · Restricted Project
yaxunl accepted D113925: [HIP] Add HIP scope atomic operations.

LGTM. Thanks.

Nov 18 2021, 12:31 PM · Restricted Project
yaxunl added a comment to D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.

LGTM. I will defer to @tra

Nov 18 2021, 7:41 AM · Restricted Project
yaxunl added a comment to D110618: [HIPSPV][2/4] Add HIPSPV tool chain.

LGTM. I will leave to @tra about -nohipwrapperinc

Nov 18 2021, 7:38 AM · Restricted Project
yaxunl accepted D110549: [HIPSPV][1/4] Refactor HIP tool chain.

LGTM. Thanks.

Nov 18 2021, 7:26 AM · Restricted Project

Nov 17 2021

yaxunl added inline comments to D110622: [HIPSPV][3/4] Enable SPIR-V emission for HIP.
Nov 17 2021, 1:18 PM · Restricted Project

Nov 16 2021

yaxunl added inline comments to D113925: [HIP] Add HIP scope atomic operations.
Nov 16 2021, 6:29 AM · Restricted Project

Nov 12 2021

yaxunl added inline comments to D113800: [amdgpu] Don't crash on empty global ctor/dtor.
Nov 12 2021, 1:23 PM · Restricted Project

Nov 11 2021

yaxunl committed rG0309e50f33f6: [Driver] Fix ToolChain::getSanitizerArgs (authored by yaxunl).
[Driver] Fix ToolChain::getSanitizerArgs
Nov 11 2021, 2:18 PM
yaxunl closed D111443: [Driver] Fix ToolChain::getSanitizerArgs.
Nov 11 2021, 2:17 PM · Restricted Project
yaxunl added a comment to D87858: [hip] Add HIP scope atomic ops..

Hi Michael, would you like to continue working on this, or let someone from AMD to take over? Thanks.

Nov 11 2021, 1:16 PM · Restricted Project
yaxunl added inline comments to D111443: [Driver] Fix ToolChain::getSanitizerArgs.
Nov 11 2021, 7:59 AM · Restricted Project
yaxunl updated the diff for D111443: [Driver] Fix ToolChain::getSanitizerArgs.

Revised by Evgenii's comments

Nov 11 2021, 7:55 AM · Restricted Project

Nov 10 2021

yaxunl committed rG4b3881e9f319: Emit hidden hostcall argument for sanitized kernels (authored by yaxunl).
Emit hidden hostcall argument for sanitized kernels
Nov 10 2021, 2:06 PM
yaxunl closed D112820: Emit hidden hostcall argument for sanitized kernels..
Nov 10 2021, 2:06 PM · Restricted Project, Restricted Project
yaxunl committed rG80072fde61d4: [CUDA][HIP] Allow comdat for kernels (authored by yaxunl).
[CUDA][HIP] Allow comdat for kernels
Nov 10 2021, 1:43 PM
yaxunl closed D112492: [CUDA][HIP] Allow comdat for kernels.
Nov 10 2021, 1:43 PM · Restricted Project
yaxunl added a comment to D112492: [CUDA][HIP] Allow comdat for kernels.

I did an experiment regarding the ICF issue and it seems not to affect kernel stub.

Nov 10 2021, 8:36 AM · Restricted Project
yaxunl added a comment to D111443: [Driver] Fix ToolChain::getSanitizerArgs.

@eugenis Any further changes needed? Thanks.

Nov 10 2021, 8:09 AM · Restricted Project
yaxunl added inline comments to D112492: [CUDA][HIP] Allow comdat for kernels.
Nov 10 2021, 7:05 AM · Restricted Project
yaxunl accepted D112820: Emit hidden hostcall argument for sanitized kernels..

LGTM. Thanks.

Nov 10 2021, 5:58 AM · Restricted Project, Restricted Project

Nov 9 2021

yaxunl added a comment to D112492: [CUDA][HIP] Allow comdat for kernels.

I think probably it is necessary to merge linkonce_odr symbols for them to work properly.

Nov 9 2021, 2:19 PM · Restricted Project
yaxunl added inline comments to D113491: [HIP] Fix device stub name for Windows.
Nov 9 2021, 1:35 PM · Restricted Project
yaxunl added a comment to D110257: [CFE][Codegen] Make sure to maintain the contiguity of all the static allocas.

LGTM. It seems all concerns have been addressed. Shall we move ahead and land this patch? Thanks.

Nov 9 2021, 10:20 AM · Restricted Project