Page MenuHomePhabricator
Feed Advanced Search

Today

yaxunl accepted D89966: [HIP] Fix HIP rounding math intrinsics.

LGTM. Thanks.

Thu, Oct 22, 8:56 AM
yaxunl added a comment to D71726: Let clang atomic builtins fetch add/sub support floating point types.

ping

Thu, Oct 22, 8:31 AM

Yesterday

yaxunl added a comment to D89799: [clang][driver] Rename DriverOption as NoXarchOption (NFC).

I am not sure whether it is proper to rename it.

Wed, Oct 21, 7:55 AM · Restricted Project

Tue, Oct 20

yaxunl added a comment to D89577: [VectorCombine] Avoid crossing address space boundaries..

Thanks a lot for fixing this.

Tue, Oct 20, 7:11 AM · Restricted Project
yaxunl accepted D89752: [CUDA] Improve clang's ability to detect recent CUDA versions..

LGTM. Thanks.

Tue, Oct 20, 4:46 AM · Restricted Project

Mon, Oct 19

yaxunl committed rG7e561b62d2f2: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic (authored by yaxunl).
[NFC] Refactor DiagnosticBuilder and PartialDiagnostic
Mon, Oct 19, 2:49 PM
yaxunl committed rG52bcd691cb19: Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device… (authored by yaxunl).
Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device…
Mon, Oct 19, 2:49 PM
yaxunl closed D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.
Mon, Oct 19, 2:48 PM · Restricted Project
yaxunl updated the diff for D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

Add constructors to StreamingDiagnostic.

Mon, Oct 19, 7:26 AM · Restricted Project

Fri, Oct 16

yaxunl added a comment to D89520: Don't permit array bound constant folding in OpenCL..

I am OK with the changes regarding null pointer. I guess people seldom set pointer to zero address in OpenCL.

Fri, Oct 16, 2:31 PM · Restricted Project
yaxunl added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;
Fri, Oct 16, 2:07 PM · Restricted Project
yaxunl added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

Fri, Oct 16, 1:58 PM · Restricted Project
yaxunl updated subscribers of D89525: [amdgpu] Enhance AMDGPU AA..

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

Fri, Oct 16, 1:49 PM · Restricted Project
yaxunl added a comment to D89582: clang/AMDGPU: Apply workgroup related attributes to all functions.

What if a device function is called by kernels with different work group sizes, will caller's work group size override callee's work group size?

Fri, Oct 16, 12:08 PM

Thu, Oct 15

yaxunl added a reverting change for rG187658b8a611: Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024": rGe384e94fbe7c: Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024".
Thu, Oct 15, 2:26 PM
yaxunl committed rGe384e94fbe7c: Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024" (authored by yaxunl).
Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Thu, Oct 15, 2:26 PM
yaxunl added a comment to D89478: AMDGPU: Make sure both cc1 and cc1as process -m[no-]code-object-v3.

LGTM

Thu, Oct 15, 10:47 AM · Restricted Project
yaxunl added inline comments to D89478: AMDGPU: Make sure both cc1 and cc1as process -m[no-]code-object-v3.
Thu, Oct 15, 9:19 AM · Restricted Project
yaxunl added inline comments to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.
Thu, Oct 15, 7:13 AM · Restricted Project
yaxunl updated the diff for D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

Rename StreamableDiagnosticBase to StreamingDiagnostic.

Thu, Oct 15, 7:08 AM · Restricted Project

Wed, Oct 14

yaxunl added inline comments to D89372: [OpenCL] Remove unused extensions.
Wed, Oct 14, 1:10 PM · Restricted Project
yaxunl added a comment to D89372: [OpenCL] Remove unused extensions.

Does the spec requires cl_* macro to be defined if an extension is enabled?

The extension spec currently has:

Every extension which affects the OpenCL language semantics, syntax or adds built-in functions to the language must create a preprocessor #define that matches the extension name string. This #define would be available in the language if and only if the extension is supported on a given implementation.

Those extensions don't affect the language or add any BIFs so my reading from this the macro shouldn't be available.

Wed, Oct 14, 1:05 PM · Restricted Project
yaxunl added a comment to D89372: [OpenCL] Remove unused extensions.

With this change, clang basically will have no knowledge about the removed extensions, i.e., it will not know which extension is supported in which version of OpenCL and have no way to enable/disable those extensions. There will be no way to define corresponding macros in clang.

Basically the responsibility of defining those macros will be shifted to OpenCL runtime for JIT and shifted to users for offline compilation. They need to have knowledge about which extensions are supported in which version of OpenCL and which cpu/platform supports them. I am not sure whether this is the direction we want to move to.

But why do you think anyone would need to use those macros in OpenCL C?

Let's take cl_khr_icd as an exmaple: https://www.khronos.org/registry/OpenCL//sdk/2.2/docs/man/html/cl_khr_icd.html

cl_khr_icd - Extension through which the Khronos OpenCL installable client driver loader (ICD Loader) may expose multiple separate vendor installable client drivers (Vendor ICDs) for OpenCL.

Why would anyone need any macro while compiling OpenCL C code for this functionality? It is dialing with the driver loader that runs on the host before compiling anything.

I believe that the addition of such extensions into the kernel language is accidental and we should try to improve this. There is no need to have something that isn't needed. We have enough code and complexity to maintain that is useful. Let's try to simplify by at least removing what is not needed.

On a separate note, the extensions that need macro definition and don't require the functionality in the clang parsing also doesn't have to be in the clang source code. I have mentioned it in my RFC as well: http://lists.llvm.org/pipermail/cfe-dev/2020-September/066911.html

Wed, Oct 14, 12:11 PM · Restricted Project
yaxunl added inline comments to D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.
Wed, Oct 14, 11:25 AM · Restricted Project
yaxunl added a comment to D89372: [OpenCL] Remove unused extensions.

With this change, clang basically will have no knowledge about the removed extensions, i.e., it will not know which extension is supported in which version of OpenCL and have no way to enable/disable those extensions. There will be no way to define corresponding macros in clang.

Wed, Oct 14, 8:12 AM · Restricted Project
yaxunl added reviewers for D89372: [OpenCL] Remove unused extensions: b-sumner, arsenm.
Wed, Oct 14, 8:05 AM · Restricted Project
yaxunl added a comment to D89372: [OpenCL] Remove unused extensions.

what if users rely on the predefined macros associated with the extension e.g. cl_khr_srgb_image_writes to enable/disable certain code?

Wed, Oct 14, 5:35 AM · Restricted Project
yaxunl updated the diff for D84362: [NFC] Refactor DiagnosticBuilder and PartialDiagnostic.

revised by John's comments.

Wed, Oct 14, 5:17 AM · Restricted Project

Tue, Oct 13

yaxunl added a comment to D76620: [SYCL] Implement __builtin_unique_stable_name..

CUDA/HIP are facing similar issues, i.e. consistency of name mangling of kernels between host/device compilation of the same TU. I hope this feature to be implemented in a generic way so that it may be reusable for other offloading languages.

Tue, Oct 13, 7:21 AM · Restricted Project

Thu, Oct 8

yaxunl committed rGb9225543e844: DeferredDiagnosticsEmitter crashes (authored by glevner).
DeferredDiagnosticsEmitter crashes
Thu, Oct 8, 8:45 AM
yaxunl closed D88949: DeferredDiagnosticsEmitter crashes.
Thu, Oct 8, 8:45 AM · Restricted Project
yaxunl updated subscribers of D88949: DeferredDiagnosticsEmitter crashes.

I can help you commit it to trunk. For cherrypick to release branch, we may need help of @hans

Thu, Oct 8, 7:22 AM · Restricted Project
yaxunl accepted D88949: DeferredDiagnosticsEmitter crashes.

LGTM. Thanks.

Thu, Oct 8, 5:43 AM · Restricted Project

Wed, Oct 7

yaxunl accepted D78902: [Driver] Add output file to properties of Command.

LGTM. Thanks!

Wed, Oct 7, 4:46 AM · Restricted Project
yaxunl added a comment to D88949: DeferredDiagnosticsEmitter crashes.

Can we have a lit test? Thanks.

Wed, Oct 7, 4:38 AM · Restricted Project

Tue, Oct 6

yaxunl added inline comments to D78902: [Driver] Add output file to properties of Command.
Tue, Oct 6, 8:47 AM · Restricted Project

Mon, Oct 5

yaxunl added a comment to D88786: [CUDA] Don't call __cudaRegisterVariable on C++17 inline variables.

This patch may break some existing HIP applications.

Mon, Oct 5, 2:43 PM · Restricted Project
yaxunl added inline comments to D78902: [Driver] Add output file to properties of Command.
Mon, Oct 5, 5:18 AM · Restricted Project

Sun, Oct 4

yaxunl committed rGe372c1d7624e: [HIP] Fix -fgpu-allow-device-init option (authored by yaxunl).
[HIP] Fix -fgpu-allow-device-init option
Sun, Oct 4, 7:17 PM
yaxunl closed D88550: [HIP] Fix -fgpu-allow-device-init option.
Sun, Oct 4, 7:17 PM · Restricted Project
yaxunl committed rG5b551b79d3bb: [HIP] Fix default output file for -E (authored by yaxunl).
[HIP] Fix default output file for -E
Sun, Oct 4, 7:10 PM
yaxunl closed D88730: [HIP] Fix default output file for -E.
Sun, Oct 4, 7:09 PM · Restricted Project
yaxunl committed rG9756a402f297: Recommit "[HIP] Add option --gpu-instrument-lib=" (authored by yaxunl).
Recommit "[HIP] Add option --gpu-instrument-lib="
Sun, Oct 4, 6:43 PM
yaxunl added a reverting change for rG64f7790e7d23: [HIP] Add option --gpu-instrument-lib=: rGfef0ebbc0b39: Revert "[HIP] Add option --gpu-instrument-lib=".
Sun, Oct 4, 6:28 PM
yaxunl committed rGfef0ebbc0b39: Revert "[HIP] Add option --gpu-instrument-lib=" (authored by yaxunl).
Revert "[HIP] Add option --gpu-instrument-lib="
Sun, Oct 4, 6:28 PM
yaxunl added a reverting change for D88557: [HIP] Add option --gpu-instrument-lib=: rGfef0ebbc0b39: Revert "[HIP] Add option --gpu-instrument-lib=".
Sun, Oct 4, 6:28 PM · Restricted Project
yaxunl committed rG64f7790e7d23: [HIP] Add option --gpu-instrument-lib= (authored by yaxunl).
[HIP] Add option --gpu-instrument-lib=
Sun, Oct 4, 6:18 PM
yaxunl closed D88557: [HIP] Add option --gpu-instrument-lib=.
Sun, Oct 4, 6:18 PM · Restricted Project

Fri, Oct 2

yaxunl committed rG2cd75f738ec6: Diagnose invalid target ID for AMDGPU toolchain for assembler (authored by yaxunl).
Diagnose invalid target ID for AMDGPU toolchain for assembler
Fri, Oct 2, 4:38 PM
yaxunl closed D88377: Diagnose invalid target ID for AMDGPU toolchain for assembler.
Fri, Oct 2, 4:38 PM · Restricted Project
yaxunl committed rGcbd420c5ed85: [CUDA][HIP] Fix bound arch for offload action for fat binary (authored by yaxunl).
[CUDA][HIP] Fix bound arch for offload action for fat binary
Fri, Oct 2, 4:14 PM
yaxunl closed D88524: [CUDA][HIP] Fix bound arch for offload action for fat binary.
Fri, Oct 2, 4:14 PM · Restricted Project
yaxunl committed rGdc6a0b0ec7e3: [HIP] Align device binary (authored by yaxunl).
[HIP] Align device binary
Fri, Oct 2, 3:12 PM
yaxunl closed D88734: [HIP] Align device binary.
Fri, Oct 2, 3:12 PM · Restricted Project
yaxunl added inline comments to D88734: [HIP] Align device binary.
Fri, Oct 2, 1:59 PM · Restricted Project
yaxunl added inline comments to D88730: [HIP] Fix default output file for -E.
Fri, Oct 2, 1:18 PM · Restricted Project
yaxunl committed rGc87c017a4c47: Fix failure in test hip-macros.hip (authored by yaxunl).
Fix failure in test hip-macros.hip
Fri, Oct 2, 7:34 AM
yaxunl committed rG36501b180a4f: Emit predefined macro for wavefront size for amdgcn (authored by yaxunl).
Emit predefined macro for wavefront size for amdgcn
Fri, Oct 2, 7:19 AM
yaxunl closed D88370: Emit predefined macro for wavefront size for amdgcn.
Fri, Oct 2, 7:18 AM · Restricted Project
yaxunl added inline comments to D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables..
Fri, Oct 2, 7:11 AM · Restricted Project
yaxunl requested review of D88734: [HIP] Align device binary.
Fri, Oct 2, 5:54 AM · Restricted Project
yaxunl requested review of D88730: [HIP] Fix default output file for -E.
Fri, Oct 2, 5:28 AM · Restricted Project

Thu, Oct 1

yaxunl added inline comments to D88524: [CUDA][HIP] Fix bound arch for offload action for fat binary.
Thu, Oct 1, 1:22 PM · Restricted Project
yaxunl added inline comments to D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables..
Thu, Oct 1, 12:38 PM · Restricted Project
yaxunl updated the diff for D88524: [CUDA][HIP] Fix bound arch for offload action for fat binary.

add CudaArch::UNUSED as suggested by Artem.

Thu, Oct 1, 7:45 AM · Restricted Project

Wed, Sep 30

yaxunl added a comment to D88524: [CUDA][HIP] Fix bound arch for offload action for fat binary.
In D88524#2304173, @tra wrote:

Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.

It would appear that the problem is actually where we check TargetID -- we should've ignored CudaArch::UNKNOWN there.
Not setting the arch here avoids triggering the bug but it does not fix it.
Considering that CudaArch::UNKNOWN is used here to indicate that the arch is unused, perhaps we need an enum for that to distinguish it from unknown/unset.

Wed, Sep 30, 11:16 AM · Restricted Project
yaxunl added a comment to D88557: [HIP] Add option --gpu-instrument-lib=.
In D88557#2303891, @tra wrote:

Perhaps we should start thinking of shipping some of that bitcode along with clang.
Then the instrumentation library could be linked with automatically by the driver when -finstrument is specified.
We already need bitcode for the math library for both NVPTX and AMDGPU and will likely need more for other things that depend on things that are standard on the host.

Wed, Sep 30, 9:51 AM · Restricted Project
yaxunl updated the diff for D88370: Emit predefined macro for wavefront size for amdgcn.

simplifies wavefrontsize64 target feature

Wed, Sep 30, 9:34 AM · Restricted Project
yaxunl updated the diff for D88370: Emit predefined macro for wavefront size for amdgcn.

simpler code for handling multiple wave64 options

Wed, Sep 30, 8:58 AM · Restricted Project
yaxunl added inline comments to D88370: Emit predefined macro for wavefront size for amdgcn.
Wed, Sep 30, 8:56 AM · Restricted Project
yaxunl added inline comments to D88370: Emit predefined macro for wavefront size for amdgcn.
Wed, Sep 30, 8:47 AM · Restricted Project
yaxunl updated the diff for D88370: Emit predefined macro for wavefront size for amdgcn.

Add test and fix multiple -m[no-]wavefrontsize64 issue.

Wed, Sep 30, 8:28 AM · Restricted Project
yaxunl requested review of D88557: [HIP] Add option --gpu-instrument-lib=.
Wed, Sep 30, 5:05 AM · Restricted Project
yaxunl added a comment to D88370: Emit predefined macro for wavefront size for amdgcn.

ping

Wed, Sep 30, 4:00 AM · Restricted Project
yaxunl requested review of D88550: [HIP] Fix -fgpu-allow-device-init option.
Wed, Sep 30, 3:25 AM · Restricted Project

Tue, Sep 29

yaxunl updated the diff for D88377: Diagnose invalid target ID for AMDGPU toolchain for assembler.

fix bug

Tue, Sep 29, 7:00 PM · Restricted Project
yaxunl requested review of D88524: [CUDA][HIP] Fix bound arch for offload action for fat binary.
Tue, Sep 29, 6:56 PM · Restricted Project
yaxunl committed rGd04775e16bba: Add remquo, frexp and modf overload functions to HIP header (authored by yaxunl).
Add remquo, frexp and modf overload functions to HIP header
Tue, Sep 29, 5:58 PM

Mon, Sep 28

yaxunl committed rG5a3023a91c0e: [HIP] Return non-zero value for invalid target ID (authored by yaxunl).
[HIP] Return non-zero value for invalid target ID
Mon, Sep 28, 8:17 PM
yaxunl committed rG187658b8a611: Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024" (authored by yaxunl).
Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Mon, Sep 28, 8:01 PM
yaxunl committed rG10eb3bf2d430: Skip -fPIE for AMDGPU and HIP toolchain (authored by yaxunl).
Skip -fPIE for AMDGPU and HIP toolchain
Mon, Sep 28, 7:25 PM
yaxunl closed D88425: Skip -fPIE for AMDGPU and HIP toolchain.
Mon, Sep 28, 7:25 PM · Restricted Project
yaxunl updated the diff for D88370: Emit predefined macro for wavefront size for amdgcn.

revised by Matt's comments.

Mon, Sep 28, 1:05 PM · Restricted Project
yaxunl added inline comments to D88425: Skip -fPIE for AMDGPU and HIP toolchain.
Mon, Sep 28, 10:36 AM · Restricted Project
yaxunl updated the diff for D88370: Emit predefined macro for wavefront size for amdgcn.

capitalize macro

Mon, Sep 28, 9:47 AM · Restricted Project
yaxunl added inline comments to D88370: Emit predefined macro for wavefront size for amdgcn.
Mon, Sep 28, 9:42 AM · Restricted Project
yaxunl added inline comments to D88377: Diagnose invalid target ID for AMDGPU toolchain for assembler.
Mon, Sep 28, 9:23 AM · Restricted Project
yaxunl updated the diff for D88377: Diagnose invalid target ID for AMDGPU toolchain for assembler.

update patch with full context

Mon, Sep 28, 9:21 AM · Restricted Project
yaxunl requested review of D88425: Skip -fPIE for AMDGPU and HIP toolchain.
Mon, Sep 28, 9:11 AM · Restricted Project

Sun, Sep 27

yaxunl requested review of D88377: Diagnose invalid target ID for AMDGPU toolchain for assembler.
Sun, Sep 27, 6:53 AM · Restricted Project

Sat, Sep 26

yaxunl updated the diff for D88370: Emit predefined macro for wavefront size for amdgcn.

fix typo

Sat, Sep 26, 8:34 PM · Restricted Project
yaxunl requested review of D88370: Emit predefined macro for wavefront size for amdgcn.
Sat, Sep 26, 7:41 PM · Restricted Project
yaxunl added inline comments to D88345: [CUDA] Allow local `static const {__constant__, __device__}` variables..
Sat, Sep 26, 4:38 AM · Restricted Project

Fri, Sep 25

yaxunl added reviewers for D88303: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr.: Anastasia, bader.
Fri, Sep 25, 7:40 AM · Restricted Project

Thu, Sep 24

yaxunl committed rGe39da8ab6a28: Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device… (authored by yaxunl).
Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device…
Thu, Sep 24, 5:45 AM

Wed, Sep 23

yaxunl committed rG8e780a1653e6: Recommit [NFC] Refactor DiagnosticBuilder and PartialDiagnostic (authored by yaxunl).
Recommit [NFC] Refactor DiagnosticBuilder and PartialDiagnostic
Wed, Sep 23, 1:56 PM
yaxunl accepted D87947: [AMDGPU] Make ds fp atomics overloadable.
Wed, Sep 23, 11:37 AM · Restricted Project, Restricted Project
yaxunl committed rGe90343ada3bd: Fix regressioin in test dwp-separate-debug-file.cpp (authored by yaxunl).
Fix regressioin in test dwp-separate-debug-file.cpp
Wed, Sep 23, 8:50 AM
yaxunl committed rGe6d50b4f22dc: recommit [HIP] Fix -gsplit-dwarf option (authored by yaxunl).
recommit [HIP] Fix -gsplit-dwarf option
Wed, Sep 23, 8:22 AM
yaxunl committed rG301e23305d03: [CUDA][HIP] Fix static device var used by host code only (authored by yaxunl).
[CUDA][HIP] Fix static device var used by host code only
Wed, Sep 23, 5:20 AM