Page MenuHomePhabricator
Feed Advanced Search

Thu, Jan 14

tra requested review of D94732: [CUDA] Normalize handling of defauled dtor..
Thu, Jan 14, 4:37 PM · Restricted Project
tra added a comment to D94643: [llvm-link] Improve link time for bitcode archives [NFC].

Do you have any measurements showing actual impact on performance or memory consumption?

Thu, Jan 14, 10:51 AM · Restricted Project
tra added a comment to D93062: [HIP] Add signbit(long double) decl.

Go ahead.

Thu, Jan 14, 10:00 AM · Restricted Project

Wed, Jan 13

tra added a comment to D94337: Add cuda header type for cuh files.

... The goal of __clang_cuda_standalone_defs.h is to make it possible to parse CUDA sources at all w/o having to rely on CUDA SDK. ...

Should __clang_cuda_standalone_defs.h depend on the arch? For example, __match_all_sync doesn't exist in sm_35 but does exist in sm_75.
For tooling this might not matter because the host cuda includes don't depend on the arch.

Wed, Jan 13, 1:25 PM · Restricted Project
tra added a comment to D93062: [HIP] Add signbit(long double) decl.

For CUDA we have tests in the test-suite (https://github.com/llvm/llvm-test-suite/tree/main/External/CUDA) and a handful of buildbots running them (e.g. http://lab.llvm.org:8011/#/builders/55).
AMD should probably set up some public build/test bots for HIP, too.
In this case, manual testing and rollback if something breaks is about all we can do.

Wed, Jan 13, 12:05 PM · Restricted Project
tra added a comment to D94337: Add cuda header type for cuh files.
In D94337#2491825, @tra wrote:

'Works' is not exactly the same as 'works correctly'. This example makes a() look like a regular host function, instead of the kernel, and that affects how the rest of the TU get parsed.
I.e. you'll have further errors if somewhere down below the file has a<<<1,1>>>(). Similar story with ignoring __host__ and __device__ attributes -- that may lead to various overload resolution errors, or reporting conflicting redeclarations/redefinitions for the perfectly valid host/device function overloads. The list goes on.

No, this part is definitely working, the full set of SDK headers is included. I have confirmed this by compiling a .cu and a .cuh file with -E and checking that the output is identical. Further, I have confirmed that global<<<_, _>>>() builds, that __host__ only functions can't be used on the device, and that __device__ functions can't be used on the host.

Wed, Jan 13, 10:57 AM · Restricted Project

Mon, Jan 11

tra added a comment to D94337: Add cuda header type for cuh files.

For example consider the following header:

#pragma once

__global__ void a() {
  unsigned block_idx = blockIdx.x;
  unsigned thread_idx = threadIdx.x;

  __shfl_down_sync(1, 2, 1);
}

When saved as a .cuh and compiled as clang++ file.cuh -fsyntax-only --cuda-gpu-arch=sm_75 (using clang++ built from this commit), this works fine other than an invalid diagnostic for the #pragma once.

Mon, Jan 11, 5:37 PM · Restricted Project
tra added a comment to D94337: Add cuda header type for cuh files.

My primary goal for this change was to allow for language servers and other tooling to properly handle cuda header files. From my understanding the way that language servers handle c++ header files is by compiling them with -xc++-header and -fsyntax-only. This is certainly true for ccls and it seems to be true for clangd.
So this can be accomplished without actually able to produce preprocessed output for cuda headers - it only requires handling the "-fsyntax-only" use case.

Mon, Jan 11, 2:52 PM · Restricted Project
tra added a comment to D94337: Add cuda header type for cuh files.

This adds a cuda header type with file extension "cuh". The output type file
extension is "cuhi" - not sure if this is a good choice. This allows
language servers to properly handle cuh files without additional arguments.

Mon, Jan 11, 11:02 AM · Restricted Project

Thu, Jan 7

tra added a comment to D93930: [NewPM][NVPTX] Port NVPTX opt passes.

LGTM

Thu, Jan 7, 3:15 PM · Restricted Project
tra accepted D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Forget that C function could be overloaded on Clang with overloadable
extension. With that, we don't need to mark functions from <ymath.h> as HD.
Instead, we could provide their device-side implementation directly.

Thu, Jan 7, 10:20 AM · Restricted Project

Wed, Jan 6

tra added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Wed, Jan 6, 4:03 PM · Restricted Project
tra added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Wed, Jan 6, 2:34 PM · Restricted Project
tra accepted D92954: [clang-offload-bundler] Add option -list.
Wed, Jan 6, 12:37 PM · Restricted Project
tra accepted D93587: [hip] Fix HIP version parsing..

LGTM overall.

Wed, Jan 6, 11:23 AM · Restricted Project
tra added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Wed, Jan 6, 11:09 AM · Restricted Project
tra accepted D94123: [NVPTX] Fix debugging information being added to NVPTX target if remarks are enabled.

LGTM.

Wed, Jan 6, 10:37 AM · Restricted Project, Restricted Project

Tue, Jan 5

tra added a comment to D94123: [NVPTX] Fix debugging information being added to NVPTX target if remarks are enabled.

Okay, so without that flag Clang will not create debug symbols in the PTX assembly output.

Tue, Jan 5, 4:49 PM · Restricted Project, Restricted Project
tra added a comment to D94123: [NVPTX] Fix debugging information being added to NVPTX target if remarks are enabled.

There's --cuda-noopt-device-debug option specifically to allow compiling GPU code with full debug info. Clang will generate optimized PTX, but ptxas optimizations will be disabled.

Tue, Jan 5, 2:50 PM · Restricted Project, Restricted Project
tra added inline comments to D92434: [NFC][AMDGPU] AMDGPU code object V4 ABI documentation.
Tue, Jan 5, 2:20 PM · Restricted Project, Restricted Project
tra added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Tue, Jan 5, 1:01 PM · Restricted Project
tra added inline comments to D93587: [hip] Fix HIP version parsing..
Tue, Jan 5, 12:37 PM · Restricted Project
tra added inline comments to D92954: [clang-offload-bundler] Add option -list.
Tue, Jan 5, 12:22 PM · Restricted Project
tra accepted D92535: [llvm-link] fix linker behavior when linking archives with --only-needed option.

LGTM.

Tue, Jan 5, 9:35 AM · Restricted Project

Dec 15 2020

tra added a comment to D92535: [llvm-link] fix linker behavior when linking archives with --only-needed option.

@jdoerfert -- do you happen to know how archives with bitcode are used by OpenMP? Does OpenMP ever link just the archive alone w/o any other bitcode files?

Dec 15 2020, 1:11 PM · Restricted Project
tra added a comment to D92535: [llvm-link] fix linker behavior when linking archives with --only-needed option.

The change look OK. But we should still wait for @jsjodin to confirm that the first-file-is-different for archive files is unintentional.

Dec 15 2020, 12:43 PM · Restricted Project

Dec 14 2020

tra accepted D92720: [HIP] unbundle bundled preprocessor output.

Output of -E for HIP combined host/device compilation is a plain text. It has C++ comments inserted between preprocessor outputs for host and different GPU arch's. The C++ comments follow the format of clang-offload-bundler bundled text files therefore clang-offload-bundler is able to unbundle it.

Dec 14 2020, 3:03 PM · Restricted Project
tra accepted D93068: [clang-offload-bundler] Add option -allow-missing-bundles.

The patch could use an OK with OMP folks, considering that we've changed the way offload bunder is invoked for OMP.

Dec 14 2020, 2:37 PM · Restricted Project
tra committed rG0936655bac78: [CUDA] Do not diagnose host/device variable access in dependent types. (authored by tra).
[CUDA] Do not diagnose host/device variable access in dependent types.
Dec 14 2020, 11:56 AM
tra closed D92893: [CUDA] Do not diagnose host/device variable access in dependent types..
Dec 14 2020, 11:56 AM · Restricted Project
tra updated the diff for D92893: [CUDA] Do not diagnose host/device variable access in dependent types..

Use device in the test case.

Dec 14 2020, 11:09 AM · Restricted Project
tra added inline comments to D93181: [NFC][AMDGPU] Reformat AMD GPU targets in cuda.cpp.
Dec 14 2020, 10:38 AM · Restricted Project

Dec 10 2020

tra updated subscribers of D92893: [CUDA] Do not diagnose host/device variable access in dependent types..

My first variant of the patch only helped with some cases when the surface/texture attribute type was used.
Trying to reduce real-world failure resulted in an example that I've added as the test case which was still failing with this patch applied.

Dec 10 2020, 4:49 PM · Restricted Project
tra updated the diff for D92893: [CUDA] Do not diagnose host/device variable access in dependent types..

Found another corner case (reference within a template with the surface/texture
attibute.) and figured out a better fix.

Dec 10 2020, 4:41 PM · Restricted Project
tra accepted D93062: [HIP] Add signbit(long double) decl.
Dec 10 2020, 2:58 PM · Restricted Project
tra added a comment to D93068: [clang-offload-bundler] Add option -allow-missing-bundles.
Dec 10 2020, 2:56 PM · Restricted Project
tra added inline comments to D92431: [SROA] Remove Dead Instructions while creating speculative instructions.
Dec 10 2020, 11:08 AM · Restricted Project
tra added inline comments to D92954: [clang-offload-bundler] Add option -list.
Dec 10 2020, 10:50 AM · Restricted Project

Dec 9 2020

tra accepted D92918: [llvm-link][NFC] Minor cleanup.

And, by the way, this function is called as a static member (i.e. Linker::linkModules()) everywhere in LLVM sources except llvm-link.cpp:

Dec 9 2020, 5:57 PM · Restricted Project
tra committed rG016e4ebfde28: [DWARF] Allow toolchain to adjust specified DWARF version. (authored by tra).
[DWARF] Allow toolchain to adjust specified DWARF version.
Dec 9 2020, 4:35 PM
tra closed D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 9 2020, 4:35 PM · Restricted Project
tra added inline comments to D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 9 2020, 12:57 PM · Restricted Project
tra updated the diff for D92617: [DWARF] Allow toolchain to adjust specified DWARF version..

Reorganized tests for unsupported debug options & dwarf version clamping.

Dec 9 2020, 12:53 PM · Restricted Project
tra added inline comments to D92954: [clang-offload-bundler] Add option -list.
Dec 9 2020, 11:45 AM · Restricted Project
tra added a comment to D92535: [llvm-link] fix linker behavior when linking archives with --only-needed option.

Sure. I believe llvm-link works incorrectly when linking --only-needed symbols from archives with bitcode files. As it is implemented now, llvm-link, when dealing with archives, first links archive modules together into an intermediate module and then tries to link required symbols from that intermediate module into the result. The problem is that archive modules are linked together with --only-needed flag as well, so we always end up with getting empty intermediate module because archive linking starts from scratch (i.e. nothing gets imported into archive module because there are no dependencies).

Dec 9 2020, 9:58 AM · Restricted Project
tra added a comment to D92918: [llvm-link][NFC] Minor cleanup.

llvm::Linker::linkModules() is a static member, so there is no need
to pass reference to llvm::Linker instance to loadArFile() function.

Dec 9 2020, 9:29 AM · Restricted Project

Dec 8 2020

tra added inline comments to D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 8 2020, 4:38 PM · Restricted Project
tra added a comment to D91281: [CUDA][HIP] Diagnose reference of host variable.
In D91281#2441147, @tra wrote:

I think isCUDADeviceBuiltinTextureType has problem handling texture refs within templates.

Dec 8 2020, 4:11 PM · Restricted Project
tra requested review of D92893: [CUDA] Do not diagnose host/device variable access in dependent types..
Dec 8 2020, 3:17 PM · Restricted Project
tra added a comment to D91281: [CUDA][HIP] Diagnose reference of host variable.

It appears that we need to add special handling for texture/surface references. Nominally they are host-side objects, but they are accessed/used from device functions as far as Sema is concerned.

Dec 8 2020, 2:26 PM · Restricted Project
tra added a comment to D92535: [llvm-link] fix linker behavior when linking archives with --only-needed option.

The patch description describes what the patch does, but does not tell us much about the problem is is supposed to fix.
Could you give us more details on why the patch is needed?

Dec 8 2020, 9:27 AM · Restricted Project

Dec 7 2020

tra updated the diff for D92617: [DWARF] Allow toolchain to adjust specified DWARF version..

Adjusted openmp test for the changed -gembed-source warning.

Dec 7 2020, 5:12 PM · Restricted Project
tra updated the diff for D92617: [DWARF] Allow toolchain to adjust specified DWARF version..

Addressed comments.

Dec 7 2020, 5:01 PM · Restricted Project
tra updated the diff for D92617: [DWARF] Allow toolchain to adjust specified DWARF version..

Updated to address the comments. PTAL.

Dec 7 2020, 2:09 PM · Restricted Project
tra added a comment to D92720: [HIP] unbundle bundled preprocessor output.

-E by default prints preprocessed output to stdout. CUDA will print preprocessed output from all subcompilations. What does HIP do in this case? Printing out the bundle is probably not what the user will expect.
IMO preprocessed output is frequently used as a debugging tool, so it's important for users to be able to read it. Bundled output is rather cumbersome to deal with. It's possible to manually unbundle it, but the tool is not documented well and it's not particularly suitable for human use.

Dec 7 2020, 10:55 AM · Restricted Project

Dec 4 2020

tra added inline comments to D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 4 2020, 1:50 PM · Restricted Project
tra requested review of D92684: [CUDA, tet-suite] enable testing with C++17 and C++20.
Dec 4 2020, 1:12 PM
tra committed rG43267929423b: [CUDA] Another attempt to fix early inclusion of <new> from libstdc++ (authored by tra).
[CUDA] Another attempt to fix early inclusion of <new> from libstdc++
Dec 4 2020, 12:04 PM
tra closed D91807: [CUDA] Unbreak CUDA compilation with -std=c++20.
Dec 4 2020, 12:04 PM · Restricted Project
tra updated the summary of D91807: [CUDA] Unbreak CUDA compilation with -std=c++20.
Dec 4 2020, 12:03 PM · Restricted Project
tra updated the diff for D91807: [CUDA] Unbreak CUDA compilation with -std=c++20.

Removed forgotten pop_macro(DEVICE)

Dec 4 2020, 11:46 AM · Restricted Project
tra requested review of D91807: [CUDA] Unbreak CUDA compilation with -std=c++20.

@jlebar PTAL. The previous fix did not quite fix the problem. __device__ operator new/delete was still not defined, but the issue was not reported due to deferred diags if the functions were not used.

Dec 4 2020, 11:37 AM · Restricted Project
tra updated the diff for D91807: [CUDA] Unbreak CUDA compilation with -std=c++20.

Better fix for the early inclusion of <new>

Dec 4 2020, 11:35 AM · Restricted Project
tra reopened D91807: [CUDA] Unbreak CUDA compilation with -std=c++20.

Reopening the tracker for a better fix.

Dec 4 2020, 11:33 AM · Restricted Project
tra added inline comments to D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 4 2020, 11:08 AM · Restricted Project
tra updated the diff for D92617: [DWARF] Allow toolchain to adjust specified DWARF version..

Simplified dwarf version clamping.

Dec 4 2020, 11:07 AM · Restricted Project

Dec 3 2020

tra added inline comments to D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 3 2020, 8:12 PM · Restricted Project
tra added inline comments to D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 3 2020, 8:01 PM · Restricted Project
tra added inline comments to D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 3 2020, 4:48 PM · Restricted Project
tra updated the diff for D92617: [DWARF] Allow toolchain to adjust specified DWARF version..

Updated according to Devid's feedback.

Dec 3 2020, 4:46 PM · Restricted Project
tra requested review of D92617: [DWARF] Allow toolchain to adjust specified DWARF version..
Dec 3 2020, 3:57 PM · Restricted Project
tra added a comment to D92363: [HIP] Warn no --offload-arch option.
Dec 3 2020, 12:17 PM

Dec 2 2020

tra added a comment to D80450: [CUDA][HIP] Fix HD function resolution.

Also, naming. -ffix-overload-resolution is rather non-specific. I didn't mean to use it literally. The problem is that I can't think of a good descriptive name for what we do here. -fgpu-fix-wrong-side-overloads ? Something else?

How about -fgpu-exclude-wrong-side-overloads? Since what this patch does is always excluding wrong side overloads whereas previously only excluding wrong side overloads if there are same side overloads.

Dec 2 2020, 10:12 AM · Restricted Project

Dec 1 2020

tra accepted D80450: [CUDA][HIP] Fix HD function resolution.

I'd suggest adding more details on the background of this change to the commit log (point to the comment in the isBetterOverloadCandidate ?) and outline the intention to enable the new way to do overloading after some soak time.

Dec 1 2020, 1:02 PM · Restricted Project
tra accepted D91088: [CUDA][HIP] Fix capturing reference to host variable.
Dec 1 2020, 12:40 PM · Restricted Project
tra added a comment to D92363: [HIP] Warn no --offload-arch option.

While I agree that the default GPU choice is not likely to be correct, or usable, for everyone, but the warning seems to be a half-measure.
If the default is not usable, then it should not be the default. If it's usable, then we don't need a warning.

Dec 1 2020, 12:23 PM

Nov 30 2020

tra added a comment to D91088: [CUDA][HIP] Fix capturing reference to host variable.

LGTM in general.

Nov 30 2020, 1:23 PM · Restricted Project
tra added inline comments to D80450: [CUDA][HIP] Fix HD function resolution.
Nov 30 2020, 1:04 PM · Restricted Project
tra accepted D92342: [HIP] Fix HIP test on windows due to lld suffix.
Nov 30 2020, 12:58 PM · Restricted Project
tra added a comment to D92130: [HIP] cmath demote long double args to double.

LGTM in general. Will defer to Sam as it's HIP.

Nov 30 2020, 12:34 PM · Restricted Project
tra accepted D92339: Add help text for -nogpuinc.
Nov 30 2020, 12:06 PM · Restricted Project
tra added inline comments to D80450: [CUDA][HIP] Fix HD function resolution.
Nov 30 2020, 12:05 PM · Restricted Project
tra added a comment to D80450: [CUDA][HIP] Fix HD function resolution.

For the call ag(e). There are two candidates:

  1. ag in namespace b. The function arguments can match. However it is a host function, therefore is a wrong-sided candidate and not viable.
  1. ag in default name space. It is a host device function. However the function arguments requires a<ae>, therefore cannot match.

Before my patch, wrong-sided candidate is allowed. clang resolves to candidate 1 and this results in a diagnostic about host function referenced in device host function, which can be deferred. Since f() is not emitted on device side, the deferred diags is not emitted.

Nov 30 2020, 12:01 PM · Restricted Project
tra added a comment to D92210: Don't sink ptrtoint/inttoptr sequences into non-noop addrspacecasts..

This looks reasonable to me (and I appreciate all the debugging in the Julia bug!), but I have never touched this code, so I don't 100% feel comfortable approving the change.

Nov 30 2020, 11:36 AM · Restricted Project, Restricted Project
tra added a comment to D91928: [nvptx] Skip alloca for read-only byval arguments..

yeah, it seems the other approach is more appropriate to place the alloca in the frontend and that explicitly copy from the parameter space to the private space.

Nov 30 2020, 10:43 AM · Restricted Project

Nov 19 2020

tra committed rG9a465057a64d: [CUDA] Unbreak CUDA compilation with -std=c++20 (authored by tra).
[CUDA] Unbreak CUDA compilation with -std=c++20
Nov 19 2020, 10:36 AM
tra closed D91807: [CUDA] Unbreak CUDA compilation with -std=c++20.
Nov 19 2020, 10:36 AM · Restricted Project
tra requested review of D91807: [CUDA] Unbreak CUDA compilation with -std=c++20.
Nov 19 2020, 10:15 AM · Restricted Project

Nov 17 2020

tra added a reviewer for D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.: jlebar.

This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment

includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through LDC instruction in SASS instead of MOV due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?

Nov 17 2020, 10:27 AM · Restricted Project, Restricted Project

Nov 16 2020

tra accepted D91546: [AMDGPU] Add option -munsafe-fp-atomics.

Patch description could use a pointer to more details about the unsafe atomics.

Nov 16 2020, 9:35 AM · Restricted Project
tra added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 16 2020, 9:28 AM · Restricted Project, Restricted Project

Nov 13 2020

tra accepted D91421: Fix temporary file name on Windows.

LGTM overall, nit about the character choice.

Nov 13 2020, 10:33 AM · Restricted Project

Nov 12 2020

tra added a comment to D91034: [Frontend] Treat CUDA header files as CUDA source files.

@tra I have checked that clang-format does not call this function. Do you think this is still needed?

Nov 12 2020, 11:37 AM · Restricted Project, Restricted Project
tra accepted D91034: [Frontend] Treat CUDA header files as CUDA source files.

OK. For clang-format it would still be somewhat useful and it's not making things worse than they already are otherwise.

Nov 12 2020, 9:37 AM · Restricted Project, Restricted Project

Nov 11 2020

tra added a comment to D91034: [Frontend] Treat CUDA header files as CUDA source files.

In general, I'm still not quite convinced that adding CUDA detection to header extensions is all that useful.

Nov 11 2020, 1:00 PM · Restricted Project, Restricted Project
tra added inline comments to D91034: [Frontend] Treat CUDA header files as CUDA source files.
Nov 11 2020, 12:45 PM · Restricted Project, Restricted Project
tra added a comment to D91281: [CUDA][HIP] Diagnose reference of host variable.

LGTM.

Nov 11 2020, 10:25 AM · Restricted Project
tra added a comment to D90174: [HIP] Fix regressions due to fp contract change.

Strictly speaking, fp-contract=fast probably should have been a separate flag entirely (since there's no _expression_ being contracted in fast). Unfortunately, that ship has sailed, and it does constrain our ability to choose an accurate name somewhat.

What if we just spell it out? fast-respect-pragma? fast-when-unspecified? I don't think that we really need to try to be as brief as possible with this one.

Nov 11 2020, 10:24 AM · Restricted Project
tra accepted D91281: [CUDA][HIP] Diagnose reference of host variable.

LGTM.

Nov 11 2020, 10:12 AM · Restricted Project
tra added a comment to D91088: [CUDA][HIP] Fix capturing reference to host variable.

added diagnosing referencing host variable in device functions

Nov 11 2020, 9:50 AM · Restricted Project