This is an archive of the discontinued LLVM Phabricator instance.

Let CUDA toolchain support amdgpu target
AbandonedPublic

Authored by gregrodgers on Feb 1 2018, 8:20 AM.

Details

Summary

Currently CUDA toolchain only supports nvptx.

This patch will let CUDA toolchain support amdgpu target. It can also serve as an example for supporting CUDA on other targets.

Patch by Greg Rodgers.
Lit test added by Yaxun Liu.

Diff Detail

Event Timeline

yaxunl created this revision.Feb 1 2018, 8:20 AM

Only commenting on parts that I'm a bit familiar with. In general, does it make sense to split this patch, are there different "stages" of support? Like 1) being able to compile an empty file, 2) generate optimized code, 3) allow using math functions?

lib/Driver/ToolChains/Cuda.cpp
403–415

This never gets cleaned up!

531–534

That is already done in TC.getInputFilename(Output) (since rC318763), the same function call that you are removing here...

639–640

clang-fixup-fatbin is not upstreamed and won't work. Sounds like a horrible name btw...

788–793

You should use GpuArch which comes from DriverArgs.getLastArgValue: The last -march overrides previous arguments.

yaxunl added a comment.Feb 1 2018, 8:56 AM

Only commenting on parts that I'm a bit familiar with. In general, does it make sense to split this patch, are there different "stages" of support? Like 1) being able to compile an empty file, 2) generate optimized code, 3) allow using math functions?

Good suggestion. Actually this patch is mainly to let the toolchain recognise the amdgpu implementation of CUDA and create proper stages. I can try to create a test for compiling an empty file.

tra added a reviewer: tra.Feb 1 2018, 9:44 AM
arsenm added inline comments.Feb 1 2018, 9:53 AM
lib/Basic/Targets/AMDGPU.cpp
437

llvm_unreachable

lib/Driver/ToolChains/Cuda.cpp
359–361

Why is this done under an NVPTX:: class

390

Why is this hardcoded?

tra added a comment.Feb 1 2018, 10:15 AM

I don't have enough knowledge about compute on AMD's GPU and would appreciate if you could share your thoughts on how you think CUDA on AMD should work. Is there a good document describing how compute currently works (how do I launch a kernel using rough equivalent of nvidia's driver API ) on AMD GPUs?

  • Headers. clang pre-includes *a lot* of headers from NVidia's CUDA SDK. Some of them may work for AMD, but some certainly will not -- there are a lot of headers with nvidia-specific inline assembly or things that rely on nvidia-specific functionality. In the end, I think, we'll need some sort of CUDA SDK for AMD which would implement (possibly with asserts for unsupported functions) existing CUDA APIs. Or, perhaps the plan is to just use CUDA syntax only without providing complete API compatibility with nvidia.
  • How will GPU-side object file be incorporated into the final executable? I believe OpenMP has a fairly generic way to deal with it in clang. I'm not sure if that would be suitable for use with AMD's runtime (whatever we need to use to launch the kernels).
  • Launching kernels. Will it be similar to the way kernel launches are configured on NVidia? I.e. grid of blocks of threads with per-block shared memory.
gregrodgers requested changes to this revision.Feb 1 2018, 6:40 PM

Thanks to everyone for the reviews. I hope I replied to all inline comments. Since I sent this to Sam to post, we discovered a major shortcoming. As tra points out, there is a lot of cuda headers in the cuda sdk that are processed. We are able to override asm() expansions with #undef and redefine as an equivalent amdgpu component so the compiler never sees the asm(). I am sure we will need to add more redefines as we broaden our testing. But that is not the big problem. We would like to be able to run cudaclang for AMD GPUs without an install of cuda. Of course you must always install cuda if any of your targeted GPUs are NVidia GPUs. To run cudaclang without cuda when only non-NVidia gpus are specified, we need an open set of headers and we must replace the fatbin tools used in the toolchain. The later can be addressed by using the libomptarget methods for embedding multiple target GPU objects. The former is going to take a lot of work. I am going to be sending an updated patch that has the stubs for the open headers noted in clang_cuda_runtime_wrapper.h. They will be included with the CC1 flag -DUSE_OPEN_HEADERS__. This will be generated by the cuda driver when it finds no cuda installation and all target GPUs are not NVidia.

This revision now requires changes to proceed.Feb 1 2018, 6:40 PM

Sorry, all my great inline comments got lost somehow. I am a newbie to Phabricator. I will try to reconstruct my comments.

t-tye added inline comments.Feb 5 2018, 10:35 AM
include/clang/Basic/Cuda.h
49–57

Should complete list of processors for the amdgcn architecture be included? See https://llvm.org/docs/AMDGPUUsage.html#processors .

79

Suggest using amdgcn which matches the architecture name in https://llvm.org/docs/AMDGPUUsage.html#processors .

lib/Basic/Targets/AMDGPU.h
85

We still want to use the amdhsa OS for amdgpu which currently supports the different environments. So can cuda simply support the same environments? Is the plan is to eliminate the environments and simply always use the default address space for generic so this code is no longer needed?

yaxunl added inline comments.Feb 5 2018, 10:40 AM
lib/Basic/Targets/AMDGPU.h
85

Currently we already use amdgiz by default. This is no longer needed.

bader added a subscriber: bader.Feb 5 2018, 10:40 AM

Here my replys to the inline comments. Everything should be fixed in the next revision.

include/clang/Basic/Cuda.h
79

Yes, I will add them in the update.

79

Done in next update

lib/Basic/Targets/AMDGPU.cpp
437

Fixed in next update

lib/Basic/Targets/AMDGPU.h
85

removed in next update

lib/Driver/ToolChains/Cuda.cpp
359–361

Because we are not creating a new toolchain for AMDGCN. We modify the logic in the tool constructor as needed for AMDGCN, keeping the ability to provide a set of mixed targets.

403–415

OK, Deleted in revision.

531–534

Fixed in next update

639–640

Major cleanup here in the next update. It is not a bad name when you see the update and the comments in the update.

788–793

Nice catch. I will fix this in the next update.

jprice added a subscriber: jprice.Feb 12 2018, 10:25 AM
yaxunl updated this revision to Diff 134107.Feb 13 2018, 1:30 PM

Update with Greg's change.

Let's start with the obvious points:

  1. The latest patch clearly wasn't run through clang-format.
  2. It only has some 90 lines of context which makes it look like you deleted quite some code when browsing through the history.
  3. This patch is still large, did you at least consider splitting it as I proposed some weeks ago?

Additionally I don't see responses to all points that @tra raised. Their answers will probably explain the approach you chose, so I think they should also be added to the summary.

lib/Basic/Cuda.cpp
254–266

This means you can't compile when the Clang driver detected an installation of CUDA 9?

lib/Basic/Targets/AMDGPU.h
319

How can LangAS::opencl_local ever be used in CUDA? I think this check is redundant.

lib/Driver/Driver.cpp
3262–3263

I'm not sure I understand this change to the generic driver code: How can LLVM IR / BC ever need a preprocessor?

3954–3957

If this is really needed this deserves some justification.

lib/Driver/ToolChains/Cuda.cpp
293

I don't see this file in the upstream repository?

359–361

That sounds more like a hack, the CUDA classes should be separated from NVPTX and AMDGCN.

413

I don't think you should invent new environment variables, Clang normally uses the -X class to pass arguments to specific parts of the toolchain.

639–640

I wasn't only criticizing the name but also the fact that this code won't work with only upstream components!

mkuron added a subscriber: mkuron.Mar 31 2018, 2:29 AM
arsenm resigned from this revision.Feb 21 2019, 6:58 PM
gregrodgers commandeered this revision.May 14 2020, 4:10 PM
gregrodgers edited reviewers, added: yaxunl; removed: gregrodgers.
gregrodgers abandoned this revision.May 14 2020, 4:10 PM