This is an archive of the discontinued LLVM Phabricator instance.

Add HIP toolchain
ClosedPublic

Authored by yaxunl on Apr 3 2018, 9:27 AM.

Details

Summary

This patch adds HIP toolchain to support HIP language mode. It includes:

Create specific compiler jobs for HIP.

Choose specific libraries for HIP.

With contribution from Greg Rodgers.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 3 2018, 9:27 AM
yaxunl edited the summary of this revision. (Show Details)Apr 3 2018, 9:31 AM
yaxunl updated this revision to Diff 140827.Apr 3 2018, 10:51 AM
yaxunl retitled this revision from [WIP][HIP] Let CUDA toolchain support HIP language mode to [HIP] Let CUDA toolchain support HIP language mode.
yaxunl edited the summary of this revision. (Show Details)

Fixed typo which causes lit test failures. Now all lit tests pass.

The other changes I see here seem reasonable, but please do split the patch.

include/clang/Basic/Cuda.h
61

Does this actually have anything to do with HIP? You have a lot of changes in this patch which seem to just be about supporting more GPU revisions.

include/clang/Driver/Options.td
556

Do we absolutely need the non-CUDA-related aliases here? We generally try to be good about namespacing extension-specific language options.

I understand that you're probably trying to maintain command-line compatibility with some existing toolchain, but if it's possible to avoid this, I would be much happier.

include/clang/Driver/ToolChain.h
124

"Backend" is a really generic name for this thing that is probably hyper-specific to the CUDA translation model.

yaxunl retitled this revision from [HIP] Let CUDA toolchain support HIP language mode to [HIP] Let CUDA toolchain support HIP language mode and amdgpu.Apr 4 2018, 7:29 AM
yaxunl marked an inline comment as done.Apr 4 2018, 7:50 AM
yaxunl added inline comments.
include/clang/Basic/Cuda.h
61

This patch not only adds support of HIP language mode but also adds support of amdgpu to CUDA toolchain.

Currently HIP extension is only supported by amdgpu although in the future it may be supported by other targets.

include/clang/Driver/Options.td
556

There were discussions about a uniform clang option for offloading sub-arcs

http://lists.llvm.org/pipermail/llvm-dev/2017-February/109896.html

the consensus seem to be -offload-arch or -offload-archs.

This patch attempts to make that transition to use this new option.

We can separate this change to a different patch though.

include/clang/Driver/ToolChain.h
124

Agree. This tool actually links bunch of bitcode libraries (so called device libraries). For non-GPU targets, this is usually unnecessary since they support ISA-level linking. However most GPU targets do not support that, therefore they need this stage.

How about renaming it as BitcodeLink?

rjmccall added inline comments.Apr 4 2018, 9:10 AM
include/clang/Basic/Cuda.h
61

I understand that, but I think you can separate those two patches without too much difficulty.

include/clang/Driver/Options.td
556

I don't mind the -offload-* options; I'm more concerned about --host-only and so on.

include/clang/Driver/ToolChain.h
124

DeviceLibraryLink, maybe? I wouldn't want someone to think this was related to ordinary LTO.

Hahnfeld added inline comments.
include/clang/Basic/Cuda.h
61

There already is D42800 with lots of unanswered comments, so those need to be addressed first.

yaxunl marked 4 inline comments as done.Apr 4 2018, 10:26 AM
yaxunl added inline comments.
include/clang/Basic/Cuda.h
61

Will split the changes for supporting amdgcn to another review.

61

Will address those comments in the split review.

include/clang/Driver/Options.td
556

Will drop those options for now. Since HIP is just a language mode of CUDA, it is OK to use CUDA options e.g. 'cuda-device-only' for HIP.

include/clang/Driver/ToolChain.h
124

That sounds good. Will change to that.

yaxunl updated this revision to Diff 141221.Apr 5 2018, 2:18 PM
yaxunl marked an inline comment as done.
yaxunl edited the summary of this revision. (Show Details)

Revised by reviewers' comments, including comments from previous review.

This looks okay to me, but I think it would better if someone with more expertise in the design of the driver and frontend code could review this.

lib/Driver/Driver.cpp
2267

The extra parens are weird here.

Can this revision be split further? The summary mentions many things that might make up multiple independent changes...

lib/Driver/ToolChains/Cuda.cpp
264

Will this override the user's value, e.g. -std=c++14?

yaxunl marked 2 inline comments as done.Apr 9 2018, 9:42 AM
yaxunl added inline comments.
lib/Driver/Driver.cpp
2267

will remove it.

lib/Driver/ToolChains/Cuda.cpp
264

No. We will add diagnotics in a separate patch.

yaxunl updated this revision to Diff 141863.Apr 10 2018, 8:48 AM
yaxunl marked 2 inline comments as done.
yaxunl edited the summary of this revision. (Show Details)

Separate file type changes to another patch.

Can this revision be split further? The summary mentions many things that might make up multiple independent changes...

I separated file type changes to https://reviews.llvm.org/D45489 and deferred some other changes for future.

It is kind of difficult to split this patch further since they depend on each other.

Can this revision be split further? The summary mentions many things that might make up multiple independent changes...

I separated file type changes to https://reviews.llvm.org/D45489 and deferred some other changes for future.

It is kind of difficult to split this patch further since they depend on each other.

You can add Related Revisions to make this easy to see.

rjmccall accepted this revision.Apr 13 2018, 12:16 AM

I'd still prefer if someone with more driver-design expertise weighed in, but we might not have any specialists there.

LGTM, although you might consider changing your tests a bit: FileCheck recently added support for a -D argument where you can predefine variables in the command line. But that's just a suggestion, not something I'm asking you to do as part of review.

This revision is now accepted and ready to land.Apr 13 2018, 12:16 AM

I'd still prefer if someone with more driver-design expertise weighed in, but we might not have any specialists there.

I think you should at least give @tra the possibility to take a look. Last time we essentially ended up reverting a patch.

tra requested changes to this revision.Apr 17 2018, 4:30 PM
tra added a subscriber: jlebar.

Sorry about the delay. I've been out for few days.

include/clang/Driver/Options.td
555–556

The discussion you've mentioned appears to be unfinished. @jlebar has raised a valid point regarding -no-offload-arch[s] and his email went unanswered AFAICT.

If you propose to generalize a way to specify offload targets, it should probably be done in a separate patch.

I'd just stick with --cuda-gpu-arch for the time being until we figure out how we're going to handle target specification in general. It works well enough for the moment and the new options are not required for the functionality implemented by this patch.

lib/Driver/Driver.cpp
2333–2339

You really want to either have your own error message or change existing one to something more generic. unsupported use of NVPTX for host compilation. will sound very confusing tof someone who tries to compile for AMD GPU.

3319

Would it make sense for HIP to have its own offloading kind? You seem to be adding similar checks in few other places.

3472–3474

llvm::sys::path::extension ?

3905–3912

I'm not sure why we do this here. Nor does it seem relevant to HIP.

3982–3985

Same here -- I'm not sure why we need this nor how it's related to HIP.

lib/Driver/ToolChains/Cuda.cpp
362

L*L*C_OUTPUT ?

366

Please use routines in`llvm::sys::path` here and in other places where you manipulate paths.

370–373

You could use .append({...})

CmdArgs2.append({"foo","bar","buz"});
446–449

This is rather awkward -- you're looking for /libdevice under all paths specified by -L, but there's no way to explicitly point to the directory with the bitcode library. If device library may be in a non-canonical location, I'd rather there was an explicit option to specify it. Furthermore, my understanding is that you will need to find these bitcode libraries during -c compilation. Using -L to derive bitcode search path during compilation looks wrong to me.

455

This (and other places where you're calculating libdevice path relative to driver dir) should probably be derived from the resource dir. Driver's path may not be the 'root' the compiler has been told to use.

457

LIBRARY_PATH sounds rather generic. Perhaps it should have HIP somewhere in its name.

472

Why do we need to silence the warnings?

508–511

This does not look like the right place to disable particular passes. Shouldn't it be done somewhere in LLVM?

This revision now requires changes to proceed.Apr 17 2018, 4:30 PM
yaxunl marked 21 inline comments as done.Apr 19 2018, 7:25 PM
yaxunl added inline comments.
include/clang/Driver/Options.td
555–556

Will use --cuda-gpu-arch for this patch.

lib/Driver/Driver.cpp
2333–2339

Will generalise this error message.

3319

Yes it makes sense to let HIP have its own offloading kind. Will do.

3472–3474

will do

3905–3912

will remove it

3982–3985

will remove

lib/Driver/ToolChains/Cuda.cpp
362

will change to LLC_OUTPUT

366

will do

370–373

will do

446–449

Will use --hip-device-lib-path= and drop /libdevice.

455

will remove this

457

will use HIP_DEVICE_LIB_PATH

472

will remove this

yaxunl added inline comments.Apr 19 2018, 7:25 PM
lib/Driver/ToolChains/Cuda.cpp
508–511

These are not disabling the passes but adding these passes. They are some optimizations which are usually improving performance for amdgcn.

yaxunl updated this revision to Diff 143218.Apr 19 2018, 7:54 PM
yaxunl marked 14 inline comments as done.

Revised by Artem's comments.

yaxunl updated this revision to Diff 143298.Apr 20 2018, 5:54 AM
yaxunl edited the summary of this revision. (Show Details)

sync to ToT.

yaxunl updated this revision to Diff 143307.Apr 20 2018, 6:27 AM

minor bug fix: do not add CUDA specific link options for HIP.

yaxunl marked an inline comment as done.Apr 20 2018, 8:26 AM
yaxunl added inline comments.
lib/Driver/ToolChains/Cuda.cpp
508–511

Since opt is now able to adjust passes based on -mtriple and -mcpu, will remove these manually added passes and add -mtriple and -mcpu instead.

yaxunl updated this revision to Diff 143320.Apr 20 2018, 8:29 AM
yaxunl marked an inline comment as done.

Remove manually added passes from opt and add -mtriple.

yaxunl marked 14 inline comments as done.Apr 26 2018, 4:08 AM

ping

yaxunl updated this revision to Diff 145297.May 4 2018, 2:19 PM

clean up code and separate action builder to another review.

Hi Artem,

I've addressed your comments. Any further changes are needed? Thanks.

tra added a comment.May 18 2018, 3:17 PM

Hi,

Sorry about the long silence. I'm back to continue the reviews. I'll handle what I can today and will continue with the rest on Tuesday.

It looks like patch description needs to be updated:

Use clang-offload-bindler to create binary for device ISA.

I don't see anything related to offload-bundler in this patch any more.

include/clang/Driver/Options.td
581

I'm not sure about i_Group? This will cause this option to be passed to all preprocessor jobs. It will also be passed to host and device side compilations, while you probably only want/need it on device side only.

lib/Driver/ToolChains/Cuda.cpp
340

FullName is already result of Args.MakeArgString. You only need to do it once.

346

This function is too large to easily see that we're actually constructing sequence of commands.
I'd probably split construction of individual tool's command line into its own function.

353

No need for the leading space in the message.

361–362

for (const InputInfo &it : Inputs) ?

367

All-caps name looks like a macro. Rename to GfxName ?

371–376
for (path : Args.getAllArgValues(...)) {
   LibraryPaths.push_back(Args.MakeArgString(path));
}
392–395

This is somewhat unreadable. Perhaps you could construct the name in a temp variable.

401

You don't need to use c_str() for MakeArgString. It will happily accept std::string.

411

BitcodeOutputFile?

434

I think you can get rid of the temp var here without hurting readability.

437

I wonder if we could derive temp file name from the input's name. This may make it easier to find relevant temp files if/when we need to debug the compilation process.

439

No need for c_str() here.

456

c_str(), again.

825

I'd just add something like this and leave existing if unchanged:

// There's no need for CUDA-specific bitcode linking with HIP.
if( DeviceOffloadingKind == Action::OFK_HIP)
  return;
992–1000

All these should be in the derived toolchain.

lib/Driver/ToolChains/Cuda.h
139–140

Where does amdgcn-link come from? Does it accept --options-file ?

159–160

You may want to derive your own HIPToolChain as you're unlikely to reuse NVPTX-specific tools.

180

In HIPToolchain this one would become an inline function returning true.

tra added a comment.May 18 2018, 3:51 PM

One more thing -- it would be really good to add some tests to make sure your commands are constructed the way you want.

yaxunl marked 19 inline comments as done.May 23 2018, 7:59 AM
In D45212#1105177, @tra wrote:

Hi,

Sorry about the long silence. I'm back to continue the reviews. I'll handle what I can today and will continue with the rest on Tuesday.

It looks like patch description needs to be updated:

Use clang-offload-bindler to create binary for device ISA.

I don't see anything related to offload-bundler in this patch any more.

You are right. Using clang-offload-bundler to create binary for device ISA has been moved to another patch. Will update the description of this patch.

In D45212#1105282, @tra wrote:

One more thing -- it would be really good to add some tests to make sure your commands are constructed the way you want.

will do

include/clang/Driver/Options.td
581

will change to Link_Group

lib/Driver/ToolChains/Cuda.cpp
340

will fix

346

will do

353

will fix.

361–362

will fix

367

will fix

371–376

will fix

392–395

will do

401

will fix

411

will change

434

will do

437

will do

439

will do

456

will fix

825

will change

992–1000

will do

lib/Driver/ToolChains/Cuda.h
139–140

amdgcn-link is the short name of the amdgcn bitcode linker. It is not a real program and does not support response file. Will remove the arguments about response file.

159–160

Will do

180

will do

yaxunl updated this revision to Diff 148216.May 23 2018, 8:05 AM
yaxunl marked 19 inline comments as done.
yaxunl retitled this revision from [HIP] Let CUDA toolchain support HIP language mode and amdgpu to Add HIP toolchain.
yaxunl edited the summary of this revision. (Show Details)

Revised by Artem's comments.

tra added inline comments.May 23 2018, 12:13 PM
lib/Driver/ToolChains/HIP.cpp
29–47 ↗(On Diff #148216)

FullName may remain uninitialized if LibraryPaths are empty which will probably crash compiler when you attempt to pass it to MakeArgString.
If empty LibraryPaths is not expected there should be an assert.

If the library is not found, we issue an error, but we still proceed to append the FullName to the CmdArgs. I don't think we should do that. FullName will be either NULL or pointing to the last directory in the LibraryPaths.

You seem to be relying on diagnostics to deal with errors and are not using return value of the function. You may as well make it void.

I'd move CmdArgs.push_back(...) under if(::exists(FullName)) and change break to return;
Then you can get rid of FoundLibDevice and just issue the error if we ever reach the end of the function.

79–81 ↗(On Diff #148216)

No need for intermediate values here -- just '+' all parts together.

133 ↗(On Diff #148216)

Nit: I think explicit llvm::Twine is unnecessary here.

155–160 ↗(On Diff #148216)

Nit: THis could be collapsed into ArgStringList LlcArgs({...});

179–181 ↗(On Diff #148216)

Same here: ArgStringList LldArgs({"-flavor", "gnu", "--no-undefined", "-shared", "-o"});

212–215 ↗(On Diff #148216)

Right now the code is structured as if you're appending to the same TempFile string which is not the case here. I'd give intermediate variables their own names -- OptCommand,LlcCommand.
This would make it easier to see that you are chaining separate commands, each producing its own temp output file.

yaxunl marked 6 inline comments as done.May 23 2018, 1:31 PM
yaxunl added inline comments.
lib/Driver/ToolChains/HIP.cpp
29–47 ↗(On Diff #148216)

Will CmdArgs.push_back(...) under if(::exists(FullName)) and change break to return; and change return type to void.

79–81 ↗(On Diff #148216)

will do

133 ↗(On Diff #148216)

will remove

155–160 ↗(On Diff #148216)

will do

179–181 ↗(On Diff #148216)

will do

212–215 ↗(On Diff #148216)

will do

yaxunl updated this revision to Diff 148277.May 23 2018, 1:34 PM
yaxunl marked 6 inline comments as done.

Revised by Artem's comments.

tra accepted this revision.May 23 2018, 1:42 PM

One small nit. LGTM otherwise.

lib/Driver/ToolChains/HIP.cpp
44 ↗(On Diff #148277)

You don't need FoundLibDevice any more as you will always return from inside the loop if it is ever true.

This revision is now accepted and ready to land.May 23 2018, 1:42 PM
yaxunl marked an inline comment as done.May 23 2018, 1:47 PM
yaxunl added inline comments.
lib/Driver/ToolChains/HIP.cpp
44 ↗(On Diff #148277)

will remove when committing. Thanks!

This revision was automatically updated to reflect the committed changes.
yaxunl marked an inline comment as done.