This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Fix size_t for MSVC environment
ClosedPublic

Authored by yaxunl on Jan 4 2019, 6:50 AM.

Details

Summary

In 64 bit MSVC environment size_t is defined as unsigned long long. Fix AMDGPU target info to match it in MSVC environment.

Diff Detail

Repository
rC Clang

Event Timeline

yaxunl created this revision.Jan 4 2019, 6:50 AM

What's the general idea here, that you're going to pretend to be the environment's "standard" CPU target of the right pointer width and try to match the ABI exactly? This seems like a pretty treacherous road to go down.

What's the general idea here, that you're going to pretend to be the environment's "standard" CPU target of the right pointer width and try to match the ABI exactly? This seems like a pretty treacherous road to go down.

The pointer width does not change. In both case it is 64 bit. The only difference is that MSVC uses unsigned long long as size_t whereas by default AMDGPU uses unsigned long as size_t. They have the same size but in AST they are different type. When HIP is compiled in MSVC environment, it has to use header files of MSVC. This nominal difference in size_t definition causes compilation error since MSVC header files contains typedef of size_t as unsigned long long. Since we cannot change header files of MSVC, we have to change our own size_t definition.

We do not want to change our device ABI.

No, no, I understand that you're not changing pointer sizes, but this is one example of trying to match the ABI of the target environment, and I'm trying to understand how far that goes. What does it mean to be in the "MSVC" environment when you're actually just compiling for the GPU? Why are you using OS headers in the first place? Do you need struct layout to match MSVC (presumably on x86-64)? What should happen with the C++ ABI?

yaxunl added a comment.EditedJan 4 2019, 11:29 AM

No, no, I understand that you're not changing pointer sizes, but this is one example of trying to match the ABI of the target environment, and I'm trying to understand how far that goes. What does it mean to be in the "MSVC" environment when you're actually just compiling for the GPU? Why are you using OS headers in the first place? Do you need struct layout to match MSVC (presumably on x86-64)? What should happen with the C++ ABI?

HIP is single source program. The same source code is compiled for both host and device. Since HIP is an extension to C++, it uses the C++ header files of the system. This is true for both host code and device code. On linux, both uses gcc header files. On windows, when MSVC is installed and default target environment is MSVC, the host compilation will use MSVC header files, so does the device compilation. For device compilation, most of the stuff in MSVC headers do not matter, e.g. function declarations, since they are for host. What matters are mostly type definitions. They should be consistent for both device and host. Since MSVC supports C++11, it should work. As an example, CUDA SDK supports MSVC.

In this patch, the driver checks the host target triple environment, if it is MSVC, it will attach -msvc to the device target triple which will be passed to device compilation. Then device compilation knows that it is using the MSVC header files and will make adjustment to be consistent with it.

Okay. Is there a reasonable way to make your targets delegate to a different TargetInfo implementation for most things so that you can generally match the host target for things like type sizes and alignments?

Okay. Is there a reasonable way to make your targets delegate to a different TargetInfo implementation for most things so that you can generally match the host target for things like type sizes and alignments?

There is TargetInfo for AuxTarget. In this case, the main target is amdgpu and the AuxTarget is x86_64. I am thinking maybe I can add a SizeTTarget pointer to ASTContext, and add a hook shouldDelegateSizeTTypeToAuxTarget to TargetInfo. If it is true, then ASTContext use size_t type in AuxTarget.

If I was only concerned about size_t, your current solution would be fine. My concern is that you really need to match *all* of the associated CPU target's ABI choices, so your target really ought to be forwarding everything to that target by default and only selectively overriding it in order to support GPU-specific features. Probably the easiest way to do that is via inheritance.

yaxunl added a comment.EditedJan 10 2019, 10:29 AM

If I was only concerned about size_t, your current solution would be fine. My concern is that you really need to match *all* of the associated CPU target's ABI choices, so your target really ought to be forwarding everything to that target by default and only selectively overriding it in order to support GPU-specific features. Probably the easiest way to do that is via inheritance.

We only need to match the type size and alignment in device and host compilation, but do not need to match function call ABI. In fact our backend has its own function ABI which is different from host on linux, but it does not preventing us from supporting HIP on linux. This is because the device kernel is launched through HIP runtime, which gets kernel argument size and offset from kernel image, and lays out the arguments for the kernel.

The latest CUDA kernel launching API cuLaunchKernel does similar thing (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15) . Basically the host code only needs to pass an array of pointer to the arguments, whereas "the number of kernel parameters and their offsets and sizes do not need to be specified as that information is retrieved directly from the kernel's image".

If the device backend has to switch to different ABI according to host environment, that will be very painful for the backend.

No, I understand that things like the function-call ABI should be different from the associated host ABI, but things like the size of long and the bit-field layout algorithm presumably shouldn't be, and that's the sort of thing that's configured by TargetInfo.

No, I understand that things like the function-call ABI should be different from the associated host ABI, but things like the size of long and the bit-field layout algorithm presumably shouldn't be, and that's the sort of thing that's configured by TargetInfo.

How about create a ForwardingTargegInfo which will has a pointer to AuxTarget and forward to that target if it is not null. Then let AMDGPUTargetInfo inherit from that.

No, I understand that things like the function-call ABI should be different from the associated host ABI, but things like the size of long and the bit-field layout algorithm presumably shouldn't be, and that's the sort of thing that's configured by TargetInfo.

How about create a ForwardingTargegInfo which will has a pointer to AuxTarget and forward to that target if it is not null. Then let AMDGPUTargetInfo inherit from that.

Why forward? You have, like, two supported host environments, right? Can you just a subclass apiece of either MicrosoftX86_64TargetInfo or X86_64TargetInfo?

If that's unreasonable and you do need to forward, having a ForwardingTargetInfo sounds like a good idea, although I think you should require it to have an underlying target, and I think you need it to copy all the fields of that target.

No, I understand that things like the function-call ABI should be different from the associated host ABI, but things like the size of long and the bit-field layout algorithm presumably shouldn't be, and that's the sort of thing that's configured by TargetInfo.

How about create a ForwardingTargegInfo which will has a pointer to AuxTarget and forward to that target if it is not null. Then let AMDGPUTargetInfo inherit from that.

Why forward? You have, like, two supported host environments, right? Can you just a subclass apiece of either MicrosoftX86_64TargetInfo or X86_64TargetInfo?

If that's unreasonable and you do need to forward, having a ForwardingTargetInfo sounds like a good idea, although I think you should require it to have an underlying target, and I think you need it to copy all the fields of that target.

There are lots of child class of X86_64TargetInfo, e.g., CygwinX86_64TargetInfo, MicrosoftX86_64TargetInfo, MinGWX86_64TargetInfo, etc. to inherit each one of them will result in duplicated code. Also, many stuff in these TargetInfo do not apply to AMDGPU target. I think I should only selectively copy the relevant fields.

yaxunl updated this revision to Diff 181326.Jan 11 2019, 11:09 AM

Copy type information from AuxTarget.

It's pretty unfortunate that all these fields have to be individually called out like this. Can you move all these basic layout fields into a separate struct (which can be a secondary base class of TargetInfo) which can then just be normally copied? Anything that needs special copy semantics, like the LLVM DataLayout (do you need to copy this?) doesn't need to go into that struct, just the basic POD things that determine fundamental type layouts and semantics.

yaxunl updated this revision to Diff 182815.Jan 21 2019, 11:49 AM

separate layout controlling flags to a base class for TargetInfo.

It's pretty unfortunate that all these fields have to be individually called out like this. Can you move all these basic layout fields into a separate struct (which can be a secondary base class of TargetInfo) which can then just be normally copied? Anything that needs special copy semantics, like the LLVM DataLayout (do you need to copy this?) doesn't need to go into that struct, just the basic POD things that determine fundamental type layouts and semantics.

LLVM DataLayout contains target specific stuff and cannot be simply copied. So far we did not see necessity to adjust device data layout for host.

rjmccall added inline comments.Jan 23 2019, 10:27 PM
include/clang/Basic/TargetInfo.h
50–54

"Fields controlling how types are laid out in memory; these may need to be copied for targets like AMDGPU that base their ABIs on an auxiliary CPU target."

196

Why is this flag necessary? Can't setAuxTarget just decide whether or not to copy?

Specifically, I would suggest:

  • Make copyAuxTarget be a non-virtual protected method that unconditionally copies the target.
  • Make setAuxTarget a virtual method that does nothing by default.
  • Override setAuxTarget for AMDGPU and call copyAuxTarget.
yaxunl updated this revision to Diff 184238.Jan 29 2019, 7:02 PM

Revised by John's comments.

One minor change and then LGTM.

include/clang/Basic/TargetInfo.h
1352

This can take a const TargetInfo *, which also very clearly documents expectations.

yaxunl updated this revision to Diff 184245.Jan 29 2019, 8:20 PM

Use const argument.

rjmccall accepted this revision.Jan 29 2019, 8:30 PM

Thanks, LGTM.

This revision is now accepted and ready to land.Jan 29 2019, 8:30 PM
This revision was automatically updated to reflect the committed changes.