This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Allow parsing of host and device code simultaneously.
ClosedPublic

Authored by tra on Sep 16 2015, 5:22 PM.

Details

Summary

The patch:

  • adds -aux-triple option to specify target triple
  • propagates aux target info to AST context and Preprocessor
  • pulls in target specific preprocessor macros.
  • pulls in target-specific builtins from aux target.
  • sets appropriate host or device attribute on builtins.

Rationale:

In order to compile CUDA source with mixed host/device code without physically separating host and device code, we need to be able to parse code that may contain target-specific constructs from both host and device targets.

During device-side compilation we need to be able to include any host includes we may encounter. Similarly, during host compilation, we do need to include device-side includes that may be required for device-specific code in the file to parse. In both cases, we need to fake target environment well enough for the headers to work (I.e. x86 host's headers want to see amd64 or i386 defined, CUDA includes are looking for NVPTX-specific macros).

We also need to be able to parse target-specific builtins from both host and device targets in the same TU.

Generally speaking it's not possible to achieve this in all cases. Fortunately, CUDA's case is simpler and proposed patch works pretty well in practice:

  • clang already implements attribute-based function overloading which allows avoiding name clashes between host and device functions.
  • basic host and device types are expected to match, so a lot of type-related predefined macros from host and device targets have the same value.
  • host includes (x86 on linux) do not use predefined NVPTX-specific macros, so including them is not a problem.
  • builtins from the aux target are only used for parsing and AST construction. CUDA never generates IR for them. If a builtin is used from a wrong context, it will violate calling restriction and produce an error.
  • this change includes *all* builtins from the aux target which provides us with a superset of builtins that would be available on the opposite side of the compilation which will allow creating different ASTs on host and device sides. IMO it's similar to already-existing issue of diverging host/device code caused by common (ab)use of CUDA_ARCH macro.

Diff Detail

Event Timeline

tra updated this revision to Diff 34940.Sep 16 2015, 5:22 PM
tra retitled this revision from to [CUDA] Allow parsing of host and device code simultaneously..
tra updated this object.
tra added reviewers: echristo, eliben, jpienaar.
tra added a subscriber: cfe-commits.
tra updated this object.Sep 16 2015, 5:23 PM
jpienaar edited edge metadata.Sep 17 2015, 7:31 AM

Nice, so this will allow parsing/AST construction with builtins from 2 architectures but will fail to compile if a builtin for the host/device is called from device/host.

You mention this is not generally possible. Can you give some examples?

include/clang/Driver/CC1Options.td
329

You use aux target in all the errors to the user so perhaps for consistency "Triple for aux target". It could be more self-documenting too ("Triple for aux target used during CUDA compilation."?) as I don't know if a lot of people would be able to guess what the auxiliary triple is or where it is used.

include/clang/Frontend/CompilerInstance.h
354

Nit: period at the end for uniformity.

tra updated this revision to Diff 35015.Sep 17 2015, 10:29 AM
tra edited edge metadata.
tra marked an inline comment as done.

cosmetic fixes.

tra marked an inline comment as done.Sep 17 2015, 10:44 AM

Nice, so this will allow parsing/AST construction with builtins from 2 architectures but will fail to compile if a builtin for the host/device is called from device/host.

It CUDA target call checks are in effect, we'll get a usual error that H->D or D->H call is illegal. If we're compiling with -fcuda-disable-target-call-checks, then we'll fail during codegen phase complaining that builtin has not been implemented.

You mention this is not generally possible. Can you give some examples?

It's mostly related to mixing include files and preprocessor macros from multiple targets.
If an include is written to conditionally provide mutually exclusive constructs for targets A and B, then compiling for A with -aux-triple B will produce something uncompilable. For instance following include will fail due to redefinition of foo() if compiled with -triple x86_64 -aux-triple nvptx64

#ifdef __amd64__
int foo() {return 1;}
#endif
#ifdef __PTX__
int foo() {return 2;}
#endif
include/clang/Driver/CC1Options.td
329

Changed to "Auxiliary target triple." While CUDA uses the aux target triple for preprocessor macros and builtins, nothing stops anyone else using this information for other purposes, so I kept the name and the comment generic.

echristo edited edge metadata.Sep 17 2015, 12:59 PM

One inline request and one inline comment, otherwise looks pretty good!

Thanks :)

-eric

include/clang/Frontend/CompilerInstance.h
354

Go ahead and just commit this separately please. :)

lib/Frontend/CompilerInstance.cpp
82–87

Seems like an icky place to do this, perhaps where we create the Target?

tra updated this revision to Diff 35031.Sep 17 2015, 1:37 PM
tra edited edge metadata.
tra marked an inline comment as done.

Updated to address Eric's comments.

tra marked 2 inline comments as done.Sep 17 2015, 1:40 PM
tra added inline comments.
lib/Frontend/CompilerInstance.cpp
82–87

Added separate setter method for AuxTarget and moved TargetInfo creation to CompilerInstance::ExecuteAction as you've suggested.

echristo accepted this revision.Sep 17 2015, 1:41 PM
echristo edited edge metadata.

Works for me. Thanks!

-eric

This revision is now accepted and ready to land.Sep 17 2015, 1:41 PM
jpienaar accepted this revision.Sep 17 2015, 2:08 PM
jpienaar edited edge metadata.
This revision was automatically updated to reflect the committed changes.
tra marked an inline comment as done.