- adds -aux-triple option to specify target triple
- propagates aux target info to AST context and Preprocessor
- pulls in target specific preprocessor macros.
- pulls in target-specific builtins from aux target.
- sets appropriate host or device attribute on builtins.
In order to compile CUDA source with mixed host/device code without physically separating host and device code, we need to be able to parse code that may contain target-specific constructs from both host and device targets.
During device-side compilation we need to be able to include any host includes we may encounter. Similarly, during host compilation, we do need to include device-side includes that may be required for device-specific code in the file to parse. In both cases, we need to fake target environment well enough for the headers to work (I.e. x86 host's headers want to see amd64 or i386 defined, CUDA includes are looking for NVPTX-specific macros).
We also need to be able to parse target-specific builtins from both host and device targets in the same TU.
Generally speaking it's not possible to achieve this in all cases. Fortunately, CUDA's case is simpler and proposed patch works pretty well in practice:
- clang already implements attribute-based function overloading which allows avoiding name clashes between host and device functions.
- basic host and device types are expected to match, so a lot of type-related predefined macros from host and device targets have the same value.
- host includes (x86 on linux) do not use predefined NVPTX-specific macros, so including them is not a problem.
- builtins from the aux target are only used for parsing and AST construction. CUDA never generates IR for them. If a builtin is used from a wrong context, it will violate calling restriction and produce an error.
- this change includes *all* builtins from the aux target which provides us with a superset of builtins that would be available on the opposite side of the compilation which will allow creating different ASTs on host and device sides. IMO it's similar to already-existing issue of diverging host/device code caused by common (ab)use of CUDA_ARCH macro.