This is an archive of the discontinued LLVM Phabricator instance.

[hip] Properly populate macros based on host processor.
ClosedPublic

Authored by hliao on Feb 3 2020, 8:22 PM.

Details

Summary
  • The device compilation needs to have a consistent source code compared to the corresponding host compilation. If macros based on the host-specific target processor is not properly populated, the device compilation may fail due to the inconsistent source after the preprocessor. So far, only the host triple is used to build the macros. If a detailed host CPU target or certain features are specified, macros derived from them won't be populated properly, e.g. __SSE3__ won't be added unless +sse3 feature is present. On Windows compilation compatible with MSVC, that missing macros result in that intrinsics are not included and cause device compilation failure on the host-side source.
  • This patch addresses this issue by introducing two cc1 options, i.e., -aux-target-cpu and -aux-target-feature. If a specific host CPU target or certain features are specified, the compiler driver will append them during the construction of the offline compilation actions. Then, the toolchain in cc1 phase will populate macros accordingly.

Event Timeline

hliao created this revision.Feb 3 2020, 8:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2020, 8:22 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tra added a comment.Feb 4 2020, 10:21 AM

On one hand the change makes sense to me and fits well with what we've done so far.
On the other hand, I worry that this is likely to break things.
We sort of have been implicitly relying on not having the macros related to advanced CPU features enabled on device side which typically results in device-side compilation seeing a more portable/simpler version of the code.
Defining the macros that enable more host-side CPU-specific code may trigger interesting compatibility features. Postponed diagnostics combined with ignore (some) errors in the wrong-side-only code should probably deal with most of them, but I suspect we'll see new interesting failure cases. We would not know until we try.

I would be more comfortable if we could have an option to disable this functionality with a command line option, at least temporarily. If things break, the option will avoid turning it into an emergency. If my concern turns out to be false, we can remove the option later.

hliao added a comment.Feb 4 2020, 10:53 AM
In D73942#1857384, @tra wrote:

On one hand the change makes sense to me and fits well with what we've done so far.
On the other hand, I worry that this is likely to break things.
We sort of have been implicitly relying on not having the macros related to advanced CPU features enabled on device side which typically results in device-side compilation seeing a more portable/simpler version of the code.
Defining the macros that enable more host-side CPU-specific code may trigger interesting compatibility features. Postponed diagnostics combined with ignore (some) errors in the wrong-side-only code should probably deal with most of them, but I suspect we'll see new interesting failure cases. We would not know until we try.

I would be more comfortable if we could have an option to disable this functionality with a command line option, at least temporarily. If things break, the option will avoid turning it into an emergency. If my concern turns out to be false, we can remove the option later.

sure, I would try to add option to control that. This patch is developed to fix a real issue found on HIP with MSVC without additional CPU settings, like -msse3. Look into the header lib/Headers/x86intrin.h in clang. The including of intrinsic headers is controlled by the pre-defined macros for MSVC compilation. If that macros are not defined correctly, the host code in device-compilation may trigger issues on missing prototypes or something else.

hliao updated this revision to Diff 242395.Feb 4 2020, 11:55 AM

Follow the reviewer's comment and an internal option
--hip-use-aux-triple-only to fall back the original behavior.

tra accepted this revision.Feb 4 2020, 12:18 PM

Thank you for adding the escape hatch option.

clang/include/clang/Driver/Options.td
552 ↗(On Diff #242395)

I think gpu- prefix would work better as it's common for HIP and CUDA.

This revision is now accepted and ready to land.Feb 4 2020, 12:18 PM
hliao updated this revision to Diff 242400.Feb 4 2020, 12:34 PM

s/--hip-use-aux-triple-only/--gpu-use-aux-triple-only/g

hliao marked an inline comment as done.Feb 4 2020, 12:36 PM
This revision was automatically updated to reflect the committed changes.