This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Add setAuxTarget override rather than make a new TargetInfo
AbandonedPublic

Authored by rprichard on Jun 7 2022, 7:06 PM.

Details

Summary

Previously, NVPTXTargetInfo constructed a TargetInfo using the
TargetOptions::HostTriple field, but did not initialize the target CPU
or CPU features, which resulted in an inaccurate value for
MaxAtomicInlineWidth on x86_32. MaxAtomicInlineWidth is 64 if the host
TargetInfo is initialized with the +cx8 feature (e.g. 586 and up) but
is 32 otherwise.

Instead, add a setAuxTarget override and defer copying from the host
TargetInfo until CompilerInstance::createTarget calls it.

Change setAuxTarget to pass IntrusiveRefCntPtr<TargetInfo> instead of
a const TargetInfo*. NVPTXTargetInfo needs to retain the host
TargetInfo so that it can check calling conventions.

NVPTXTargetInfo was the only user of the TargetOptions::HostTriple
field, so remove the field.

Also see D29542 and D56318.

Diff Detail

Event Timeline

rprichard created this revision.Jun 7 2022, 7:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 7:06 PM
rprichard requested review of this revision.Jun 7 2022, 7:06 PM

This change is needed for D28213, so that the __GCC_ATOMIC_LLONG_LOCK_FREE macro matches for -target i386-unknown-linux-gnu between --cuda-host-only and --cuda-device-only. This is tested in clang/test/Preprocessor/cuda-types.cu.

yaxunl added a comment.Jun 8 2022, 7:58 AM

need a test

I don't expect this change to affect the compiler behavior by itself -- is there a particular test that should be written?

The code that I'm moving into NVPTXTargetInfo::setAuxTarget is already tested via clang/test/Preprocessor/cuda-types.cu.

Aside: There are only two calls to AllocateTarget: one in TargetInfo::CreateTargetInfo and one in NVPTXTargetInfo::NVPTXTargetInfo. This change removes NVPTXTargetInfo's call.

Aside: There are only two calls to AllocateTarget: one in TargetInfo::CreateTargetInfo and one in NVPTXTargetInfo::NVPTXTargetInfo. This change removes NVPTXTargetInfo's call.

There are various places that call clang::TargetInfo::CreateTargetInfo, though, so if a call site had been providing a clang::TargetOptions with a set HostTriple field, but not calling clang::TargetInfo::setAuxTarget, then maybe the NVPTXTargetInfo behavior could change. I'll audit the calls of clang::TargetInfo::CreateTargetInfo.

yaxunl added a comment.Jun 9 2022, 8:06 AM

This patch is to fix an issue, right? At least we need a test to prevent that issue from happening again.

This patch is to fix an issue, right? At least we need a test to prevent that issue from happening again.

Yes, this patch is necessary to keep the clang/test/Preprocessor/cuda-types.cu test passing after applying D28213. That test is verifying that __GCC_ATOMIC_LLONG_LOCK_FREE is the same for --cuda-device-only and --cuda-host-only.

D28213 fixes the value of __GCC_ATOMIC_LLONG_LOCK_FREE to be 2 when targeting 586 and up, which has the cx8 feature (cmpxchg8b). Without this NVPTX patch, the value of __GCC_ATOMIC_LLONG_LOCK_FREE is 2 for --cuda-host-only, but 1 for --cuda-device-only, because when NVPTXTargetInfo creates the host TargetInfo, it only uses the host triple, and doesn't set the target CPU nor initialize CPU features. (Specifically, NVPTXTargetInfo calls AllocateTarget but skips all the other work that TargetInfo::CreateTargetInfo does, like calling initFeatureMap, handleTargetFeatures, and setMaxAtomicWidth.)

tra added a subscriber: tra.Jun 9 2022, 2:55 PM

I think the root cause of the problem here is that CUDA compilation assumes that the GPU-side looks identical to the host side of the compilation. The test was intended to verify that and, AFAICT, it did its job flagging the issue here.

AFAICT, __GCC_ATOMIC_LLONG_LOCK_FREE=1 is the correct setting for the GPU.

In this particular case host and GPU do have different capabilities and the compiler will see a different subset of preprocessed sources, that's unfortunate, but unavoidable, unless we force the host to have value of 1.

Setting _GCC_ATOMIC_LLONG_LOCK_FREE=2 on the GPU side would result in an invalid code and we definitely do not want to do that.

I think the right thing to do here is to add __GCC_ATOMIC_LLONG_LOCK_FREE= to the list of ignored differences in the clang/test/Preprocessor/cuda-types.cu, similar to how we handle LONG_DOUBLE-related macros there.

Ok, I can upload a patch omitting *_ATOMIC_LLONG_LOCK_FREE from the macro testing.

FWIW, did you see this comment in clang/lib/Basic/Targets/NVPTX.cpp?:

// This is a bit of a lie, but it controls __GCC_ATOMIC_XXX_LOCK_FREE, and
// we need those macros to be identical on host and device, because (among
// other things) they affect which standard library classes are defined, and
// we need all classes to be defined on both the host and device.
MaxAtomicInlineWidth = HostTarget->getMaxAtomicInlineWidth();

It was added in D24407, apparently so that __GCC_ATOMIC_INT_LOCK_FREE would be 2 instead of 1. I imagine that's OK, because NVPTX does have atomic int?

I uploaded D127465, weakening the test.

rprichard abandoned this revision.Jul 20 2022, 1:46 PM

Abandoning this patch because D127465 is a better fix.