This is an archive of the discontinued LLVM Phabricator instance.

Properly handle -fsplit-machine-functions for fatbinary compilation
ClosedPublic

Authored by shenhan on Aug 11 2023, 12:40 PM.

Details

Summary

When building a fatbinary, the driver invokes the compiler multiple times with different "--target". (For example, with "-x cuda --cuda-gpu-arch=sm_70" flags, clang will be invoded twice, once with --target=x86_64_...., once with --target=sm_70) If we use -fsplit-machine-functions or -fno-split-machine-functions for such invocation, the driver reports an error.

This CL changes the behavior so:

  1. "-fsplit-machine-functions" is now passed to all targets, for non-X86 targets, the flag is a NOOP and causes a warning.
  2. "-fno-split-machine-functions" now negates -fsplit-machine-functions (if -fno-split-machine-functions appears after any -fsplit-machine-functions) for any target triple, previously, it causes an error.
  3. "-fsplit-machine-functions -Xarch_device -fno-split-machine-functions" enables MFS on host but disables MFS for GPUS without warnings/errors.
  4. "-Xarch_host -fsplit-machine-functions" enables MFS on host but disables MFS for GPUS without warnings/errors.

Diff Detail

Event Timeline

shenhan created this revision.Aug 11 2023, 12:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2023, 12:40 PM
shenhan requested review of this revision.Aug 11 2023, 12:40 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 11 2023, 12:40 PM
snehasish added inline comments.
llvm/lib/CodeGen/TargetPassConfig.cpp
1278 ↗(On Diff #549493)

Can you coordinate with @dhoekwater ? He has some patches in flight for AArch64.

I think D157157 is the one which modifies the same logic.

llvm/test/CodeGen/X86/mfs-triple.ll
8 ↗(On Diff #549493)

Any reason why we can't use the bitcode already in test/CodeGen/machine-function-splitter.ll? (Going to be moved to test/Generic/machine-function-splitter.ll in D157563)

IMO we can just reuse the basic test and add these run and check lines.

tra added a subscriber: tra.Aug 11 2023, 2:05 PM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

We will still see a warning, right? So, for someone compiling with -Werror that's going to be a problem.

Also, if the warning is issued from the top-level driver, we may not even be able to suppress it when we disable splitting on GPU side with -Xarch_device -fno-split-machine-functions.

shenhan added inline comments.Aug 11 2023, 2:53 PM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

We will still see a warning, right?

Yes, there still will be a warning. We've discussed it and we think that pass -fsplit-machine-functions in this case is not a proper usage and a warning is warranted, and it is not good that skip doing split silently while uses explicitly ask for it.

Also, if the warning is issued from the top-level driver

The warning will not be issued from the top-level driver, it will be issued when configuring optimization passes.
So:

  • -fsplit-machine-functions -Xarch_device -fno-split-machine-functions

Will enable MFS for host, disable MFS for gpus and without any warnings.

  • -Xarch_host -fsplit-machine-functions

The same as the above

  • -Xarch_host -fsplit-machine-functions -Xarch_device -fno-split-machine-functions

The same as the above

shenhan edited the summary of this revision. (Show Details)Aug 11 2023, 2:55 PM
shenhan edited the summary of this revision. (Show Details)
shenhan added inline comments.Aug 11 2023, 2:58 PM
llvm/lib/CodeGen/TargetPassConfig.cpp
1278 ↗(On Diff #549493)

Thanks. Yes, I'll coordinate with @dhoekwater before resolving this.

arsenm added a subscriber: arsenm.Aug 11 2023, 3:05 PM
arsenm added inline comments.
llvm/lib/CodeGen/TargetPassConfig.cpp
1281–1282 ↗(On Diff #549493)

You cannot spam warnings here. The other instance of printing here looks like a new addition and should be removed

tra added inline comments.Aug 11 2023, 3:14 PM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

We've discussed it and we think that pass -fsplit-machine-functions in this case is not a proper usage and a warning is warranted, and it is not good that skip doing split silently while uses explicitly ask for it.

I would agree with that assertion if we were talking exclusively about CUDA compilation.
However, a common real world use pattern is that the flags are set globally for all C++ compilations, and then CUDA compilations within the project need to do whatever they need to to keep things working. The original user intent was for the option to affect the host compilation. There's no inherent assumption that it will do anything useful for the GPU.

In number of similar cases in the past we did settle on silently ignoring some top-level flags that we do expect to encounter in real projects, but which made no sense for the GPU. E.g. sanitizers. If the project is built w/ sanitizer enabled, the idea is to sanitize the host code, The GPU code continues to be built w/o sanitizer enabled.

Anyways, as long as we have a way to deal with it it's not a big deal one way or another.

-fsplit-machine-functions -Xarch_device -fno-split-machine-functions
Will enable MFS for host, disable MFS for gpus and without any warnings.

OK. This will work.

shenhan added inline comments.Aug 14 2023, 12:02 PM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

In number of similar cases in the past we did settle on silently ignoring some top-level flags that we do expect to encounter in real projects, but which made no sense for the GPU. E.g. sanitizers. If the project is built w/ sanitizer enabled, the idea is to sanitize the host code, The GPU code continues to be built w/o sanitizer enabled.

Can I understand it this way - if the compiler is only building for CPUs, then silently ignore any optimization flags is not a good behavior. If the compiler is building CPUs and GPUs, it is still not a good behavior to silently ignore optimization flags for CPUs, but it is probably ok to silently ignore optimization flags for GPUs.

OK. This will work.

Thanks for confirming.

llvm/lib/CodeGen/TargetPassConfig.cpp
1281–1282 ↗(On Diff #549493)

Thanks. Do you suggest moving the warnings to the underlying pass? (Although that means we create passes that only issue warnings.)

arsenm added inline comments.Aug 14 2023, 12:05 PM
llvm/lib/CodeGen/TargetPassConfig.cpp
1281–1282 ↗(On Diff #549493)

Move it to the pass, and use a backend remark, not directly print to the console (e.g. DiagnosticInfoUnsupported)

tra added inline comments.Aug 14 2023, 12:23 PM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

it is probably ok to silently ignore optimization flags for GPUs.

In this case, yes.

I think the most consistent way to handle the situation is to keep the warning in place at cc1 compiler level, but change the driver behavior (and document it) so that it does not pass the splitting options to offloading sub-compilations. This way we'll do the sensible thing for the most common use case, yet would still warn if the user tries to enable the splitting where they should not (e.g. by using -Xclang -fsplit-machine-functions during CUDA compilation)

shenhan updated this revision to Diff 550137.Aug 14 2023, 4:44 PM
shenhan marked an inline comment as done.
shenhan added inline comments.
llvm/lib/CodeGen/TargetPassConfig.cpp
1281–1282 ↗(On Diff #549493)

Thanks, created DiagnosticInfoMachineFunctionSplit and moved the warning to MFS pass.

shenhan updated this revision to Diff 550423.Aug 15 2023, 12:11 PM
shenhan marked an inline comment as done.
shenhan marked 3 inline comments as done.Aug 15 2023, 12:15 PM
shenhan added inline comments.
llvm/lib/CodeGen/TargetPassConfig.cpp
1278 ↗(On Diff #549493)

@dhoekwater will rebase D157157 on top of this.

llvm/test/CodeGen/X86/mfs-triple.ll
8 ↗(On Diff #549493)

Moved the tests into machine-function-splitter.ll. Either this CL or D157563 can be submitted first, and the other will rebase on top of that.

shenhan updated this revision to Diff 550834.Aug 16 2023, 11:56 AM
shenhan marked an inline comment as done.
shenhan added inline comments.
llvm/test/CodeGen/X86/mfs-triple.ll
8 ↗(On Diff #549493)

Rebased on D157563.

shenhan updated this revision to Diff 550885.Aug 16 2023, 2:10 PM
This revision is now accepted and ready to land.Aug 16 2023, 2:28 PM
xur accepted this revision.Aug 16 2023, 3:38 PM

lgtm

dhoekwater added a comment.EditedAug 16 2023, 3:41 PM

This patch will make it difficult to write tests for MFS on AArch64 before it is officially enabled. Currently, because clang performs the Triple check, we can use -enable-split-machine-functions to run tests with MFS on Arm, but after this patch the flag won't do anything. Is there a way that we can land this patch while still making MFS testable on AArch64?

shenhan updated this revision to Diff 550913.Aug 16 2023, 3:41 PM

Refined the test case a little bit.

This patch will make it difficult to write tests for MFS on AArch64 before it is officially enabled. Currently, because clang performs the Triple check, we can use -enable-split-machine-functions to run tests with MFS on Arm, but after this patch the flag won't do anything. Is there a way that we can land this patch while still making MFS testable on AArch64?

Discussed with @dhoekwater offline, we decide to use a temporary hidden flag, something like "enable-mfs-for-debugging/testing", to force enable MFS for any triple during the period when MFS for arm is progressively rolled out, and when all is done, we will remove that temporary flag.

This revision was landed with ongoing or failed builds.Aug 16 2023, 11:46 PM
This revision was automatically updated to reflect the committed changes.
Hahnfeld added a subscriber: Hahnfeld.EditedAug 17 2023, 1:07 AM

I disabled two tests without {arm,aarch64}-registered-target in rGeeac4321c517ee8afc30ebe62c5b1778efc1173d; two post-commit comments inline

llvm/test/CodeGen/Generic/machine-function-splitter.ll
18

shouldn't this be MFS_ON-NOT?

21
steelannelida added inline comments.
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
16

Unfortunately these commands fail in our sandbox due to writing files to readonly directories:

unable to open output file 'fsplit-machine-functions-with-cuda-nvptx.s': 'Permission denied'

Could you please specify the output files via %t substitutions? I'm not sure how to do this for cuda compilation.

Hahnfeld added inline comments.Aug 17 2023, 5:08 AM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
16

IIRC the file names are generated based on what you specify with -o. Did you try this already?

tra added inline comments.Aug 17 2023, 10:29 AM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
16

The problem is that in this case we didn't pass any -o at all here, so the compiler tries to write into the current directory.

We need -o %t.s or -o /dev/null here.

MaskRay added inline comments.Aug 17 2023, 12:59 PM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

There are excessive spaces before %clang. We should keep just one space: RUN: %clang

16

Driver tests should not run the backend. Most driver commands should use -###.

REQUIRES: shell disables the internal shell, which essentially disables the test on Windows. This should generally be avoided.

An idiom is // RUN: rm -rf %t && mkdir %t && cd %t (see ftime-trace.cpp) if the driver places output files in the current working directory.

Thanks all. Created D158231 to address the post-submit comments and added all as reviewers. @MaskRay @Hahnfeld @steelannelida

Sorry, but I think this change should be reverted.

(a) -fsplit-machine-functions on an unsupported target now emits a warning instead of an error. This diverges from the regular expectation for target-specific features.

% fclang --target=riscv64 -fsplit-machine-functions -c a.c
warning: -fsplit-machine-functions is not valid for riscv64 [-Wbackend-plugin]

warn_drv_for_elf_only is not necessary. We typically just use err_drv_unsupported_opt_for_target.

(b) the test needs substantial change (D158231)

(c) The error reporting should be on the driver side, not in backend. void DiagnosticInfoMachineFunctionSplit::print(DiagnosticPrinter &DP) const is not necessary.
At the very least, it should not be in the generic IR/DiagnosticInfo.cpp

MaskRay added inline comments.Aug 19 2023, 11:23 AM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

I agree with @tra's analysis. Either do nothing on Clang side and requiring -fsplit-machine-functions -Xarch_device -fno-split-machine-functions or ignoring the option when creating a device job works for me.

This patch changed the behavior in an unintended direction.

shenhan added inline comments.Aug 21 2023, 10:16 AM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

Either do nothing on Clang side and requiring -fsplit-machine-functions -Xarch_device -fno-split-machine-functions or ignoring the option when creating a device job works for me.
This patch changed the behavior in an unintended direction.

Thanks Ray. Just a little bit confused, what this patch does is indeed "requiring -fsplit-machine-functions -Xarch_device -fno-split-machine-functions", before this patch, this usage will cause an error.

What do you suggest?

MaskRay added inline comments.Aug 21 2023, 10:22 AM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

I think we should go back to the state before this patch. Do either

(a) require -Xarch_device -fno-split-machine-functions for --cuda-gpu-arch users
(b) ignore -fsplit-machine-functions when creating a -triple" "nvptx64-nvidia-cuda job in the driver.

Personally I'd prefer (a), as I don't want to maintain a long list of ignored options in the driver for certain very specific optimization options, but I am fine with (b). In either case, I think we need to revert this patch to go to a clean state.

tra added inline comments.Aug 21 2023, 10:45 AM
clang/test/Driver/fsplit-machine-functions-with-cuda-nvptx.c
10

a) require -Xarch_device -fno-split-machine-functions for --cuda-gpu-arch users

It will break the currently valid compilation options and CUDA users would have no other viable option other than keep spamming their build flags with -Xarch_device -fno-split-machine-functions. E.g. if the flag is set build-wide on all compilations, it will break the previously working CUDA compilations due to no fault on the user side. Yes, -Xarch_device -fno-split-machine-functions will work around it, but it will have to be done by everyone who ever faces -fsplit-machine-functions in a CUDA compilation.

If we have to apply the workaround in 100% of the use cases, we should not require every user to do it.

(b) ignore -fsplit-machine-functions when creating a -triple" "nvptx64-nvidia-cuda job in the driver.

I believe that's the way to go. We already do that in a number of other cases like sanitizers, or other things not supported on the GPU side in principle.