This is an archive of the discontinued LLVM Phabricator instance.

[Driver] Make clang/cc conforms to UNIX standard
ClosedPublic

Authored by steven_wu on Nov 1 2017, 12:57 PM.

Details

Summary

This is basically reverting r261774 with a tweak for clang-cl. UNIX
standard states:
When c99 encounters a compilation error that causes an object file not
to be created, it shall write a diagnostic to standard error and
continue to compile other source code operands, but it shall not perform
the link phase and it shall return a non-zero exit status

The same goes for c89 or cc. And they are all alias or shims pointing to
clang on Darwin.

The original commit was intended for CUDA so the error message doesn't
get emit twice for both host and device. It seems that the clang driver
has been changed to model the CUDA dependency differently. Now the
driver behaves the same without this commit.

rdar://problem/32223263

Diff Detail

Repository
rL LLVM

Event Timeline

steven_wu created this revision.Nov 1 2017, 12:57 PM
steven_wu added a comment.EditedNov 1 2017, 1:07 PM

More background here that doesn't fit in the commit message.

The tweak for cl mode is intended for r262420. I don't see any test breakage on my machine without that hack but I leave it in there just in case.

A little reasoning behind clang driver for CUDA. This is what I see now with clang (trunk):
$ clang -x cuda test.cu -ccc-print-bindings

  1. "nvptx64-nvidia-cuda" - "clang", inputs: ["test.cu"], output: "/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-fbc2e7.s"
  2. "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-fbc2e7.s"], output: "/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-beca72.o"
  3. "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-beca72.o", "/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-fbc2e7.s"], output: "/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-c3378c.fatbin"
  4. "x86_64-apple-darwin17.2.0" - "clang", inputs: ["test.cu", "/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-c3378c.fatbin"], output: "/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-193005.o"
  5. "x86_64-apple-darwin17.2.0" - "darwin::Linker", inputs: ["/var/folders/cg/pyj86ks15y74w3hkwhm_zltm0000gp/T/test-193005.o"], output: "a.out"

Note the host clang side has the inputs "test.cu" and "test-c3378c.fatbin". Which means if the device side compilation failed, the host side will not compile because InputOk will return false in this case. Same applies even if there are multiple CUDA inputs on the command line. clang will compile all the inputs even the first compilation failed but clang will not compile for host when device compilation failed for that input file. This is a better behavior than the current implementation.

I don't know what is the best way to write test cases for CUDA driver. If anyone can suggest the best way to write them, I will happily add them to the commit.
This is the behavior after the patch:

$ echo "invalid" > test.cu
$ clang -x cuda test.cu -nocudalib -nocudainc
test.cu:1:1: error: unknown type name 'invalid'
invalid
^
test.cu:1:8: error: expected unqualified-id
invalid
       ^
$ clang -x cuda test.cu test.cu -nocudalib -nocudainc
# Same error as above, twice, once for each input file.
dexonsmith edited edge metadata.Nov 1 2017, 2:03 PM

I have a few nitpicks, but otherwise this LGTM. I'd like to wait for someone on the CUDA side to confirm though.

lib/Driver/Compilation.cpp
185 ↗(On Diff #121167)

Have you run clang-format-diff.py on your patch? There's some funny whitespace here.

Also, you might consider splitting this out in a separate NFC commit.

193–196 ↗(On Diff #121167)

Ranged-based for seems better here:

for (const auto &CI : FailingCommands)
  if (A == &CI.second->getSource())
    return true;
218 ↗(On Diff #121167)

I think the longer comment was useful; just tweak it to be cl-mode-specific.

test/Driver/output-file-cleanup.c
53–58 ↗(On Diff #121167)

IIRC, the tests were like this before, and then they were changed as convenient when the CUDA code was introduced. I suggest adding a comment around here explaining *why* we expect 3.s and 5.s to exist (i.e., for UNIX conformance of the cc/c89/c99 aliases) so that the behaviour isn't changed again in the future without careful consideration.

steven_wu updated this revision to Diff 121403.Nov 2 2017, 4:01 PM

Address review feedback.

Also split out the testcase for UNIX conformance. That test is not related
to output file cleanup. Split it out to make it clear that is for UNIX
conformance.

Also split out the testcase for UNIX conformance. That test is not related
to output file cleanup. Split it out to make it clear that is for UNIX
conformance.

Great idea. Confirming this LGTM once someone on the CUDA side has a look.

steven_wu updated this revision to Diff 121410.Nov 2 2017, 4:41 PM

Fix testcase. My test was passing because files left over from previous run.
Now it should work with a clean build.

jlebar requested changes to this revision.EditedNov 5 2017, 8:16 PM

Note the host clang side has the inputs "test.cu" and "test-c3378c.fatbin". Which means if the device side compilation failed, the host side will not compile because InputOk will return false in this case

The patch handles this case correctly, thank you.

Unfortunately it's a bit more complicated than this. You can pass --cuda-gpu-arch=XXX multiple times when invoking clang, and that will result in multiple compilations for different GPU architectures. In this case we want to fail when the first one fails, on the theory that (most of the time) all compilations will have the same errors.

You can test this with e.g.

$ clang -x cuda test.cu -nocudalib -nocudainc --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_60

With this patch, clang prints the errors twice, once for sm_35 and once for sm_60. Without the patch, it only prints one set of errors, which is the expected behavior.

I don't know what is the best way to write test cases for CUDA driver.

Thanks for offering to write tests for this. See ls clang/test/Driver | grep '\.cu$', and let me know if you need help crafting a testcase.

This revision now requires changes to proceed.Nov 5 2017, 8:16 PM

Note the host clang side has the inputs "test.cu" and "test-c3378c.fatbin". Which means if the device side compilation failed, the host side will not compile because InputOk will return false in this case

The patch handles this case correctly, thank you.

Unfortunately it's a bit more complicated than this. You can pass --cuda-gpu-arch=XXX multiple times when invoking clang, and that will result in multiple compilations for different GPU architectures. In this case we want to fail when the first one fails, on the theory that (most of the time) all compilations will have the same errors.

You can test this with e.g.

$ clang -x cuda test.cu -nocudalib -nocudainc --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_60

With this patch, clang prints the errors twice, once for sm_35 and once for sm_60. Without the patch, it only prints one set of errors, which is the expected behavior.

Ok. I didn't thought about that. To fix that, I guess the driver need to invalidate all the offload input once one compilation failed. Let me see how to do that. Thanks for looking.

I don't know what is the best way to write test cases for CUDA driver.

Thanks for offering to write tests for this. See ls clang/test/Driver | grep '\.cu$', and let me know if you need help crafting a testcase.

steven_wu updated this revision to Diff 121779.EditedNov 6 2017, 1:40 PM
steven_wu edited edge metadata.

This seems to be the cleanest solution I can find for CUDA without
touching the job configuration for CUDA. My proposed fix is to bail
out of the jobs that are parts of CUDA offload if some command failed
before that. Now if you hit failures when compiling multiple GPU
architectures, you will error out only once. This is pretty much the exact
same behavior of current driver.

This is not the ideal solution but there is simply no information in the job
setup in current CUDA driver to detect if the input is same across different
jobs. For all the CUDA inputs, it is duplicated into different InputActions
and insert into different OffloadActions. You also cannot detect them by
looking at their name because the following should compiled the same file
twice even there are errors in it:

cc a.c a.c

Also, the reason I don't know how to craft a testcase is not because I have
trouble with CUDA driver, but how to write a test to check when did the driver
bailed out. Let me know if you have any suggestions.

tra edited edge metadata.Nov 6 2017, 2:08 PM

Also, the reason I don't know how to craft a testcase is not because I have
trouble with CUDA driver, but how to write a test to check when did the driver
bailed out. Let me know if you have any suggestions.

Here's one way to do it: Run 'cc -v', positively match cc1 invocations that worked and CHECK-NOT for cc1 invocations that would happen after bail-out point.
Use preprocessor to induce the failure during particular compilation phase:

# if __CUDA_ARCH__ == 350
#error Error during compilation for sm_35
#endif

#ifndef __CUDA_ARCH__
#error Error during host compilation
#endif
steven_wu updated this revision to Diff 121795.Nov 6 2017, 2:49 PM

Add testcase for CUDA driver

In D39502#917132, @tra wrote:

Also, the reason I don't know how to craft a testcase is not because I have
trouble with CUDA driver, but how to write a test to check when did the driver
bailed out. Let me know if you have any suggestions.

Here's one way to do it: Run 'cc -v', positively match cc1 invocations that worked and CHECK-NOT for cc1 invocations that would happen after bail-out point.
Use preprocessor to induce the failure during particular compilation phase:

# if __CUDA_ARCH__ == 350
#error Error during compilation for sm_35
#endif

#ifndef __CUDA_ARCH__
#error Error during host compilation
#endif

Thanks for the advice. Testcase added.

tra added inline comments.Nov 6 2017, 3:02 PM
test/Driver/cuda-bail-out.cu
31 ↗(On Diff #121795)

Missing ':' after CHECK-TWO-ARCHES-NOT.

You may want to include a compilation for a GPU arch that does *not* fail. Otherwise this test would be happy if we'd always fail during first device-side compilation. I believe we compile GPU arches in sorted order, so using --cuda-gpu-arch=sm_30 should work for this purpose.

E.g. something like this:

#ifdef __CUDA_ARCH__ == 300
#warning  OK. No error during compilation for sm_30.
#endif
steven_wu updated this revision to Diff 121807.Nov 6 2017, 3:45 PM

Improve testcase according to review feedback.

In order to let compilation to be successful for one input, I need to use
-fsyntax-only because I don't have nvptx assembler. Enable -fsyntax-only
changes the order of the compilation somehow so I need to reorder the
errors and warnings a little bit.

tra added a comment.Nov 6 2017, 4:31 PM

Improve testcase according to review feedback.

In order to let compilation to be successful for one input, I need to use
-fsyntax-only because I don't have nvptx assembler. Enable -fsyntax-only
changes the order of the compilation somehow so I need to reorder the
errors and warnings a little bit.

Right. With -fsyntax-only host compilation no longer depends on device-side binary. You may want to add a comment in the test explaining peculiar order of checks.

test/Driver/cuda-bail-out.cu
47 ↗(On Diff #121807)

To make it more robust, I'd add another copy of the last line before CHECK-HOST-ERROR: so it would catch device-side errors if they were to happen ahead of the host.

Generally speaking, expected behavior is "I want to see an error from one compilation only". We don't really care which CUDA compilation phase produces it. Perhaps all we need in all test cases is:

CHECK: Error during compilation
CHECK-NOT:  Error during compilation

This way we don't need to depend on specific phase order.

steven_wu added inline comments.Nov 6 2017, 4:52 PM
test/Driver/cuda-bail-out.cu
47 ↗(On Diff #121807)

That will be a design choice for CUDA driver. I have no preference going either direction. Just let me know so I will update the test case.

Speaking of "-fsyntax-only", this is another interesting behavior of clang cuda driver:

$ clang -x cuda /dev/null -x c /dev/null -ccc-print-phases
14: input, "/dev/null", c, (host-cuda)
$ clang -fsyntax-only -x cuda /dev/null -x c /dev/null -ccc-print-phases
9: input, "/dev/null", c

So depending on if -fsyntax-only is used or not, the c language part can be either offloading or not offloading.
This is a corner case that the driver behavior will change after this patch.

tra added inline comments.Nov 6 2017, 5:12 PM
test/Driver/cuda-bail-out.cu
47 ↗(On Diff #121807)

OK. Let's just check for one error only.

As for the second, it is, IMO a problem. The file after `-x c``` should have been treated as plain C input, regardless of -fsyntax-only. It's reproducible in clean clang tree, so it's not due to this patch.

steven_wu added inline comments.Nov 6 2017, 5:16 PM
test/Driver/cuda-bail-out.cu
47 ↗(On Diff #121807)

The corner case I am talking about is after this patch, "-x c" will be syntax checked even if "-x cuda" failed the syntax check (which sounds good) but only when using "-fsyntax-only"

I am updating the test case.

steven_wu updated this revision to Diff 121822.Nov 6 2017, 5:25 PM

Update testcase for CUDA driver

jlebar removed a reviewer: jlebar.Nov 9 2017, 3:40 PM
jlebar added a subscriber: jlebar.

I abdicate as a reviewer in favor of tra -- if he's happy, I'm happy.

tra added inline comments.Nov 9 2017, 4:21 PM
test/Driver/cuda-bail-out.cu
47 ↗(On Diff #121807)

Sorry, I should've phrased my suggestion better. I didn't mean to remove multiple sources of errors. Quite the opposite. There should be multiple test runs with errors produced by combination of phases. E.g. host, host+sm_35, host+sm_60, sm_35+sm+60, host+sm_35+sm_60.

The check itself is fine, you just need multiple runs that trigger errors during particular phase.

#if defined(ERROR_HOST) && !defined(__CUDA_ARCH__)
#error Host error
#endif

#if defined(ERROR_SM35) && (__CUDA_ARCH__ == 350)
#error sm_35 error
#endif
...

and then do multiple runs with various combinations of -DERROR_xxx

steven_wu added inline comments.Nov 9 2017, 4:35 PM
test/Driver/cuda-bail-out.cu
47 ↗(On Diff #121807)

Hmm, I might still understand you incorrectly. If the error message is different for host and different devices, checking the error message requires the compilation being executed in a certain order. Otherwise, you won't know which error is going to be emitted.
On the other hand, if you don't care about the order and don't want to check for any ordering, the best we can do is run all the combinations and check the error only happens once. In my current test, I have host + default_sm, host + sm_35 + sim_60. I can add --cude-device-only but is there anything else I need to do?

steven_wu updated this revision to Diff 122372.Nov 9 2017, 4:49 PM

I think I understand what you mean now. You want some cases when the
compilation doesn't failed on the first source it process.

I add a testcase which compiles for host + sm_35 + sim_60. I run the test
3 times and each time I trigger an error in different source. This should cover
the case.

tra added inline comments.Nov 9 2017, 4:58 PM
test/Driver/cuda-bail-out.cu
47 ↗(On Diff #121807)

That would be implementation details. You could use regex in the error check to match any of them.
All I want to know, that I only see errors from one phase only, even when more than one phase is broken.

We're almost there. Now we just need few tests with an error from more than one phase. E.g. a few test runs with more than one -DERROR_xxx

steven_wu updated this revision to Diff 122376.Nov 9 2017, 5:05 PM

Add more tests! The new tests can trigger two errors during the compilation
but only one should be emitted.

tra accepted this revision.Nov 9 2017, 5:15 PM

LGTM for CUDA-related functionality & tests. Thank you for the patch!

This revision is now accepted and ready to land.Nov 9 2017, 5:15 PM
This revision was automatically updated to reflect the committed changes.