This is an archive of the discontinued LLVM Phabricator instance.

Generate extra .ll files before/after optimization when using -save-temps.
AbandonedPublic

Authored by jgorbe on May 11 2017, 3:49 PM.

Details

Reviewers
chandlerc
Summary

When -save-temps is specified, add two actions to the compilation to generate two new .ll outputs: one containing unoptimized IR and another one containing optimized IR.

Note that these new additional outputs will not be generated in compilations that target multiple architectures, because in that case the compiler generates an error when not all outputs generated by Actions can be processed by lipo.

Diff Detail

Event Timeline

jgorbe created this revision.May 11 2017, 3:49 PM
jgorbe edited the summary of this revision. (Show Details)May 11 2017, 4:00 PM
jgorbe updated this revision to Diff 100943.May 31 2017, 5:09 PM
jgorbe edited the summary of this revision. (Show Details)

Only generate extra .ll files when not targeting multiple archs. Updated related tests.

jgorbe added a subscriber: cfe-commits.
tra added a subscriber: tra.May 31 2017, 5:28 PM
tra added inline comments.
lib/Driver/Driver.cpp
2588–2591

I'm not sure I understand why unoptimized IR can't be written out for multi-arch builds like CUDA. Could you elaborate?

jgorbe updated this revision to Diff 100954.May 31 2017, 6:15 PM

Clarify that we only skip saving IR for multiarch Mach-O universal builds, not other multi-arch builds like CUDA.

lib/Driver/Driver.cpp
2588–2591

The error I was trying to work around here is this one.

It's not really specific to all multi-arch builds, but to Mach-O multi-arch builds. For those, BuildUniversalActions is called, which checks that all actions produce outputs with lipo-able types (see also the types::canLipoType() function) or errors out otherwise. I'm not sure the check is still necessary, but I got no answers when I asked about it in cfe-dev, so I decided to just not do it for that kind of builds.

I have updated the patch to make it clear that we only skip generating these IR files in that particular case, not every multi-arch build. Thanks for pointing this out!

tra added inline comments.Jun 1 2017, 9:42 AM
lib/Driver/Driver.cpp
2592–2603

Can this be moved below addHostDependenceToDeviceActions() on line 2626?
See my comment in cuda-options.cu below for the reasons why it may be necessary.

test/Driver/cuda-options.cu
197–202 ↗(On Diff #100954)

It appears that the new actions you've pushed trigger at least parts of host-side compilation to happen before device-side compilation is done. That, at the very least, will not work well for CUDA. Compilation will probably succeed, but it will be missing device-side code and will fail at runtime.

If it's only host-side preprocessor that happens ahead of device-side actions, then may be OK, but in general host actions must be done after device's.

jgorbe added inline comments.Jun 1 2017, 5:00 PM
lib/Driver/Driver.cpp
2592–2603

I have tried but it didn't cause the test to go back to the previous ordering where all device-side actions are executed before all host-side actions.

I have noticed that, before applying my patch, the action graph produced by clang with -ccc-print-phases doesn't seem to introduce any explicit dependency to guarantee that device-side actions are executed before host-side actions:

0: input, "cuda-options.cu", cuda, (host-cuda)
1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
2: compiler, {1}, ir, (host-cuda)
3: input, "cuda-options.cu", cuda, (device-cuda, sm_20)
4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_20)
5: compiler, {4}, ir, (device-cuda, sm_20)
6: backend, {5}, assembler, (device-cuda, sm_20)
7: assembler, {6}, object, (device-cuda, sm_20)
8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {7}, object
9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {6}, assembler
10: linker, {8, 9}, cuda-fatbin, (device-cuda)
11: offload, "host-cuda (x86_64--linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {10}, ir
12: backend, {11}, assembler, (host-cuda)
13: assembler, {12}, object, (host-cuda)

I'm trying to figure out now if there's something else that enforces that restriction, or the current compilation order being the right one is a happy coincidence that my patch happened to disturb. I'm new to the project, so I'm working under the assumption that I'm missing something, any hints will be appreciated :)

jgorbe abandoned this revision.Feb 14 2019, 11:29 AM