This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGPU] Skip backend and assemble phases for amdgcn
ClosedPublic

Authored by pdhaliwal on Feb 16 2021, 4:24 AM.

Details

Summary

Remove emit-llvm-bc from addClangTargetOptions as it conflicts with -E for save-temps.

AMDGCN does not yet support linking object files so backend and assemble actions are
skipped, leaving LLVM IR as the output format.

Diff Detail

Event Timeline

pdhaliwal created this revision.Feb 16 2021, 4:24 AM
pdhaliwal requested review of this revision.Feb 16 2021, 4:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2021, 4:24 AM
pdhaliwal added inline comments.Feb 16 2021, 4:26 AM
clang/lib/Driver/Driver.cpp
3057

This logic is based on the assumption that the ith item in OpenMPDeviceActions corresponds to ith item in ToolChains array. Size of both lists is guaranteed to be same from assert on #3035.

Change looks reasonable. Does this fix save-temps as intended?

pdhaliwal added a comment.EditedFeb 16 2021, 6:15 AM

This does fixes the save-temps but only when -o is not specified. If -o is specified (along with -save-temps), the name of host object file and host-wrapper object file (second last phase) is same, which fails the linker. This does not seem to be related to this patch.

Could you somewhere explain why emit-llvm-bc is not enough? This simply reverts its introduction without saying why.

emit-llvm-bc does not correctly solve the problem. It works because [input, compile, assemble, backend] actions collapse to a single action by driver. This single command handles emit-llvm-bc properly. But when save-temps is specified, this collapsing does not happen which messes up command line flags of the jobs and hence the output, for e.g., preprocessor command also has -emit-llvm-bc.

Perhaps paraphrase that and add it to the commit message (which is derived from the text at the top of this page)

emit-llvm-bc does not correctly solve the problem. It works because [input, compile, assemble, backend] actions collapse to a single action by driver. This single command handles emit-llvm-bc properly. But when save-temps is specified, this collapsing does not happen which messes up command line flags of the jobs and hence the output, for e.g., preprocessor command also has -emit-llvm-bc.

Is this the intended behavior of -save-temps + -emit-llvm-bc or an accident? What exactly happens when both are specified and what is the problem. Maybe we should fix that instead of working around it?

It is because of how addClangTargetOptions is invoked. In case of save-temps, it is being invoked for all the actions resulting in target cc1 call. That's why all these invocations have -emit-llvm-bc. I guess we need Action as an argument to addClangTargetOptions.

Also, it does not make sense for having assemble and backend action for amdgcn as linker is dependent directly on llvm IR. They will also come up redundantly in the -ccc-print-phases.

Debugged this because I need save-temps (or hack up a binary by hand) to debug something else. The problem is addClangTargetOptions gets called to add target options to clang. With save temps, an early clang invocation runs the preprocessor alone. That gets passed the target specific flags, something like:

"clang-13" "-cc1" "-mllvm" "--amdhsa-code-object-version=3" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-pc-linux-gnu" "-E" "-save-temps=cwd" "-disable-free" "-main-file-name" "firstprivate.c" "-target-cpu" "gfx906" "-fcuda-is-device" "-emit-llvm-bc" "-mlink-builtin-bitcode" "/home/amd/llvm-install/lib/libomptarget-amdgcn-gfx906.bc" "-fno-split-dwarf-inlining" "-debugger-tuning=gdb" "-O2" "-fdebug-compilation-dir" "/home/amd/aomp/aomp/test/smoke/firstprivate" "-ferror-limit" "19" "-fmessage-length=102" "-fopenmp" "-fopenmp-cuda-parallel-target-regions" "-fgnuc-version=4.2.1" "-fopenmp-is-device" "-munsafe-fp-atomics" "-faddrsig" "-o" "firstprivate-openmp-amdgcn-amd-amdhsa.i" "-x" "c" "firstprivate.c"

Deleted some parts of that. The "-E" gives preprocessor behaviour. Things like -O2 or munsafe-fp-atomics don't matter to the preprocessor but are passed anyway, so there's prior art for it ignoring arguments when called as save temps. Whichever of emit-llvm-bc and E comes last wins, and the target options come after the E, so currently clang emits (binary) bytecode which is later interpreted as cpp-output, which can't handle bytecode.

With this patch applied, the problem of overriding the E flag is sidestepped. There are alternatives, but this approach looks cleaner than the others I can think of.

Note that it isn't a complete fix for save-temps, but it gets me as far as multiple definition of __dummy.omp_offloading.entry'`, which I vaguely remember marking as a weak symbol for the aomp toolchain. It lets clang get as far as emitting the device binary I wanted access to, so I'm now very sure this patch works.

JonChesterfield edited the summary of this revision. (Show Details)

What you are telling me here is that -save-temps and -emit-llvm(-bc) can't work together right now. If that is so, why not fix it there instead of introducing a AMDGPU solution? There are other reasons you might want to combine those to options after all.

JonChesterfield added a comment.EditedFeb 17 2021, 3:59 PM

I don't think there's a general bug here. Maybe save-temps could strip out some arguments that collide with E, or pass E at the end of the list instead of the start, but that seems pretty invasive for a problem that doesn't seem to affect anyone.

In particular, I don't have reason to believe that save-temps and emit-llvm don't work together. Unconditionally passing emit-llvm-bc from addClangTargetOptions doesn't work, but that's not the same thing.

Passing emit-llvm-bc was an amdgcn specific hack because we can't do much with linking object files compiled from assembly, so passing bitcode into lld works much better. This patch changes that to a mostly equivalent amdgcn specific hack, just one that collides with fewer assumptions elsewhere.

edit: Did some testing. -save-temps and -emit-llvm work together fine (emit-llvm is stripped from the preprocessor invocation). -emit-llvm-bc isn't recognised by clang, but if passed as -Xclang -emit-llvm-bc to force matters, it falls over exactly like it does here. 'error source file is not valid UTF-8'.

So -Xclang -emit-llvm-bc -save-temps doesn't work. I don't think that's a serious problem, passing -Xclang already implies sidestepping various convenience features, of which save-temps is one.

An upshot of looking at this is that -Xclang -emit-llvm-bc -save-temps does not work, and nor does -Xclang -emit-llvm -save-temps. However -emit-llvm -save-temps is fine. Opened https://bugs.llvm.org/show_bug.cgi?id=49234 to track.

Skipping the phases still looks right to me, instead of passing -emit-llvm-bc, but I suppose we could stall fixing this until we reach consensus on how to fix 49234. I can carry the patch locally for debugging until then.

Replacing CC1Args.push_back("-emit-llvm-bc"); with CC1Args.push_back("-emit-llvm-bc"); as suggested on the call does not work. This hook is downstream of the clang driver, so all it does under save temps is lead to clang -E -emit-llvm, which generated llvm as requested, that cannot be fed into clang -x cpp-output.

The handling of clang -save-temps that strips out emit-llvm from the preprocessor pass runs before this.

I do not want to change the semantics of emit-llvm, emit-llvm-bc, or save-temps to help out the amdgcn openmp target. I can see that breaking lots of other users of clang.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
192

vaguely interesting that -emit-llvm here appears to generate the same code as -emit-llvm-bc, though neither compose correctly for -save-temps

Something we should probably check is the interaction between -save-temps and whether we are trying to compile a single file or an executable, e.g. the difference between clang and clang -c.

If trying to compile foo.c directly to an executable, -save-temps should probably print (along with lots of others) an assembly file containing the contents of the code object (the shared elf containing amdgcn machine code). If trying to compile only the host part, we probably want to emit asm. If trying to compile only the target part, we probably don't - the llc part is likely to error if the rest of the application is missing.

So, neither emit-llvm-bc or emit-llvm work well with save-temps. Therefore, I feel the current approach is still valid. This does not impact nvptx or any other target in any way. And I don't see how.

I see valid concern regarding assembly output. This patch will surely halt the device assembly output. I am working on that which require adding an extra llc step in AMDGPUOpenMPToolChain.

pdhaliwal updated this revision to Diff 326392.Feb 25 2021, 7:58 AM

Add extra llc step to produce assembly in the linker.

Strictly speaking I suppose that's not a temporary file, since it isn't used for anything, but whenever I pass -save-temps I am likely to want to read the assembly. This looks like a great workaround to me. @jdoerfert?

jdoerfert resigned from this revision.Mar 15 2021, 9:42 AM

I see the need for this to work, unsure if this is the right way, we need to make progress though.

JonChesterfield accepted this revision.Mar 15 2021, 10:18 AM

Agreed. Lack of save temps is causing grief when debugging, so I keep on applying this patch locally. Let's go with this for now. and change to something better when we think of it.

This revision is now accepted and ready to land.Mar 15 2021, 10:18 AM
This revision was landed with ongoing or failed builds.Mar 15 2021, 9:58 PM
This revision was automatically updated to reflect the committed changes.