This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Driver/
-
Driver/
1/1
Driver.cpp
-
ToolChains/
1
AMDGPUOpenMP.cpp
-
test/Driver/
-
Driver/
-
amdgpu-openmp-toolchain.c

Differential D96769

[OpenMP][AMDGPU] Skip backend and assemble phases for amdgcn
ClosedPublic

Authored by pdhaliwal on Feb 16 2021, 4:24 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
ronlieb
tianshilei1992
ye-luo
jdoerfert

Commits

rGfc12a64ecc71: [OpenMP][AMDGPU] Skip backend and assemble phases for amdgcn

Summary

Remove emit-llvm-bc from addClangTargetOptions as it conflicts with -E for save-temps.

AMDGCN does not yet support linking object files so backend and assemble actions are
skipped, leaving LLVM IR as the output format.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pdhaliwal created this revision.Feb 16 2021, 4:24 AM

Herald added subscribers: kerbowa, guansong, t-tye and 6 others. · View Herald TranscriptFeb 16 2021, 4:24 AM

pdhaliwal requested review of this revision.Feb 16 2021, 4:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2021, 4:24 AM

Herald added subscribers: cfe-commits, sstefan1, wdng. · View Herald Transcript

pdhaliwal added inline comments.Feb 16 2021, 4:26 AM

clang/lib/Driver/Driver.cpp
3090	This logic is based on the assumption that the ith item in OpenMPDeviceActions corresponds to ith item in ToolChains array. Size of both lists is guaranteed to be same from assert on #3035.

Change looks reasonable. Does this fix save-temps as intended?

Harbormaster completed remote builds in B89359: Diff 323954.Feb 16 2021, 5:03 AM

This does fixes the save-temps but only when -o is not specified. If -o is specified (along with -save-temps), the name of host object file and host-wrapper object file (second last phase) is same, which fails the linker. This does not seem to be related to this patch.

Could you somewhere explain why emit-llvm-bc is not enough? This simply reverts its introduction without saying why.

emit-llvm-bc does not correctly solve the problem. It works because [input, compile, assemble, backend] actions collapse to a single action by driver. This single command handles emit-llvm-bc properly. But when save-temps is specified, this collapsing does not happen which messes up command line flags of the jobs and hence the output, for e.g., preprocessor command also has -emit-llvm-bc.

Perhaps paraphrase that and add it to the commit message (which is derived from the text at the top of this page)

In D96769#2565751, @pdhaliwal wrote:

emit-llvm-bc does not correctly solve the problem. It works because [input, compile, assemble, backend] actions collapse to a single action by driver. This single command handles emit-llvm-bc properly. But when save-temps is specified, this collapsing does not happen which messes up command line flags of the jobs and hence the output, for e.g., preprocessor command also has -emit-llvm-bc.

Is this the intended behavior of -save-temps + -emit-llvm-bc or an accident? What exactly happens when both are specified and what is the problem. Maybe we should fix that instead of working around it?

It is because of how addClangTargetOptions is invoked. In case of save-temps, it is being invoked for all the actions resulting in target cc1 call. That's why all these invocations have -emit-llvm-bc. I guess we need Action as an argument to addClangTargetOptions.

Also, it does not make sense for having assemble and backend action for amdgcn as linker is dependent directly on llvm IR. They will also come up redundantly in the -ccc-print-phases.

Debugged this because I need save-temps (or hack up a binary by hand) to debug something else. The problem is addClangTargetOptions gets called to add target options to clang. With save temps, an early clang invocation runs the preprocessor alone. That gets passed the target specific flags, something like:

"clang-13" "-cc1" "-mllvm" "--amdhsa-code-object-version=3" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-pc-linux-gnu" "-E" "-save-temps=cwd" "-disable-free" "-main-file-name" "firstprivate.c" "-target-cpu" "gfx906" "-fcuda-is-device" "-emit-llvm-bc" "-mlink-builtin-bitcode" "/home/amd/llvm-install/lib/libomptarget-amdgcn-gfx906.bc" "-fno-split-dwarf-inlining" "-debugger-tuning=gdb" "-O2" "-fdebug-compilation-dir" "/home/amd/aomp/aomp/test/smoke/firstprivate" "-ferror-limit" "19" "-fmessage-length=102" "-fopenmp" "-fopenmp-cuda-parallel-target-regions" "-fgnuc-version=4.2.1" "-fopenmp-is-device" "-munsafe-fp-atomics" "-faddrsig" "-o" "firstprivate-openmp-amdgcn-amd-amdhsa.i" "-x" "c" "firstprivate.c"

Deleted some parts of that. The "-E" gives preprocessor behaviour. Things like -O2 or munsafe-fp-atomics don't matter to the preprocessor but are passed anyway, so there's prior art for it ignoring arguments when called as save temps. Whichever of emit-llvm-bc and E comes last wins, and the target options come after the E, so currently clang emits (binary) bytecode which is later interpreted as cpp-output, which can't handle bytecode.

With this patch applied, the problem of overriding the E flag is sidestepped. There are alternatives, but this approach looks cleaner than the others I can think of.

Note that it isn't a complete fix for save-temps, but it gets me as far as multiple definition of __dummy.omp_offloading.entry'`, which I vaguely remember marking as a weak symbol for the aomp toolchain. It lets clang get as far as emitting the device binary I wanted access to, so I'm now very sure this patch works.

JonChesterfield added reviewers: tianshilei1992, ye-luo.Feb 17 2021, 2:34 PM

JonChesterfield edited the summary of this revision. (Show Details)

What you are telling me here is that -save-temps and -emit-llvm(-bc) can't work together right now. If that is so, why not fix it there instead of introducing a AMDGPU solution? There are other reasons you might want to combine those to options after all.

I don't think there's a general bug here. Maybe save-temps could strip out some arguments that collide with E, or pass E at the end of the list instead of the start, but that seems pretty invasive for a problem that doesn't seem to affect anyone.

In particular, I don't have reason to believe that save-temps and emit-llvm don't work together. Unconditionally passing emit-llvm-bc from addClangTargetOptions doesn't work, but that's not the same thing.

Passing emit-llvm-bc was an amdgcn specific hack because we can't do much with linking object files compiled from assembly, so passing bitcode into lld works much better. This patch changes that to a mostly equivalent amdgcn specific hack, just one that collides with fewer assumptions elsewhere.

edit: Did some testing. -save-temps and -emit-llvm work together fine (emit-llvm is stripped from the preprocessor invocation). -emit-llvm-bc isn't recognised by clang, but if passed as -Xclang -emit-llvm-bc to force matters, it falls over exactly like it does here. 'error source file is not valid UTF-8'.

So -Xclang -emit-llvm-bc -save-temps doesn't work. I don't think that's a serious problem, passing -Xclang already implies sidestepping various convenience features, of which save-temps is one.

An upshot of looking at this is that -Xclang -emit-llvm-bc -save-temps does not work, and nor does -Xclang -emit-llvm -save-temps. However -emit-llvm -save-temps is fine. Opened https://bugs.llvm.org/show_bug.cgi?id=49234 to track.

Skipping the phases still looks right to me, instead of passing -emit-llvm-bc, but I suppose we could stall fixing this until we reach consensus on how to fix 49234. I can carry the patch locally for debugging until then.

pdhaliwal mentioned this in D97273: OpenMP: Fix object clobbering issue when using save-temps.Feb 23 2021, 5:05 AM

Replacing CC1Args.push_back("-emit-llvm-bc"); with CC1Args.push_back("-emit-llvm-bc"); as suggested on the call does not work. This hook is downstream of the clang driver, so all it does under save temps is lead to clang -E -emit-llvm, which generated llvm as requested, that cannot be fed into clang -x cpp-output.

The handling of clang -save-temps that strips out emit-llvm from the preprocessor pass runs before this.

I do not want to change the semantics of emit-llvm, emit-llvm-bc, or save-temps to help out the amdgcn openmp target. I can see that breaking lots of other users of clang.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
192	vaguely interesting that `-emit-llvm` here appears to generate the same code as `-emit-llvm-bc`, though neither compose correctly for `-save-temps`

Something we should probably check is the interaction between -save-temps and whether we are trying to compile a single file or an executable, e.g. the difference between clang and clang -c.

If trying to compile foo.c directly to an executable, -save-temps should probably print (along with lots of others) an assembly file containing the contents of the code object (the shared elf containing amdgcn machine code). If trying to compile only the host part, we probably want to emit asm. If trying to compile only the target part, we probably don't - the llc part is likely to error if the rest of the application is missing.

pdhaliwal mentioned this in rG99951aa68da3: OpenMP: Fix object clobbering issue when using save-temps.Feb 24 2021, 9:51 PM

So, neither emit-llvm-bc or emit-llvm work well with save-temps. Therefore, I feel the current approach is still valid. This does not impact nvptx or any other target in any way. And I don't see how.

I see valid concern regarding assembly output. This patch will surely halt the device assembly output. I am working on that which require adding an extra llc step in AMDGPUOpenMPToolChain.

Add extra llc step to produce assembly in the linker.

Strictly speaking I suppose that's not a temporary file, since it isn't used for anything, but whenever I pass -save-temps I am likely to want to read the assembly. This looks like a great workaround to me. @jdoerfert?

Harbormaster completed remote builds in B90821: Diff 326392.Feb 25 2021, 8:41 AM

I see the need for this to work, unsure if this is the right way, we need to make progress though.

Agreed. Lack of save temps is causing grief when debugging, so I keep on applying this patch locally. Let's go with this for now. and change to something better when we think of it.

This revision is now accepted and ready to land.Mar 15 2021, 10:18 AM

LGTM

This revision was landed with ongoing or failed builds.Mar 15 2021, 9:58 PM

Closed by commit rGfc12a64ecc71: [OpenMP][AMDGPU] Skip backend and assemble phases for amdgcn (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rGfc12a64ecc71: [OpenMP][AMDGPU] Skip backend and assemble phases for amdgcn.

pdhaliwal mentioned this in D101901: [AMDGPU][OpenMP] Fix clang driver crash when provided -c.May 5 2021, 5:29 AM

pdhaliwal mentioned this in rG1f5cacfcb845: [AMDGPU][OpenMP] Fix clang driver crash when provided -c.May 5 2021, 7:27 AM

Revision Contents

Path

Size

clang/

lib/

Driver/

Driver.cpp

10 lines

ToolChains/

AMDGPUOpenMP.cpp

6 lines

test/

Driver/

amdgpu-openmp-toolchain.c

40 lines

Diff 330879

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 3,080 Lines • ▼ Show 20 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,

// We passed the device action as a host dependence, so we don't need to		// We passed the device action as a host dependence, so we don't need to
// do anything else with them.		// do anything else with them.
OpenMPDeviceActions.clear();		OpenMPDeviceActions.clear();
return ABRT_Success;		return ABRT_Success;
}		}

// By default, we produce an action for each device arch.		// By default, we produce an action for each device arch.
for (Action *&A : OpenMPDeviceActions)		for (unsigned I = 0; I < ToolChains.size(); ++I) {
		Action *&A = OpenMPDeviceActions[I];
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions This logic is based on the assumption that the ith item in OpenMPDeviceActions corresponds to ith item in ToolChains array. Size of both lists is guaranteed to be same from assert on #3035. pdhaliwal: This logic is based on the assumption that the ith item in OpenMPDeviceActions corresponds to…
		// AMDGPU does not support linking of object files, so we skip
		// assemble and backend actions to produce LLVM IR.
		if (ToolChains[I]->getTriple().isAMDGCN() &&
		(CurPhase == phases::Assemble \|\| CurPhase == phases::Backend))
		continue;

A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A);		A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A);
		}

return ABRT_Success;		return ABRT_Success;
}		}

ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {		ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {

// If this is an input action replicate it for each OpenMP toolchain.		// If this is an input action replicate it for each OpenMP toolchain.
if (auto *IA = dyn_cast<InputAction>(HostAction)) {		if (auto *IA = dyn_cast<InputAction>(HostAction)) {
▲ Show 20 Lines • Show All 2,381 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	for (const auto &II : Inputs)
if (II.isFilename())		if (II.isFilename())
Prefix =		Prefix =
llvm::sys::path::stem(II.getFilename()).str() + "-" + GPUArch.str();		llvm::sys::path::stem(II.getFilename()).str() + "-" + GPUArch.str();
assert(Prefix.length() && "no linker inputs are files ");		assert(Prefix.length() && "no linker inputs are files ");

// Each command outputs different files.		// Each command outputs different files.
const char *LLVMLinkCommand =		const char *LLVMLinkCommand =
constructLLVMLinkCommand(C, JA, Inputs, Args, GPUArch, Prefix);		constructLLVMLinkCommand(C, JA, Inputs, Args, GPUArch, Prefix);

		// Produce readable assembly if save-temps is enabled.
		if (C.getDriver().isSaveTempsEnabled())
		constructLlcCommand(C, JA, Inputs, Args, GPUArch, Prefix, LLVMLinkCommand,
		/OutputIsAsm=/true);
const char *LlcCommand = constructLlcCommand(C, JA, Inputs, Args, GPUArch,		const char *LlcCommand = constructLlcCommand(C, JA, Inputs, Args, GPUArch,
Prefix, LLVMLinkCommand);		Prefix, LLVMLinkCommand);
constructLldCommand(C, JA, Inputs, Output, Args, LlcCommand);		constructLldCommand(C, JA, Inputs, Output, Args, LlcCommand);
}		}

AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,		AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
const llvm::Triple &Triple,		const llvm::Triple &Triple,
const ToolChain &HostTC,		const ToolChain &HostTC,
Show All 12 Lines	void AMDGPUOpenMPToolChain::addClangTargetOptions(
StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);		StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
assert(DeviceOffloadingKind == Action::OFK_OpenMP &&		assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
"Only OpenMP offloading kinds are supported.");		"Only OpenMP offloading kinds are supported.");

CC1Args.push_back("-target-cpu");		CC1Args.push_back("-target-cpu");
CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));		CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");
CC1Args.push_back("-emit-llvm-bc");
JonChesterfieldUnsubmitted Not Done Reply Inline Actions vaguely interesting that `-emit-llvm` here appears to generate the same code as `-emit-llvm-bc`, though neither compose correctly for `-save-temps` JonChesterfield: vaguely interesting that `-emit-llvm` here appears to generate the same code as `-emit-llvm-bc`…

if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
return;		return;
std::string BitcodeSuffix = "amdgcn-" + GpuArch.str();		std::string BitcodeSuffix = "amdgcn-" + GpuArch.str();
addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, BitcodeSuffix,		addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, BitcodeSuffix,
getTriple());		getTriple());
}		}

▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

clang/test/Driver/amdgpu-openmp-toolchain.c

				// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// RUN: env LIBRARY_PATH=%S/Inputs/hip_dev_lib %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \			// RUN: env LIBRARY_PATH=%S/Inputs/hip_dev_lib %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// verify the tools invocations			// verify the tools invocations
	// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "c"{{.*}}			// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "c"{{.*}}
	// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "ir"{{.*}}			// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "ir"{{.*}}
	// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx906" "-fcuda-is-device" "-emit-llvm-bc" "-mlink-builtin-bitcode"{{.}}libomptarget-amdgcn-gfx906.bc"{{.*}}			// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx906" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}libomptarget-amdgcn-gfx906.bc"{{.*}}
	// CHECK: llvm-link{{.}}"-o" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc"			// CHECK: llvm-link{{.}}"-o" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc"
	// CHECK: llc{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"			// CHECK: llc{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"
	// CHECK: lld{{.}}"-flavor" "gnu" "--no-undefined" "-shared" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}.out" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"			// CHECK: lld{{.}}"-flavor" "gnu" "--no-undefined" "-shared" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}.out" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"
	// CHECK: clang-offload-wrapper{{.}}"-target" "x86_64-unknown-linux-gnu" "-o" "{{.}}a-{{.}}.bc" {{.}}amdgpu-openmp-toolchain-{{.*}}.out"			// CHECK: clang-offload-wrapper{{.}}"-target" "x86_64-unknown-linux-gnu" "-o" "{{.}}a-{{.}}.bc" {{.}}amdgpu-openmp-toolchain-{{.*}}.out"
	// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-o" "{{.}}a-{{.}}.o" "-x" "ir" "{{.}}a-{{.}}.bc"			// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-o" "{{.}}a-{{.}}.o" "-x" "ir" "{{.}}a-{{.}}.bc"
	// CHECK: ld{{.}}"-o" "a.out"{{.}}"{{.}}amdgpu-openmp-toolchain-{{.}}.o" "{{.}}a-{{.}}.o" "-lomp" "-lomptarget"			// CHECK: ld{{.}}"-o" "a.out"{{.}}"{{.}}amdgpu-openmp-toolchain-{{.}}.o" "{{.}}a-{{.}}.o" "-lomp" "-lomptarget"

	// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \			// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-PHASES %s			// RUN: \| FileCheck --check-prefix=CHECK-PHASES %s
	// phases			// phases
	// CHECK-PHASES: 0: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (host-openmp)			// CHECK-PHASES: 0: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (host-openmp)
	// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)			// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
	// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)			// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
	// CHECK-PHASES: 3: backend, {2}, assembler, (host-openmp)			// CHECK-PHASES: 3: backend, {2}, assembler, (host-openmp)
	// CHECK-PHASES: 4: assembler, {3}, object, (host-openmp)			// CHECK-PHASES: 4: assembler, {3}, object, (host-openmp)
	// CHECK-PHASES: 5: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (device-openmp)			// CHECK-PHASES: 5: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (device-openmp)
	// CHECK-PHASES: 6: preprocessor, {5}, cpp-output, (device-openmp)			// CHECK-PHASES: 6: preprocessor, {5}, cpp-output, (device-openmp)
	// CHECK-PHASES: 7: compiler, {6}, ir, (device-openmp)			// CHECK-PHASES: 7: compiler, {6}, ir, (device-openmp)
	// CHECK-PHASES: 8: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir			// CHECK-PHASES: 8: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir
	// CHECK-PHASES: 9: backend, {8}, assembler, (device-openmp)			// CHECK-PHASES: 9: linker, {8}, image, (device-openmp)
	// CHECK-PHASES: 10: assembler, {9}, object, (device-openmp)			// CHECK-PHASES: 10: offload, "device-openmp (amdgcn-amd-amdhsa)" {9}, image
	// CHECK-PHASES: 11: linker, {10}, image, (device-openmp)			// CHECK-PHASES: 11: clang-offload-wrapper, {10}, ir, (host-openmp)
	// CHECK-PHASES: 12: offload, "device-openmp (amdgcn-amd-amdhsa)" {11}, image			// CHECK-PHASES: 12: backend, {11}, assembler, (host-openmp)
	// CHECK-PHASES: 13: clang-offload-wrapper, {12}, ir, (host-openmp)			// CHECK-PHASES: 13: assembler, {12}, object, (host-openmp)
	// CHECK-PHASES: 14: backend, {13}, assembler, (host-openmp)			// CHECK-PHASES: 14: linker, {4, 13}, image, (host-openmp)
	// CHECK-PHASES: 15: assembler, {14}, object, (host-openmp)
	// CHECK-PHASES: 16: linker, {4, 15}, image, (host-openmp)

	// handling of --libomptarget-amdgcn-bc-path			// handling of --libomptarget-amdgcn-bc-path
	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 --libomptarget-amdgcn-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgcn-gfx803.bc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIBOMPTARGET			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 --libomptarget-amdgcn-bc-path=%S/Inputs/hip_dev_lib/libomptarget-amdgcn-gfx803.bc %s 2>&1 \| FileCheck %s --check-prefix=CHECK-LIBOMPTARGET
	// CHECK-LIBOMPTARGET: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-emit-llvm-bc" "-mlink-builtin-bitcode"{{.}}Inputs/hip_dev_lib/libomptarget-amdgcn-gfx803.bc"{{.*}}			// CHECK-LIBOMPTARGET: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}Inputs/hip_dev_lib/libomptarget-amdgcn-gfx803.bc"{{.*}}

	// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-NOGPULIB			// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-NOGPULIB
	// CHECK-NOGPULIB-NOT: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-emit-llvm-bc" "-mlink-builtin-bitcode"{{.}}libomptarget-amdgcn-gfx803.bc"{{.*}}			// CHECK-NOGPULIB-NOT: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx803" "-fcuda-is-device" "-mlink-builtin-bitcode"{{.}}libomptarget-amdgcn-gfx803.bc"{{.*}}

				// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -save-temps -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx803 -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-PRINT-BINDINGS
				// CHECK-PRINT-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.*]]"],
				// CHECK-PRINT-BINDINGS: "x86_64-unknown-linux-gnu" - "clang",{{.}} output: "[[HOST_BC:.]]"
				// CHECK-PRINT-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]"], output: "[[HOST_S:.*]]"
				// CHECK-PRINT-BINDINGS: "x86_64-unknown-linux-gnu" - "clang::as", inputs: ["[[HOST_S]]"], output: "[[HOST_O:.*]]"
				// CHECK-PRINT-BINDINGS: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]"], output: "[[DEVICE_I:.*]]"
				// CHECK-PRINT-BINDINGS: "amdgcn-amd-amdhsa" - "clang", inputs: ["[[DEVICE_I]]", "[[HOST_BC]]"], output: "[[DEVICE_BC:.*]]"
				// CHECK-PRINT-BINDINGS: "amdgcn-amd-amdhsa" - "AMDGCN::OpenMPLinker", inputs: ["[[DEVICE_BC]]"], output: "[[DEVICE_OUT:.*]]"
				// CHECK-PRINT-BINDINGS: "x86_64-unknown-linux-gnu" - "offload wrapper", inputs: ["[[DEVICE_OUT]]"], output: "[[OFFLOAD_WRAPPER:.*]]"
				// CHECK-PRINT-BINDINGS: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[OFFLOAD_WRAPPER]]"], output: "[[OFFLOAD_S:.*]]"
				// CHECK-PRINT-BINDINGS: "x86_64-unknown-linux-gnu" - "clang::as", inputs: ["[[OFFLOAD_S]]"], output: "[[OFFLOAD_O:.*]]"
				// CHECK-PRINT-BINDINGS: "x86_64-unknown-linux-gnu" - "GNU::Linker", inputs: ["[[HOST_O]]", "[[OFFLOAD_O]]"], output:

				// verify the llc is invoked for textual assembly output
				// RUN: env LIBRARY_PATH=%S/Inputs/hip_dev_lib %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 -save-temps %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=CHECK-SAVE-ASM
				// CHECK-SAVE-ASM: llc{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=asm" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906.s"
				// CHECK-SAVE-ASM: llc{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906.o"