This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Driver/ToolChains/
-
Driver/
-
ToolChains/
3/8
Clang.cpp
-
test/Driver/
-
Driver/
-
amdgpu-openmp-toolchain.c

Differential D150998

[OpenMP] Fix using the target ID when using the new driver
ClosedPublic

Authored by jhuber6 on May 19 2023, 1:54 PM.

Download Raw Diff

Details

Reviewers

JonChesterfield
carlo.bertolli
yaxunl
saiislam
jdoerfert

Commits

rGdc81d2a4d5b3: [OpenMP] Fix using the target ID when using the new driver

Summary

AMDGPU sometimes uses a novel formatting for their offloading
architecture called the target id. This merges the attributes and the
architecture name into a single string. Previously, we were passing this
as the canonical architecture name. This caused the linker wrapper to
fail to find relevant libraries and then pass an incalid CPU name. This
patch changes the handling in the offload packager to handle the
canonical architecture and then extract the features.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.May 19 2023, 1:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2023, 1:54 PM

Herald added subscribers: sunshaoce, kerbowa, guansong and 2 others. · View Herald Transcript

jhuber6 requested review of this revision.May 19 2023, 1:54 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptMay 19 2023, 1:54 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, jplehr, sstefan1, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B233293: Diff 523924.May 19 2023, 6:01 PM

saiislam added inline comments.May 22 2023, 7:29 AM

clang/lib/Driver/ToolChains/Clang.cpp
8419–8424	May be use `parseTargetIDWithFormatCheckingOnly()`?
8431	Shouldn't Arch (targetID here) should be passed along instead of just the processor? For example, `gfx90a:xnack+` and `gfx90a:xnack-` should be treated differently.

jhuber6 added inline comments.May 22 2023, 7:33 AM

clang/lib/Driver/ToolChains/Clang.cpp
8419–8424	I tried that but it didn't return the strings in the format required by `llc` for the `-mattrs` list.
8431	So the problem there is that this will cause us to no longer link in something like the OpenMP runtime library since `gfx90a` != `gfx90a:xnack+`. Right now the behavior is that we will link them both together since the architecture matches but then the attributes will get resolved the same way we handle `-mattr=+x,-x`. I'm not sure what the expected behaviour is here.

yaxunl added inline comments.May 23 2023, 7:35 AM

clang/lib/Driver/ToolChains/Clang.cpp
8431	targetID is part of ROCm ABI as it is returned as part of Isa::GetIsaName (https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/rocm-5.5.x/src/core/runtime/isa.cpp#L98) . the compatibility rule for targetID is specified by https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id . For example, bundle entry with gfx90a can be consumed by device with GetIsaName gfx90a:xnack+ or gfx90a:xnack- . but bundle entry with gfx90a:xnack+ can only be consumed by device with GetIsaName gfx90a:xnack+. Language runtime is supposed to do a compatibility check for bundle entry with the device GetIsaName. Isa::IsCompatible (https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/3b939c398bdac0c2b9a860ff9a0ed0be0c80f911/src/core/runtime/isa.cpp#L73) can be used to do that. For convenience, language runtime is expected to use targetID for identifying bundle entries instead of re-construct targetID from features when needed. targetID is also used for compatibility checks when linking bitcode.

jhuber6 added inline comments.May 23 2023, 7:42 AM

clang/lib/Driver/ToolChains/Clang.cpp
8431	So what we need is some more sophisticated logic in the linker wrapper to merge the binaries according to these rules. However the handling will definitely require pulling this apart when we send it to LTO.

saiislam added inline comments.May 23 2023, 7:54 AM

clang/lib/Driver/ToolChains/Clang.cpp
8431	Some logic is given in ClangOffloadBundler and in AMDGPU plugin

yaxunl added inline comments.May 23 2023, 7:56 AM

clang/lib/Driver/ToolChains/Clang.cpp
8431	targetID is only used to identify bundle entry. It is not directly represented in LLVM IR. In LLVM IR, the target cpu does not contain the target ID feature string. Therefore lld does not know about it. The linker wrapper is responsible to do the compatibility check based on bundle ID.

Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.

This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.

In D150998#4403359, @jhuber6 wrote:

Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.

This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.

can we add a test to make sure --offload-arch=gfx90a:xnack+ and --offload-arch=gfx90a:xnack- work together? It is a very common use case for HIP.

In D150998#4403401, @yaxunl wrote:

In D150998#4403359, @jhuber6 wrote:

Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.

This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.

can we add a test to make sure --offload-arch=gfx90a:xnack+ and --offload-arch=gfx90a:xnack- work together? It is a very common use case for HIP.

With the current patch, they would both be linked together and it would probably set the xnack value to the last one that showed up in the link list. E.g.

clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-new-driver -fgpu-rdc
clang -xhip b.c --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang --offload-link a.o b.o

Would result in a.o and b.o getting linked together with xnack- set as the backend attribute.

In D150998#4403416, @jhuber6 wrote:
In D150998#4403401, @yaxunl wrote:

In D150998#4403359, @jhuber6 wrote:

Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.

This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.

can we add a test to make sure --offload-arch=gfx90a:xnack+ and --offload-arch=gfx90a:xnack- work together? It is a very common use case for HIP.

With the current patch, they would both be linked together and it would probably set the xnack value to the last one that showed up in the link list. E.g.
clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-new-driver -fgpu-rdc
clang -xhip b.c --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang --offload-link a.o b.o
Would result in a.o and b.o getting linked together with xnack- set as the backend attribute.

what happens to

clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang -xhip b.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang --offload-link a.o b.o

Basically gfx90a:xnack+ and gfx90a:xnack- need to be treated as distinct GPU arch's and the fat binary should contain different code objects for them.

In D150998#4403444, @yaxunl wrote:
what happens to
clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang -xhip b.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang --offload-link a.o b.o
Basically gfx90a:xnack+ and gfx90a:xnack- need to be treated as distinct GPU arch's and the fat binary should contain different code objects for them.

In this logic they would both map to gfx90a but be given distinct images and then be linked together into a single one with gfx90a metadata. This is the part that I'm saying should probably be a follow-up fix to handle this correctly in the linker wrapper.

LGTM. Thanks.

This revision is now accepted and ready to land.Jun 7 2023, 10:20 AM

Closed by commit rGdc81d2a4d5b3: [OpenMP] Fix using the target ID when using the new driver (authored by jhuber6). · Explain WhyJun 7 2023, 5:17 PM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rGdc81d2a4d5b3: [OpenMP] Fix using the target ID when using the new driver.

Revision Contents

Path

Size

clang/

lib/

Driver/

ToolChains/

Clang.cpp

16 lines

test/

Driver/

amdgpu-openmp-toolchain.c

4 lines

Diff 529476

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,397 Lines • ▼ Show 20 Lines	void OffloadPackager::ConstructJob(Compilation &C, const JobAction &JA,
// Create the inputs to bundle the needed metadata.		// Create the inputs to bundle the needed metadata.
for (const InputInfo &Input : Inputs) {		for (const InputInfo &Input : Inputs) {
const Action *OffloadAction = Input.getAction();		const Action *OffloadAction = Input.getAction();
const ToolChain *TC = OffloadAction->getOffloadingToolChain();		const ToolChain *TC = OffloadAction->getOffloadingToolChain();
const ArgList &TCArgs =		const ArgList &TCArgs =
C.getArgsForToolChain(TC, OffloadAction->getOffloadingArch(),		C.getArgsForToolChain(TC, OffloadAction->getOffloadingArch(),
OffloadAction->getOffloadingDeviceKind());		OffloadAction->getOffloadingDeviceKind());
StringRef File = C.getArgs().MakeArgString(TC->getInputFilename(Input));		StringRef File = C.getArgs().MakeArgString(TC->getInputFilename(Input));
StringRef Arch = (OffloadAction->getOffloadingArch())		StringRef Arch = OffloadAction->getOffloadingArch()
? OffloadAction->getOffloadingArch()		? OffloadAction->getOffloadingArch()
: TCArgs.getLastArgValue(options::OPT_march_EQ);		: TCArgs.getLastArgValue(options::OPT_march_EQ);
StringRef Kind =		StringRef Kind =
Action::GetOffloadKindName(OffloadAction->getOffloadingDeviceKind());		Action::GetOffloadKindName(OffloadAction->getOffloadingDeviceKind());

ArgStringList Features;		ArgStringList Features;
SmallVector<StringRef> FeatureArgs;		SmallVector<StringRef> FeatureArgs;
getTargetFeatures(TC->getDriver(), TC->getTriple(), TCArgs, Features,		getTargetFeatures(TC->getDriver(), TC->getTriple(), TCArgs, Features,
false);		false);
llvm::copy_if(Features, std::back_inserter(FeatureArgs),		llvm::copy_if(Features, std::back_inserter(FeatureArgs),
[](StringRef Arg) { return !Arg.startswith("-target"); });		[](StringRef Arg) { return !Arg.startswith("-target"); });

		if (TC->getTriple().isAMDGPU()) {
		for (StringRef Feature : llvm::split(Arch.split(':').second, ':')) {
		FeatureArgs.emplace_back(
		Args.MakeArgString(Feature.take_back() + Feature.drop_back()));
		}
		}
		saiislamUnsubmitted Not Done Reply Inline Actions May be use `parseTargetIDWithFormatCheckingOnly()`? saiislam: May be use `parseTargetIDWithFormatCheckingOnly()`?
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I tried that but it didn't return the strings in the format required by `llc` for the `-mattrs` list. jhuber6: I tried that but it didn't return the strings in the format required by `llc` for the `-mattrs`…

		// TODO: We need to pass in the full target-id and handle it properly in the
		// linker wrapper.
SmallVector<std::string> Parts{		SmallVector<std::string> Parts{
"file=" + File.str(),		"file=" + File.str(),
"triple=" + TC->getTripleString(),		"triple=" + TC->getTripleString(),
"arch=" + Arch.str(),		"arch=" + getProcessorFromTargetID(TC->getTriple(), Arch).str(),
		saiislamUnsubmitted Not Done Reply Inline Actions Shouldn't Arch (targetID here) should be passed along instead of just the processor? For example, `gfx90a:xnack+` and `gfx90a:xnack-` should be treated differently. saiislam: Shouldn't Arch (targetID here) should be passed along instead of just the processor? For…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions So the problem there is that this will cause us to no longer link in something like the OpenMP runtime library since `gfx90a` != `gfx90a:xnack+`. Right now the behavior is that we will link them both together since the architecture matches but then the attributes will get resolved the same way we handle `-mattr=+x,-x`. I'm not sure what the expected behaviour is here. jhuber6: So the problem there is that this will cause us to no longer link in something like the OpenMP…
		yaxunlUnsubmitted Not Done Reply Inline Actions targetID is part of ROCm ABI as it is returned as part of Isa::GetIsaName (https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/rocm-5.5.x/src/core/runtime/isa.cpp#L98) . the compatibility rule for targetID is specified by https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id . For example, bundle entry with gfx90a can be consumed by device with GetIsaName gfx90a:xnack+ or gfx90a:xnack- . but bundle entry with gfx90a:xnack+ can only be consumed by device with GetIsaName gfx90a:xnack+. Language runtime is supposed to do a compatibility check for bundle entry with the device GetIsaName. Isa::IsCompatible (https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/3b939c398bdac0c2b9a860ff9a0ed0be0c80f911/src/core/runtime/isa.cpp#L73) can be used to do that. For convenience, language runtime is expected to use targetID for identifying bundle entries instead of re-construct targetID from features when needed. targetID is also used for compatibility checks when linking bitcode. yaxunl: targetID is part of ROCm ABI as it is returned as part of Isa::GetIsaName (https://github.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions So what we need is some more sophisticated logic in the linker wrapper to merge the binaries according to these rules. However the handling will definitely require pulling this apart when we send it to LTO. jhuber6: So what we need is some more sophisticated logic in the linker wrapper to merge the binaries…
		saiislamUnsubmitted Not Done Reply Inline Actions Some logic is given in ClangOffloadBundler and in AMDGPU plugin saiislam: Some logic is given in [[ https://github.com/llvm/llvm…
		yaxunlUnsubmitted Not Done Reply Inline Actions targetID is only used to identify bundle entry. It is not directly represented in LLVM IR. In LLVM IR, the target cpu does not contain the target ID feature string. Therefore lld does not know about it. The linker wrapper is responsible to do the compatibility check based on bundle ID. yaxunl: targetID is only used to identify bundle entry. It is not directly represented in LLVM IR. In…
"kind=" + Kind.str(),		"kind=" + Kind.str(),
};		};

if (TC->getDriver().isUsingLTO(/* IsOffload */ true))		if (TC->getDriver().isUsingLTO(/* IsOffload */ true) \|\|
		TC->getTriple().isAMDGPU())
for (StringRef Feature : FeatureArgs)		for (StringRef Feature : FeatureArgs)
Parts.emplace_back("feature=" + Feature.str());		Parts.emplace_back("feature=" + Feature.str());

CmdArgs.push_back(Args.MakeArgString("--image=" + llvm::join(Parts, ",")));		CmdArgs.push_back(Args.MakeArgString("--image=" + llvm::join(Parts, ",")));
}		}

C.addCommand(std::make_unique<Command>(		C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::None(),		JA, *this, ResponseFileSupport::None(),
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

clang/test/Driver/amdgpu-openmp-toolchain.c

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	// RUN: --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 \| \			// RUN: --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 \| \
	// RUN: FileCheck %s --check-prefix=CHECK-LIB-DEVICE			// RUN: FileCheck %s --check-prefix=CHECK-LIB-DEVICE
	// CHECK-LIB-DEVICE: "-cc1" {{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"			// CHECK-LIB-DEVICE: "-cc1" {{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"

	// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx803 -nogpulib \			// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx803 -nogpulib \
	// RUN: --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 \| \			// RUN: --rocm-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode -fopenmp-new-driver %s 2>&1 \| \
	// RUN: FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NOGPULIB			// RUN: FileCheck %s --check-prefix=CHECK-LIB-DEVICE-NOGPULIB
	// CHECK-LIB-DEVICE-NOGPULIB-NOT: "-cc1" {{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"			// CHECK-LIB-DEVICE-NOGPULIB-NOT: "-cc1" {{.}}ocml.bc"{{.}}ockl.bc"{{.}}oclc_daz_opt_on.bc"{{.}}oclc_unsafe_math_off.bc"{{.}}oclc_finite_only_off.bc"{{.}}oclc_correctly_rounded_sqrt_on.bc"{{.}}oclc_wavefrontsize64_on.bc"{{.}}oclc_isa_version_803.bc"

				// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a:sramecc-:xnack+ \
				// RUN: -nogpulib %s 2>&1 \| FileCheck %s --check-prefix=CHECK-TARGET-ID
				// CHECK-TARGET-ID: clang-offload-packager{{.*}}arch=gfx90a,kind=openmp,feature=-sramecc,feature=+xnack