This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
ClangCommandLineReference.rst
-
include/clang/Driver/
-
clang/
-
Driver/
1/2
Options.td
-
lib/Driver/
-
Driver/
-
Driver.cpp
-
test/Driver/
-
Driver/
-
cuda-openmp-driver.cu

Differential D126398

[Clang] Introduce `--offload-link` option to perform offload device linking
ClosedPublic

Authored by jhuber6 on May 25 2022, 10:30 AM.

Download Raw Diff

Details

Reviewers

MaskRay
jdoerfert
yaxunl
tra

Commits

rGb7c8c4d8cf07: [Clang] Introduce `--offload-link` option to perform offload device linking

Summary

The new driver uses an augmented linker wrapper to perform the device
linking phase, but to the user looks like a regular linker invocation.
Contrary to the old driver, the new driver contains all the information
necessary to produce a linked device image in the host object itself.
Currently, we infer the usage of the device linker by the user
specifying an offloading toolchain, e.g. (--offload-arch=...) or
(-fopenmp-targets=...), but this shouldn't be strictly necessary.
This patch introduces a new option --offload-link to tell
the driver to use the offloading linker instead. So a compilation flow
can now look like this,

clang foo.cu --offload-new-driver -fgpu-rdc --offload-arch=sm_70 -c
clang foo.o --offload-link -lcudart

I was considering if this could be merged into the -fuse-ld option,
but because the device linker wraps over the users linker it would
conflict with that. In the future it's possible to merge this into lld
completely or gold via a plugin and we would use this option to
enable the device linking feature. Let me know what you think for this.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.May 25 2022, 10:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 10:30 AM

Herald added a subscriber: StephenFan. · View Herald Transcript

jhuber6 requested review of this revision.May 25 2022, 10:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 10:30 AM

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B166309: Diff 432042.May 25 2022, 11:07 AM

Currently, we infer the usage of the device linker by the user
specifying an offloading toolchain, e.g. (--offload-arch=...) or
(-fopenmp-targets=...), but this shouldn't be strictly necessary.

Yup. Whether we want to perform device link or not is orthogonal to those options.

This patch introduces a new option -dl to tell the driver to use the
offloading linker instead. So a compilation flow can now look like this,

clang foo.cu --offload-new-driver -fgpu-rdc --offload-arch=sm_70 -c
clang foo.o -dl -lcudart

It's an essential feature, as we do want to be able to produce libraries with host object files, but with fully linked GPU executables.
Naming, as usual, is hard. I would prefer a more explicit --offload-link which would be in line with other --offload* options we have by now.
-dl is cryptic for uninitiated and is uncomfortably close to commonly used -ldl. If it gets mistyped as -ld, it would lead to a legitimate but unrelated error about missing libd. Or it might silently succeed linking with libd without actually doing any device linking.

I was considering if this could be merged into the -fuse-ld option,
but becuse the device linker wraps over the users linker it would
conflict with that. In the future it's possible to merge this into lld
completely or gold via a plugin. Let me know what you think for this.

In D126398#3537942, @tra wrote:

Naming, as usual, is hard. I would prefer a more explicit --offload-link which would be in line with other --offload* options we have by now.
-dl is cryptic for uninitiated and is uncomfortably close to commonly used -ldl. If it gets mistyped as -ld, it would lead to a legitimate but unrelated error about missing libd. Or it might silently succeed linking with libd without actually doing any device linking.

Yeah, I can see your point, --offload-link definitely works but it would be nice to have something less verbose. Maybe could just use -dlink or something.

I was considering if this could be merged into the -fuse-ld option,
but becuse the device linker wraps over the users linker it would
conflict with that. In the future it's possible to merge this into lld
completely or gold via a plugin. Let me know what you think for this.

I should also add, even if we built-in support for this into LLD or Gold, we'll probably still need a flag like this to tell Gold to use the plugin, or LLD to do the extra processing.

Unrelated, but in the future I'm also considering making the linker wrapper add the necessary libraries for whatever offloading kinds it found. E.g. if the link job finds embedded OpenMP code it will add -lomp -lomptarget if not already present.

jhuber6 retitled this revision from [Clang] Introduce `-dl` option to perform offload device linking to [Clang] Introduce `-dlink` option to perform offload device linking.May 25 2022, 11:39 AM

jhuber6 edited the summary of this revision. (Show Details)

Changing to use --offload-link and use -dlink as an alias.

tra added inline comments.May 25 2022, 11:53 AM

clang/include/clang/Driver/Options.td
825–826	We typically use option aliases to provide compatibility with the legacy options. AFAICT there are no current uses of `Alias` for the sake of saving a few characters in an option name. Is `-dlink` really needed? It's not going to be typed manually all that often, and a slightly longer option does not make any difference for cmake or make files. I assume that partial motivation for `-dlink` is that it's a shortened alias used by NVCC for its functionally similar --device-link option. I do not think it buys us anything. We never intended to be option-compatible with nvcc and clang's CUDA compilation and relevant options have only partial overlap with NVCC's functionality-wise and almost none syntax-wise. Adding one rarely used option for the same of matching NVCC's is not worth it, IMO.

jhuber6 added inline comments.May 25 2022, 11:59 AM

clang/include/clang/Driver/Options.td
825–826	Yeah, it's somewhat mental compatibility with Nvidia parlance, see `-fgpu-rdc` and `-rdc=true`, if you think that's not necessary I'll just make it `--offload-link`.

jhuber6 retitled this revision from [Clang] Introduce `-dlink` option to perform offload device linking to [Clang] Introduce `--offload-link` option to perform offload device linking.May 25 2022, 12:00 PM

jhuber6 edited the summary of this revision. (Show Details)

Removing -dlink

tra accepted this revision.May 25 2022, 12:04 PM

This revision is now accepted and ready to land.May 25 2022, 12:04 PM

Harbormaster completed remote builds in B166329: Diff 432071.May 25 2022, 1:14 PM

Closed by commit rGb7c8c4d8cf07: [Clang] Introduce `--offload-link` option to perform offload device linking (authored by jhuber6). · Explain WhyMay 25 2022, 1:31 PM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rGb7c8c4d8cf07: [Clang] Introduce `--offload-link` option to perform offload device linking.

Revision Contents

Path

Size

clang/

docs/

ClangCommandLineReference.rst

4 lines

include/

clang/

Driver/

Options.td

2 lines

lib/

Driver/

Driver.cpp

3 lines

test/

Driver/

cuda-openmp-driver.cu

5 lines

Diff 432071

clang/docs/ClangCommandLineReference.rst

	Show First 20 Lines • Show All 4,203 Lines • ▼ Show 20 Lines
	.. option:: -Ttext<addr>			.. option:: -Ttext<addr>

	Set starting address of TEXT to <addr>			Set starting address of TEXT to <addr>

	.. option:: -Wl,<arg>,<arg2>...			.. option:: -Wl,<arg>,<arg2>...

	Pass the comma separated arguments in <arg> to the linker			Pass the comma separated arguments in <arg> to the linker

				.. option:: --offload-link

				Use the linker supporting offloading device linking.

	.. option:: -X			.. option:: -X

	.. option:: -Xlinker <arg>, --for-linker <arg>, --for-linker=<arg>			.. option:: -Xlinker <arg>, --for-linker <arg>, --for-linker=<arg>

	Pass <arg> to the linker			Pass <arg> to the linker

	.. option:: -Xoffload-linker <arg>, -Xoffload-linker-<triple> <arg>			.. option:: -Xoffload-linker <arg>, -Xoffload-linker-<triple> <arg>

	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 814 Lines • ▼ Show 20 Lines
	def Xopenmp_target : Separate<["-"], "Xopenmp-target">, Group<CompileOnly_Group>,			def Xopenmp_target : Separate<["-"], "Xopenmp-target">, Group<CompileOnly_Group>,
	HelpText<"Pass <arg> to the target offloading toolchain.">, MetaVarName<"<arg>">;			HelpText<"Pass <arg> to the target offloading toolchain.">, MetaVarName<"<arg>">;
	def Xopenmp_target_EQ : JoinedAndSeparate<["-"], "Xopenmp-target=">, Group<CompileOnly_Group>,			def Xopenmp_target_EQ : JoinedAndSeparate<["-"], "Xopenmp-target=">, Group<CompileOnly_Group>,
	HelpText<"Pass <arg> to the target offloading toolchain identified by <triple>.">,			HelpText<"Pass <arg> to the target offloading toolchain identified by <triple>.">,
	MetaVarName<"<triple> <arg>">;			MetaVarName<"<triple> <arg>">;
	def z : Separate<["-"], "z">, Flags<[LinkerInput, RenderAsInput]>,			def z : Separate<["-"], "z">, Flags<[LinkerInput, RenderAsInput]>,
	HelpText<"Pass -z <arg> to the linker">, MetaVarName<"<arg>">,			HelpText<"Pass -z <arg> to the linker">, MetaVarName<"<arg>">,
	Group<Link_Group>;			Group<Link_Group>;
				def offload_link : Flag<["--"], "offload-link">, Group<Link_Group>,
				HelpText<"Use the new offloading linker to perform the link job.">;
	def Xlinker : Separate<["-"], "Xlinker">, Flags<[LinkerInput, RenderAsInput]>,			def Xlinker : Separate<["-"], "Xlinker">, Flags<[LinkerInput, RenderAsInput]>,
	HelpText<"Pass <arg> to the linker">, MetaVarName<"<arg>">,			HelpText<"Pass <arg> to the linker">, MetaVarName<"<arg>">,
				traUnsubmitted Not Done Reply Inline Actions We typically use option aliases to provide compatibility with the legacy options. AFAICT there are no current uses of `Alias` for the sake of saving a few characters in an option name. Is `-dlink` really needed? It's not going to be typed manually all that often, and a slightly longer option does not make any difference for cmake or make files. I assume that partial motivation for `-dlink` is that it's a shortened alias used by NVCC for its functionally similar --device-link option. I do not think it buys us anything. We never intended to be option-compatible with nvcc and clang's CUDA compilation and relevant options have only partial overlap with NVCC's functionality-wise and almost none syntax-wise. Adding one rarely used option for the same of matching NVCC's is not worth it, IMO. tra: We typically use option aliases to provide compatibility with the legacy options. AFAICT there…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions Yeah, it's somewhat mental compatibility with Nvidia parlance, see `-fgpu-rdc` and `-rdc=true`, if you think that's not necessary I'll just make it `--offload-link`. jhuber6: Yeah, it's somewhat mental compatibility with Nvidia parlance, see `-fgpu-rdc` and `-rdc=true`…
	Group<Link_Group>;			Group<Link_Group>;
	def Xoffload_linker : JoinedAndSeparate<["-"], "Xoffload-linker">,			def Xoffload_linker : JoinedAndSeparate<["-"], "Xoffload-linker">,
	HelpText<"Pass <arg> to the offload linkers or the ones idenfied by -<triple>">,			HelpText<"Pass <arg> to the offload linkers or the ones idenfied by -<triple>">,
	MetaVarName<"<triple> <arg>">, Group<Link_Group>;			MetaVarName<"<triple> <arg>">, Group<Link_Group>;
	def Xpreprocessor : Separate<["-"], "Xpreprocessor">, Group<Preprocessor_Group>,			def Xpreprocessor : Separate<["-"], "Xpreprocessor">, Group<Preprocessor_Group>,
	HelpText<"Pass <arg> to the preprocessor">, MetaVarName<"<arg>">;			HelpText<"Pass <arg> to the preprocessor">, MetaVarName<"<arg>">;
	def X_Flag : Flag<["-"], "X">, Group<Link_Group>;			def X_Flag : Flag<["-"], "X">, Group<Link_Group>;
	def X_Joined : Joined<["-"], "X">, IgnoredGCCCompat;			def X_Joined : Joined<["-"], "X">, IgnoredGCCCompat;
	▲ Show 20 Lines • Show All 5,970 Lines • Show Last 20 Lines

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 4,152 Lines • ▼ Show 20 Lines	void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
if (!LinkerInputs.empty()) {		if (!LinkerInputs.empty()) {
if (!UseNewOffloadingDriver)		if (!UseNewOffloadingDriver)
if (Action *Wrapper = OffloadBuilder.makeHostLinkAction())		if (Action *Wrapper = OffloadBuilder.makeHostLinkAction())
LinkerInputs.push_back(Wrapper);		LinkerInputs.push_back(Wrapper);
Action *LA;		Action *LA;
// Check if this Linker Job should emit a static library.		// Check if this Linker Job should emit a static library.
if (ShouldEmitStaticLibrary(Args)) {		if (ShouldEmitStaticLibrary(Args)) {
LA = C.MakeAction<StaticLibJobAction>(LinkerInputs, types::TY_Image);		LA = C.MakeAction<StaticLibJobAction>(LinkerInputs, types::TY_Image);
} else if (UseNewOffloadingDriver) {		} else if (UseNewOffloadingDriver \|\|
		Args.hasArg(options::OPT_offload_link)) {
LA = C.MakeAction<LinkerWrapperJobAction>(LinkerInputs, types::TY_Image);		LA = C.MakeAction<LinkerWrapperJobAction>(LinkerInputs, types::TY_Image);
LA->propagateHostOffloadInfo(C.getActiveOffloadKinds(),		LA->propagateHostOffloadInfo(C.getActiveOffloadKinds(),
/BoundArch=/nullptr);		/BoundArch=/nullptr);
} else {		} else {
LA = C.MakeAction<LinkJobAction>(LinkerInputs, types::TY_Image);		LA = C.MakeAction<LinkJobAction>(LinkerInputs, types::TY_Image);
}		}
if (!UseNewOffloadingDriver)		if (!UseNewOffloadingDriver)
LA = OffloadBuilder.processHostLinkAction(LA);		LA = OffloadBuilder.processHostLinkAction(LA);
▲ Show 20 Lines • Show All 2,104 Lines • Show Last 20 Lines

clang/test/Driver/cuda-openmp-driver.cu

	Show All 29 Lines
	// RUN: \| FileCheck -check-prefix BINDINGS-DEVICE %s			// RUN: \| FileCheck -check-prefix BINDINGS-DEVICE %s

	// BINDINGS-DEVICE: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]]"			// BINDINGS-DEVICE: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX:.+]]"
	// BINDINGS-DEVICE: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]]"], output: "[[CUBIN:.+]]"			// BINDINGS-DEVICE: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX]]"], output: "[[CUBIN:.+]]"
	// BINDINGS-DEVICE: # "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]]", "[[PTX]]"], output: "{{.*}}.fatbin"			// BINDINGS-DEVICE: # "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN]]", "[[PTX]]"], output: "{{.*}}.fatbin"

	// RUN: %clang -### -target x86_64-linux-gnu -nocudalib --cuda-feature=+ptx61 --offload-arch=sm_70 %s 2>&1 \| FileCheck -check-prefix MANUAL-FEATURE %s			// RUN: %clang -### -target x86_64-linux-gnu -nocudalib --cuda-feature=+ptx61 --offload-arch=sm_70 %s 2>&1 \| FileCheck -check-prefix MANUAL-FEATURE %s
	// MANUAL-FEATURE: -cc1{{.}}-target-feature{{.}}+ptx61			// MANUAL-FEATURE: -cc1{{.}}-target-feature{{.}}+ptx61

				// RUN: %clang -### -target x86_64-linux-gnu -nocudalib -ccc-print-bindings --offload-link %s 2>&1 \
				// RUN: \| FileCheck -check-prefix DEVICE-LINK %s

				// DEVICE-LINK: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[INPUT:.+]]"], output: "a.out"

This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Introduce `--offload-link` option to perform offload device linkingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 432071

clang/docs/ClangCommandLineReference.rst

clang/include/clang/Driver/Options.td

clang/lib/Driver/Driver.cpp

clang/test/Driver/cuda-openmp-driver.cu

[Clang] Introduce `--offload-link` option to perform offload device linking
ClosedPublic