Download Raw Diff

Details

Reviewers

yaxunl
jhuber6

Commits

rGbe8a65b598b3: [HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc

Summary

Provide control over clang job / action creation. This feature provides the phase pipeline for an upcoming COMGR action : AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_RELOCATABLE

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jrbyrnes created this revision.Jun 23 2023, 3:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 3:00 PM

jrbyrnes requested review of this revision.Jun 23 2023, 3:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 3:00 PM

Herald added subscribers: cfe-commits, MaskRay. · View Herald Transcript

jrbyrnes edited the summary of this revision. (Show Details)Jun 23 2023, 3:01 PM

Formatting

Harbormaster completed remote builds in B240860: Diff 534086.Jun 23 2023, 4:01 PM

yaxunl added inline comments.Jun 23 2023, 4:02 PM

clang/test/Driver/hip-phases.hip
259	probably just use // RELOC-NOT: linker same for below also, we need a test for -fgpu-rdc case

Fix tests + add tests. Add phase test for -fgpu-rdc --no-gpu-link-output (these are not intended to be used together)

arsenm added a reviewer: jhuber6.Jun 26 2023, 2:20 PM

What's the difference here between this and the existing --hip-link?

In D153667#4450517, @jhuber6 wrote:

What's the difference here between this and the existing --hip-link?

Hi @jhuber6

The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.

We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no-gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.

AFAICT, -hip-link allows for linking of offload-bundles / linking through HIPAMD toolchain, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.

In D153667#4450705, @jrbyrnes wrote:

In D153667#4450517, @jhuber6 wrote:

What's the difference here between this and the existing --hip-link?

Hi @jhuber6

The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.

We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no- gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.

AFAICT, -hip-link allows for linking of offload-bundles, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.

So, you run the backend and obtain a relocatable ELF, but do not link it via lld? If I'm understanding this correctly, that would be the difference between -flto and -fno-lto, or -foffload-lto and -fno-offload-lto, AMDGPU always having -flto on currently. Also I recall AMDGPU / HIP completely disabling the backend step at some point, so it only emits LLVM-IR.

Harbormaster completed remote builds in B241292: Diff 534725.Jun 26 2023, 5:16 PM

In D153667#4450724, @jhuber6 wrote:

In D153667#4450705, @jrbyrnes wrote:

In D153667#4450517, @jhuber6 wrote:

What's the difference here between this and the existing --hip-link?

Hi @jhuber6

The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.

We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no- gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.

AFAICT, -hip-link allows for linking of offload-bundles, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.

So, you run the backend and obtain a relocatable ELF, but do not link it via lld? If I'm understanding this correctly, that would be the difference between -flto and -fno-lto, or -foffload-lto and -fno-offload-lto, AMDGPU always having -flto on currently. Also I recall AMDGPU / HIP completely disabling the backend step at some point, so it only emits LLVM-IR.

For -fno-gpu-rdc case we do not use lto. Since -fno-gpu-rdc has one TU only, we use the non-lto backend to get relocatable object, and use lld for relocatable to shared object. This patch allows us to stop at the relocatable object since comgr needs that.

In D153667#4455943, @yaxunl wrote:

For -fno-gpu-rdc case we do not use lto. Since -fno-gpu-rdc has one TU only, we use the non-lto backend to get relocatable object, and use lld for relocatable to shared object. This patch allows us to stop at the relocatable object since comgr needs that.

I see, so conceptually this is like -Xarch-device -c (if such a thing worked)?

I am thinking we probably want to rename the option as -fhip-emit-relocatable and limit it to be used with -cuda-device-only and -fno-gpu-rdc only.

In D153667#4455998, @jhuber6 wrote:

In D153667#4455943, @yaxunl wrote:

For -fno-gpu-rdc case we do not use lto. Since -fno-gpu-rdc has one TU only, we use the non-lto backend to get relocatable object, and use lld for relocatable to shared object. This patch allows us to stop at the relocatable object since comgr needs that.

I see, so conceptually this is like -Xarch-device -c (if such a thing worked)?

It is like that

In D153667#4450724, @jhuber6 wrote:

In D153667#4450705, @jrbyrnes wrote:

In D153667#4450517, @jhuber6 wrote:

What's the difference here between this and the existing --hip-link?

Hi @jhuber6

The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.

We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no- gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.

AFAICT, -hip-link allows for linking of offload-bundles, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.

So, you run the backend and obtain a relocatable ELF, but do not link it via lld? If I'm understanding this correctly, that would be the difference between -flto and -fno-lto, or -foffload-lto and -fno-offload-lto, AMDGPU always having -flto on currently. Also I recall AMDGPU / HIP completely disabling the backend step at some point, so it only emits LLVM-IR.

The whole point of this work is to give hiprtc a way to compile-to-bitcode and optimize sources in a single step, to make (user-passed) flag handling less weird. Since the intent of LTO is to defer this optimization step, I would assume any way we try to use it here would not be correct.

Naming + -cuda-device-only and -fno-gpu-rdc only

jrbyrnes retitled this revision from [HIP]: Add gpu-link-output to control link job creation to [HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc.Jun 28 2023, 10:09 AM

yaxunl added inline comments.Jun 28 2023, 10:57 AM

clang/lib/Driver/Driver.cpp
3337–3339	There are data members Relocatable and CompileDeviceOnly in the base class. You can use them instead of using local variables.
3344	need to emit a diag diag::err_opt_not_valid_with_opt for fgpu-rdc and diag::err_opt_not_valid_without_opt if it is not device only.

Harbormaster completed remote builds in B241836: Diff 535456.Jun 28 2023, 11:18 AM

Use member variabls + add diagnostic + tests

Harbormaster completed remote builds in B241860: Diff 535484.Jun 28 2023, 1:18 PM

yaxunl added inline comments.Jun 28 2023, 1:19 PM

clang/lib/Driver/Driver.cpp
3334–3336	probably needs to be moved to ctor of CudaActionBuilderBase since they are needed by both Cuda and HIP action builders.

Address Comment

clang/lib/Driver/Driver.cpp
3334–3336	Thanks

Harbormaster completed remote builds in B241886: Diff 535519.Jun 28 2023, 4:19 PM

LGTM. Thanks

This revision is now accepted and ready to land.Jun 28 2023, 6:58 PM

Closed by commit rGbe8a65b598b3: [HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc (authored by jrbyrnes). · Explain WhyJun 29 2023, 8:19 AM

This revision was automatically updated to reflect the committed changes.

jrbyrnes added a commit: rGbe8a65b598b3: [HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc.

Diff 535519

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,118 Lines • ▼ Show 20 Lines	def gpu_instrument_lib_EQ : Joined<["--"], "gpu-instrument-lib=">,
"__cyg_profile_func_enter and __cyg_profile_func_exit">;		"__cyg_profile_func_enter and __cyg_profile_func_exit">;
def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,		def fgpu_sanitize : Flag<["-"], "fgpu-sanitize">, Group<f_Group>,
HelpText<"Enable sanitizer for AMDGPU target">;		HelpText<"Enable sanitizer for AMDGPU target">;
def fno_gpu_sanitize : Flag<["-"], "fno-gpu-sanitize">, Group<f_Group>;		def fno_gpu_sanitize : Flag<["-"], "fno-gpu-sanitize">, Group<f_Group>;
def gpu_bundle_output : Flag<["--"], "gpu-bundle-output">,		def gpu_bundle_output : Flag<["--"], "gpu-bundle-output">,
Group<f_Group>, HelpText<"Bundle output files of HIP device compilation">;		Group<f_Group>, HelpText<"Bundle output files of HIP device compilation">;
def no_gpu_bundle_output : Flag<["--"], "no-gpu-bundle-output">,		def no_gpu_bundle_output : Flag<["--"], "no-gpu-bundle-output">,
Group<f_Group>, HelpText<"Do not bundle output files of HIP device compilation">;		Group<f_Group>, HelpText<"Do not bundle output files of HIP device compilation">;
		def fhip_emit_relocatable : Flag<["-"], "fhip-emit-relocatable">, Group<f_Group>,
		HelpText<"Compile HIP source to relocatable">;
		def fno_hip_emit_relocatable : Flag<["-"], "fno-hip-emit-relocatable">, Group<f_Group>,
		HelpText<"Do not override toolchain to compile HIP source to relocatable">;
def cuid_EQ : Joined<["-"], "cuid=">, Flags<[CC1Option]>,		def cuid_EQ : Joined<["-"], "cuid=">, Flags<[CC1Option]>,
HelpText<"An ID for compilation unit, which should be the same for the same "		HelpText<"An ID for compilation unit, which should be the same for the same "
"compilation unit but different for different compilation units. "		"compilation unit but different for different compilation units. "
"It is used to externalize device-side static variables for single "		"It is used to externalize device-side static variables for single "
"source offloading languages CUDA and HIP so that they can be "		"source offloading languages CUDA and HIP so that they can be "
"accessed by the host code of the same compilation unit.">,		"accessed by the host code of the same compilation unit.">,
MarshallingInfoString<LangOpts<"CUID">>;		MarshallingInfoString<LangOpts<"CUID">>;
def fuse_cuid_EQ : Joined<["-"], "fuse-cuid=">,		def fuse_cuid_EQ : Joined<["-"], "fuse-cuid=">,
▲ Show 20 Lines • Show All 6,266 Lines • Show Last 20 Lines

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 2,910 Lines • ▼ Show 20 Lines	protected:

/// Compilation unit ID specified by option '-cuid='.		/// Compilation unit ID specified by option '-cuid='.
StringRef FixedCUID;		StringRef FixedCUID;

public:		public:
CudaActionBuilderBase(Compilation &C, DerivedArgList &Args,		CudaActionBuilderBase(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs,		const Driver::InputList &Inputs,
Action::OffloadKind OFKind)		Action::OffloadKind OFKind)
: DeviceActionBuilder(C, Args, Inputs, OFKind) {}		: DeviceActionBuilder(C, Args, Inputs, OFKind) {

		CompileDeviceOnly = C.getDriver().offloadDeviceOnly();
		Relocatable = Args.hasFlag(options::OPT_fgpu_rdc,
		options::OPT_fno_gpu_rdc, /Default=/false);
		}

ActionBuilderReturnCode addDeviceDependences(Action *HostAction) override {		ActionBuilderReturnCode addDeviceDependences(Action *HostAction) override {
// While generating code for CUDA, we only depend on the host input action		// While generating code for CUDA, we only depend on the host input action
// to trigger the creation of all the CUDA device actions.		// to trigger the creation of all the CUDA device actions.

// If we are dealing with an input action, replicate it for each GPU		// If we are dealing with an input action, replicate it for each GPU
// architecture. If we are in host-only mode we return 'success' so that		// architecture. If we are in host-only mode we return 'success' so that
// the host uses the CUDA offload kind.		// the host uses the CUDA offload kind.
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	bool initialize() override {
!C.hasOffloadToolChain<Action::OFK_Cuda>())		!C.hasOffloadToolChain<Action::OFK_Cuda>())
return false;		return false;

// We don't need to support HIP.		// We don't need to support HIP.
if (AssociatedOffloadKind == Action::OFK_HIP &&		if (AssociatedOffloadKind == Action::OFK_HIP &&
!C.hasOffloadToolChain<Action::OFK_HIP>())		!C.hasOffloadToolChain<Action::OFK_HIP>())
return false;		return false;

Relocatable = Args.hasFlag(options::OPT_fgpu_rdc,
options::OPT_fno_gpu_rdc, /Default=/false);

const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();		const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
assert(HostTC && "No toolchain for host compilation.");		assert(HostTC && "No toolchain for host compilation.");
if (HostTC->getTriple().isNVPTX() \|\|		if (HostTC->getTriple().isNVPTX() \|\|
HostTC->getTriple().getArch() == llvm::Triple::amdgcn) {		HostTC->getTriple().getArch() == llvm::Triple::amdgcn) {
// We do not support targeting NVPTX/AMDGCN for host compilation. Throw		// We do not support targeting NVPTX/AMDGCN for host compilation. Throw
// an error and abort pipeline construction early so we don't trip		// an error and abort pipeline construction early so we don't trip
// asserts that assume device-side compilation.		// asserts that assume device-side compilation.
C.getDriver().Diag(diag::err_drv_cuda_host_arch)		C.getDriver().Diag(diag::err_drv_cuda_host_arch)
<< HostTC->getTriple().getArchName();		<< HostTC->getTriple().getArchName();
return true;		return true;
}		}

ToolChains.push_back(		ToolChains.push_back(
AssociatedOffloadKind == Action::OFK_Cuda		AssociatedOffloadKind == Action::OFK_Cuda
? C.getSingleOffloadToolChain<Action::OFK_Cuda>()		? C.getSingleOffloadToolChain<Action::OFK_Cuda>()
: C.getSingleOffloadToolChain<Action::OFK_HIP>());		: C.getSingleOffloadToolChain<Action::OFK_HIP>());

CompileHostOnly = C.getDriver().offloadHostOnly();		CompileHostOnly = C.getDriver().offloadHostOnly();
CompileDeviceOnly = C.getDriver().offloadDeviceOnly();
EmitLLVM = Args.getLastArg(options::OPT_emit_llvm);		EmitLLVM = Args.getLastArg(options::OPT_emit_llvm);
EmitAsm = Args.getLastArg(options::OPT_S);		EmitAsm = Args.getLastArg(options::OPT_S);
FixedCUID = Args.getLastArgValue(options::OPT_cuid_EQ);		FixedCUID = Args.getLastArgValue(options::OPT_cuid_EQ);
if (Arg *A = Args.getLastArg(options::OPT_fuse_cuid_EQ)) {		if (Arg *A = Args.getLastArg(options::OPT_fuse_cuid_EQ)) {
StringRef UseCUIDStr = A->getValue();		StringRef UseCUIDStr = A->getValue();
UseCUID = llvm::StringSwitch<UseCUIDKind>(UseCUIDStr)		UseCUID = llvm::StringSwitch<UseCUIDKind>(UseCUIDStr)
.Case("hash", CUID_Hash)		.Case("hash", CUID_Hash)
.Case("random", CUID_Random)		.Case("random", CUID_Random)
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	class HIPActionBuilder final : public CudaActionBuilderBase {
/// The linker inputs obtained for each device arch.		/// The linker inputs obtained for each device arch.
SmallVector<ActionList, 8> DeviceLinkerInputs;		SmallVector<ActionList, 8> DeviceLinkerInputs;
// The default bundling behavior depends on the type of output, therefore		// The default bundling behavior depends on the type of output, therefore
// BundleOutput needs to be tri-value: None, true, or false.		// BundleOutput needs to be tri-value: None, true, or false.
// Bundle code objects except --no-gpu-output is specified for device		// Bundle code objects except --no-gpu-output is specified for device
// only compilation. Bundle other type of output files only if		// only compilation. Bundle other type of output files only if
// --gpu-bundle-output is specified for device only compilation.		// --gpu-bundle-output is specified for device only compilation.
std::optional<bool> BundleOutput;		std::optional<bool> BundleOutput;
		std::optional<bool> EmitReloc;

public:		public:
HIPActionBuilder(Compilation &C, DerivedArgList &Args,		HIPActionBuilder(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs)		const Driver::InputList &Inputs)
: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_HIP) {		: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_HIP) {

DefaultCudaArch = CudaArch::GFX906;		DefaultCudaArch = CudaArch::GFX906;

		if (Args.hasArg(options::OPT_fhip_emit_relocatable,
		options::OPT_fno_hip_emit_relocatable)) {
		yaxunlUnsubmitted Done Reply Inline Actions probably needs to be moved to ctor of CudaActionBuilderBase since they are needed by both Cuda and HIP action builders. yaxunl: probably needs to be moved to ctor of CudaActionBuilderBase since they are needed by both Cuda…
		jrbyrnesAuthorUnsubmitted Done Reply Inline Actions Thanks jrbyrnes: Thanks
		EmitReloc = Args.hasFlag(options::OPT_fhip_emit_relocatable,
		options::OPT_fno_hip_emit_relocatable, false);

		yaxunlUnsubmitted Done Reply Inline Actions There are data members Relocatable and CompileDeviceOnly in the base class. You can use them instead of using local variables. yaxunl: There are data members Relocatable and CompileDeviceOnly in the base class. You can use them…
		if (*EmitReloc) {
		if (Relocatable) {
		C.getDriver().Diag(diag::err_opt_not_valid_with_opt)
		<< "-fhip-emit-relocatable"
		<< "-fgpu-rdc";
		yaxunlUnsubmitted Done Reply Inline Actions need to emit a diag diag::err_opt_not_valid_with_opt for fgpu-rdc and diag::err_opt_not_valid_without_opt if it is not device only. yaxunl: need to emit a diag diag::err_opt_not_valid_with_opt for fgpu-rdc and diag…
		}

		if (!CompileDeviceOnly) {
		C.getDriver().Diag(diag::err_opt_not_valid_without_opt)
		<< "-fhip-emit-relocatable"
		<< "--cuda-device-only";
		}
		}
		}

if (Args.hasArg(options::OPT_gpu_bundle_output,		if (Args.hasArg(options::OPT_gpu_bundle_output,
options::OPT_no_gpu_bundle_output))		options::OPT_no_gpu_bundle_output))
BundleOutput = Args.hasFlag(options::OPT_gpu_bundle_output,		BundleOutput = Args.hasFlag(options::OPT_gpu_bundle_output,
options::OPT_no_gpu_bundle_output, true);		options::OPT_no_gpu_bundle_output, true) &&
		(!EmitReloc \|\| !*EmitReloc);
}		}

bool canUseBundlerUnbundler() const override { return true; }		bool canUseBundlerUnbundler() const override { return true; }

StringRef getCanonicalOffloadArch(StringRef IdStr) override {		StringRef getCanonicalOffloadArch(StringRef IdStr) override {
llvm::StringMap<bool> Features;		llvm::StringMap<bool> Features;
// getHIPOffloadTargetTriple() is known to return valid value as it has		// getHIPOffloadTargetTriple() is known to return valid value as it has
// been called successfully in the CreateOffloadingDeviceToolChains().		// been called successfully in the CreateOffloadingDeviceToolChains().
Show All 30 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
return ABRT_Success;		return ABRT_Success;

assert(((CurPhase == phases::Link && Relocatable) \|\|		assert(((CurPhase == phases::Link && Relocatable) \|\|
CudaDeviceActions.size() == GpuArchList.size()) &&		CudaDeviceActions.size() == GpuArchList.size()) &&
"Expecting one action per GPU architecture.");		"Expecting one action per GPU architecture.");
assert(!CompileHostOnly &&		assert(!CompileHostOnly &&
"Not expecting HIP actions in host-only compilation.");		"Not expecting HIP actions in host-only compilation.");

		bool ShouldLink = !EmitReloc \|\| !*EmitReloc;

if (!Relocatable && CurPhase == phases::Backend && !EmitLLVM &&		if (!Relocatable && CurPhase == phases::Backend && !EmitLLVM &&
!EmitAsm) {		!EmitAsm && ShouldLink) {
// If we are in backend phase, we attempt to generate the fat binary.		// If we are in backend phase, we attempt to generate the fat binary.
// We compile each arch to IR and use a link action to generate code		// We compile each arch to IR and use a link action to generate code
// object containing ISA. Then we use a special "link" action to create		// object containing ISA. Then we use a special "link" action to create
// a fat binary containing all the code objects for different GPU's.		// a fat binary containing all the code objects for different GPU's.
// The fat binary is then an input to the host action.		// The fat binary is then an input to the host action.
for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {		for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
if (C.getDriver().isUsingLTO(/IsOffload=/true)) {		if (C.getDriver().isUsingLTO(/IsOffload=/true)) {
// When LTO is enabled, skip the backend and assemble phases and		// When LTO is enabled, skip the backend and assemble phases and
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,

// Remove the CUDA actions as they are already connected to an host		// Remove the CUDA actions as they are already connected to an host
// action or fat binary.		// action or fat binary.
CudaDeviceActions.clear();		CudaDeviceActions.clear();
}		}

return CompileDeviceOnly ? ABRT_Ignore_Host : ABRT_Success;		return CompileDeviceOnly ? ABRT_Ignore_Host : ABRT_Success;
} else if (CurPhase == phases::Link) {		} else if (CurPhase == phases::Link) {
		if (!ShouldLink)
		return ABRT_Success;
// Save CudaDeviceActions to DeviceLinkerInputs for each GPU subarch.		// Save CudaDeviceActions to DeviceLinkerInputs for each GPU subarch.
// This happens to each device action originated from each input file.		// This happens to each device action originated from each input file.
// Later on, device actions in DeviceLinkerInputs are used to create		// Later on, device actions in DeviceLinkerInputs are used to create
// device link actions in appendLinkDependences and the created device		// device link actions in appendLinkDependences and the created device
// link actions are passed to the offload action as device dependence.		// link actions are passed to the offload action as device dependence.
DeviceLinkerInputs.resize(CudaDeviceActions.size());		DeviceLinkerInputs.resize(CudaDeviceActions.size());
auto LI = DeviceLinkerInputs.begin();		auto LI = DeviceLinkerInputs.begin();
for (auto *A : CudaDeviceActions) {		for (auto *A : CudaDeviceActions) {
Show All 21 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
CudaDeviceActions[I] = C.MakeAction<OffloadAction>(		CudaDeviceActions[I] = C.MakeAction<OffloadAction>(
DDep, CudaDeviceActions[I]->getType());		DDep, CudaDeviceActions[I]->getType());
}		}
CudaFatBinary =		CudaFatBinary =
C.MakeAction<OffloadBundlingJobAction>(CudaDeviceActions);		C.MakeAction<OffloadBundlingJobAction>(CudaDeviceActions);
CudaDeviceActions.clear();		CudaDeviceActions.clear();
}		}

return (CompileDeviceOnly && CurPhase == FinalPhase) ? ABRT_Ignore_Host		return (CompileDeviceOnly &&
		(CurPhase == FinalPhase \|\|
		(!ShouldLink && CurPhase == phases::Assemble)))
		? ABRT_Ignore_Host
: ABRT_Success;		: ABRT_Success;
}		}

void appendLinkDeviceActions(ActionList &AL) override {		void appendLinkDeviceActions(ActionList &AL) override {
if (DeviceLinkerInputs.size() == 0)		if (DeviceLinkerInputs.size() == 0)
return;		return;

assert(DeviceLinkerInputs.size() == GpuArchList.size() &&		assert(DeviceLinkerInputs.size() == GpuArchList.size() &&
"Linker inputs and GPU arch list sizes do not match.");		"Linker inputs and GPU arch list sizes do not match.");
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	addDeviceDependencesToHostAction(Action HostAction, const Arg InputArg,
auto &OffloadKind = InputArgToOffloadKindMap[InputArg];		auto &OffloadKind = InputArgToOffloadKindMap[InputArg];
unsigned InactiveBuilders = 0u;		unsigned InactiveBuilders = 0u;
unsigned IgnoringBuilders = 0u;		unsigned IgnoringBuilders = 0u;
for (auto *SB : SpecializedBuilders) {		for (auto *SB : SpecializedBuilders) {
if (!SB->isValid()) {		if (!SB->isValid()) {
++InactiveBuilders;		++InactiveBuilders;
continue;		continue;
}		}

auto RetCode =		auto RetCode =
SB->getDeviceDependences(DDeps, CurPhase, FinalPhase, Phases);		SB->getDeviceDependences(DDeps, CurPhase, FinalPhase, Phases);

// If the builder explicitly says the host action should be ignored,		// If the builder explicitly says the host action should be ignored,
// we need to increment the variable that tracks the builders that request		// we need to increment the variable that tracks the builders that request
// the host object to be ignored.		// the host object to be ignored.
if (RetCode == DeviceActionBuilder::ABRT_Ignore_Host)		if (RetCode == DeviceActionBuilder::ABRT_Ignore_Host)
++IgnoringBuilders;		++IgnoringBuilders;
▲ Show 20 Lines • Show All 2,903 Lines • Show Last 20 Lines

clang/test/Driver/hip-dependent-options.hip

This file was added.

				// RUN: %clang -### --target=x86_64-linux-gnu \
				// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
				// RUN: -c -fhip-emit-relocatable -nogpuinc -nogpulib --cuda-device-only -fgpu-rdc \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip --gpu-bundle-output \
				// RUN: 2>&1 \| FileCheck -check-prefixes=RELOCRDC %s

				// RELOCRDC: error: option '-fhip-emit-relocatable' cannot be specified with '-fgpu-rdc'

				// RUN: %clang -### --target=x86_64-linux-gnu \
				// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
				// RUN: -c -fhip-emit-relocatable -nogpuinc -nogpulib \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip --gpu-bundle-output \
				// RUN: 2>&1 \| FileCheck -check-prefixes=RELOCHOST %s

				// RELOCHOST: error: option '-fhip-emit-relocatable' cannot be specified without '--cuda-device-only'

clang/test/Driver/hip-device-compile.hip

	Show All 39 Lines
	// Output unbundled assembly.			// Output unbundled assembly.
	// RUN: %clang -c -S --cuda-device-only -### --target=x86_64-linux-gnu \			// RUN: %clang -c -S --cuda-device-only -### --target=x86_64-linux-gnu \
	// RUN: -o a.s -x hip --cuda-gpu-arch=gfx900 --no-gpu-bundle-output \			// RUN: -o a.s -x hip --cuda-gpu-arch=gfx900 --no-gpu-bundle-output \
	// RUN: --hip-device-lib=lib1.bc \			// RUN: --hip-device-lib=lib1.bc \
	// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \			// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \
	// RUN: %S/Inputs/hip_multiple_inputs/a.cu \			// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
	// RUN: 2>&1 \| FileCheck -check-prefixes=CHECK,ASM,NBUN %s			// RUN: 2>&1 \| FileCheck -check-prefixes=CHECK,ASM,NBUN %s

				// Output relocatable.
				// RUN: %clang -c --cuda-device-only -### --target=x86_64-linux-gnu \
				// RUN: -o a.o -x hip --cuda-gpu-arch=gfx900 -fhip-emit-relocatable \
				// RUN: --hip-device-lib=lib1.bc \
				// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: 2>&1 \| FileCheck -check-prefixes=CHECK,NBUN,RELOC %s

	// Output bundled assembly.			// Output bundled assembly.
	// RUN: %clang -c -S --cuda-device-only -### --target=x86_64-linux-gnu \			// RUN: %clang -c -S --cuda-device-only -### --target=x86_64-linux-gnu \
	// RUN: -o a.s -x hip --cuda-gpu-arch=gfx900 --no-gpu-bundle-output \			// RUN: -o a.s -x hip --cuda-gpu-arch=gfx900 --no-gpu-bundle-output \
	// RUN: --hip-device-lib=lib1.bc \			// RUN: --hip-device-lib=lib1.bc \
	// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \			// RUN: --hip-device-lib-path=%S/Inputs/hip_multiple_inputs/lib1 \
	// RUN: %S/Inputs/hip_multiple_inputs/a.cu --gpu-bundle-output \			// RUN: %S/Inputs/hip_multiple_inputs/a.cu --gpu-bundle-output \
	// RUN: 2>&1 \| FileCheck -check-prefixes=CHECK,ASMBUN %s			// RUN: 2>&1 \| FileCheck -check-prefixes=CHECK,ASMBUN %s

	// CHECK: {{".clang."}} "-cc1" "-triple" "amdgcn-amd-amdhsa"			// CHECK: {{".clang."}} "-cc1" "-triple" "amdgcn-amd-amdhsa"
	// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"			// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
	// BC-SAME: "-emit-llvm-bc"			// BC-SAME: "-emit-llvm-bc"
	// LL-SAME: "-emit-llvm"			// LL-SAME: "-emit-llvm"
	// ASM-NOT: "-emit-llvm"			// ASM-NOT: "-emit-llvm"
	// CHECK-SAME: "-main-file-name" "a.cu"			// CHECK-SAME: "-main-file-name" "a.cu"
	// CHECK-SAME: "-fcuda-is-device"			// CHECK-SAME: "-fcuda-is-device"
	// CHECK-SAME: {{".*lib1.bc"}}			// CHECK-SAME: {{".*lib1.bc"}}
	// CHECK-SAME: "-target-cpu" "gfx900"			// CHECK-SAME: "-target-cpu" "gfx900"
	// BC-SAME: "-o" "a.bc"			// BC-SAME: "-o" "a.bc"
	// BCBUN-SAME: "-o" "{{.*}}.bc"			// BCBUN-SAME: "-o" "{{.*}}.bc"
	// LL-SAME: "-o" "a.ll"			// LL-SAME: "-o" "a.ll"
	// LLBUN-SAME: "-o" "{{.*}}.ll"			// LLBUN-SAME: "-o" "{{.*}}.ll"
	// ASM-SAME: "-o" "a.s"			// ASM-SAME: "-o" "a.s"
	// ASMBUN-SAME: "-o" "{{.*}}.s"			// ASMBUN-SAME: "-o" "{{.*}}.s"
				// RELOC-SAME: "-o" "a.o"
	// CHECK-SAME: {{".*a.cu"}}			// CHECK-SAME: {{".*a.cu"}}

	// CHECK-NOT: {{"*.llvm-link"}}			// CHECK-NOT: {{"*.llvm-link"}}
	// CHECK-NOT: {{".*opt"}}			// CHECK-NOT: {{".*opt"}}
	// CHECK-NOT: {{".*llc"}}			// CHECK-NOT: {{".*llc"}}
	// CHECK-NOT: {{".lld."}}			// CHECK-NOT: {{".lld."}}
	// NBUN-NOT: {{".*clang-offload-bundler"}}			// NBUN-NOT: {{".*clang-offload-bundler"}}
	// BCBUN: {{".clang-offload-bundler"}}{{.}}"-output=a.bc"			// BCBUN: {{".clang-offload-bundler"}}{{.}}"-output=a.bc"
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

clang/test/Driver/hip-phases.hip

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
	// DASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])			// DASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
	// DASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])			// DASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])
	// DASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])			// DASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])
	// DASM-DAG: [[P4:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P3]]}, assembler			// DASM-DAG: [[P4:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P3]]}, assembler
	// DASM-NOT: clang-offload-bundler			// DASM-NOT: clang-offload-bundler
	// DASM-NOT: host			// DASM-NOT: host

	//			//
				// Test single gpu architecture with compile to relocatable in device-only
				// compilation mode.
				//
				// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
				// RUN: --cuda-gpu-arch=gfx803 %s --cuda-device-only -fhip-emit-relocatable 2>&1 \
				// RUN: \| FileCheck -check-prefixes=RELOC %s
				// RELOC-DAG: [[P0:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])
				// RELOC-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
				// RELOC-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])
				// RELOC-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])
				// RELOC-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-[[T]], [[ARCH]])
				// RELOC-NOT: linker
				yaxunlUnsubmitted Done Reply Inline Actions probably just use // RELOC-NOT: linker same for below also, we need a test for -fgpu-rdc case yaxunl: probably just use // RELOC-NOT: linker same for below also, we need a test for -fgpu-rdc case
				// RELOC-DAG: [[P5:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P4]]}, object

				//
				// Test two gpu architectures with compile to relocatable in device-only
				// compilation mode.
				//
				// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
				// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-device-only -fhip-emit-relocatable 2>&1 \
				// RUN: \| FileCheck -check-prefixes=RELOC2 %s
				// RELOC2-DAG: [[P0:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])
				// RELOC2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
				// RELOC2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])
				// RELOC2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])
				// RELOC2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-[[T]], [[ARCH]])
				// RELOC2-NOT: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (device-[[T]], [[ARCH]])
				// RELOC2-DAG: [[P5:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH]])" {[[P4]]}, object
				// RELOC2-DAG: [[P6:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T:hip]], (device-[[T]], [[ARCH2:gfx900]])
				// RELOC2-DAG: [[P7:[0-9]+]]: preprocessor, {[[P6]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH2]])
				// RELOC2-DAG: [[P8:[0-9]+]]: compiler, {[[P7]]}, ir, (device-[[T]], [[ARCH2]])
				// RELOC2-DAG: [[P9:[0-9]+]]: backend, {[[P8]]}, assembler, (device-[[T]], [[ARCH2]])
				// RELOC2-DAG: [[P10:[0-9]+]]: assembler, {[[P9]]}, object, (device-[[T]], [[ARCH2]])
				// RELOC2-NOT: linker
				// RELOC2-DAG: [[P11:[0-9]+]]: offload, "device-[[T]] (amdgcn-amd-amdhsa:[[ARCH2]])" {[[P10]]}, object

				//
	// Test two gpu architectures with complete compilation in device-only			// Test two gpu architectures with complete compilation in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \			// RUN: %clang -x hip --target=x86_64-unknown-linux-gnu -ccc-print-phases \
	// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-device-only \			// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-device-only \
	// RUN: 2>&1 \| FileCheck -check-prefixes=DBIN2 %s			// RUN: 2>&1 \| FileCheck -check-prefixes=DBIN2 %s
	// DBIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])			// DBIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}hip-phases.hip", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])
	// DBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])			// DBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
	▲ Show 20 Lines • Show All 317 Lines • Show Last 20 Lines

clang/test/Driver/hip-rdc-device-only.hip

	Show All 12 Lines

	// RUN: %clang -### --target=x86_64-linux-gnu \			// RUN: %clang -### --target=x86_64-linux-gnu \
	// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \			// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
	// RUN: -c -emit-llvm -nogpuinc -nogpulib --cuda-device-only -fgpu-rdc \			// RUN: -c -emit-llvm -nogpuinc -nogpulib --cuda-device-only -fgpu-rdc \
	// RUN: %S/Inputs/hip_multiple_inputs/a.cu \			// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
	// RUN: %S/Inputs/hip_multiple_inputs/b.hip --gpu-bundle-output \			// RUN: %S/Inputs/hip_multiple_inputs/b.hip --gpu-bundle-output \
	// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,EMITBC %s			// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,EMITBC %s

				// With `-fno-hip-emit-relocatable`, the output should be the same as the aforementioned line
				// as `-fgpu-rdc` in HIP implies `-fno-hip-emit-relocatable`.

				// RUN: %clang -### --target=x86_64-linux-gnu \
				// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
				// RUN: -c -fno-hip-emit-relocatable -nogpuinc -nogpulib --cuda-device-only -fgpu-rdc \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip --gpu-bundle-output \
				// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,EMITBC %s

	// RUN: %clang -### --target=x86_64-linux-gnu \			// RUN: %clang -### --target=x86_64-linux-gnu \
	// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \			// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
	// RUN: -S -nogpuinc -nogpulib --cuda-device-only -fgpu-rdc \			// RUN: -S -nogpuinc -nogpulib --cuda-device-only -fgpu-rdc \
	// RUN: %S/Inputs/hip_multiple_inputs/a.cu \			// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
	// RUN: %S/Inputs/hip_multiple_inputs/b.hip --gpu-bundle-output \			// RUN: %S/Inputs/hip_multiple_inputs/b.hip --gpu-bundle-output \
	// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,EMITLL %s			// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,EMITLL %s

	// With `-emit-llvm`, the output should be the same as the aforementioned line			// With `-emit-llvm`, the output should be the same as the aforementioned line
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 535519

clang/include/clang/Driver/Options.td

clang/lib/Driver/Driver.cpp

clang/test/Driver/hip-dependent-options.hip

clang/test/Driver/hip-device-compile.hip

clang/test/Driver/hip-phases.hip

clang/test/Driver/hip-rdc-device-only.hip

This is an archive of the discontinued LLVM Phabricator instance.

[HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdcClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 535519

clang/include/clang/Driver/Options.td

clang/lib/Driver/Driver.cpp

clang/test/Driver/hip-dependent-options.hip

clang/test/Driver/hip-device-compile.hip

clang/test/Driver/hip-phases.hip

clang/test/Driver/hip-rdc-device-only.hip

[HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc
ClosedPublic