This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
1
TargetOptions.h
-
lib/Driver/
-
Driver/
1
ToolChain.cpp
-
ToolChains/
1/3
Clang.cpp
1/1
CommonArgs.cpp

Differential D156928

[Clang][AMDGPU] Fix handling of -mcode-object-version=none arg
AbandonedPublic

Authored by saiislam on Aug 2 2023, 11:30 AM.

Download Raw Diff

Details

Reviewers

jhuber6
yaxunl
JonChesterfield

Summary

-mcode-object-version=none is a special argument which allows
abi-agnostic code to be generated for device runtime libraries.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

saiislam created this revision.Aug 2 2023, 11:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 11:30 AM

Herald added subscribers: tpr, dstuttard, kzhuravl. · View Herald Transcript

saiislam requested review of this revision.Aug 2 2023, 11:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 11:30 AM

Herald added subscribers: cfe-commits, MaskRay, wdng. · View Herald Transcript

saiislam added a reviewer: JonChesterfield.Aug 2 2023, 11:31 AM

missing tests

clang/lib/Driver/ToolChains/Clang.cpp
1066	don't need to go through std::string? stick with Twine everywhere?
clang/lib/Driver/ToolChains/CommonArgs.cpp
2309	missing space after if, also return on separate line. Also why starts with, and not ==?

jhuber6 added inline comments.Aug 2 2023, 11:46 AM

clang/include/clang/Basic/TargetOptions.h
90	Typically we just put a `COV_LAST` to indicate that it's over the accepted enumerations.
clang/lib/Driver/ToolChain.cpp
1364	Is this flag not in the `m` group? It should be caught here right?
clang/lib/Driver/ToolChains/Clang.cpp
1058	Use clang-format.
1066	You shouldn't assign to a Twine, but in general I think we should probably put this ternary in-line with the other stuff to avoid the temporary. The handling here is a little confusing, we do Args.getLastArg(options::OPT_mcode_object_version_EQ); Which expects a number, if it's not present we get an empty string which default converts to zero which we then convert into "none"?

-mcode-object-version=none was intentionally designed to work with clang -cc1 only, since it does not work with clang driver if users link with device library. Device library can still use it by using it with -Xclang.

In D156928#4555121, @yaxunl wrote:

-mcode-object-version=none was intentionally designed to work with clang -cc1 only, since it does not work with clang driver if users link with device library. Device library can still use it by using it with -Xclang.

If the intended use is the deviceRTL then that should be sufficient.

Harbormaster completed remote builds in B249848: Diff 546560.Aug 2 2023, 3:21 PM

In D156928#4555121, @yaxunl wrote:

-mcode-object-version=none was intentionally designed to work with clang -cc1 only, since it does not work with clang driver if users link with device library. Device library can still use it by using it with -Xclang.

Thanks for the tip @yaxunl . I will abandon this revision and use Xclang for passing cov_none to devicertl.

saiislam abandoned this revision.Aug 4 2023, 11:45 AM

What does code objects version= none mean?

In D156928#4561811, @JonChesterfield wrote:

What does code objects version= none mean?

Handle any version

In D156928#4561849, @arsenm wrote:

In D156928#4561811, @JonChesterfield wrote:

What does code objects version= none mean?

Handle any version

So... That should be the default, right? Emit IR that the back end specialises. Or, ideally, the only behaviour as far as the front end is concerned.

In D156928#4561890, @JonChesterfield wrote:

In D156928#4561849, @arsenm wrote:

In D156928#4561811, @JonChesterfield wrote:

What does code objects version= none mean?

Handle any version

So... That should be the default, right? Emit IR that the back end specialises. Or, ideally, the only behaviour as far as the front end is concerned.

Code in the device library depends on a control variable about the code object version. Specifying the code object version in Clang allows internalizing that variable and optimizing code depending on it as early as possible. Not specifying it with Clang will require an LLVM pass in amdgpu backend to define that control variable after linking and it has to have an external linkage. This may lose optimization. Also, you need a way to not specify it in FE but specify it in BE. This just complicates things without much benefits.

In D156928#4562023, @yaxunl wrote:

In D156928#4561890, @JonChesterfield wrote:

In D156928#4561849, @arsenm wrote:

In D156928#4561811, @JonChesterfield wrote:

What does code objects version= none mean?

Handle any version

So... That should be the default, right? Emit IR that the back end specialises. Or, ideally, the only behaviour as far as the front end is concerned.

Code in the device library depends on a control variable about the code object version. Specifying the code object version in Clang allows internalizing that variable and optimizing code depending on it as early as possible. Not specifying it with Clang will require an LLVM pass in amdgpu backend to define that control variable after linking and it has to have an external linkage. This may lose optimization. Also, you need a way to not specify it in FE but specify it in BE. This just complicates things without much benefits.

On second thoughts, this may inspire us about eliminating not just the code object control variable but all device library control variables.

Basically in Clang we can emit a module flag about required control variables and do not link the device libs that implement these control variables.

Then we add an LLVM pass at the very beginning of the optimization pipeline which checks that module flag and defines those control variables with internal linkage. This way, we should be able to get rid of those control variable device libs without losing performance.

Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.

In D156928#4562239, @JonChesterfield wrote:

Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.

Some control variables are per-module. Clang cannot emit control variables that have different values for different modules. Intrinsics should work since it can take an argument as its value.

Tangent below. Main thrust is this =none feature should be called "default", not none, and be the default, and there should be no feature called ABI=none.

In D156928#4562412, @yaxunl wrote:

Some control variables are per-module. Clang cannot emit control variables that have different values for different modules. Intrinsics should work since it can take an argument as its value.

Sure it can. In the most limited case, we could exactly replace the variables imported from source IR in the device libs based on command line variables with the same globals with the same value and attributes. That would be NFC except it would no longer matter if the devicertl source was present.

There's just also other better things to be done as well, like not burning ABI decisions in the front end, or not using magic variables at all. But it's definitely not necessary to write constants in IR files that encode the same information as the command line flag. That's a whole indirect no-op that doesn't need to be there.

In D156928#4562239, @JonChesterfield wrote:

Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.

I think @saiislam is working on a patch that will handle that. We'll have clang emit some global that OpenMP uses.

In D156928#4565506, @jhuber6 wrote:

In D156928#4562239, @JonChesterfield wrote:

Or, the front end could define those objects directly, without importing IR files that define the objects with the content clang used to choose the object file. E.g. instead of the argument daz=off (spelled differently) finding a file called daz.off.ll that defines variable called daz with a value 0, that argument could define that variable. I think @jhuber6 has a partial patch trying to do that.

If we were more ambitious, we could use intrinsics that are folded reliably at O0 instead of magic variables that hopefully get constant folded. That would kill a bunch of O0 bugs.

In general though, splicing magic variables in the front end seems unlikely to be performance critical relative to splicing them in at the start of the backend.

I think @saiislam is working on a patch that will handle that. We'll have clang emit some global that OpenMP uses.

Thanks Joseph.
Yes, I have abandoned this patch and using -Xclang -mcode-object-version=none option in the patch to enable cov5 support for OpenMP.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

TargetOptions.h

2 lines

lib/

Driver/

ToolChain.cpp

3 lines

ToolChains/

Clang.cpp

8 lines

CommonArgs.cpp

23 lines

Diff 546560

clang/include/clang/Basic/TargetOptions.h

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	public:
/// \brief Enumeration value for AMDGPU code object version, which is the		/// \brief Enumeration value for AMDGPU code object version, which is the
/// code object version times 100.		/// code object version times 100.
enum CodeObjectVersionKind {		enum CodeObjectVersionKind {
COV_None,		COV_None,
COV_2 = 200,		COV_2 = 200,
COV_3 = 300,		COV_3 = 300,
COV_4 = 400,		COV_4 = 400,
COV_5 = 500,		COV_5 = 500,
		COV_Default = 400,
		COV_MAX = 500
		jhuber6Unsubmitted Not Done Reply Inline Actions Typically we just put a `COV_LAST` to indicate that it's over the accepted enumerations. jhuber6: Typically we just put a `COV_LAST` to indicate that it's over the accepted enumerations.
};		};
/// \brief Code object version for AMDGPU.		/// \brief Code object version for AMDGPU.
CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;		CodeObjectVersionKind CodeObjectVersion = CodeObjectVersionKind::COV_None;

/// \brief Enumeration values for AMDGPU printf lowering scheme		/// \brief Enumeration values for AMDGPU printf lowering scheme
enum class AMDGPUPrintfKind {		enum class AMDGPUPrintfKind {
/// printf lowering scheme involving hostcalls, currently used by HIP		/// printf lowering scheme involving hostcalls, currently used by HIP
/// programs by default		/// programs by default
Show All 40 Lines

clang/lib/Driver/ToolChain.cpp

Show First 20 Lines • Show All 1,348 Lines • ▼ Show 20 Lines	llvm::opt::DerivedArgList *ToolChain::TranslateOpenMPTargetArgs(
const llvm::opt::DerivedArgList &Args, bool SameTripleAsHost,		const llvm::opt::DerivedArgList &Args, bool SameTripleAsHost,
SmallVectorImpl<llvm::opt::Arg *> &AllocatedArgs) const {		SmallVectorImpl<llvm::opt::Arg *> &AllocatedArgs) const {
DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());		DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
const OptTable &Opts = getDriver().getOpts();		const OptTable &Opts = getDriver().getOpts();
bool Modified = false;		bool Modified = false;

// Handle -Xopenmp-target flags		// Handle -Xopenmp-target flags
for (auto *A : Args) {		for (auto *A : Args) {
		if (A->getOption().matches(options::OPT_mcode_object_version_EQ))
		DAL->append(A);

// Exclude flags which may only apply to the host toolchain.		// Exclude flags which may only apply to the host toolchain.
// Do not exclude flags when the host triple (AuxTriple)		// Do not exclude flags when the host triple (AuxTriple)
// matches the current toolchain triple. If it is not present		// matches the current toolchain triple. If it is not present
// at all, target and host share a toolchain.		// at all, target and host share a toolchain.
if (A->getOption().matches(options::OPT_m_Group)) {		if (A->getOption().matches(options::OPT_m_Group)) {
		jhuber6Unsubmitted Not Done Reply Inline Actions Is this flag not in the `m` group? It should be caught here right? jhuber6: Is this flag not in the `m` group? It should be caught here right?
if (SameTripleAsHost)		if (SameTripleAsHost)
DAL->append(A);		DAL->append(A);
else		else
Modified = true;		Modified = true;
continue;		continue;
}		}

unsigned Index;		unsigned Index;
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,049 Lines • ▼ Show 20 Lines	static void handleAMDGPUCodeObjectVersionOptions(const Driver &D,
bool IsCC1As = false) {		bool IsCC1As = false) {
// If no version was requested by the user, use the default value from the		// If no version was requested by the user, use the default value from the
// back end. This is consistent with the value returned from		// back end. This is consistent with the value returned from
// getAMDGPUCodeObjectVersion. This lets clang emit IR for amdgpu without		// getAMDGPUCodeObjectVersion. This lets clang emit IR for amdgpu without
// requiring the corresponding llvm to have the AMDGPU target enabled,		// requiring the corresponding llvm to have the AMDGPU target enabled,
// provided the user (e.g. front end tests) can use the default.		// provided the user (e.g. front end tests) can use the default.
if (haveAMDGPUCodeObjectVersionArgument(D, Args)) {		if (haveAMDGPUCodeObjectVersionArgument(D, Args)) {
unsigned CodeObjVer = getAMDGPUCodeObjectVersion(D, Args);		unsigned CodeObjVer = getAMDGPUCodeObjectVersion(D, Args);
		if(CodeObjVer != 0) {
		jhuber6Unsubmitted Not Done Reply Inline Actions Use clang-format. jhuber6: Use clang-format.
CmdArgs.insert(CmdArgs.begin() + 1,		CmdArgs.insert(CmdArgs.begin() + 1,
Args.MakeArgString(Twine("--amdhsa-code-object-version=") +		Args.MakeArgString(Twine("--amdhsa-code-object-version=") +
Twine(CodeObjVer)));		Twine(CodeObjVer)));
CmdArgs.insert(CmdArgs.begin() + 1, "-mllvm");		CmdArgs.insert(CmdArgs.begin() + 1, "-mllvm");
		}
// -cc1as does not accept -mcode-object-version option.		// -cc1as does not accept -mcode-object-version option.
if (!IsCC1As)		if (!IsCC1As) {
		std::string CodeObjVerStr = (CodeObjVer ? Twine(CodeObjVer) : "none").str();
		arsenmUnsubmitted Done Reply Inline Actions don't need to go through std::string? stick with Twine everywhere? arsenm: don't need to go through std::string? stick with Twine everywhere?
		jhuber6Unsubmitted Not Done Reply Inline Actions You shouldn't assign to a Twine, but in general I think we should probably put this ternary in-line with the other stuff to avoid the temporary. The handling here is a little confusing, we do Args.getLastArg(options::OPT_mcode_object_version_EQ); Which expects a number, if it's not present we get an empty string which default converts to zero which we then convert into "none"? jhuber6: You shouldn't assign to a Twine, but in general I think we should probably put this ternary in…
CmdArgs.insert(CmdArgs.begin() + 1,		CmdArgs.insert(CmdArgs.begin() + 1,
Args.MakeArgString(Twine("-mcode-object-version=") +		Args.MakeArgString(Twine("-mcode-object-version=") +
Twine(CodeObjVer)));		CodeObjVerStr));
		}
}		}
}		}

void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA,		void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA,
const Driver &D, const ArgList &Args,		const Driver &D, const ArgList &Args,
ArgStringList &CmdArgs,		ArgStringList &CmdArgs,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs) const {		const InputInfoList &Inputs) const {
▲ Show 20 Lines • Show All 7,642 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/CommonArgs.cpp

	Show All 19 Lines
	#include "Arch/VE.h"			#include "Arch/VE.h"
	#include "Arch/X86.h"			#include "Arch/X86.h"
	#include "HIPAMD.h"			#include "HIPAMD.h"
	#include "Hexagon.h"			#include "Hexagon.h"
	#include "MSP430.h"			#include "MSP430.h"
	#include "clang/Basic/CharInfo.h"			#include "clang/Basic/CharInfo.h"
	#include "clang/Basic/LangOptions.h"			#include "clang/Basic/LangOptions.h"
	#include "clang/Basic/ObjCRuntime.h"			#include "clang/Basic/ObjCRuntime.h"
				#include "clang/Basic/TargetOptions.h"
	#include "clang/Basic/Version.h"			#include "clang/Basic/Version.h"
	#include "clang/Config/config.h"			#include "clang/Config/config.h"
	#include "clang/Driver/Action.h"			#include "clang/Driver/Action.h"
	#include "clang/Driver/Compilation.h"			#include "clang/Driver/Compilation.h"
	#include "clang/Driver/Driver.h"			#include "clang/Driver/Driver.h"
	#include "clang/Driver/DriverDiagnostic.h"			#include "clang/Driver/DriverDiagnostic.h"
	#include "clang/Driver/InputInfo.h"			#include "clang/Driver/InputInfo.h"
	#include "clang/Driver/Job.h"			#include "clang/Driver/Job.h"
	▲ Show 20 Lines • Show All 2,258 Lines • ▼ Show 20 Lines

	static llvm::opt::Arg *			static llvm::opt::Arg *
	getAMDGPUCodeObjectArgument(const Driver &D, const llvm::opt::ArgList &Args) {			getAMDGPUCodeObjectArgument(const Driver &D, const llvm::opt::ArgList &Args) {
	return Args.getLastArg(options::OPT_mcode_object_version_EQ);			return Args.getLastArg(options::OPT_mcode_object_version_EQ);
	}			}

	void tools::checkAMDGPUCodeObjectVersion(const Driver &D,			void tools::checkAMDGPUCodeObjectVersion(const Driver &D,
	const llvm::opt::ArgList &Args) {			const llvm::opt::ArgList &Args) {
	const unsigned MinCodeObjVer = 2;
	const unsigned MaxCodeObjVer = 5;

	if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args)) {			if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args)) {
	if (CodeObjArg->getOption().getID() ==			if (CodeObjArg->getOption().getID() ==
	options::OPT_mcode_object_version_EQ) {			options::OPT_mcode_object_version_EQ) {
	unsigned CodeObjVer = MaxCodeObjVer;			unsigned CodeObjVer = TargetOptions::COV_Default / 100;
	auto Remnant =			auto CovStr = StringRef(CodeObjArg->getValue());
	StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);			if(CovStr.starts_with("none")) return;
				arsenmUnsubmitted Done Reply Inline Actions missing space after if, also return on separate line. Also why starts with, and not ==? arsenm: missing space after if, also return on separate line. Also why starts with, and not ==?
	if (Remnant \|\| CodeObjVer < MinCodeObjVer \|\| CodeObjVer > MaxCodeObjVer)
				CovStr.getAsInteger(0, CodeObjVer);
				if (CodeObjVer < TargetOptions::COV_None \|\| CodeObjVer > TargetOptions::COV_MAX)
	D.Diag(diag::err_drv_invalid_int_value)			D.Diag(diag::err_drv_invalid_int_value)
	<< CodeObjArg->getAsString(Args) << CodeObjArg->getValue();			<< CodeObjArg->getAsString(Args) << CodeObjArg->getValue();
	}			}
	}			}
	}			}

	unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,			unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
	const llvm::opt::ArgList &Args) {			const llvm::opt::ArgList &Args) {
	unsigned CodeObjVer = 4; // default
	if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))			unsigned CodeObjVer = TargetOptions::COV_Default / 100; // default
	StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);			if (haveAMDGPUCodeObjectVersionArgument(D, Args)) {
				auto CodeObjArg = StringRef(getAMDGPUCodeObjectArgument(D, Args)->getValue());
				if(CodeObjArg.starts_with("none")) return TargetOptions::COV_None;
				CodeObjArg.getAsInteger(0, CodeObjVer);
				}
	return CodeObjVer;			return CodeObjVer;
	}			}

	bool tools::haveAMDGPUCodeObjectVersionArgument(			bool tools::haveAMDGPUCodeObjectVersionArgument(
	const Driver &D, const llvm::opt::ArgList &Args) {			const Driver &D, const llvm::opt::ArgList &Args) {
	return getAMDGPUCodeObjectArgument(D, Args) != nullptr;			return getAMDGPUCodeObjectArgument(D, Args) != nullptr;
	}			}

	▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines