This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Sema/
-
clang/
-
Sema/
2/2
Sema.h
-
lib/
-
Parse/
-
ParseDecl.cpp
-
Sema/
2/2
SemaCUDA.cpp
-
SemaDeclAttr.cpp
-
SemaOverload.cpp
-
SemaType.cpp
-
test/
-
CodeGenCUDA/
2/2
global-initializers.cu
-
SemaCUDA/
-
amdgpu-windows-vectorcall.cu
-
function-overload.cu
-
global-initializers-host.cu
-
global-initializers.cu

Differential D158247

[CUDA][HIP] Fix overloading resolution in global variable initializer
ClosedPublic

Authored by yaxunl on Aug 17 2023, 9:22 PM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
rsmith
aaron.ballman

Commits

rGde0df639724b: [CUDA][HIP] Fix overloading resolution in global variable initializer

Summary

Currently, clang does not resolve certain overloaded functions correctly in the initializer
of global variables, e.g.

template<typename T1, typename U>
T1 mypow(T1, U);

__attribute__((device)) double mypow(double, int);

double t_extent = mypow(1.0, 2);

In the above example, mypow is supposed to resolve to the host version
but clang resolves it to the device version instead, and emits an error
(https://godbolt.org/z/17xxzaa67).

However, if the variable is assigned in a host function, there is no error.
The discrepancy in overloading resolution inside and outside of
a function is due to clang not accounting for the host/device target
when resolving functions called in the initializer of a global variable.

This patch introduces a global host/device target context for CUDA/HIP
for functions called outside of functions. For global variable initialization,
it is determined by the host/device attribute of the variable. For other
situations, a default value of host_device is sufficient.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yaxunl created this revision.Aug 17 2023, 9:22 PM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptAug 17 2023, 9:22 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: mattd, carlosgalvezp, kerbowa, jvesely. · View Herald Transcript

yaxunl requested review of this revision.Aug 17 2023, 9:22 PM

Harbormaster completed remote builds in B253386: Diff 551365.Aug 17 2023, 10:10 PM

Same reproducer but for CUDA: https://godbolt.org/z/WhjTMffnx

clang/include/clang/Sema/Sema.h
4766	It appears that `Declarator D` here is only used as an attribute carrier used to identify CUDA calling target. Should we pass `CudaTarget ContextTarget` instead and let the caller figure out how to find it? I'm just thinking that we're hardcoding just one specific way to find the target, while there may potentially be more. The current way is OK, as we have just one use case at the moment.
clang/lib/Sema/SemaCUDA.cpp
139	Style nit: no braces around single-statement body.
clang/test/CodeGenCUDA/global-initializers.cu
12–13	We don't really need templates to reproduce the issue. We just need a host function with lower overloading priority. A function requiring type conversion or with an additional default argument should do. E.g. `float pow(float, int);` or `double X = pow(double, int, bool lower_priority_host_overload=1);` Removing template should unclutter the tests a bit.

yaxunl marked 3 inline comments as done.Aug 18 2023, 9:35 PM

yaxunl added inline comments.

clang/include/clang/Sema/Sema.h
4766	will do
clang/lib/Sema/SemaCUDA.cpp
139	will fix
clang/test/CodeGenCUDA/global-initializers.cu
12–13	will do

revised by comments

Harbormaster completed remote builds in B253623: Diff 551705.Aug 18 2023, 10:35 PM

ping

tra accepted this revision.Aug 28 2023, 10:22 AM

This revision is now accepted and ready to land.Aug 28 2023, 10:22 AM

This revision was landed with ongoing or failed builds.Aug 29 2023, 7:17 AM

Closed by commit rGde0df639724b: [CUDA][HIP] Fix overloading resolution in global variable initializer (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rGde0df639724b: [CUDA][HIP] Fix overloading resolution in global variable initializer.

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 7:17 AM

yaxunl added a reverting change: rG27313b68ef0e: Revert "[CUDA][HIP] Fix overloading resolution in global variable initializer".Aug 31 2023, 6:26 AM

The patch was reverted since it caused regressions on Windows for HIP. A reduced test case is:

typedef void (__stdcall* funcTy)();
void invoke(funcTy f);

static void __stdcall callee() noexcept {
}

void foo() {
   invoke(callee);
}

It is due to clang missed handling host/device attributes for calling convention at a few places.

Reopen to fix it.

This revision is now accepted and ready to land.Sep 7 2023, 4:48 AM

Phabricator no longer allows me to update the patch. Created PR in github https://github.com/llvm/llvm-project/pull/65606

GitHub <noreply@github.com> mentioned this in rG9b7763821aed: Reland "[CUDA][HIP] Fix overloading resolution in global var init" (#65606).Sep 7 2023, 8:19 PM

Revision Contents

Path

Size

clang/

include/

clang/

Sema/

Sema.h

46 lines

lib/

Parse/

ParseDecl.cpp

1 line

Sema/

24 lines

6 lines

45 lines

3 lines

test/

CodeGenCUDA/

global-initializers.cu

51 lines

SemaCUDA/

amdgpu-windows-vectorcall.cu

1 line

function-overload.cu

6 lines

global-initializers-host.cu

global-initializers.cu

72 lines

Diff 554321

clang/include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,006 Lines • ▼ Show 20 Lines	public:

/// Undo a previous pushUndelayed().		/// Undo a previous pushUndelayed().
void popUndelayed(DelayedDiagnosticsState state) {		void popUndelayed(DelayedDiagnosticsState state) {
assert(CurPool == nullptr);		assert(CurPool == nullptr);
CurPool = state.SavedPool;		CurPool = state.SavedPool;
}		}
} DelayedDiagnostics;		} DelayedDiagnostics;

		enum CUDAFunctionTarget {
		CFT_Device,
		CFT_Global,
		CFT_Host,
		CFT_HostDevice,
		CFT_InvalidTarget
		};

/// A RAII object to temporarily push a declaration context.		/// A RAII object to temporarily push a declaration context.
class ContextRAII {		class ContextRAII {
private:		private:
Sema &S;		Sema &S;
DeclContext *SavedContext;		DeclContext *SavedContext;
ProcessingContextState SavedContextState;		ProcessingContextState SavedContextState;
QualType SavedCXXThisTypeOverride;		QualType SavedCXXThisTypeOverride;
unsigned SavedFunctionScopesStart;		unsigned SavedFunctionScopesStart;
▲ Show 20 Lines • Show All 3,723 Lines • ▼ Show 20 Lines	public:

/// Determine if type T is a valid subject for a nonnull and similar		/// Determine if type T is a valid subject for a nonnull and similar
/// attributes. By default, we look through references (the behavior used by		/// attributes. By default, we look through references (the behavior used by
/// nonnull), but if the second parameter is true, then we treat a reference		/// nonnull), but if the second parameter is true, then we treat a reference
/// type as valid.		/// type as valid.
bool isValidPointerAttrType(QualType T, bool RefOkay = false);		bool isValidPointerAttrType(QualType T, bool RefOkay = false);

bool CheckRegparmAttr(const ParsedAttr &attr, unsigned &value);		bool CheckRegparmAttr(const ParsedAttr &attr, unsigned &value);

		/// Check validaty of calling convention attribute \p attr. If \p FD
		/// is not null pointer, use \p FD to determine the CUDA/HIP host/device
		/// target. Otherwise, it is specified by \p CFT.
bool CheckCallingConvAttr(const ParsedAttr &attr, CallingConv &CC,		bool CheckCallingConvAttr(const ParsedAttr &attr, CallingConv &CC,
		traUnsubmitted Done Reply Inline Actions It appears that `Declarator D` here is only used as an attribute carrier used to identify CUDA calling target. Should we pass `CudaTarget ContextTarget` instead and let the caller figure out how to find it? I'm just thinking that we're hardcoding just one specific way to find the target, while there may potentially be more. The current way is OK, as we have just one use case at the moment. tra: It appears that `Declarator D` here is only used as an attribute carrier used to identify CUDA…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
const FunctionDecl *FD = nullptr);		const FunctionDecl *FD = nullptr,
		CUDAFunctionTarget CFT = CFT_InvalidTarget);
bool CheckAttrTarget(const ParsedAttr &CurrAttr);		bool CheckAttrTarget(const ParsedAttr &CurrAttr);
bool CheckAttrNoArgs(const ParsedAttr &CurrAttr);		bool CheckAttrNoArgs(const ParsedAttr &CurrAttr);
bool checkStringLiteralArgumentAttr(const AttributeCommonInfo &CI,		bool checkStringLiteralArgumentAttr(const AttributeCommonInfo &CI,
const Expr *E, StringRef &Str,		const Expr *E, StringRef &Str,
SourceLocation *ArgLocation = nullptr);		SourceLocation *ArgLocation = nullptr);
bool checkStringLiteralArgumentAttr(const ParsedAttr &Attr, unsigned ArgNum,		bool checkStringLiteralArgumentAttr(const ParsedAttr &Attr, unsigned ArgNum,
StringRef &Str,		StringRef &Str,
SourceLocation *ArgLocation = nullptr);		SourceLocation *ArgLocation = nullptr);
▲ Show 20 Lines • Show All 8,490 Lines • ▼ Show 20 Lines	SemaDiagnosticBuilder targetDiag(SourceLocation Loc,
const FunctionDecl *FD = nullptr) {		const FunctionDecl *FD = nullptr) {
return targetDiag(Loc, PD.getDiagID(), FD) << PD;		return targetDiag(Loc, PD.getDiagID(), FD) << PD;
}		}

/// Check if the type is allowed to be used for the current target.		/// Check if the type is allowed to be used for the current target.
void checkTypeSupport(QualType Ty, SourceLocation Loc,		void checkTypeSupport(QualType Ty, SourceLocation Loc,
ValueDecl *D = nullptr);		ValueDecl *D = nullptr);

enum CUDAFunctionTarget {
CFT_Device,
CFT_Global,
CFT_Host,
CFT_HostDevice,
CFT_InvalidTarget
};

/// Determines whether the given function is a CUDA device/host/kernel/etc.		/// Determines whether the given function is a CUDA device/host/kernel/etc.
/// function.		/// function.
///		///
/// Use this rather than examining the function's attributes yourself -- you		/// Use this rather than examining the function's attributes yourself -- you
/// will get it wrong. Returns CFT_Host if D is null.		/// will get it wrong. Returns CFT_Host if D is null.
CUDAFunctionTarget IdentifyCUDATarget(const FunctionDecl *D,		CUDAFunctionTarget IdentifyCUDATarget(const FunctionDecl *D,
bool IgnoreImplicitHDAttr = false);		bool IgnoreImplicitHDAttr = false);
CUDAFunctionTarget IdentifyCUDATarget(const ParsedAttributesView &Attrs);		CUDAFunctionTarget IdentifyCUDATarget(const ParsedAttributesView &Attrs);

enum CUDAVariableTarget {		enum CUDAVariableTarget {
CVT_Device, /// Emitted on device side with a shadow variable on host side		CVT_Device, /// Emitted on device side with a shadow variable on host side
CVT_Host, /// Emitted on host side only		CVT_Host, /// Emitted on host side only
CVT_Both, /// Emitted on both sides with different addresses		CVT_Both, /// Emitted on both sides with different addresses
CVT_Unified, /// Emitted as a unified address, e.g. managed variables		CVT_Unified, /// Emitted as a unified address, e.g. managed variables
};		};
/// Determines whether the given variable is emitted on host or device side.		/// Determines whether the given variable is emitted on host or device side.
CUDAVariableTarget IdentifyCUDATarget(const VarDecl *D);		CUDAVariableTarget IdentifyCUDATarget(const VarDecl *D);

		/// Defines kinds of CUDA global host/device context where a function may be
		/// called.
		enum CUDATargetContextKind {
		CTCK_Unknown, /// Unknown context
		CTCK_InitGlobalVar, /// Function called during global variable
		/// initialization
		};

		/// Define the current global CUDA host/device context where a function may be
		/// called. Only used when a function is called outside of any functions.
		struct CUDATargetContext {
		CUDAFunctionTarget Target = CFT_HostDevice;
		CUDATargetContextKind Kind = CTCK_Unknown;
		Decl *D = nullptr;
		} CurCUDATargetCtx;

		struct CUDATargetContextRAII {
		Sema &S;
		CUDATargetContext SavedCtx;
		CUDATargetContextRAII(Sema &S_, CUDATargetContextKind K, Decl *D);
		~CUDATargetContextRAII() { S.CurCUDATargetCtx = SavedCtx; }
		};

/// Gets the CUDA target for the current context.		/// Gets the CUDA target for the current context.
CUDAFunctionTarget CurrentCUDATarget() {		CUDAFunctionTarget CurrentCUDATarget() {
return IdentifyCUDATarget(dyn_cast<FunctionDecl>(CurContext));		return IdentifyCUDATarget(dyn_cast<FunctionDecl>(CurContext));
}		}

static bool isCUDAImplicitHostDeviceFunction(const FunctionDecl *D);		static bool isCUDAImplicitHostDeviceFunction(const FunctionDecl *D);

// CUDA function call preference. Must be ordered numerically from		// CUDA function call preference. Must be ordered numerically from
▲ Show 20 Lines • Show All 890 Lines • Show Last 20 Lines

clang/lib/Parse/ParseDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,577 Lines • ▼ Show 20 Lines	if (Tok.is(tok::semi)) {
ThisDecl =		ThisDecl =
Actions.ActOnTemplateDeclarator(getCurScope(), FakedParamLists, D);		Actions.ActOnTemplateDeclarator(getCurScope(), FakedParamLists, D);
}		}
}		}
break;		break;
}		}
}		}

		Sema::CUDATargetContextRAII X(Actions, Sema::CTCK_InitGlobalVar, ThisDecl);
switch (TheInitKind) {		switch (TheInitKind) {
// Parse declarator '=' initializer.		// Parse declarator '=' initializer.
case InitKind::Equal: {		case InitKind::Equal: {
SourceLocation EqualLoc = ConsumeToken();		SourceLocation EqualLoc = ConsumeToken();

if (Tok.is(tok::kw_delete)) {		if (Tok.is(tok::kw_delete)) {
if (D.isFunctionDeclarator())		if (D.isFunctionDeclarator())
Diag(ConsumeToken(), diag::err_default_delete_in_multiple_declaration)		Diag(ConsumeToken(), diag::err_default_delete_in_multiple_declaration)
▲ Show 20 Lines • Show All 5,454 Lines • Show Last 20 Lines

clang/lib/Sema/SemaCUDA.cpp

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	Sema::IdentifyCUDATarget(const ParsedAttributesView &Attrs) {

if (HasDeviceAttr)		if (HasDeviceAttr)
return CFT_Device;		return CFT_Device;

return CFT_Host;		return CFT_Host;
}		}

template <typename A>		template <typename A>
static bool hasAttr(const FunctionDecl *D, bool IgnoreImplicitAttr) {		static bool hasAttr(const Decl *D, bool IgnoreImplicitAttr) {
return D->hasAttrs() && llvm::any_of(D->getAttrs(), [&](Attr *Attribute) {		return D->hasAttrs() && llvm::any_of(D->getAttrs(), [&](Attr *Attribute) {
return isa<A>(Attribute) &&		return isa<A>(Attribute) &&
!(IgnoreImplicitAttr && Attribute->isImplicit());		!(IgnoreImplicitAttr && Attribute->isImplicit());
});		});
}		}

		Sema::CUDATargetContextRAII::CUDATargetContextRAII(Sema &S_,
		CUDATargetContextKind K,
		Decl *D)
		: S(S_) {
		SavedCtx = S.CurCUDATargetCtx;
		assert(K == CTCK_InitGlobalVar);
		auto *VD = dyn_cast_or_null<VarDecl>(D);
		if (VD && VD->hasGlobalStorage() && !VD->isStaticLocal()) {
		auto Target = CFT_Host;
		if ((hasAttr<CUDADeviceAttr>(VD, /IgnoreImplicit=/true) &&
		!hasAttr<CUDAHostAttr>(VD, /IgnoreImplicit=/true)) \|\|
		hasAttr<CUDASharedAttr>(VD, /IgnoreImplicit=/true) \|\|
		hasAttr<CUDAConstantAttr>(VD, /IgnoreImplicit=/true))
		Target = CFT_Device;
		S.CurCUDATargetCtx = {Target, K, VD};
		}
		}

/// IdentifyCUDATarget - Determine the CUDA compilation target for this function		/// IdentifyCUDATarget - Determine the CUDA compilation target for this function
Sema::CUDAFunctionTarget Sema::IdentifyCUDATarget(const FunctionDecl *D,		Sema::CUDAFunctionTarget Sema::IdentifyCUDATarget(const FunctionDecl *D,
bool IgnoreImplicitHDAttr) {		bool IgnoreImplicitHDAttr) {
// Code that lives outside a function is run on the host.		// Code that lives outside a function gets the target from CurCUDATargetCtx.
if (D == nullptr)		if (D == nullptr)
return CFT_Host;		return CurCUDATargetCtx.Target;

		traUnsubmitted Done Reply Inline Actions Style nit: no braces around single-statement body. tra: Style nit: no braces around single-statement body.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will fix yaxunl: will fix
if (D->hasAttr<CUDAInvalidTargetAttr>())		if (D->hasAttr<CUDAInvalidTargetAttr>())
return CFT_InvalidTarget;		return CFT_InvalidTarget;

if (D->hasAttr<CUDAGlobalAttr>())		if (D->hasAttr<CUDAGlobalAttr>())
return CFT_Global;		return CFT_Global;

if (hasAttr<CUDADeviceAttr>(D, IgnoreImplicitHDAttr)) {		if (hasAttr<CUDADeviceAttr>(D, IgnoreImplicitHDAttr)) {
if (hasAttr<CUDAHostAttr>(D, IgnoreImplicitHDAttr))		if (hasAttr<CUDAHostAttr>(D, IgnoreImplicitHDAttr))
▲ Show 20 Lines • Show All 833 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDeclAttr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,311 Lines • ▼ Show 20 Lines	static void handleNoRandomizeLayoutAttr(Sema &S, Decl *D,
const ParsedAttr &AL) {		const ParsedAttr &AL) {
if (checkAttrMutualExclusion<RandomizeLayoutAttr>(S, D, AL))		if (checkAttrMutualExclusion<RandomizeLayoutAttr>(S, D, AL))
return;		return;
if (!D->hasAttr<NoRandomizeLayoutAttr>())		if (!D->hasAttr<NoRandomizeLayoutAttr>())
D->addAttr(::new (S.Context) NoRandomizeLayoutAttr(S.Context, AL));		D->addAttr(::new (S.Context) NoRandomizeLayoutAttr(S.Context, AL));
}		}

bool Sema::CheckCallingConvAttr(const ParsedAttr &Attrs, CallingConv &CC,		bool Sema::CheckCallingConvAttr(const ParsedAttr &Attrs, CallingConv &CC,
const FunctionDecl *FD) {		const FunctionDecl *FD,
		CUDAFunctionTarget CFT) {
if (Attrs.isInvalid())		if (Attrs.isInvalid())
return true;		return true;

if (Attrs.hasProcessingCache()) {		if (Attrs.hasProcessingCache()) {
CC = (CallingConv) Attrs.getProcessingCache();		CC = (CallingConv) Attrs.getProcessingCache();
return false;		return false;
}		}

▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	bool Sema::CheckCallingConvAttr(const ParsedAttr &Attrs, CallingConv &CC,
TargetInfo::CallingConvCheckResult A = TargetInfo::CCCR_OK;		TargetInfo::CallingConvCheckResult A = TargetInfo::CCCR_OK;
const TargetInfo &TI = Context.getTargetInfo();		const TargetInfo &TI = Context.getTargetInfo();
// CUDA functions may have host and/or device attributes which indicate		// CUDA functions may have host and/or device attributes which indicate
// their targeted execution environment, therefore the calling convention		// their targeted execution environment, therefore the calling convention
// of functions in CUDA should be checked against the target deduced based		// of functions in CUDA should be checked against the target deduced based
// on their host/device attributes.		// on their host/device attributes.
if (LangOpts.CUDA) {		if (LangOpts.CUDA) {
auto *Aux = Context.getAuxTargetInfo();		auto *Aux = Context.getAuxTargetInfo();
auto CudaTarget = IdentifyCUDATarget(FD);		assert(FD \|\| CFT != CFT_InvalidTarget);
		auto CudaTarget = FD ? IdentifyCUDATarget(FD) : CFT;
bool CheckHost = false, CheckDevice = false;		bool CheckHost = false, CheckDevice = false;
switch (CudaTarget) {		switch (CudaTarget) {
case CFT_HostDevice:		case CFT_HostDevice:
CheckHost = true;		CheckHost = true;
CheckDevice = true;		CheckDevice = true;
break;		break;
case CFT_Host:		case CFT_Host:
CheckHost = true;		CheckHost = true;
▲ Show 20 Lines • Show All 4,541 Lines • Show Last 20 Lines

clang/lib/Sema/SemaOverload.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,693 Lines • ▼ Show 20 Lines	if (!AggregateCandidateDeduction && Args.size() < MinRequiredArgs &&
!PartialOverloading) {		!PartialOverloading) {
// Not enough arguments.		// Not enough arguments.
Candidate.Viable = false;		Candidate.Viable = false;
Candidate.FailureKind = ovl_fail_too_few_arguments;		Candidate.FailureKind = ovl_fail_too_few_arguments;
return;		return;
}		}

// (CUDA B.1): Check for invalid calls between targets.		// (CUDA B.1): Check for invalid calls between targets.
if (getLangOpts().CUDA)		if (getLangOpts().CUDA) {
if (const FunctionDecl Caller = getCurFunctionDecl(/AllowLambda=*/true))		const FunctionDecl Caller = getCurFunctionDecl(/AllowLambda=*/true);
// Skip the check for callers that are implicit members, because in this		// Skip the check for callers that are implicit members, because in this
// case we may not yet know what the member's target is; the target is		// case we may not yet know what the member's target is; the target is
// inferred for the member automatically, based on the bases and fields of		// inferred for the member automatically, based on the bases and fields of
// the class.		// the class.
if (!Caller->isImplicit() && !IsAllowedCUDACall(Caller, Function)) {		if (!(Caller && Caller->isImplicit()) &&
		!IsAllowedCUDACall(Caller, Function)) {
Candidate.Viable = false;		Candidate.Viable = false;
Candidate.FailureKind = ovl_fail_bad_target;		Candidate.FailureKind = ovl_fail_bad_target;
return;		return;
}		}
		}

if (Function->getTrailingRequiresClause()) {		if (Function->getTrailingRequiresClause()) {
ConstraintSatisfaction Satisfaction;		ConstraintSatisfaction Satisfaction;
if (CheckFunctionConstraints(Function, Satisfaction, /Loc/ {},		if (CheckFunctionConstraints(Function, Satisfaction, /Loc/ {},
/ForOverloadResolution/ true) \|\|		/ForOverloadResolution/ true) \|\|
!Satisfaction.IsSatisfied) {		!Satisfaction.IsSatisfied) {
Candidate.Viable = false;		Candidate.Viable = false;
Candidate.FailureKind = ovl_fail_constraints_not_satisfied;		Candidate.FailureKind = ovl_fail_constraints_not_satisfied;
▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	if (Candidate.Conversions[FirstConvIdx].isBad()) {
Candidate.Viable = false;		Candidate.Viable = false;
Candidate.FailureKind = ovl_fail_bad_conversion;		Candidate.FailureKind = ovl_fail_bad_conversion;
return;		return;
}		}
}		}

// (CUDA B.1): Check for invalid calls between targets.		// (CUDA B.1): Check for invalid calls between targets.
if (getLangOpts().CUDA)		if (getLangOpts().CUDA)
if (const FunctionDecl Caller = getCurFunctionDecl(/AllowLambda=*/true))		if (!IsAllowedCUDACall(getCurFunctionDecl(/AllowLambda=/true), Method)) {
if (!IsAllowedCUDACall(Caller, Method)) {
Candidate.Viable = false;		Candidate.Viable = false;
Candidate.FailureKind = ovl_fail_bad_target;		Candidate.FailureKind = ovl_fail_bad_target;
return;		return;
}		}

if (Method->getTrailingRequiresClause()) {		if (Method->getTrailingRequiresClause()) {
ConstraintSatisfaction Satisfaction;		ConstraintSatisfaction Satisfaction;
if (CheckFunctionConstraints(Method, Satisfaction, /Loc/ {},		if (CheckFunctionConstraints(Method, Satisfaction, /Loc/ {},
/ForOverloadResolution/ true) \|\|		/ForOverloadResolution/ true) \|\|
!Satisfaction.IsSatisfied) {		!Satisfaction.IsSatisfied) {
Candidate.Viable = false;		Candidate.Viable = false;
Candidate.FailureKind = ovl_fail_constraints_not_satisfied;		Candidate.FailureKind = ovl_fail_constraints_not_satisfied;
▲ Show 20 Lines • Show All 5,254 Lines • ▼ Show 20 Lines	if (CXXMethodDecl *Method = dyn_cast<CXXMethodDecl>(Fn)) {
// when converting to member pointer.		// when converting to member pointer.
if (Method->isStatic() == TargetTypeIsNonStaticMemberFunction)		if (Method->isStatic() == TargetTypeIsNonStaticMemberFunction)
return false;		return false;
}		}
else if (TargetTypeIsNonStaticMemberFunction)		else if (TargetTypeIsNonStaticMemberFunction)
return false;		return false;

if (FunctionDecl *FunDecl = dyn_cast<FunctionDecl>(Fn)) {		if (FunctionDecl *FunDecl = dyn_cast<FunctionDecl>(Fn)) {
if (S.getLangOpts().CUDA)		if (S.getLangOpts().CUDA) {
if (FunctionDecl Caller = S.getCurFunctionDecl(/AllowLambda=*/true))		FunctionDecl Caller = S.getCurFunctionDecl(/AllowLambda=*/true);
if (!Caller->isImplicit() && !S.IsAllowedCUDACall(Caller, FunDecl))		if (!(Caller && Caller->isImplicit()) &&
		!S.IsAllowedCUDACall(Caller, FunDecl))
return false;		return false;
		}
if (FunDecl->isMultiVersion()) {		if (FunDecl->isMultiVersion()) {
const auto *TA = FunDecl->getAttr<TargetAttr>();		const auto *TA = FunDecl->getAttr<TargetAttr>();
if (TA && !TA->isDefaultVersion())		if (TA && !TA->isDefaultVersion())
return false;		return false;
const auto *TVA = FunDecl->getAttr<TargetVersionAttr>();		const auto *TVA = FunDecl->getAttr<TargetVersionAttr>();
if (TVA && !TVA->isDefaultVersion())		if (TVA && !TVA->isDefaultVersion())
return false;		return false;
}		}
▲ Show 20 Lines • Show All 3,248 Lines • Show Last 20 Lines

clang/lib/Sema/SemaType.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,049 Lines • ▼ Show 20 Lines	static CallingConv getCCForDeclaratorChunk(
// Check for an explicit CC attribute.		// Check for an explicit CC attribute.
for (const ParsedAttr &AL : AttrList) {		for (const ParsedAttr &AL : AttrList) {
switch (AL.getKind()) {		switch (AL.getKind()) {
CALLING_CONV_ATTRS_CASELIST : {		CALLING_CONV_ATTRS_CASELIST : {
// Ignore attributes that don't validate or can't apply to the		// Ignore attributes that don't validate or can't apply to the
// function type. We'll diagnose the failure to apply them in		// function type. We'll diagnose the failure to apply them in
// handleFunctionTypeAttr.		// handleFunctionTypeAttr.
CallingConv CC;		CallingConv CC;
if (!S.CheckCallingConvAttr(AL, CC) &&		if (!S.CheckCallingConvAttr(AL, CC, /FunctionDecl=/nullptr,
		S.IdentifyCUDATarget(D.getAttributes())) &&
(!FTI.isVariadic \|\| supportsVariadicCall(CC))) {		(!FTI.isVariadic \|\| supportsVariadicCall(CC))) {
return CC;		return CC;
}		}
break;		break;
}		}

default:		default:
break;		break;
▲ Show 20 Lines • Show All 5,799 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/global-initializers.cu

This file was added.

				// RUN: %clang_cc1 %s -triple x86_64-linux-unknown -emit-llvm -o - \
				// RUN: \| FileCheck -check-prefix=HOST %s
				// RUN: %clang_cc1 %s -fcuda-is-device \
				// RUN: -emit-llvm -o - -triple nvptx64 \
				// RUN: -aux-triple x86_64-unknown-linux-gnu \| FileCheck \
				// RUN: -check-prefix=DEV %s

				#include "Inputs/cuda.h"

				// Check host/device-based overloding resolution in global variable initializer.
				double pow(double, double) { return 1.0; }

				__device__ double pow(double, int) { return 2.0; }
				traUnsubmitted Done Reply Inline Actions We don't really need templates to reproduce the issue. We just need a host function with lower overloading priority. A function requiring type conversion or with an additional default argument should do. E.g. `float pow(float, int);` or `double X = pow(double, int, bool lower_priority_host_overload=1);` Removing template should unclutter the tests a bit. tra: We don't really need templates to reproduce the issue. We just need a host function with lower…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do

				// HOST-DAG: call {{.*}}double @_Z3powdd(double noundef 1.000000e+00, double noundef 1.000000e+00)
				double X = pow(1.0, 1);

				constexpr double cpow(double, double) { return 11.0; }

				constexpr __device__ double cpow(double, int) { return 12.0; }

				// HOST-DAG: @CX = global double 1.100000e+01
				double CX = cpow(11.0, 1);

				// DEV-DAG: @CY = addrspace(1) externally_initialized global double 1.200000e+01
				__device__ double CY = cpow(12.0, 1);

				struct A {
				double pow(double, double) { return 3.0; }

				__device__ double pow(double, int) { return 4.0; }
				};

				A a;

				// HOST-DAG: call {{.}}double @_ZN1A3powEdd(ptr {{.}}@a, double noundef 3.000000e+00, double noundef 1.000000e+00)
				double AX = a.pow(3.0, 1);

				struct CA {
				constexpr double cpow(double, double) const { return 13.0; }

				constexpr __device__ double cpow(double, int) const { return 14.0; }
				};

				const CA ca;

				// HOST-DAG: @CAX = global double 1.300000e+01
				double CAX = ca.cpow(13.0, 1);

				// DEV-DAG: @CAY = addrspace(1) externally_initialized global double 1.400000e+01
				__device__ double CAY = ca.cpow(14.0, 1);

clang/test/SemaCUDA/amdgpu-windows-vectorcall.cu

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-pc-windows-msvc -fms-compatibility -fcuda-is-device -fsyntax-only -verify %s			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-pc-windows-msvc -fms-compatibility -fcuda-is-device -fsyntax-only -verify %s
				// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc -fms-compatibility -fsyntax-only -verify %s

	__cdecl void hostf1();			__cdecl void hostf1();
	__vectorcall void (hostf2)() = hostf1; // expected-error {{cannot initialize a variable of type 'void (())() __attribute__((vectorcall))' with an lvalue of type 'void () __attribute__((cdecl))'}}			__vectorcall void (hostf2)() = hostf1; // expected-error {{cannot initialize a variable of type 'void (())() __attribute__((vectorcall))' with an lvalue of type 'void () __attribute__((cdecl))'}}

clang/test/SemaCUDA/function-overload.cu

	Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
	#if defined(__CUDA_ARCH__)			#if defined(__CUDA_ARCH__)
	// expected-error@-2 {{reference to __global__ function 'g' in __host__ __device__ function}}			// expected-error@-2 {{reference to __global__ function 'g' in __host__ __device__ function}}
	#endif			#endif
	}			}

	// Test for address of overloaded function resolution in the global context.			// Test for address of overloaded function resolution in the global context.
	HostFnPtr fp_h = h;			HostFnPtr fp_h = h;
	HostFnPtr fp_ch = ch;			HostFnPtr fp_ch = ch;
				#if defined (__CUDA_ARCH__)
				__device__
				#endif
	CurrentFnPtr fp_dh = dh;			CurrentFnPtr fp_dh = dh;
				#if defined (__CUDA_ARCH__)
				__device__
				#endif
	CurrentFnPtr fp_cdh = cdh;			CurrentFnPtr fp_cdh = cdh;
	GlobalFnPtr fp_g = g;			GlobalFnPtr fp_g = g;


	// Test overloading of destructors			// Test overloading of destructors
	// Can't mix H and unattributed destructors			// Can't mix H and unattributed destructors
	struct d_h {			struct d_h {
	~d_h() {} // expected-note {{previous definition is here}}			~d_h() {} // expected-note {{previous definition is here}}
	▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

clang/test/SemaCUDA/global-initializers-host.cu

This file was deleted.

	// RUN: %clang_cc1 %s --std=c++11 -triple x86_64-linux-unknown -fsyntax-only -o - -verify

	#include "Inputs/cuda.h"

	// Check that we get an error if we try to call a __device__ function from a
	// module initializer.

	struct S {
	__device__ S() {}
	// expected-note@-1 {{'S' declared here}}
	};

	S s;
	// expected-error@-1 {{reference to __device__ function 'S' in global initializer}}

	struct T {
	__host__ __device__ T() {}
	};
	T t; // No error, this is OK.

	struct U {
	__host__ U() {}
	__device__ U(int) {}
	// expected-note@-1 {{'U' declared here}}
	};
	U u(42);
	// expected-error@-1 {{reference to __device__ function 'U' in global initializer}}

	__device__ int device_fn() { return 42; }
	// expected-note@-1 {{'device_fn' declared here}}
	int n = device_fn();
	// expected-error@-1 {{reference to __device__ function 'device_fn' in global initializer}}

clang/test/SemaCUDA/global-initializers.cu

This file was added.

				// RUN: %clang_cc1 %s -triple x86_64-linux-unknown -fsyntax-only -o - -verify
				// RUN: %clang_cc1 %s -fcuda-is-device -triple nvptx -fsyntax-only -o - -verify

				#include "Inputs/cuda.h"

				// Check that we get an error if we try to call a __device__ function from a
				// module initializer.

				struct S {
				// expected-note@-1 {{candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided}}
				// expected-note@-2 {{candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided}}
				__device__ S() {}
				// expected-note@-1 {{candidate constructor not viable: call to __device__ function from __host__ function}}
				};

				S s;
				// expected-error@-1 {{no matching constructor for initialization of 'S'}}

				struct T {
				__host__ __device__ T() {}
				};
				T t; // No error, this is OK.

				struct U {
				// expected-note@-1 {{candidate constructor (the implicit copy constructor) not viable: no known conversion from 'int' to 'const U' for 1st argument}}
				// expected-note@-2 {{candidate constructor (the implicit move constructor) not viable: no known conversion from 'int' to 'U' for 1st argument}}
				__host__ U() {}
				// expected-note@-1 {{candidate constructor not viable: requires 0 arguments, but 1 was provided}}
				__device__ U(int) {}
				// expected-note@-1 {{candidate constructor not viable: call to __device__ function from __host__ function}}
				};
				U u(42);
				// expected-error@-1 {{no matching constructor for initialization of 'U'}}

				__device__ int device_fn() { return 42; }
				// expected-note@-1 {{candidate function not viable: call to __device__ function from __host__ function}}
				int n = device_fn();
				// expected-error@-1 {{no matching function for call to 'device_fn'}}

				// Check host/device-based overloding resolution in global variable initializer.
				double pow(double, double);

				__device__ double pow(double, int);

				double X = pow(1.0, 1);
				__device__ double Y = pow(2.0, 2); // expected-error{{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}

				constexpr double cpow(double, double) { return 1.0; }

				constexpr __device__ double cpow(double, int) { return 2.0; }

				const double CX = cpow(1.0, 1);
				const __device__ double CY = cpow(2.0, 2);

				struct A {
				double pow(double, double);

				__device__ double pow(double, int);

				constexpr double cpow(double, double) const { return 1.0; }

				constexpr __device__ double cpow(double, int) const { return 1.0; }

				};

				A a;
				double AX = a.pow(1.0, 1);
				__device__ double AY = a.pow(2.0, 2); // expected-error{{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}

				const A ca;
				const double CAX = ca.cpow(1.0, 1);
				const __device__ double CAY = ca.cpow(2.0, 2);