This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Sema/
-
clang/
-
Sema/
-
Overload.h
-
lib/Sema/
-
Sema/
2/2
SemaOverload.cpp
-
test/SemaCUDA/
-
SemaCUDA/
-
deferred-oeverload.cu
6/7
function-overload.cu

Differential D80450

[CUDA][HIP] Fix HD function resolution
ClosedPublic

Authored by yaxunl on May 22 2020, 10:33 AM.

Download Raw Diff

Details

Reviewers

tra

Commits

rGacb6f80d96b7: [CUDA][HIP] Fix overloading resolution
rG263390d4f5f2: [CUDA][HIP] Fix implicit HD function resolution

Summary

Add option -ffix-overload-resolution, which is off by default.

When -ffix-overload-resolution is off, keep the original behavior.
Otherwise enable the correct hostness based overloading resolution.

Diff Detail

Event Timeline

yaxunl created this revision.May 22 2020, 10:33 AM

Is this patch supposed to be used with D79526 or instead of it?

clang/test/SemaCUDA/function-overload.cu
463	`__device__`, etc. are defined by the included "Inputs/cuda.h" and can be used here to make it more readable.

Fix test.

ping

In D80450#2055463, @tra wrote:

Is this patch supposed to be used with D79526 or instead of it?

^^^ I don't think this has been answered. I would like to test this change before it lands.

In D80450#2071696, @tra wrote:

In D80450#2055463, @tra wrote:

Is this patch supposed to be used with D79526 or instead of it?

^^^ I don't think this has been answered. I would like to test this change before it lands.

sorry I missed that. Yes this patch is used on top of D79526.

LGTM. Combined with D79526 it appears to work for tensorflow build.

This revision is now accepted and ready to land.Jun 3 2020, 12:45 PM

Closed by commit rG263390d4f5f2: [CUDA][HIP] Fix implicit HD function resolution (authored by yaxunl). · Explain WhyJun 4 2020, 2:24 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2020, 2:24 PM

MaskRay added a reverting change: rGdfc0d9475556: Revert D80450 "[CUDA][HIP] Fix implicit HD function resolution".Jun 10 2020, 5:48 PM

MaskRay mentioned this in rGb3d10920e134: Restore part of D80450 [CUDA][HIP] Fix implicit HD function resolution.Jun 10 2020, 10:39 PM

Reproducer for the regression. https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575
It's not particularly small, but that's as far as I could get it reduced.

With the patch, an attempt to instantiate ag on line 36 (in the reproducer sources I linked to above) results in ambiguity between two templates on lines 33 and 24 that are in different namespaces.
Previously it picked the template on line 28.

In D80450#2087938, @tra wrote:

Reproducer for the regression. https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575
It's not particularly small, but that's as far as I could get it reduced.

With the patch, an attempt to instantiate ag on line 36 (in the reproducer sources I linked to above) results in ambiguity between two templates on lines 33 and 24 that are in different namespaces.
Previously it picked the template on line 28.

Managed to simplify the reproducer down to this which now reports that a host candidate has been ignored. This may explain why we ended up with the ambiguity when other overloads were present.

template <typename> struct a {};
namespace b {
struct c : a<int> {};
template <typename d> void ag(d);
} // namespace b
template <typename ae>
__attribute__((host)) __attribute__((device)) int ag(a<ae>) {
  ae e;
  ag(e);
}
void f() { ag<b::c>; }

In D80450#2088129, @tra wrote:
In D80450#2087938, @tra wrote:

Reproducer for the regression. https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575
It's not particularly small, but that's as far as I could get it reduced.

With the patch, an attempt to instantiate ag on line 36 (in the reproducer sources I linked to above) results in ambiguity between two templates on lines 33 and 24 that are in different namespaces.
Previously it picked the template on line 28.

Managed to simplify the reproducer down to this which now reports that a host candidate has been ignored. This may explain why we ended up with the ambiguity when other overloads were present.
template <typename> struct a {};
namespace b {
struct c : a<int> {};
template <typename d> void ag(d);
} // namespace b
template <typename ae>
__attribute__((host)) __attribute__((device)) int ag(a<ae>) {
  ae e;
  ag(e);
}
void f() { ag<b::c>; }

The error only happens in device compilation.

For the call ag(e). There are two candidates:

ag in namespace b. The function arguments can match. However it is a host function, therefore is a wrong-sided candidate and not viable.

ag in default name space. It is a host device function. However the function arguments requires a<ae>, therefore cannot match.

Before my patch, wrong-sided candidate is allowed. clang resolves to candidate 1 and this results in a diagnostic about host function referenced in device host function, which can be deferred. Since f() is not emitted on device side, the deferred diags is not emitted.

After my patch, wrong-sided candidate is not allowed. clang resolves to candidate 2, which results in a diagnostic that no matching function, which is not a deferred diagnostics by default and is emitted even if f() is not emitted on device side.

In a sense, the diagnostic is correct, since ag(a<ae>) cannot be emitted on device side. This can be fixed by either make ag(a<ae>) a host function or make ag(d) a host device function.

In the original testcase (https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575)

Before my change, call at line 36 resolves to wrong-sided candidate at line 29 since that is the best match for argument types. This results in a deferred diag which allows device compilation to pass.

After my change, call at line 36 resolves to two host device candidates. This results in diagnostics about ambiguity which is not deferred by default. Therefore the compilation fails.

Basically it all boils down to the issue that overloading resolution diagnostics are not deferred by default.

I think first of all we need to exclude wrong-sided candidates as this patch does, otherwise we cannot have correct hostness based overloading resolution and fix bugs like https://bugs.llvm.org/show_bug.cgi?id=46922 .

However by doing this we changes the existing overloading resolution incur some overloading resolution diags. Unless we defer these diags, we may break some existing CUDA/HIP code.

Fortunately https://reviews.llvm.org/D84364 is already landed, which allows deferring overloading resolution diags under option -fgpu-defer-diags.

I think a conservative solution is that we keep the old overloading resolution behavior by default (i.e. do not exclude wrong-sided candidates), whereas enable the correct hostness based overloading resolution when -fgpu-defer-diags is on. If developers would like correct hostness based overloading resolution, they can use -fgpu-defer-diags. Then as -fgpu-defer-diags become stable, we turn it on by default.

reopen for fixing the regression

This revision is now accepted and ready to land.Nov 26 2020, 8:43 AM

If -fgpu-defer-diags is off, keep original behavior.

In D80450#2418775, @yaxunl wrote:

For the call ag(e). There are two candidates:

ag in namespace b. The function arguments can match. However it is a host function, therefore is a wrong-sided candidate and not viable.

ag in default name space. It is a host device function. However the function arguments requires a<ae>, therefore cannot match.

Before my patch, wrong-sided candidate is allowed. clang resolves to candidate 1 and this results in a diagnostic about host function referenced in device host function, which can be deferred. Since f() is not emitted on device side, the deferred diags is not emitted.

This used to be a fairly common pattern in existing CUDA code. A lot of templated code had __host__ __device__ slapped on it because NVCC had no target overloading and it's hard to control who/where/how will instantiate particular template. Some of those templates could only be instantiated on one side of the compilation. Clang's only choice was to allow the wrong-side candidates and/or defer the diagnostics. I vaguely recall that it was one of the trickier bits of the overload resolution rules to handle.

After my patch, wrong-sided candidate is not allowed. clang resolves to candidate 2, which results in a diagnostic that no matching function, which is not a deferred diagnostics by default and is emitted even if f() is not emitted on device side.
...
Basically it all boils down to the issue that overloading resolution diagnostics are not deferred by default.

Looks that way.

I think a conservative solution is that we keep the old overloading resolution behavior by default (i.e. do not exclude wrong-sided candidates), whereas enable the correct hostness based overloading resolution when -fgpu-defer-diags is on. If developers would like correct hostness based overloading resolution, they can use -fgpu-defer-diags. Then as -fgpu-defer-diags become stable, we turn it on by default.

SGTM. I'll check how the patch fares on our CUDA code.

clang/test/SemaCUDA/function-overload.cu
614	competes->compete

tra added inline comments.Nov 30 2020, 12:05 PM

clang/test/SemaCUDA/function-overload.cu
616	One thing that bothers me about this comment is that `-fgpu-defer-diag` apparently changes the result of the overload resolution, not just deferring diags.
617–618	It would be great to have a test where those diagnostics do fire.

In D80450#2423706, @tra wrote:

SGTM. I'll check how the patch fares on our CUDA code.

Please hold on. I just found a regression due to old behavior not fully recovered in certain case. I will update the patch for fixing the regression.

yaxunl marked an inline comment as done.Nov 30 2020, 12:35 PM

yaxunl added inline comments.

clang/test/SemaCUDA/function-overload.cu
616	without -fgpu-defer-diag we have to keep the old incorrect overloading resolution since otherwise it breaks existing code. We can only have correct overloading resolution with -fgpu-defer-diag on. If we want to have correct overloading resolution, not depending on whether -fgpu-defer-diag is on or off, we have to turn on -fgpu-defer-diag by default. In this case no existing code will be broken.

tra added inline comments.Nov 30 2020, 1:04 PM

clang/test/SemaCUDA/function-overload.cu
616	We can only have correct overloading resolution with -fgpu-defer-diag on. `-fgpu-defer-diags` is a prerequisite for fixing overload resolution. I'm fine with that. Making it serve the double duty of affecting the overload resolution is what I was pointing at. We should have a knob `fix-overload-resolution` which would then turn `-fgpu-defer-diag` on, not the other way around.

yaxunl marked 2 inline comments as done.Nov 30 2020, 1:17 PM

yaxunl added inline comments.

clang/test/SemaCUDA/function-overload.cu
616	That makes sense. Will add -ffix-overload-resolution.

Add -ffix-overload-resolution and fix a regression.

Herald added subscribers: dexonsmith, dang. · View Herald TranscriptNov 30 2020, 7:11 PM

LGTM.

I'd suggest adding more details on the background of this change to the commit log (point to the comment in the isBetterOverloadCandidate ?) and outline the intention to enable the new way to do overloading after some soak time.

Also, naming. -ffix-overload-resolution is rather non-specific. I didn't mean to use it literally. The problem is that I can't think of a good descriptive name for what we do here. -fgpu-fix-wrong-side-overloads ? Something else?

clang/lib/Sema/SemaOverload.cpp
9621	The comment uses device/host for both function attributes and when it refers to the compilation phase. It would help to make it more readable if function attributes would be distinct from compilation phase. E.g. by using `__host__ __device__` or `HD`.

In D80450#2426507, @tra wrote:

LGTM.

I'd suggest adding more details on the background of this change to the commit log (point to the comment in the isBetterOverloadCandidate ?) and outline the intention to enable the new way to do overloading after some soak time.

Will do.

Also, naming. -ffix-overload-resolution is rather non-specific. I didn't mean to use it literally. The problem is that I can't think of a good descriptive name for what we do here. -fgpu-fix-wrong-side-overloads ? Something else?

How about -fgpu-exclude-wrong-side-overloads? Since what this patch does is always excluding wrong side overloads whereas previously only excluding wrong side overloads if there are same side overloads.

clang/lib/Sema/SemaOverload.cpp
9621	will use H/D/HD for function attribute when committing.

In D80450#2428631, @yaxunl wrote:

Also, naming. -ffix-overload-resolution is rather non-specific. I didn't mean to use it literally. The problem is that I can't think of a good descriptive name for what we do here. -fgpu-fix-wrong-side-overloads ? Something else?

How about -fgpu-exclude-wrong-side-overloads? Since what this patch does is always excluding wrong side overloads whereas previously only excluding wrong side overloads if there are same side overloads.

SGTM. Maybe, also make it hidden. I don't think it's useful for the end users.

Closed by commit rGacb6f80d96b7: [CUDA][HIP] Fix overloading resolution (authored by yaxunl). · Explain WhyDec 2 2020, 1:34 PM

This revision was automatically updated to reflect the committed changes.

yaxunl marked an inline comment as done.

yaxunl added a commit: rGacb6f80d96b7: [CUDA][HIP] Fix overloading resolution.

Revision Contents

Path

Size

clang/

include/

clang/

Sema/

Overload.h

3 lines

lib/

Sema/

SemaOverload.cpp

146 lines

test/

SemaCUDA/

deferred-oeverload.cu

6 lines

function-overload.cu

278 lines

Diff 308032

clang/include/clang/Sema/Overload.h

Show First 20 Lines • Show All 1,045 Lines • ▼ Show 20 Lines	T *slabAllocate(unsigned N) {
"Misaligned storage!");		"Misaligned storage!");

NumInlineBytesUsed += NBytes;		NumInlineBytesUsed += NBytes;
return reinterpret_cast<T *>(FreeSpaceStart);		return reinterpret_cast<T *>(FreeSpaceStart);
}		}

void destroyCandidates();		void destroyCandidates();

		/// Whether diagnostics should be deferred.
		bool shouldDeferDiags(Sema &S, ArrayRef<Expr *> Args, SourceLocation OpLoc);

public:		public:
OverloadCandidateSet(SourceLocation Loc, CandidateSetKind CSK,		OverloadCandidateSet(SourceLocation Loc, CandidateSetKind CSK,
OperatorRewriteInfo RewriteInfo = {})		OperatorRewriteInfo RewriteInfo = {})
: Loc(Loc), Kind(CSK), RewriteInfo(RewriteInfo) {}		: Loc(Loc), Kind(CSK), RewriteInfo(RewriteInfo) {}
OverloadCandidateSet(const OverloadCandidateSet &) = delete;		OverloadCandidateSet(const OverloadCandidateSet &) = delete;
OverloadCandidateSet &operator=(const OverloadCandidateSet &) = delete;		OverloadCandidateSet &operator=(const OverloadCandidateSet &) = delete;
~OverloadCandidateSet() { destroyCandidates(); }		~OverloadCandidateSet() { destroyCandidates(); }

▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

clang/lib/Sema/SemaOverload.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,610 Lines • ▼ Show 20 Lines	bool clang::isBetterOverloadCandidate(
SourceLocation Loc, OverloadCandidateSet::CandidateSetKind Kind) {		SourceLocation Loc, OverloadCandidateSet::CandidateSetKind Kind) {
// Define viable functions to be better candidates than non-viable		// Define viable functions to be better candidates than non-viable
// functions.		// functions.
if (!Cand2.Viable)		if (!Cand2.Viable)
return Cand1.Viable;		return Cand1.Viable;
else if (!Cand1.Viable)		else if (!Cand1.Viable)
return false;		return false;

		// [CUDA] A function with 'never' preference is marked not viable, therefore
		// is never shown up here. The worst preference shown up here is 'wrong side',
		// e.g. a host function called by a device host function in device
		traUnsubmitted Done Reply Inline Actions The comment uses device/host for both function attributes and when it refers to the compilation phase. It would help to make it more readable if function attributes would be distinct from compilation phase. E.g. by using `__host__ __device__` or `HD`. tra: The comment uses device/host for both function attributes and when it refers to the compilation…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will use H/D/HD for function attribute when committing. yaxunl: will use H/D/HD for function attribute when committing.
		// compilation. This is valid AST as long as the host device function is not
		// emitted, e.g. it is an inline function which is called only by a host
		// function. A deferred diagnostic will be triggered if it is emitted.
		// However a wrong-sided function is still a viable candidate here.
		//
		// If Cand1 can be emitted and Cand2 cannot be emitted in the current
		// context, Cand1 is better than Cand2. If Cand1 can not be emitted and Cand2
		// can be emitted, Cand1 is not better than Cand2. This rule should have
		// precedence over other rules.
		//
		// If both Cand1 and Cand2 can be emitted, or neither can be emitted, then
		// other rules should be used to determine which is better. This is because
		// host/device based overloading resolution is mostly for determining
		// viability of a function. If two functions are both viable, other factors
		// should take precedence in preference, e.g. the standard-defined preferences
		// like argument conversion ranks or enable_if partial-ordering. The
		// preference for pass-object-size parameters is probably most similar to a
		// type-based-overloading decision and so should take priority.
		//
		// If other rules cannot determine which is better, CUDA preference will be
		// used again to determine which is better.
		//
		// TODO: Currently IdentifyCUDAPreference does not return correct values
		// for functions called in global variable initializers due to missing
		// correct context about device/host. Therefore we can only enforce this
		// rule when there is a caller. We should enforce this rule for functions
		// in global variable initializers once proper context is added.
		if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
		if (FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext)) {
		bool IsCallerImplicitHD = Sema::isCUDAImplicitHostDeviceFunction(Caller);
		bool IsCand1ImplicitHD =
		Sema::isCUDAImplicitHostDeviceFunction(Cand1.Function);
		bool IsCand2ImplicitHD =
		Sema::isCUDAImplicitHostDeviceFunction(Cand2.Function);
		auto P1 = S.IdentifyCUDAPreference(Caller, Cand1.Function);
		auto P2 = S.IdentifyCUDAPreference(Caller, Cand2.Function);
		assert(P1 != Sema::CFP_Never && P2 != Sema::CFP_Never);
		// The implicit HD function may be a function in a system header which
		// is forced by pragma. In device compilation, if we prefer HD candidates
		// over wrong-sided candidates, overloading resolution may change, which
		// may result in non-deferrable diagnostics. As a workaround, we let
		// implicit HD candidates take equal preference as wrong-sided candidates.
		// This will preserve the overloading resolution.
		auto EmitThreshold =
		(S.getLangOpts().CUDAIsDevice && IsCallerImplicitHD &&
		(IsCand1ImplicitHD \|\| IsCand2ImplicitHD)) \|\|
		(!S.getLangOpts().GPUDeferDiag && P1 < Sema::CFP_SameSide &&
		P2 < Sema::CFP_SameSide)
		? Sema::CFP_Never
		: Sema::CFP_WrongSide;
		auto Cand1Emittable = P1 > EmitThreshold;
		auto Cand2Emittable = P2 > EmitThreshold;
		if (Cand1Emittable && !Cand2Emittable)
		return true;
		if (!Cand1Emittable && Cand2Emittable)
		return false;
		}
		}

// C++ [over.match.best]p1:		// C++ [over.match.best]p1:
//		//
// -- if F is a static member function, ICS1(F) is defined such		// -- if F is a static member function, ICS1(F) is defined such
// that ICS1(F) is neither better nor worse than ICS1(G) for		// that ICS1(F) is neither better nor worse than ICS1(G) for
// any function G, and, symmetrically, ICS1(G) is neither		// any function G, and, symmetrically, ICS1(G) is neither
// better nor worse than ICS1(F).		// better nor worse than ICS1(F).
unsigned StartArg = 0;		unsigned StartArg = 0;
if (Cand1.IgnoreObjectArgument \|\| Cand2.IgnoreObjectArgument)		if (Cand1.IgnoreObjectArgument \|\| Cand2.IgnoreObjectArgument)
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	bool clang::isBetterOverloadCandidate(

// Check for enable_if value-based overload resolution.		// Check for enable_if value-based overload resolution.
if (Cand1.Function && Cand2.Function) {		if (Cand1.Function && Cand2.Function) {
Comparison Cmp = compareEnableIfAttrs(S, Cand1.Function, Cand2.Function);		Comparison Cmp = compareEnableIfAttrs(S, Cand1.Function, Cand2.Function);
if (Cmp != Comparison::Equal)		if (Cmp != Comparison::Equal)
return Cmp == Comparison::Better;		return Cmp == Comparison::Better;
}		}

if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext);
return S.IdentifyCUDAPreference(Caller, Cand1.Function) >
S.IdentifyCUDAPreference(Caller, Cand2.Function);
}

bool HasPS1 = Cand1.Function != nullptr &&		bool HasPS1 = Cand1.Function != nullptr &&
functionHasPassObjectSizeParams(Cand1.Function);		functionHasPassObjectSizeParams(Cand1.Function);
bool HasPS2 = Cand2.Function != nullptr &&		bool HasPS2 = Cand2.Function != nullptr &&
functionHasPassObjectSizeParams(Cand2.Function);		functionHasPassObjectSizeParams(Cand2.Function);
if (HasPS1 != HasPS2 && HasPS1)		if (HasPS1 != HasPS2 && HasPS1)
return true;		return true;

Comparison MV = isBetterMultiversionCandidate(Cand1, Cand2);		auto MV = isBetterMultiversionCandidate(Cand1, Cand2);
return MV == Comparison::Better;		if (MV == Comparison::Better)
		return true;
		if (MV == Comparison::Worse)
		return false;

		// If other rules cannot determine which is better, CUDA preference is used
		// to determine which is better.
		if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
		FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext);
		return S.IdentifyCUDAPreference(Caller, Cand1.Function) >
		S.IdentifyCUDAPreference(Caller, Cand2.Function);
		}

		return false;
}		}

/// Determine whether two declarations are "equivalent" for the purposes of		/// Determine whether two declarations are "equivalent" for the purposes of
/// name lookup and overload resolution. This applies when the same internal/no		/// name lookup and overload resolution. This applies when the same internal/no
/// linkage entity is defined by two modules (probably by textually including		/// linkage entity is defined by two modules (probably by textually including
/// the same header). In such a case, we don't consider the declarations to		/// the same header). In such a case, we don't consider the declarations to
/// declare the same entity, but we also don't want lookups with both		/// declare the same entity, but we also don't want lookups with both
/// declarations visible to be ambiguous in some cases (this happens when using		/// declarations visible to be ambiguous in some cases (this happens when using
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
/// \returns The result of overload resolution.		/// \returns The result of overload resolution.
OverloadingResult		OverloadingResult
OverloadCandidateSet::BestViableFunction(Sema &S, SourceLocation Loc,		OverloadCandidateSet::BestViableFunction(Sema &S, SourceLocation Loc,
iterator &Best) {		iterator &Best) {
llvm::SmallVector<OverloadCandidate *, 16> Candidates;		llvm::SmallVector<OverloadCandidate *, 16> Candidates;
std::transform(begin(), end(), std::back_inserter(Candidates),		std::transform(begin(), end(), std::back_inserter(Candidates),
[](OverloadCandidate &Cand) { return &Cand; });		[](OverloadCandidate &Cand) { return &Cand; });

// [CUDA] HD->H or HD->D calls are technically not allowed by CUDA but
// are accepted by both clang and NVCC. However, during a particular
// compilation mode only one call variant is viable. We need to
// exclude non-viable overload candidates from consideration based
// only on their host/device attributes. Specifically, if one
// candidate call is WrongSide and the other is SameSide, we ignore
// the WrongSide candidate.
if (S.getLangOpts().CUDA) {
const FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext);
bool ContainsSameSideCandidate =
llvm::any_of(Candidates, [&](OverloadCandidate *Cand) {
// Check viable function only.
return Cand->Viable && Cand->Function &&
S.IdentifyCUDAPreference(Caller, Cand->Function) ==
Sema::CFP_SameSide;
});
if (ContainsSameSideCandidate) {
auto IsWrongSideCandidate = [&](OverloadCandidate *Cand) {
// Check viable function only to avoid unnecessary data copying/moving.
return Cand->Viable && Cand->Function &&
S.IdentifyCUDAPreference(Caller, Cand->Function) ==
Sema::CFP_WrongSide;
};
llvm::erase_if(Candidates, IsWrongSideCandidate);
}
}

// Find the best viable function.		// Find the best viable function.
Best = end();		Best = end();
for (auto *Cand : Candidates) {		for (auto *Cand : Candidates) {
Cand->Best = false;		Cand->Best = false;
if (Cand->Viable)		if (Cand->Viable)
if (Best == end() \|\|		if (Best == end() \|\|
isBetterOverloadCandidate(S, Cand, Best, Loc, Kind))		isBetterOverloadCandidate(S, Cand, Best, Loc, Kind))
Best = Cand;		Best = Cand;
▲ Show 20 Lines • Show All 1,627 Lines • ▼ Show 20 Lines	SmallVector<OverloadCandidate *, 32> OverloadCandidateSet::CompleteCandidates(
}		}

llvm::stable_sort(		llvm::stable_sort(
Cands, CompareOverloadCandidatesForDisplay(S, OpLoc, Args.size(), Kind));		Cands, CompareOverloadCandidatesForDisplay(S, OpLoc, Args.size(), Kind));

return Cands;		return Cands;
}		}

/// When overload resolution fails, prints diagnostic messages containing the		bool OverloadCandidateSet::shouldDeferDiags(Sema &S, ArrayRef<Expr *> Args,
/// candidates in the candidate set.		SourceLocation OpLoc) {
void OverloadCandidateSet::NoteCandidates(PartialDiagnosticAt PD,
Sema &S, OverloadCandidateDisplayKind OCD, ArrayRef<Expr *> Args,
StringRef Opc, SourceLocation OpLoc,
llvm::function_ref<bool(OverloadCandidate &)> Filter) {

bool DeferHint = false;		bool DeferHint = false;
if (S.getLangOpts().CUDA && S.getLangOpts().GPUDeferDiag) {		if (S.getLangOpts().CUDA && S.getLangOpts().GPUDeferDiag) {
// Defer diagnostic for CUDA/HIP if there are wrong-sided candidates.		// Defer diagnostic for CUDA/HIP if there are wrong-sided candidates or
		// host device candidates.
auto WrongSidedCands =		auto WrongSidedCands =
CompleteCandidates(S, OCD_AllCandidates, Args, OpLoc, [](auto &Cand) {		CompleteCandidates(S, OCD_AllCandidates, Args, OpLoc, [](auto &Cand) {
return Cand.Viable == false &&		return (Cand.Viable == false &&
Cand.FailureKind == ovl_fail_bad_target;		Cand.FailureKind == ovl_fail_bad_target) \|\|
		(Cand.Function->template hasAttr<CUDAHostAttr>() &&
		Cand.Function->template hasAttr<CUDADeviceAttr>());
});		});
DeferHint = WrongSidedCands.size();		DeferHint = WrongSidedCands.size();
}		}
		return DeferHint;
		}

		/// When overload resolution fails, prints diagnostic messages containing the
		/// candidates in the candidate set.
		void OverloadCandidateSet::NoteCandidates(
		PartialDiagnosticAt PD, Sema &S, OverloadCandidateDisplayKind OCD,
		ArrayRef<Expr *> Args, StringRef Opc, SourceLocation OpLoc,
		llvm::function_ref<bool(OverloadCandidate &)> Filter) {

auto Cands = CompleteCandidates(S, OCD, Args, OpLoc, Filter);		auto Cands = CompleteCandidates(S, OCD, Args, OpLoc, Filter);

S.Diag(PD.first, PD.second, DeferHint);		S.Diag(PD.first, PD.second, shouldDeferDiags(S, Args, OpLoc));

NoteCandidates(S, Args, Cands, Opc, OpLoc);		NoteCandidates(S, Args, Cands, Opc, OpLoc);

if (OCD == OCD_AmbiguousCandidates)		if (OCD == OCD_AmbiguousCandidates)
MaybeDiagnoseAmbiguousConstraints(S, {begin(), end()});		MaybeDiagnoseAmbiguousConstraints(S, {begin(), end()});
}		}

void OverloadCandidateSet::NoteCandidates(Sema &S, ArrayRef<Expr *> Args,		void OverloadCandidateSet::NoteCandidates(Sema &S, ArrayRef<Expr *> Args,
Show All 35 Lines	else {
}		}

// If this is a viable builtin, print it.		// If this is a viable builtin, print it.
NoteBuiltinOperatorCandidate(S, Opc, OpLoc, Cand);		NoteBuiltinOperatorCandidate(S, Opc, OpLoc, Cand);
}		}
}		}

if (I != E)		if (I != E)
S.Diag(OpLoc, diag::note_ovl_too_many_candidates) << int(E - I);		S.Diag(OpLoc, diag::note_ovl_too_many_candidates,
		shouldDeferDiags(S, Args, OpLoc))
		<< int(E - I);
}		}

static SourceLocation		static SourceLocation
GetLocationForCandidate(const TemplateSpecCandidate *Cand) {		GetLocationForCandidate(const TemplateSpecCandidate *Cand) {
return Cand->Specialization ? Cand->Specialization->getLocation()		return Cand->Specialization ? Cand->Specialization->getLocation()
: SourceLocation();		: SourceLocation();
}		}

▲ Show 20 Lines • Show All 3,405 Lines • Show Last 20 Lines

clang/test/SemaCUDA/deferred-oeverload.cu

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines

	struct A { int x; typedef int isA; };			struct A { int x; typedef int isA; };
	struct B { int x; };			struct B { int x; };

	// This function is invalid for A and B by SFINAE.			// This function is invalid for A and B by SFINAE.
	// This fails to substitue for A but no diagnostic			// This fails to substitue for A but no diagnostic
	// should be emitted.			// should be emitted.
	template<typename T, typename T::foo* = nullptr>			template<typename T, typename T::foo* = nullptr>
	__host__ __device__ void sfinae(T t) { // com-note {{candidate template ignored: substitution failure [with T = B]}}			__host__ __device__ void sfinae(T t) { // host-note {{candidate template ignored: substitution failure [with T = B]}}
	t.x = 1;			t.x = 1;
	}			}

	// This function is defined for A only by SFINAE.			// This function is defined for A only by SFINAE.
	// Calling it with A should succeed, with B should fail.			// Calling it with A should succeed, with B should fail.
	// The error should not be deferred since it happens in			// The error should not be deferred since it happens in
	// file scope.			// file scope.

	template<typename T, typename T::isA* = nullptr>			template<typename T, typename T::isA* = nullptr>
	__host__ __device__ void sfinae(T t) { // com-note {{candidate template ignored: substitution failure [with T = B]}}			__host__ __device__ void sfinae(T t) { // host-note {{candidate template ignored: substitution failure [with T = B]}}
	t.x = 1;			t.x = 1;
	}			}

	void test_sfinae() {			void test_sfinae() {
	sfinae(A());			sfinae(A());
	sfinae(B()); // com-error{{no matching function for call to 'sfinae'}}			sfinae(B()); // host-error{{no matching function for call to 'sfinae'}}
	}			}

	// Make sure throw is diagnosed in OpenMP parallel region in host function.			// Make sure throw is diagnosed in OpenMP parallel region in host function.
	void test_openmp() {			void test_openmp() {
	#pragma omp parallel for			#pragma omp parallel for
	for (int i = 0; i < 10; i++) {			for (int i = 0; i < 10; i++) {
	throw 1;			throw 1;
	}			}
	}			}

	// If a syntax error causes a function not declared, it cannot			// If a syntax error causes a function not declared, it cannot
	// be deferred.			// be deferred.

	inline __host__ __device__ void bad_func() { // com-note {{to match this '{'}}			inline __host__ __device__ void bad_func() { // com-note {{to match this '{'}}
	// com-error {{expected '}'}}			// com-error {{expected '}'}}

clang/test/SemaCUDA/function-overload.cu

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target

	// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify=host,expected %s			// RUN: %clang_cc1 -std=c++14 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify=host,hostdefer,devdefer,expected %s
	// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify=dev,expected %s			// RUN: %clang_cc1 -std=c++14 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify=dev,devnodeferonly,hostdefer,devdefer,expected %s
				// RUN: %clang_cc1 -fgpu-defer-diag -DDEFER=1 -std=c++14 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify=host,hostdefer,expected %s
				// RUN: %clang_cc1 -fgpu-defer-diag -DDEFER=1 -std=c++14 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify=dev,devdeferonly,devdefer,expected %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Opaque return types used to check that we pick the right overloads.			// Opaque return types used to check that we pick the right overloads.
	struct HostReturnTy {};			struct HostReturnTy {};
	struct HostReturnTy2 {};			struct HostReturnTy2 {};
	struct DeviceReturnTy {};			struct DeviceReturnTy {};
	struct DeviceReturnTy2 {};			struct DeviceReturnTy2 {};
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines

	extern "C" __host__ int chhd2() { return 0; } // expected-note {{previous declaration is here}}			extern "C" __host__ int chhd2() { return 0; } // expected-note {{previous declaration is here}}
	extern "C" __host__ __device__ int chhd2() { return 0; }			extern "C" __host__ __device__ int chhd2() { return 0; }
	// expected-error@-1 {{__host__ __device__ function 'chhd2' cannot overload __host__ function 'chhd2'}}			// expected-error@-1 {{__host__ __device__ function 'chhd2' cannot overload __host__ function 'chhd2'}}

	// Helper functions to verify calling restrictions.			// Helper functions to verify calling restrictions.
	__device__ DeviceReturnTy d() { return DeviceReturnTy(); }			__device__ DeviceReturnTy d() { return DeviceReturnTy(); }
	// host-note@-1 1+ {{'d' declared here}}			// host-note@-1 1+ {{'d' declared here}}
	// expected-note@-2 1+ {{candidate function not viable: call to __device__ function from __host__ function}}			// hostdefer-note@-2 1+ {{candidate function not viable: call to __device__ function from __host__ function}}
	// expected-note@-3 0+ {{candidate function not viable: call to __device__ function from __host__ __device__ function}}			// expected-note@-3 0+ {{candidate function not viable: call to __device__ function from __host__ __device__ function}}

	__host__ HostReturnTy h() { return HostReturnTy(); }			__host__ HostReturnTy h() { return HostReturnTy(); }
	// dev-note@-1 1+ {{'h' declared here}}			// dev-note@-1 1+ {{'h' declared here}}
	// expected-note@-2 1+ {{candidate function not viable: call to __host__ function from __device__ function}}			// devdefer-note@-2 1+ {{candidate function not viable: call to __host__ function from __device__ function}}
	// expected-note@-3 0+ {{candidate function not viable: call to __host__ function from __host__ __device__ function}}			// expected-note@-3 0+ {{candidate function not viable: call to __host__ function from __host__ __device__ function}}
	// expected-note@-4 1+ {{candidate function not viable: call to __host__ function from __global__ function}}			// devdefer-note@-4 1+ {{candidate function not viable: call to __host__ function from __global__ function}}

	__global__ void g() {}			__global__ void g() {}
	// dev-note@-1 1+ {{'g' declared here}}			// dev-note@-1 1+ {{'g' declared here}}
	// expected-note@-2 1+ {{candidate function not viable: call to __global__ function from __device__ function}}			// devdefer-note@-2 1+ {{candidate function not viable: call to __global__ function from __device__ function}}
	// expected-note@-3 0+ {{candidate function not viable: call to __global__ function from __host__ __device__ function}}			// expected-note@-3 0+ {{candidate function not viable: call to __global__ function from __host__ __device__ function}}
	// expected-note@-4 1+ {{candidate function not viable: call to __global__ function from __global__ function}}			// devdefer-note@-4 1+ {{candidate function not viable: call to __global__ function from __global__ function}}

	extern "C" __device__ DeviceReturnTy cd() { return DeviceReturnTy(); }			extern "C" __device__ DeviceReturnTy cd() { return DeviceReturnTy(); }
	// host-note@-1 1+ {{'cd' declared here}}			// host-note@-1 1+ {{'cd' declared here}}
	// expected-note@-2 1+ {{candidate function not viable: call to __device__ function from __host__ function}}			// hostdefer-note@-2 1+ {{candidate function not viable: call to __device__ function from __host__ function}}
	// expected-note@-3 0+ {{candidate function not viable: call to __device__ function from __host__ __device__ function}}			// expected-note@-3 0+ {{candidate function not viable: call to __device__ function from __host__ __device__ function}}

	extern "C" __host__ HostReturnTy ch() { return HostReturnTy(); }			extern "C" __host__ HostReturnTy ch() { return HostReturnTy(); }
	// dev-note@-1 1+ {{'ch' declared here}}			// dev-note@-1 1+ {{'ch' declared here}}
	// expected-note@-2 1+ {{candidate function not viable: call to __host__ function from __device__ function}}			// devdefer-note@-2 1+ {{candidate function not viable: call to __host__ function from __device__ function}}
	// expected-note@-3 0+ {{candidate function not viable: call to __host__ function from __host__ __device__ function}}			// expected-note@-3 0+ {{candidate function not viable: call to __host__ function from __host__ __device__ function}}
	// expected-note@-4 1+ {{candidate function not viable: call to __host__ function from __global__ function}}			// devdefer-note@-4 1+ {{candidate function not viable: call to __host__ function from __global__ function}}

	__host__ void hostf() {			__host__ void hostf() {
	DeviceFnPtr fp_d = d; // host-error {{reference to __device__ function 'd' in __host__ function}}			DeviceFnPtr fp_d = d; // host-error {{reference to __device__ function 'd' in __host__ function}}
	DeviceReturnTy ret_d = d(); // expected-error {{no matching function for call to 'd'}}			DeviceReturnTy ret_d = d(); // hostdefer-error {{no matching function for call to 'd'}}
	DeviceFnPtr fp_cd = cd; // host-error {{reference to __device__ function 'cd' in __host__ function}}			DeviceFnPtr fp_cd = cd; // host-error {{reference to __device__ function 'cd' in __host__ function}}
	DeviceReturnTy ret_cd = cd(); // expected-error {{no matching function for call to 'cd'}}			DeviceReturnTy ret_cd = cd(); // hostdefer-error {{no matching function for call to 'cd'}}

	HostFnPtr fp_h = h;			HostFnPtr fp_h = h;
	HostReturnTy ret_h = h();			HostReturnTy ret_h = h();
	HostFnPtr fp_ch = ch;			HostFnPtr fp_ch = ch;
	HostReturnTy ret_ch = ch();			HostReturnTy ret_ch = ch();

	HostFnPtr fp_dh = dh;			HostFnPtr fp_dh = dh;
	HostReturnTy ret_dh = dh();			HostReturnTy ret_dh = dh();
	HostFnPtr fp_cdh = cdh;			HostFnPtr fp_cdh = cdh;
	HostReturnTy ret_cdh = cdh();			HostReturnTy ret_cdh = cdh();

	GlobalFnPtr fp_g = g;			GlobalFnPtr fp_g = g;
	g(); // expected-error {{call to global function 'g' not configured}}			g(); // expected-error {{call to global function 'g' not configured}}
	g<<<0, 0>>>();			g<<<0, 0>>>();
	}			}

	__device__ void devicef() {			__device__ void devicef() {
	DeviceFnPtr fp_d = d;			DeviceFnPtr fp_d = d;
	DeviceReturnTy ret_d = d();			DeviceReturnTy ret_d = d();
	DeviceFnPtr fp_cd = cd;			DeviceFnPtr fp_cd = cd;
	DeviceReturnTy ret_cd = cd();			DeviceReturnTy ret_cd = cd();

	HostFnPtr fp_h = h; // dev-error {{reference to __host__ function 'h' in __device__ function}}			HostFnPtr fp_h = h; // dev-error {{reference to __host__ function 'h' in __device__ function}}
	HostReturnTy ret_h = h(); // expected-error {{no matching function for call to 'h'}}			HostReturnTy ret_h = h(); // devdefer-error {{no matching function for call to 'h'}}
	HostFnPtr fp_ch = ch; // dev-error {{reference to __host__ function 'ch' in __device__ function}}			HostFnPtr fp_ch = ch; // dev-error {{reference to __host__ function 'ch' in __device__ function}}
	HostReturnTy ret_ch = ch(); // expected-error {{no matching function for call to 'ch'}}			HostReturnTy ret_ch = ch(); // devdefer-error {{no matching function for call to 'ch'}}

	DeviceFnPtr fp_dh = dh;			DeviceFnPtr fp_dh = dh;
	DeviceReturnTy ret_dh = dh();			DeviceReturnTy ret_dh = dh();
	DeviceFnPtr fp_cdh = cdh;			DeviceFnPtr fp_cdh = cdh;
	DeviceReturnTy ret_cdh = cdh();			DeviceReturnTy ret_cdh = cdh();

	GlobalFnPtr fp_g = g; // dev-error {{reference to __global__ function 'g' in __device__ function}}			GlobalFnPtr fp_g = g; // dev-error {{reference to __global__ function 'g' in __device__ function}}
	g(); // expected-error {{no matching function for call to 'g'}}			g(); // devdefer-error {{no matching function for call to 'g'}}
	g<<<0,0>>>(); // dev-error {{reference to __global__ function 'g' in __device__ function}}			g<<<0,0>>>(); // dev-error {{reference to __global__ function 'g' in __device__ function}}
	}			}

	__global__ void globalf() {			__global__ void globalf() {
	DeviceFnPtr fp_d = d;			DeviceFnPtr fp_d = d;
	DeviceReturnTy ret_d = d();			DeviceReturnTy ret_d = d();
	DeviceFnPtr fp_cd = cd;			DeviceFnPtr fp_cd = cd;
	DeviceReturnTy ret_cd = cd();			DeviceReturnTy ret_cd = cd();

	HostFnPtr fp_h = h; // dev-error {{reference to __host__ function 'h' in __global__ function}}			HostFnPtr fp_h = h; // dev-error {{reference to __host__ function 'h' in __global__ function}}
	HostReturnTy ret_h = h(); // expected-error {{no matching function for call to 'h'}}			HostReturnTy ret_h = h(); // devdefer-error {{no matching function for call to 'h'}}
	HostFnPtr fp_ch = ch; // dev-error {{reference to __host__ function 'ch' in __global__ function}}			HostFnPtr fp_ch = ch; // dev-error {{reference to __host__ function 'ch' in __global__ function}}
	HostReturnTy ret_ch = ch(); // expected-error {{no matching function for call to 'ch'}}			HostReturnTy ret_ch = ch(); // devdefer-error {{no matching function for call to 'ch'}}

	DeviceFnPtr fp_dh = dh;			DeviceFnPtr fp_dh = dh;
	DeviceReturnTy ret_dh = dh();			DeviceReturnTy ret_dh = dh();
	DeviceFnPtr fp_cdh = cdh;			DeviceFnPtr fp_cdh = cdh;
	DeviceReturnTy ret_cdh = cdh();			DeviceReturnTy ret_cdh = cdh();

	GlobalFnPtr fp_g = g; // dev-error {{reference to __global__ function 'g' in __global__ function}}			GlobalFnPtr fp_g = g; // dev-error {{reference to __global__ function 'g' in __global__ function}}
	g(); // expected-error {{no matching function for call to 'g'}}			g(); // devdefer-error {{no matching function for call to 'g'}}
	g<<<0,0>>>(); // dev-error {{reference to __global__ function 'g' in __global__ function}}			g<<<0,0>>>(); // dev-error {{reference to __global__ function 'g' in __global__ function}}
	}			}

	__host__ __device__ void hostdevicef() {			__host__ __device__ void hostdevicef() {
	DeviceFnPtr fp_d = d;			DeviceFnPtr fp_d = d;
	DeviceReturnTy ret_d = d();			DeviceReturnTy ret_d = d();
	DeviceFnPtr fp_cd = cd;			DeviceFnPtr fp_cd = cd;
	DeviceReturnTy ret_cd = cd();			DeviceReturnTy ret_cd = cd();
	#if !defined(__CUDA_ARCH__)			#if !defined(__CUDA_ARCH__)
	// expected-error@-5 {{reference to __device__ function 'd' in __host__ __device__ function}}			// expected-error@-5 {{reference to __device__ function 'd' in __host__ __device__ function}}
	// expected-error@-5 {{reference to __device__ function 'd' in __host__ __device__ function}}			// expected-error@-5 {{reference to __device__ function 'd' in __host__ __device__ function}}
	// expected-error@-5 {{reference to __device__ function 'cd' in __host__ __device__ function}}			// expected-error@-5 {{reference to __device__ function 'cd' in __host__ __device__ function}}
	// expected-error@-5 {{reference to __device__ function 'cd' in __host__ __device__ function}}			// expected-error@-5 {{reference to __device__ function 'cd' in __host__ __device__ function}}
	#endif			#endif

	HostFnPtr fp_h = h;			HostFnPtr fp_h = h;
	HostReturnTy ret_h = h();			HostReturnTy ret_h = h();
	HostFnPtr fp_ch = ch;			HostFnPtr fp_ch = ch;
	HostReturnTy ret_ch = ch();			HostReturnTy ret_ch = ch();
	#if defined(__CUDA_ARCH__)			#if defined(__CUDA_ARCH__)
	// expected-error@-5 {{reference to __host__ function 'h' in __host__ __device__ function}}			// expected-error@-5 {{reference to __host__ function 'h' in __host__ __device__ function}}
	// expected-error@-5 {{reference to __host__ function 'h' in __host__ __device__ function}}			// expected-error@-5 {{reference to __host__ function 'h' in __host__ __device__ function}}
	// expected-error@-5 {{reference to __host__ function 'ch' in __host__ __device__ function}}			// devdefer-error@-5 {{reference to __host__ function 'ch' in __host__ __device__ function}}
	// expected-error@-5 {{reference to __host__ function 'ch' in __host__ __device__ function}}			// expected-error@-5 {{reference to __host__ function 'ch' in __host__ __device__ function}}
	#endif			#endif

	CurrentFnPtr fp_dh = dh;			CurrentFnPtr fp_dh = dh;
	CurrentReturnTy ret_dh = dh();			CurrentReturnTy ret_dh = dh();
	CurrentFnPtr fp_cdh = cdh;			CurrentFnPtr fp_cdh = cdh;
	CurrentReturnTy ret_cdh = cdh();			CurrentReturnTy ret_cdh = cdh();

	▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	__device__ void test_device_calls_template_fn() {			__device__ void test_device_calls_template_fn() {
	DeviceReturnTy ret1 = template_vs_function(1.0f);			DeviceReturnTy ret1 = template_vs_function(1.0f);
	DeviceReturnTy ret2 = template_vs_function(2.0);			DeviceReturnTy ret2 = template_vs_function(2.0);
	}			}

	// If we have a mix of HD and H-only or D-only candidates in the overload set,			// If we have a mix of HD and H-only or D-only candidates in the overload set,
	// normal C++ overload resolution rules apply first.			// normal C++ overload resolution rules apply first.
	template <typename T> TemplateReturnTy template_vs_hd_function(T arg)			template <typename T> TemplateReturnTy template_vs_hd_function(T arg)
	#ifdef __CUDA_ARCH__			// devnodeferonly-note@-1{{'template_vs_hd_function<int>' declared here}}
	//expected-note@-2 {{declared here}}
	#endif
	{			{
	return TemplateReturnTy();			return TemplateReturnTy();
	}			}
	__host__ __device__ HostDeviceReturnTy template_vs_hd_function(float arg) {			__host__ __device__ HostDeviceReturnTy template_vs_hd_function(float arg) {
	return HostDeviceReturnTy();			return HostDeviceReturnTy();
	}			}

	__host__ __device__ void test_host_device_calls_hd_template() {			__host__ __device__ void test_host_device_calls_hd_template() {
	HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);			#if __CUDA_ARCH__ && DEFER
	TemplateReturnTy ret2 = template_vs_hd_function(1);			typedef HostDeviceReturnTy ExpectedReturnTy;
	#ifdef __CUDA_ARCH__			#else
	// expected-error@-2 {{reference to __host__ function 'template_vs_hd_function<int>' in __host__ __device__ function}}			typedef TemplateReturnTy ExpectedReturnTy;
	#endif			#endif
				HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
				ExpectedReturnTy ret2 = template_vs_hd_function(1);
				// devnodeferonly-error@-1{{reference to __host__ function 'template_vs_hd_function<int>' in __host__ __device__ function}}
	}			}

	__host__ void test_host_calls_hd_template() {			__host__ void test_host_calls_hd_template() {
	HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);			HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
	TemplateReturnTy ret2 = template_vs_hd_function(1);			TemplateReturnTy ret2 = template_vs_hd_function(1);
	}			}

	__device__ void test_device_calls_hd_template() {			__device__ void test_device_calls_hd_template() {
	HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);			HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
	// Host-only function template is not callable with strict call checks,			// Host-only function template is not callable with strict call checks,
	// so for device side HD function will be the only choice.			// so for device side HD function will be the only choice.
	HostDeviceReturnTy ret2 = template_vs_hd_function(1);			HostDeviceReturnTy ret2 = template_vs_hd_function(1);
	}			}

	// Check that overloads still work the same way on both host and			// Check that overloads still work the same way on both host and
	// device side when the overload set contains only functions from one			// device side when the overload set contains only functions from one
	// side of compilation.			// side of compilation.
	__device__ DeviceReturnTy device_only_function(int arg) { return DeviceReturnTy(); }			__device__ DeviceReturnTy device_only_function(int arg) { return DeviceReturnTy(); }
	__device__ DeviceReturnTy2 device_only_function(float arg) { return DeviceReturnTy2(); }			__device__ DeviceReturnTy2 device_only_function(float arg) { return DeviceReturnTy2(); }
	#ifndef __CUDA_ARCH__			#ifndef __CUDA_ARCH__
	// expected-note@-3 {{'device_only_function' declared here}}			// expected-note@-3 2{{'device_only_function' declared here}}
	// expected-note@-3 {{'device_only_function' declared here}}			// expected-note@-3 2{{'device_only_function' declared here}}
	#endif			#endif
	__host__ HostReturnTy host_only_function(int arg) { return HostReturnTy(); }			__host__ HostReturnTy host_only_function(int arg) { return HostReturnTy(); }
	__host__ HostReturnTy2 host_only_function(float arg) { return HostReturnTy2(); }			__host__ HostReturnTy2 host_only_function(float arg) { return HostReturnTy2(); }
	#ifdef __CUDA_ARCH__			#ifdef __CUDA_ARCH__
	// expected-note@-3 {{'host_only_function' declared here}}			// expected-note@-3 2{{'host_only_function' declared here}}
	// expected-note@-3 {{'host_only_function' declared here}}			// expected-note@-3 2{{'host_only_function' declared here}}
	#endif			#endif

	__host__ __device__ void test_host_device_single_side_overloading() {			__host__ __device__ void test_host_device_single_side_overloading() {
	DeviceReturnTy ret1 = device_only_function(1);			DeviceReturnTy ret1 = device_only_function(1);
	DeviceReturnTy2 ret2 = device_only_function(1.0f);			DeviceReturnTy2 ret2 = device_only_function(1.0f);
	#ifndef __CUDA_ARCH__			#ifndef __CUDA_ARCH__
	// expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}			// expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
	// expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}			// expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
	#endif			#endif
	HostReturnTy ret3 = host_only_function(1);			HostReturnTy ret3 = host_only_function(1);
	HostReturnTy2 ret4 = host_only_function(1.0f);			HostReturnTy2 ret4 = host_only_function(1.0f);
	#ifdef __CUDA_ARCH__			#ifdef __CUDA_ARCH__
	// expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}			// expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
	// expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}			// expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
	#endif			#endif
	}			}

				// wrong-sided overloading should not cause diagnostic unless it is emitted.
				// This inline function is not emitted.
				inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_no_diag() {
				DeviceReturnTy ret1 = device_only_function(1);
				DeviceReturnTy2 ret2 = device_only_function(1.0f);
				HostReturnTy ret3 = host_only_function(1);
				HostReturnTy2 ret4 = host_only_function(1.0f);
				}

				// wrong-sided overloading should cause diagnostic if it is emitted.
				// This inline function is emitted since it is called by an emitted function.
				inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_diag() {
				DeviceReturnTy ret1 = device_only_function(1);
				DeviceReturnTy2 ret2 = device_only_function(1.0f);
				#ifndef __CUDA_ARCH__
				// expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
				// expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
				#endif
				HostReturnTy ret3 = host_only_function(1);
				HostReturnTy2 ret4 = host_only_function(1.0f);
				#ifdef __CUDA_ARCH__
				// expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
				// expected-error@-3 {{reference to __host__ function 'host_only_function' in __host__ __device__ function}}
				#endif
				}

				__host__ __device__ void test_host_device_wrong_side_overloading_inline_diag_caller() {
				test_host_device_wrong_side_overloading_inline_diag();
				// expected-note@-1 {{called by 'test_host_device_wrong_side_overloading_inline_diag_caller'}}
				}

	// Verify that we allow overloading function templates.			// Verify that we allow overloading function templates.
	template <typename T> __host__ T template_overload(const T &a) { return a; };			template <typename T> __host__ T template_overload(const T &a) { return a; };
	template <typename T> __device__ T template_overload(const T &a) { return a; };			template <typename T> __device__ T template_overload(const T &a) { return a; };

	__host__ void test_host_template_overload() {			__host__ void test_host_template_overload() {
	template_overload(1); // OK. Attribute-based overloading picks __host__ variant.			template_overload(1); // OK. Attribute-based overloading picks __host__ variant.
	}			}
	__device__ void test_device_template_overload() {			__device__ void test_device_template_overload() {
	Show All 11 Lines
	__host__ __device__ int constexpr_overload(const T &x, const T &y) {			__host__ __device__ int constexpr_overload(const T &x, const T &y) {
	return x - y;			return x - y;
	}			}

	// Verify that function overloading doesn't prune candidate wrongly.			// Verify that function overloading doesn't prune candidate wrongly.
	int test_constexpr_overload(C2 &x, C2 &y) {			int test_constexpr_overload(C2 &x, C2 &y) {
	return constexpr_overload(x, y);			return constexpr_overload(x, y);
	}			}

				// Verify no ambiguity for new operator.
				void *a = new int;
				__device__ void *b = new int;
				// expected-error@-1{{dynamic initialization is not supported for __device__, __constant__, and __shared__ variables.}}

				// Verify no ambiguity for new operator.
				template<typename _Tp> _Tp&& f();
				traUnsubmitted Done Reply Inline Actions `__device__`, etc. are defined by the included "Inputs/cuda.h" and can be used here to make it more readable. tra: `__device__`, etc. are defined by the included "Inputs/cuda.h" and can be used here to make it…
				template<typename _Tp, typename = decltype(new _Tp(f<_Tp>()))>
				void __test();

				void foo() {
				__test<int>();
				}

				// Test resolving implicit host device candidate vs wrong-sided candidate.
				// In device compilation, implicit host device caller choose implicit host
				// device candidate and wrong-sided candidate with equal preference.
				// Resolution result should not change with/without pragma.
				namespace ImplicitHostDeviceVsWrongSided {
				HostReturnTy callee(double x);
				#pragma clang force_cuda_host_device begin
				HostDeviceReturnTy callee(int x);
				inline HostReturnTy implicit_hd_caller() {
				return callee(1.0);
				}
				#pragma clang force_cuda_host_device end
				}

				// Test resolving implicit host device candidate vs same-sided candidate.
				// In host compilation, implicit host device caller choose implicit host
				// device candidate and same-sided candidate with equal preference.
				// Resolution result should not change with/without pragma.
				namespace ImplicitHostDeviceVsSameSide {
				HostReturnTy callee(int x);
				#pragma clang force_cuda_host_device begin
				HostDeviceReturnTy callee(double x);
				inline HostDeviceReturnTy implicit_hd_caller() {
				return callee(1.0);
				}
				#pragma clang force_cuda_host_device end
				}

				// Test resolving explicit host device candidate vs. wrong-sided candidate.
				// When -fgpu-defer-diag is off, wrong-sided candidate is not excluded, therefore
				// the first callee is chosen.
				// When -fgpu-defer-diag is on, wrong-sided candidate is excluded, therefore
				// the second callee is chosen.
				namespace ExplicitHostDeviceVsWrongSided {
				HostReturnTy callee(double x);
				__host__ __device__ HostDeviceReturnTy callee(int x);
				#if __CUDA_ARCH__ && DEFER
				typedef HostDeviceReturnTy ExpectedRetTy;
				#else
				typedef HostReturnTy ExpectedRetTy;
				#endif
				inline __host__ __device__ ExpectedRetTy explicit_hd_caller() {
				return callee(1.0);
				}
				}

				// In the implicit host device function 'caller', the second 'callee' should be
				// chosen since it has better match, even though it is an implicit host device
				// function whereas the first 'callee' is a host function. A diagnostic will be
				// emitted if the first 'callee' is chosen since deduced return type cannot be
				// used before it is defined.
				namespace ImplicitHostDeviceByConstExpr {
				template <class a> a b;
				auto callee(...);
				template <class d> constexpr auto callee(d) -> decltype(0);
				struct e {
				template <class ad, class... f> static auto g(ad, f...) {
				return h<e, decltype(b<f>)...>;
				}
				struct i {
				template <class, class... f> static constexpr auto caller(f... k) {
				return callee(k...);
				}
				};
				template <class, class... f> static auto h() {
				return i::caller<int, f...>;
				}
				};
				class l {
				l() {
				e::g([] {}, this);
				}
				};
				}

				// Implicit HD candidate competes with device candidate.
				// a and b have implicit HD copy ctor. In copy ctor of b, ctor of a is resolved.
				// copy ctor of a should win over a(short), otherwise there will be ambiguity
				// due to conversion operator.
				namespace TestImplicitHDWithD {
				struct a {
				__device__ a(short);
				__device__ operator unsigned() const;
				__device__ operator int() const;
				};
				struct b {
				a d;
				};
				void f(b g) { b e = g; }
				}

				// Implicit HD candidate competes with host candidate.
				// a and b have implicit HD copy ctor. In copy ctor of b, ctor of a is resolved.
				// copy ctor of a should win over a(short), otherwise there will be ambiguity
				// due to conversion operator.
				namespace TestImplicitHDWithH {
				struct a {
				a(short);
				__device__ operator unsigned() const;
				__device__ operator int() const;
				};
				struct b {
				a d;
				};
				void f(b g) { b e = g; }
				}

				// Implicit HD candidate competes with HD candidate.
				// a and b have implicit HD copy ctor. In copy ctor of b, ctor of a is resolved.
				// copy ctor of a should win over a(short), otherwise there will be ambiguity
				// due to conversion operator.
				namespace TestImplicitHDWithHD {
				struct a {
				__host__ __device__ a(short);
				__device__ operator unsigned() const;
				__device__ operator int() const;
				};
				struct b {
				a d;
				};
				void f(b g) { b e = g; }
				}

				// HD candidate competes with H candidate.
				// HD has type mismatch whereas H has type match.
				// In device compilation, H wins when -fgpu-defer-diag is off and HD wins
				// when -fgpu-defer-diags is on. In both cases the diagnostic should be
				// deferred.
				namespace TestDeferNoMatchingFunc {
				template <typename> struct a {};
				namespace b {
				struct c : a<int> {};
				template <typename d> void ag(d);
				} // namespace b
				template <typename ae>
				__attribute__((host)) __attribute__((device))
				void ag(a<ae>) {
				ae e;
				ag(e);
				}
				void f() { (void)ag<b::c>; }
				}

				// Two HD candidates competes with H candidate.
				traUnsubmitted Done Reply Inline Actions competes->compete tra: competes->compete
				// HDs have type mismatch whereas H has type match.
				// In device compilation, H wins when -fgpu-defer-diag is off and two HD win
				traUnsubmitted Done Reply Inline Actions One thing that bothers me about this comment is that `-fgpu-defer-diag` apparently changes the result of the overload resolution, not just deferring diags. tra: One thing that bothers me about this comment is that `-fgpu-defer-diag` apparently changes the…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions without -fgpu-defer-diag we have to keep the old incorrect overloading resolution since otherwise it breaks existing code. We can only have correct overloading resolution with -fgpu-defer-diag on. If we want to have correct overloading resolution, not depending on whether -fgpu-defer-diag is on or off, we have to turn on -fgpu-defer-diag by default. In this case no existing code will be broken. yaxunl: without -fgpu-defer-diag we have to keep the old incorrect overloading resolution since…
				traUnsubmitted Not Done Reply Inline Actions We can only have correct overloading resolution with -fgpu-defer-diag on. `-fgpu-defer-diags` is a prerequisite for fixing overload resolution. I'm fine with that. Making it serve the double duty of affecting the overload resolution is what I was pointing at. We should have a knob `fix-overload-resolution` which would then turn `-fgpu-defer-diag` on, not the other way around. tra: > We can only have correct overloading resolution with -fgpu-defer-diag on. `-fgpu-defer…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions That makes sense. Will add -ffix-overload-resolution. yaxunl: That makes sense. Will add -ffix-overload-resolution.
				// when -fgpu-defer-diags is on. In both cases the diagnostic should be
				// deferred.
				traUnsubmitted Done Reply Inline Actions It would be great to have a test where those diagnostics do fire. tra: It would be great to have a test where those diagnostics do fire.
				namespace TestDeferAmbiguity {
				template <typename> struct a {};
				namespace b {
				struct c : a<int> {};
				template <typename d> void ag(d, int);
				} // namespace b
				template <typename ae>
				__attribute__((host)) __attribute__((device))
				void ag(a<ae>, float) {
				ae e;
				ag(e, 1);
				}
				template <typename ae>
				__attribute__((host)) __attribute__((device))
				void ag(a<ae>, double) {
				}
				void f() {
				b::c x;
				ag(x, 1);
				}
				}