This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Sema/
-
Sema/
2/2
SemaOverload.cpp
-
test/SemaCUDA/
-
SemaCUDA/
-
template-arg-deduction.cu

Differential D154300

[CUDA][HIP] Fix template argument deduction
ClosedPublic

Authored by yaxunl on Jul 2 2023, 6:44 AM.

Download Raw Diff

Details

Reviewers

tra
rsmith

Commits

rGea72a4e6547f: [CUDA][HIP] Fix template argument deduction

Summary

nvcc allows using std::malloc and std::free in device code.
When std::malloc or std::free is passed as a template
function argument with template argument deduction,
there is no diagnostics. e.g.

#include <memory>

__global__ void kern() {
    void *p = std::malloc(1);
    std::free(p);
}
int main()
{

    std::shared_ptr<float> a;
    a = std::shared_ptr<float>(
      (float*)std::malloc(sizeof(float) * 100),
      std::free
    );
    return 0;
}

However, the same code fails to compile with clang
(https://godbolt.org/z/1roGvo6YY). The reason is
that clang does not have logic to choose a function
argument from an overloaded set of candidates
based on host/device attributes for template argument
deduction.

Currently, clang does have a logic to choose a candidate
based on the constraints of the candidates. This patch
extends that logic to account for the CUDA host/device-based
preference.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yaxunl created this revision.Jul 2 2023, 6:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 6:44 AM

Herald added subscribers: mattd, carlosgalvezp. · View Herald Transcript

yaxunl requested review of this revision.Jul 2 2023, 6:44 AM

Harbormaster completed remote builds in B242680: Diff 536586.Jul 2 2023, 7:40 AM

tra added inline comments.Jul 11 2023, 1:49 PM

clang/lib/Sema/SemaOverload.cpp
12758–12764	Maybe `CheckCUDAPreference` should return -1/0/1 or an enum. std::optional does not seem to be very readable here. E.g. `if(MorePreferableByCUDA)` sounds like it's going to be satisfied when FD is a better choice than Result, but it's not the case. I think this would be easier to follow: if (CheckCUDAPreference(FD, Result) <= 0) // or `!= CP_BETTER` continue;

yaxunl marked an inline comment as done.Aug 7 2023, 12:06 PM

yaxunl added inline comments.

clang/lib/Sema/SemaOverload.cpp
12758–12764	will use an integer for that.

revised by comments

tra accepted this revision.Aug 7 2023, 12:39 PM

This revision is now accepted and ready to land.Aug 7 2023, 12:39 PM

Harbormaster completed remote builds in B250882: Diff 547901.Aug 7 2023, 4:54 PM

This revision was landed with ongoing or failed builds.Aug 8 2023, 2:40 PM

Closed by commit rGea72a4e6547f: [CUDA][HIP] Fix template argument deduction (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rGea72a4e6547f: [CUDA][HIP] Fix template argument deduction.

Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2023, 2:40 PM

Revision Contents

Path

Size

clang/

lib/

Sema/

SemaOverload.cpp

41 lines

test/

SemaCUDA/

template-arg-deduction.cu

27 lines

Diff 548359

clang/lib/Sema/SemaOverload.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,749 Lines • ▼ Show 20 Lines	Sema::ResolveAddressOfOverloadedFunction(Expr *AddressOfExpr,
}		}

if (pHadMultipleCandidates)		if (pHadMultipleCandidates)
*pHadMultipleCandidates = Resolver.hadMultipleCandidates();		*pHadMultipleCandidates = Resolver.hadMultipleCandidates();
return Fn;		return Fn;
}		}

/// Given an expression that refers to an overloaded function, try to		/// Given an expression that refers to an overloaded function, try to
/// resolve that function to a single function that can have its address taken.		/// resolve that function to a single function that can have its address taken.
/// This will modify `Pair` iff it returns non-null.		/// This will modify `Pair` iff it returns non-null.
///		///
/// This routine can only succeed if from all of the candidates in the overload		/// This routine can only succeed if from all of the candidates in the overload
/// set for SrcExpr that can have their addresses taken, there is one candidate		/// set for SrcExpr that can have their addresses taken, there is one candidate
/// that is more constrained than the rest.		/// that is more constrained than the rest.
FunctionDecl *		FunctionDecl *
		traUnsubmitted Done Reply Inline Actions Maybe `CheckCUDAPreference` should return -1/0/1 or an enum. std::optional does not seem to be very readable here. E.g. `if(MorePreferableByCUDA)` sounds like it's going to be satisfied when FD is a better choice than Result, but it's not the case. I think this would be easier to follow: if (CheckCUDAPreference(FD, Result) <= 0) // or `!= CP_BETTER` continue; tra: Maybe `CheckCUDAPreference` should return -1/0/1 or an enum. std::optional does not seem to be…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will use an integer for that. yaxunl: will use an integer for that.
Sema::resolveAddressOfSingleOverloadCandidate(Expr *E, DeclAccessPair &Pair) {		Sema::resolveAddressOfSingleOverloadCandidate(Expr *E, DeclAccessPair &Pair) {
OverloadExpr::FindResult R = OverloadExpr::find(E);		OverloadExpr::FindResult R = OverloadExpr::find(E);
OverloadExpr *Ovl = R.Expression;		OverloadExpr *Ovl = R.Expression;
bool IsResultAmbiguous = false;		bool IsResultAmbiguous = false;
FunctionDecl *Result = nullptr;		FunctionDecl *Result = nullptr;
DeclAccessPair DAP;		DeclAccessPair DAP;
SmallVector<FunctionDecl *, 2> AmbiguousDecls;		SmallVector<FunctionDecl *, 2> AmbiguousDecls;

		// Return positive for better, negative for worse, 0 for equal preference.
		auto CheckCUDAPreference = [&](FunctionDecl FD1, FunctionDecl FD2) {
		FunctionDecl Caller = getCurFunctionDecl(/AllowLambda=*/true);
		return static_cast<int>(IdentifyCUDAPreference(Caller, FD1)) -
		static_cast<int>(IdentifyCUDAPreference(Caller, FD2));
		};

auto CheckMoreConstrained = [&](FunctionDecl *FD1,		auto CheckMoreConstrained = [&](FunctionDecl *FD1,
FunctionDecl *FD2) -> std::optional<bool> {		FunctionDecl *FD2) -> std::optional<bool> {
if (FunctionDecl *MF = FD1->getInstantiatedFromMemberFunction())		if (FunctionDecl *MF = FD1->getInstantiatedFromMemberFunction())
FD1 = MF;		FD1 = MF;
if (FunctionDecl *MF = FD2->getInstantiatedFromMemberFunction())		if (FunctionDecl *MF = FD2->getInstantiatedFromMemberFunction())
FD2 = MF;		FD2 = MF;
SmallVector<const Expr *, 1> AC1, AC2;		SmallVector<const Expr *, 1> AC1, AC2;
FD1->getAssociatedConstraints(AC1);		FD1->getAssociatedConstraints(AC1);
Show All 14 Lines	Sema::resolveAddressOfSingleOverloadCandidate(Expr *E, DeclAccessPair &Pair) {
for (auto I = Ovl->decls_begin(), E = Ovl->decls_end(); I != E; ++I) {		for (auto I = Ovl->decls_begin(), E = Ovl->decls_end(); I != E; ++I) {
auto *FD = dyn_cast<FunctionDecl>(I->getUnderlyingDecl());		auto *FD = dyn_cast<FunctionDecl>(I->getUnderlyingDecl());
if (!FD)		if (!FD)
return nullptr;		return nullptr;

if (!checkAddressOfFunctionIsAvailable(FD))		if (!checkAddressOfFunctionIsAvailable(FD))
continue;		continue;

		// If we found a better result, update Result.
		auto FoundBetter = [&]() {
		IsResultAmbiguous = false;
		DAP = I.getPair();
		Result = FD;
		};

// We have more than one result - see if it is more constrained than the		// We have more than one result - see if it is more constrained than the
// previous one.		// previous one.
if (Result) {		if (Result) {
		// Check CUDA preference first. If the candidates have differennt CUDA
		// preference, choose the one with higher CUDA preference. Otherwise,
		// choose the one with more constraints.
		if (getLangOpts().CUDA) {
		int PreferenceByCUDA = CheckCUDAPreference(FD, Result);
		// FD has different preference than Result.
		if (PreferenceByCUDA != 0) {
		// FD is more preferable than Result.
		if (PreferenceByCUDA > 0)
		FoundBetter();
		continue;
		}
		}
		// FD has the same CUDA prefernece than Result. Continue check
		// constraints.
std::optional<bool> MoreConstrainedThanPrevious =		std::optional<bool> MoreConstrainedThanPrevious =
CheckMoreConstrained(FD, Result);		CheckMoreConstrained(FD, Result);
if (!MoreConstrainedThanPrevious) {		if (!MoreConstrainedThanPrevious) {
IsResultAmbiguous = true;		IsResultAmbiguous = true;
AmbiguousDecls.push_back(FD);		AmbiguousDecls.push_back(FD);
continue;		continue;
}		}
if (!*MoreConstrainedThanPrevious)		if (!*MoreConstrainedThanPrevious)
continue;		continue;
// FD is more constrained - replace Result with it.		// FD is more constrained - replace Result with it.
}		}
IsResultAmbiguous = false;		FoundBetter();
DAP = I.getPair();
Result = FD;
}		}

if (IsResultAmbiguous)		if (IsResultAmbiguous)
return nullptr;		return nullptr;

if (Result) {		if (Result) {
SmallVector<const Expr *, 1> ResultAC;		SmallVector<const Expr *, 1> ResultAC;
// We skipped over some ambiguous declarations which might be ambiguous with		// We skipped over some ambiguous declarations which might be ambiguous with
// the selected result.		// the selected result.
for (FunctionDecl *Skipped : AmbiguousDecls)		for (FunctionDecl *Skipped : AmbiguousDecls) {
		// If skipped candidate has different CUDA preference than the result,
		// there is no ambiguity. Otherwise check whether they have different
		// constraints.
		if (getLangOpts().CUDA && CheckCUDAPreference(Skipped, Result) != 0)
		continue;
if (!CheckMoreConstrained(Skipped, Result))		if (!CheckMoreConstrained(Skipped, Result))
return nullptr;		return nullptr;
		}
Pair = DAP;		Pair = DAP;
}		}
return Result;		return Result;
}		}

/// Given an overloaded function, tries to turn it into a non-overloaded		/// Given an overloaded function, tries to turn it into a non-overloaded
/// function reference using resolveAddressOfSingleOverloadCandidate. This		/// function reference using resolveAddressOfSingleOverloadCandidate. This
/// will perform access checks, diagnose the use of the resultant decl, and, if		/// will perform access checks, diagnose the use of the resultant decl, and, if
▲ Show 20 Lines • Show All 2,887 Lines • Show Last 20 Lines

clang/test/SemaCUDA/template-arg-deduction.cu

This file was added.

				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
				// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s

				// expected-no-diagnostics

				#include "Inputs/cuda.h"

				void foo();
				__device__ void foo();

				template<class F>
				void host_temp(F f);

				template<class F>
				__device__ void device_temp(F f);

				void host_caller() {
				host_temp(foo);
				}

				__global__ void kernel_caller() {
				device_temp(foo);
				}

				__device__ void device_caller() {
				device_temp(foo);
				}