Download Raw Diff

Details

Reviewers

Commits

rGacb6f80d96b7: [CUDA][HIP] Fix overloading resolution
rG263390d4f5f2: [CUDA][HIP] Fix implicit HD function resolution

Summary

Add option -ffix-overload-resolution, which is off by default.

When -ffix-overload-resolution is off, keep the original behavior.
Otherwise enable the correct hostness based overloading resolution.

Diff Detail

Event Timeline

yaxunl created this revision.May 22 2020, 10:33 AM

Is this patch supposed to be used with D79526 or instead of it?

clang/test/SemaCUDA/function-overload.cu
551	`__device__`, etc. are defined by the included "Inputs/cuda.h" and can be used here to make it more readable.

Fix test.

ping

In D80450#2055463, @tra wrote:

Is this patch supposed to be used with D79526 or instead of it?

^^^ I don't think this has been answered. I would like to test this change before it lands.

In D80450#2071696, @tra wrote:

In D80450#2055463, @tra wrote:

Is this patch supposed to be used with D79526 or instead of it?

^^^ I don't think this has been answered. I would like to test this change before it lands.

sorry I missed that. Yes this patch is used on top of D79526.

LGTM. Combined with D79526 it appears to work for tensorflow build.

This revision is now accepted and ready to land.Jun 3 2020, 12:45 PM

Closed by commit rG263390d4f5f2: [CUDA][HIP] Fix implicit HD function resolution (authored by yaxunl). · Explain WhyJun 4 2020, 2:24 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2020, 2:24 PM

MaskRay added a reverting change: rGdfc0d9475556: Revert D80450 "[CUDA][HIP] Fix implicit HD function resolution".Jun 10 2020, 5:48 PM

MaskRay mentioned this in rGb3d10920e134: Restore part of D80450 [CUDA][HIP] Fix implicit HD function resolution.Jun 10 2020, 10:39 PM

Reproducer for the regression. https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575
It's not particularly small, but that's as far as I could get it reduced.

With the patch, an attempt to instantiate ag on line 36 (in the reproducer sources I linked to above) results in ambiguity between two templates on lines 33 and 24 that are in different namespaces.
Previously it picked the template on line 28.

In D80450#2087938, @tra wrote:

Reproducer for the regression. https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575
It's not particularly small, but that's as far as I could get it reduced.

With the patch, an attempt to instantiate ag on line 36 (in the reproducer sources I linked to above) results in ambiguity between two templates on lines 33 and 24 that are in different namespaces.
Previously it picked the template on line 28.

Managed to simplify the reproducer down to this which now reports that a host candidate has been ignored. This may explain why we ended up with the ambiguity when other overloads were present.

template <typename> struct a {};
namespace b {
struct c : a<int> {};
template <typename d> void ag(d);
} // namespace b
template <typename ae>
__attribute__((host)) __attribute__((device)) int ag(a<ae>) {
  ae e;
  ag(e);
}
void f() { ag<b::c>; }

In D80450#2088129, @tra wrote:
In D80450#2087938, @tra wrote:

Reproducer for the regression. https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575
It's not particularly small, but that's as far as I could get it reduced.

With the patch, an attempt to instantiate ag on line 36 (in the reproducer sources I linked to above) results in ambiguity between two templates on lines 33 and 24 that are in different namespaces.
Previously it picked the template on line 28.

Managed to simplify the reproducer down to this which now reports that a host candidate has been ignored. This may explain why we ended up with the ambiguity when other overloads were present.
template <typename> struct a {};
namespace b {
struct c : a<int> {};
template <typename d> void ag(d);
} // namespace b
template <typename ae>
__attribute__((host)) __attribute__((device)) int ag(a<ae>) {
  ae e;
  ag(e);
}
void f() { ag<b::c>; }

The error only happens in device compilation.

For the call ag(e). There are two candidates:

ag in namespace b. The function arguments can match. However it is a host function, therefore is a wrong-sided candidate and not viable.

ag in default name space. It is a host device function. However the function arguments requires a<ae>, therefore cannot match.

Before my patch, wrong-sided candidate is allowed. clang resolves to candidate 1 and this results in a diagnostic about host function referenced in device host function, which can be deferred. Since f() is not emitted on device side, the deferred diags is not emitted.

After my patch, wrong-sided candidate is not allowed. clang resolves to candidate 2, which results in a diagnostic that no matching function, which is not a deferred diagnostics by default and is emitted even if f() is not emitted on device side.

In a sense, the diagnostic is correct, since ag(a<ae>) cannot be emitted on device side. This can be fixed by either make ag(a<ae>) a host function or make ag(d) a host device function.

In the original testcase (https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575)

Before my change, call at line 36 resolves to wrong-sided candidate at line 29 since that is the best match for argument types. This results in a deferred diag which allows device compilation to pass.

After my change, call at line 36 resolves to two host device candidates. This results in diagnostics about ambiguity which is not deferred by default. Therefore the compilation fails.

Basically it all boils down to the issue that overloading resolution diagnostics are not deferred by default.

I think first of all we need to exclude wrong-sided candidates as this patch does, otherwise we cannot have correct hostness based overloading resolution and fix bugs like https://bugs.llvm.org/show_bug.cgi?id=46922 .

However by doing this we changes the existing overloading resolution incur some overloading resolution diags. Unless we defer these diags, we may break some existing CUDA/HIP code.

Fortunately https://reviews.llvm.org/D84364 is already landed, which allows deferring overloading resolution diags under option -fgpu-defer-diags.

I think a conservative solution is that we keep the old overloading resolution behavior by default (i.e. do not exclude wrong-sided candidates), whereas enable the correct hostness based overloading resolution when -fgpu-defer-diags is on. If developers would like correct hostness based overloading resolution, they can use -fgpu-defer-diags. Then as -fgpu-defer-diags become stable, we turn it on by default.

reopen for fixing the regression

This revision is now accepted and ready to land.Nov 26 2020, 8:43 AM

If -fgpu-defer-diags is off, keep original behavior.

In D80450#2418775, @yaxunl wrote:

For the call ag(e). There are two candidates:

ag in namespace b. The function arguments can match. However it is a host function, therefore is a wrong-sided candidate and not viable.

ag in default name space. It is a host device function. However the function arguments requires a<ae>, therefore cannot match.

Before my patch, wrong-sided candidate is allowed. clang resolves to candidate 1 and this results in a diagnostic about host function referenced in device host function, which can be deferred. Since f() is not emitted on device side, the deferred diags is not emitted.

This used to be a fairly common pattern in existing CUDA code. A lot of templated code had __host__ __device__ slapped on it because NVCC had no target overloading and it's hard to control who/where/how will instantiate particular template. Some of those templates could only be instantiated on one side of the compilation. Clang's only choice was to allow the wrong-side candidates and/or defer the diagnostics. I vaguely recall that it was one of the trickier bits of the overload resolution rules to handle.

After my patch, wrong-sided candidate is not allowed. clang resolves to candidate 2, which results in a diagnostic that no matching function, which is not a deferred diagnostics by default and is emitted even if f() is not emitted on device side.
...
Basically it all boils down to the issue that overloading resolution diagnostics are not deferred by default.

Looks that way.

I think a conservative solution is that we keep the old overloading resolution behavior by default (i.e. do not exclude wrong-sided candidates), whereas enable the correct hostness based overloading resolution when -fgpu-defer-diags is on. If developers would like correct hostness based overloading resolution, they can use -fgpu-defer-diags. Then as -fgpu-defer-diags become stable, we turn it on by default.

SGTM. I'll check how the patch fares on our CUDA code.

clang/test/SemaCUDA/function-overload.cu
617	competes->compete

tra added inline comments.Nov 30 2020, 12:05 PM

clang/test/SemaCUDA/function-overload.cu
619	One thing that bothers me about this comment is that `-fgpu-defer-diag` apparently changes the result of the overload resolution, not just deferring diags.
620–621	It would be great to have a test where those diagnostics do fire.

In D80450#2423706, @tra wrote:

SGTM. I'll check how the patch fares on our CUDA code.

Please hold on. I just found a regression due to old behavior not fully recovered in certain case. I will update the patch for fixing the regression.

yaxunl marked an inline comment as done.Nov 30 2020, 12:35 PM

yaxunl added inline comments.

clang/test/SemaCUDA/function-overload.cu
619	without -fgpu-defer-diag we have to keep the old incorrect overloading resolution since otherwise it breaks existing code. We can only have correct overloading resolution with -fgpu-defer-diag on. If we want to have correct overloading resolution, not depending on whether -fgpu-defer-diag is on or off, we have to turn on -fgpu-defer-diag by default. In this case no existing code will be broken.

tra added inline comments.Nov 30 2020, 1:04 PM

clang/test/SemaCUDA/function-overload.cu
619	We can only have correct overloading resolution with -fgpu-defer-diag on. `-fgpu-defer-diags` is a prerequisite for fixing overload resolution. I'm fine with that. Making it serve the double duty of affecting the overload resolution is what I was pointing at. We should have a knob `fix-overload-resolution` which would then turn `-fgpu-defer-diag` on, not the other way around.

yaxunl marked 2 inline comments as done.Nov 30 2020, 1:17 PM

yaxunl added inline comments.

clang/test/SemaCUDA/function-overload.cu
619	That makes sense. Will add -ffix-overload-resolution.

Add -ffix-overload-resolution and fix a regression.

Herald added subscribers: dexonsmith, dang. · View Herald TranscriptNov 30 2020, 7:11 PM

LGTM.

I'd suggest adding more details on the background of this change to the commit log (point to the comment in the isBetterOverloadCandidate ?) and outline the intention to enable the new way to do overloading after some soak time.

Also, naming. -ffix-overload-resolution is rather non-specific. I didn't mean to use it literally. The problem is that I can't think of a good descriptive name for what we do here. -fgpu-fix-wrong-side-overloads ? Something else?

clang/lib/Sema/SemaOverload.cpp
9490	The comment uses device/host for both function attributes and when it refers to the compilation phase. It would help to make it more readable if function attributes would be distinct from compilation phase. E.g. by using `__host__ __device__` or `HD`.

In D80450#2426507, @tra wrote:

LGTM.

I'd suggest adding more details on the background of this change to the commit log (point to the comment in the isBetterOverloadCandidate ?) and outline the intention to enable the new way to do overloading after some soak time.

Will do.

Also, naming. -ffix-overload-resolution is rather non-specific. I didn't mean to use it literally. The problem is that I can't think of a good descriptive name for what we do here. -fgpu-fix-wrong-side-overloads ? Something else?

How about -fgpu-exclude-wrong-side-overloads? Since what this patch does is always excluding wrong side overloads whereas previously only excluding wrong side overloads if there are same side overloads.

clang/lib/Sema/SemaOverload.cpp
9490	will use H/D/HD for function attribute when committing.

In D80450#2428631, @yaxunl wrote:

Also, naming. -ffix-overload-resolution is rather non-specific. I didn't mean to use it literally. The problem is that I can't think of a good descriptive name for what we do here. -fgpu-fix-wrong-side-overloads ? Something else?

How about -fgpu-exclude-wrong-side-overloads? Since what this patch does is always excluding wrong side overloads whereas previously only excluding wrong side overloads if there are same side overloads.

SGTM. Maybe, also make it hidden. I don't think it's useful for the end users.

Closed by commit rGacb6f80d96b7: [CUDA][HIP] Fix overloading resolution (authored by yaxunl). · Explain WhyDec 2 2020, 1:34 PM

This revision was automatically updated to reflect the committed changes.

yaxunl marked an inline comment as done.

yaxunl added a commit: rGacb6f80d96b7: [CUDA][HIP] Fix overloading resolution.

Diff 265767

clang/lib/Sema/SemaOverload.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,481 Lines • ▼ Show 20 Lines	bool clang::isBetterOverloadCandidate(
// functions.		// functions.
if (!Cand2.Viable)		if (!Cand2.Viable)
return Cand1.Viable;		return Cand1.Viable;
else if (!Cand1.Viable)		else if (!Cand1.Viable)
return false;		return false;

// [CUDA] A function with 'never' preference is marked not viable, therefore		// [CUDA] A function with 'never' preference is marked not viable, therefore
// is never shown up here. The worst preference shown up here is 'wrong side',		// is never shown up here. The worst preference shown up here is 'wrong side',
// e.g. a host function called by a device host function in device		// e.g. a host function called by a device host function in device
		traUnsubmitted Done Reply Inline Actions The comment uses device/host for both function attributes and when it refers to the compilation phase. It would help to make it more readable if function attributes would be distinct from compilation phase. E.g. by using `__host__ __device__` or `HD`. tra: The comment uses device/host for both function attributes and when it refers to the compilation…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will use H/D/HD for function attribute when committing. yaxunl: will use H/D/HD for function attribute when committing.
// compilation. This is valid AST as long as the host device function is not		// compilation. This is valid AST as long as the host device function is not
// emitted, e.g. it is an inline function which is called only by a host		// emitted, e.g. it is an inline function which is called only by a host
// function. A deferred diagnostic will be triggered if it is emitted.		// function. A deferred diagnostic will be triggered if it is emitted.
// However a wrong-sided function is still a viable candidate here.		// However a wrong-sided function is still a viable candidate here.
//		//
// If Cand1 can be emitted and Cand2 cannot be emitted in the current		// If Cand1 can be emitted and Cand2 cannot be emitted in the current
// context, Cand1 is better than Cand2. If Cand1 can not be emitted and Cand2		// context, Cand1 is better than Cand2. If Cand1 can not be emitted and Cand2
// can be emitted, Cand1 is not better than Cand2. This rule should have		// can be emitted, Cand1 is not better than Cand2. This rule should have
Show All 30 Lines	if (FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext)) {
// is forced by pragma. In device compilation, if we prefer HD candidates		// is forced by pragma. In device compilation, if we prefer HD candidates
// over wrong-sided candidates, overloading resolution may change, which		// over wrong-sided candidates, overloading resolution may change, which
// may result in non-deferrable diagnostics. As a workaround, we let		// may result in non-deferrable diagnostics. As a workaround, we let
// implicit HD candidates take equal preference as wrong-sided candidates.		// implicit HD candidates take equal preference as wrong-sided candidates.
// This will preserve the overloading resolution.		// This will preserve the overloading resolution.
auto EmitThreshold =		auto EmitThreshold =
(S.getLangOpts().CUDAIsDevice && IsCallerImplicitHD &&		(S.getLangOpts().CUDAIsDevice && IsCallerImplicitHD &&
(IsCand1ImplicitHD \|\| IsCand2ImplicitHD))		(IsCand1ImplicitHD \|\| IsCand2ImplicitHD))
? Sema::CFP_HostDevice		? Sema::CFP_Never
: Sema::CFP_WrongSide;		: Sema::CFP_WrongSide;
auto Cand1Emittable = P1 > EmitThreshold;		auto Cand1Emittable = P1 > EmitThreshold;
auto Cand2Emittable = P2 > EmitThreshold;		auto Cand2Emittable = P2 > EmitThreshold;
if (Cand1Emittable && !Cand2Emittable)		if (Cand1Emittable && !Cand2Emittable)
return true;		return true;
if (!Cand1Emittable && Cand2Emittable)		if (!Cand1Emittable && Cand2Emittable)
return false;		return false;
}		}
▲ Show 20 Lines • Show All 5,401 Lines • Show Last 20 Lines

clang/test/SemaCUDA/function-overload.cu

Show First 20 Lines • Show All 535 Lines • ▼ Show 20 Lines	struct e {
}		}
};		};
class l {		class l {
l() {		l() {
e::g([] {}, this);		e::g([] {}, this);
}		}
};		};
}		}

		// Implicit HD candidate competes with device candidate.
		// a and b have implicit HD copy ctor. In copy ctor of b, ctor of a is resolved.
		// copy ctor of a should win over a(short), otherwise there will be ambiguity
		// due to conversion operator.
		namespace TestImplicitHDWithD {
		struct a {
		__attribute__((device)) a(short);
		traUnsubmitted Done Reply Inline Actions `__device__`, etc. are defined by the included "Inputs/cuda.h" and can be used here to make it more readable. tra: `__device__`, etc. are defined by the included "Inputs/cuda.h" and can be used here to make it…
		__attribute__((device)) operator unsigned() const;
		__attribute__((device)) operator int() const;
		};
		struct b {
		a d;
		};
		void f(b g) { b e = g; }
		}

		// Implicit HD candidate competes with host candidate.
		// a and b have implicit HD copy ctor. In copy ctor of b, ctor of a is resolved.
		// copy ctor of a should win over a(short), otherwise there will be ambiguity
		// due to conversion operator.
		namespace TestImplicitHDWithH {
		struct a {
		a(short);
		__attribute__((device)) operator unsigned() const;
		__attribute__((device)) operator int() const;
		};
		struct b {
		a d;
		};
		void f(b g) { b e = g; }
		}

		// Implicit HD candidate comptes with HD candidate.
		// a and b have implicit HD copy ctor. In copy ctor of b, ctor of a is resolved.
		// copy ctor of a should win over a(short), otherwise there will be ambiguity
		// due to conversion operator.
		namespace TestImplicitHDWithHD {
		struct a {
		__attribute__((host,device)) a(short);
		__attribute__((device)) operator unsigned() const;
		__attribute__((device)) operator int() const;
		};
		struct b {
		a d;
		};
		void f(b g) { b e = g; }
		}
		traUnsubmitted Done Reply Inline Actions competes->compete tra: competes->compete
		traUnsubmitted Done Reply Inline Actions One thing that bothers me about this comment is that `-fgpu-defer-diag` apparently changes the result of the overload resolution, not just deferring diags. tra: One thing that bothers me about this comment is that `-fgpu-defer-diag` apparently changes the…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions without -fgpu-defer-diag we have to keep the old incorrect overloading resolution since otherwise it breaks existing code. We can only have correct overloading resolution with -fgpu-defer-diag on. If we want to have correct overloading resolution, not depending on whether -fgpu-defer-diag is on or off, we have to turn on -fgpu-defer-diag by default. In this case no existing code will be broken. yaxunl: without -fgpu-defer-diag we have to keep the old incorrect overloading resolution since…
		traUnsubmitted Not Done Reply Inline Actions We can only have correct overloading resolution with -fgpu-defer-diag on. `-fgpu-defer-diags` is a prerequisite for fixing overload resolution. I'm fine with that. Making it serve the double duty of affecting the overload resolution is what I was pointing at. We should have a knob `fix-overload-resolution` which would then turn `-fgpu-defer-diag` on, not the other way around. tra: > We can only have correct overloading resolution with -fgpu-defer-diag on. `-fgpu-defer…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions That makes sense. Will add -ffix-overload-resolution. yaxunl: That makes sense. Will add -ffix-overload-resolution.
		traUnsubmitted Done Reply Inline Actions It would be great to have a test where those diagnostics do fire. tra: It would be great to have a test where those diagnostics do fire.

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix HD function resolution
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 265767

clang/lib/Sema/SemaOverload.cpp

clang/test/SemaCUDA/function-overload.cu

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix HD function resolutionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 265767

clang/lib/Sema/SemaOverload.cpp

clang/test/SemaCUDA/function-overload.cu

[CUDA][HIP] Fix HD function resolution
ClosedPublic