Add option -ffix-overload-resolution, which is off by default.
When -ffix-overload-resolution is off, keep the original behavior.
Otherwise enable the correct hostness based overloading resolution.
Differential D80450
[CUDA][HIP] Fix HD function resolution yaxunl on May 22 2020, 10:33 AM. Authored by
Details Add option -ffix-overload-resolution, which is off by default. When -ffix-overload-resolution is off, keep the original behavior.
Diff Detail
Event TimelineComment Actions ^^^ I don't think this has been answered. I would like to test this change before it lands. Comment Actions Reproducer for the regression. https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575 With the patch, an attempt to instantiate ag on line 36 (in the reproducer sources I linked to above) results in ambiguity between two templates on lines 33 and 24 that are in different namespaces. Comment Actions Managed to simplify the reproducer down to this which now reports that a host candidate has been ignored. This may explain why we ended up with the ambiguity when other overloads were present. template <typename> struct a {}; namespace b { struct c : a<int> {}; template <typename d> void ag(d); } // namespace b template <typename ae> __attribute__((host)) __attribute__((device)) int ag(a<ae>) { ae e; ag(e); } void f() { ag<b::c>; } Comment Actions The error only happens in device compilation. For the call ag(e). There are two candidates:
Before my patch, wrong-sided candidate is allowed. clang resolves to candidate 1 and this results in a diagnostic about host function referenced in device host function, which can be deferred. Since f() is not emitted on device side, the deferred diags is not emitted. After my patch, wrong-sided candidate is not allowed. clang resolves to candidate 2, which results in a diagnostic that no matching function, which is not a deferred diagnostics by default and is emitted even if f() is not emitted on device side. In a sense, the diagnostic is correct, since ag(a<ae>) cannot be emitted on device side. This can be fixed by either make ag(a<ae>) a host function or make ag(d) a host device function. In the original testcase (https://gist.github.com/Artem-B/183e9cfc28c6b04c1c862c853b5d9575) Before my change, call at line 36 resolves to wrong-sided candidate at line 29 since that is the best match for argument types. This results in a deferred diag which allows device compilation to pass. After my change, call at line 36 resolves to two host device candidates. This results in diagnostics about ambiguity which is not deferred by default. Therefore the compilation fails. Basically it all boils down to the issue that overloading resolution diagnostics are not deferred by default. I think first of all we need to exclude wrong-sided candidates as this patch does, otherwise we cannot have correct hostness based overloading resolution and fix bugs like https://bugs.llvm.org/show_bug.cgi?id=46922 . However by doing this we changes the existing overloading resolution incur some overloading resolution diags. Unless we defer these diags, we may break some existing CUDA/HIP code. Fortunately https://reviews.llvm.org/D84364 is already landed, which allows deferring overloading resolution diags under option -fgpu-defer-diags. I think a conservative solution is that we keep the old overloading resolution behavior by default (i.e. do not exclude wrong-sided candidates), whereas enable the correct hostness based overloading resolution when -fgpu-defer-diags is on. If developers would like correct hostness based overloading resolution, they can use -fgpu-defer-diags. Then as -fgpu-defer-diags become stable, we turn it on by default. Comment Actions This used to be a fairly common pattern in existing CUDA code. A lot of templated code had __host__ __device__ slapped on it because NVCC had no target overloading and it's hard to control who/where/how will instantiate particular template. Some of those templates could only be instantiated on one side of the compilation. Clang's only choice was to allow the wrong-side candidates and/or defer the diagnostics. I vaguely recall that it was one of the trickier bits of the overload resolution rules to handle.
Looks that way.
SGTM. I'll check how the patch fares on our CUDA code.
Comment Actions Please hold on. I just found a regression due to old behavior not fully recovered in certain case. I will update the patch for fixing the regression.
Comment Actions LGTM. I'd suggest adding more details on the background of this change to the commit log (point to the comment in the isBetterOverloadCandidate ?) and outline the intention to enable the new way to do overloading after some soak time. Also, naming. -ffix-overload-resolution is rather non-specific. I didn't mean to use it literally. The problem is that I can't think of a good descriptive name for what we do here. -fgpu-fix-wrong-side-overloads ? Something else?
Comment Actions Will do.
How about -fgpu-exclude-wrong-side-overloads? Since what this patch does is always excluding wrong side overloads whereas previously only excluding wrong side overloads if there are same side overloads.
|
The comment uses device/host for both function attributes and when it refers to the compilation phase. It would help to make it more readable if function attributes would be distinct from compilation phase. E.g. by using __host__ __device__ or HD.