- Clang follows its own scheme for lambdas which don't need to follow ODR rule. That scheme will assign an unqiue ID within the TU scope and won't be unique or consistent across TUs.
- In CUDA/HIP, a lambda with __device__ or __host__ __device__ (or an extended lambda) may be used in __global__ template function instantiation. If that lambda cannot be named following ODR rule, the device compilation may produce a mismatching device kernel name from the host compilation as the anonymous type ID assignment aforementioned.
- In this patch, a new language option, -fcuda-force-lambda-odr, is introduced to force ODR for lambda naming so that all lambda could be consistently named across TUs, including the device compilation. This solves the assertion checking device kernel names as well as ensures the named-based resolution could resolve the correct device binaries from the device name generated in the host compilation.
So, in short, what you're saying is that lambda type may leak into the mangled name of a __global__ function and ne need to ensure that the mangled name is identical for both host and device, hence the need for consistent naming of lambdas.
If that's the case, shouldn't it be enabled for CUDA/HIP by default? While it's not frequently used ATM, it is something we do want to work correctly all the time. The failure to do so results in weird runtime failures that would be hard to debug for end-users.
@rsmith -- are there any downsides having this enabled all the time?
yeah, we should ensure consistent naming by default. But, I want to hear more suggestion and comment before making that option by default. To more specific, as that option forces all naming of lambda to follow ODR rule. For non-__device__ lambda, even though there is no code quality change, we do add overhead for the compiler itself, as the additional records, though that should be negligible. A potential solution is to record the ODR context for parent lambdas and re-number them if the inner lambda is found as __device__ one.
However, I do like the straight-forward and extremely simple solution of this patch to force all lambda naming following ODR, there is no code quality change and, potentially slight, FE overhead. What's your thought?
BTW, I am also working on similar issues in unnamed class/struct/union. But, so far, we didn't found any workloads broken due to that and want to address that in another patch.
I think this is the wrong way to handle this issue. We need to give lambdas a mangling if they occur in functions for which there can be definitions in multiple translation units. In regular C++ code, that's inline functions and function template specializations, so that's what we're currently checking for. CUDA adds more cases (in particular, __host__ __device__ functions, plus anything else that can be emitted for multiple targets), so we should additionally check for those cases when determining whether to number lambdas. I don't see any need for a flag to control this behavior.
I agree that this's a temporary solution to fix the issue. But, the real tricky part is that, once we found a __device__ lambda, we need to ensure all the enclosing scopes should be named following ODR as well just as the case illustrated in the test case. In fact, it's not the outer lambda (not annotated with __device__ nor within an inline function.) not being named in ODR. The tricky issue is that, so far, we don't maintain a context to add mangling back if we found an inner one needs to follow ODR. We have to add that before we could do that on-demand. I was working on that but it would take more efforts of review.
That's also the motivation why this change adds a option to guard this behavior.