This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Pass ExecConfig through BuildCallToMemberFunction
ClosedPublic

Authored by tra on Aug 26 2021, 2:12 PM.

Details

Summary

Otherwise, we fail to compile calls to CUDA kernels that are static members.

Diff Detail

Unit TestsFailed

Event Timeline

tra created this revision.Aug 26 2021, 2:12 PM
tra requested review of this revision.Aug 26 2021, 2:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2021, 2:12 PM
jlebar accepted this revision.Aug 26 2021, 2:18 PM

Thanks, Art.

This revision is now accepted and ready to land.Aug 26 2021, 2:18 PM

need a test for non-template static member function as kernel. also need codegen tests.

tra updated this revision to Diff 369140.Aug 27 2021, 10:47 AM
tra edited the summary of this revision. (Show Details)

Added more tests

tra added a comment.Aug 27 2021, 11:01 AM

need a test for non-template static member function as kernel. also need codegen tests.

I've added more tests for different code paths leading to the kernel call. Interestingly enough, only the a0 actually calls BuildCallToMemberFunction. Other variants go through different code paths that handle the call without it.

As for the codegen, all kernel calls that make it to clang::Sema::BuildResolvedCallExpr with launch config are handled the same way. I don't think codegen tests will buy us much.

I am concerned that there may be more places which need handling, and passing exec config expr by function arguments may not scale. Is it possible to represent the kernel call expr by a derived class of call expr and add the exec config expr as member to it?

tra added a comment.Sep 14 2021, 10:15 AM

I am concerned that there may be more places which need handling, and passing exec config expr by function arguments may not scale.
Is it possible to represent the kernel call expr by a derived class of call expr and add the exec config expr as member to it?

I don't think it's worth it.

This config pass-through code has been around from the very early days of attempting to implement CUDA and we're already passing it around during call resolution.
AFAICT, this particular place was a relatively new addition which didn't implement the pass-through of the config.

While there may be other places where a similar issue may happen in the future (or exists as a corner case we didn't find yet), it/when we run into it, it will be diagnosed, as it was in this case.
It took us few years until we ran into this one. I'm pretty sure that this particular code path is pretty rare and the patch is not going to have a measurable impact on compiler performance.

yaxunl accepted this revision.Sep 15 2021, 10:17 AM

LGTM. Thanks.

This revision was landed with ongoing or failed builds.Sep 16 2021, 11:19 AM
This revision was automatically updated to reflect the committed changes.