This is an archive of the discontinued LLVM Phabricator instance.

[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible
AbandonedPublic

Authored by tianshilei1992 on Jun 30 2021, 4:35 PM.

Details

Summary

In the device runtime there are many function calls to __kmpc_is_spmd_exec_mode
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use AAKernelInfo to analyze kernel execution. Basically, for
each kernel (device) function F, we collect all kernel entries K that can
reach F. In each iteration, we go through all reaching kernel entries and check
their execution mode. If F can only be reached by kernel entries with same mode,
we update a map from CallBase * to Constant * to corresponding value. In
manifest stage, if any entry of the map is not nullptr, which means the function
call can be folded to the Constant *, we replace all uses of the function call
and remove it.

Later we will also add more foldable functions, such as isMainThread.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Jun 30 2021, 4:35 PM
tianshilei1992 requested review of this revision.Jun 30 2021, 4:35 PM
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript

Test missing. Other than that mostly nits.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
460

Unsure, I think this way is more natural.

464
2738

use the changeValueAfterManifest interface.

3007–3008

Remove

3025
3105

Move this before the kernel handling, then you can exit if it is not a kernel. Alternatively, put the kernel handling in a helper.

3152–3153
3202

Maybe put this into a helper as well. And the "reaching kernels" update. Will make it easier to read in the future.

3209

Add a comment why we would want this (here).

tianshilei1992 marked 9 inline comments as done.

fix coments

JonChesterfield added a comment.EditedJul 1 2021, 1:25 AM

Interesting, thanks. For the functions that could be called by generic or spmd kernels, we could clone them (one for each) as (I think) we know at the entry point which it will be.

We probably want this after any transforms that try to convert generic kernels to spmd. Not sure how to order that.

jdoerfert added inline comments.Jul 1 2021, 10:54 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2735–2738
  • fixed comments
  • added a wrapper for kernel entry set up
tianshilei1992 marked an inline comment as done.Jul 1 2021, 11:22 AM
cchen added a subscriber: cchen.Jul 1 2021, 11:58 AM

fixed rebase issue

tianshilei1992 added inline comments.Jul 6 2021, 3:06 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3316

This condition should include SPMDization, but what should be used?

tianshilei1992 added inline comments.Jul 7 2021, 10:00 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
458

We probably don't need this variable at all.

Test missing.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
492–493

?

2730–2745

here we need to return changed or we need to set changed to CHANGED if changeToSPMDMode returned true.

3141

Why !assumed() ? Wouldn't that mean we only update if we have unknown reaching kernels?

3266

Early exit please.

3275

I'm not so sure about this assertion. Maybe handle the case conservatively instead.

3276–3277
tianshilei1992 marked 6 inline comments as done.

fixed comments

revmoed unrelated change

Test still missing ;)

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3141

This is better but I'm still not sure if it's correct. When we go from valid to invalid we stop updating the map, which is bad because the content is still in there.

JonChesterfield added a comment.EditedJul 8 2021, 10:57 AM

I wonder if this is separable.

Could we have an attribute, 'openmp_called_in_spmd_mode' or similar, which we apply to functions when that is always true, similar to the above logic.

We could then have a pass which, for functions with that attribute set, folds call instructions within that function appropriately.

edit: Partly thinking of splitting it to simplify testing - we can write various call trees and check the right ones pick up the attribute, and separate tests to check we fold the calls based on that attribute.

tianshilei1992 added inline comments.Jul 8 2021, 12:54 PM
llvm/test/Transforms/OpenMP/is_spmd_exec_mode_unfoldable.ll
343 ↗(On Diff #357330)

This function will not be called if we remove the way to tell if it is in SPMD mode from ident.

update foldable test

update the test case with update_test_checks

JonChesterfield added a comment.EditedJul 9 2021, 2:06 AM

Thought about this a little more. I think there's a dead-simple 90% solution available.

-For each internal global

  • If it is undef initialized
  • If it is only used by loads and stores (thus no aliasing)
  • If every store writes the same constant
  • Replace every load with that constant, then delete the stores, then delete the global

That's a very narrow load/store propagation that only applies to globals that are initially undef, which are unusual in most programs, but common where __shared__ or pteam memory is involved.

If I'm reading some disassembled IR right, it would work for the variables used to report is_spmd and is_uninitialized. For modules that only contain one type of kernel, the overall effect would be the same as the current patch, at much reduced complexity.

Should I type that up? I think we're only looking to reduce the cost of the current/legacy devicertl with this patch. When more serious aliasing analysis comes online it would obsolete the trivial IR transform above, at which point we can delete said transform and keep the verification tests. I particularly like the path to cleanly deleting the pass when it becomes obsolete.

edit: Tried to write a test case for this, but everything I've tried so far gets stripped out by the optimiser already. i.e. the is_spmd_exec_mode calls are already removed. Can you share the test case/application where this is not the case?

Thought about this a little more. I think there's a dead-simple 90% solution available.

-For each internal global

  • If it is undef initialized
  • If it is only used by loads and stores (thus no aliasing)
  • If every store writes the same constant
  • Replace every load with that constant, then delete the stores, then delete the global

And if you make it "field-sensitive" now you are almost at https://reviews.llvm.org/D104432 :)
Not sure I'd call that dead-simple but I'm with you.

That's a very narrow load/store propagation that only applies to globals that are initially undef, which are unusual in most programs, but common where __shared__ or pteam memory is involved.

If I'm reading some disassembled IR right, it would work for the variables used to report is_spmd and is_uninitialized. For modules that only contain one type of kernel, the overall effect would be the same as the current patch, at much reduced complexity.

Should I type that up? I think we're only looking to reduce the cost of the current/legacy devicertl with this patch. When more serious aliasing analysis comes online it would obsolete the trivial IR transform above, at which point we can delete said transform and keep the verification tests. I particularly like the path to cleanly deleting the pass when it becomes obsolete.

edit: Tried to write a test case for this, but everything I've tried so far gets stripped out by the optimiser already. i.e. the is_spmd_exec_mode calls are already removed. Can you share the test case/application where this is not the case?

tianshilei1992 planned changes to this revision.Jul 10 2021, 10:40 AM

Will move the change out of AAKernelInfo.

tianshilei1992 abandoned this revision.Jul 11 2021, 8:01 PM