This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Folding threadLimit and numThreads when single value in kernels
ClosedPublic

Authored by josemonsalve2 on Jul 14 2021, 8:05 PM.

Details

Summary

The device runtime contains several calls to __kmpc_get_hardware_num_threads_in_block
and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced AAFoldRuntimeCall and the NumTeams and
NumThreads kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Diff Detail

Event Timeline

josemonsalve2 created this revision.Jul 14 2021, 8:05 PM
josemonsalve2 requested review of this revision.Jul 14 2021, 8:05 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 14 2021, 8:05 PM

The test is breaking the compiler. I need to change it to see if it passes or not.

Fixing syntax, removing unnecessary code. Changing a break
to a indicatePessimisticFixpoint()

Just a comment before reviewing the patch: please don't rebase the patch as D105787 has been reverted.

Just a comment before reviewing the patch: please don't rebase the patch as D105787 has been reverted.

ACK

Renaming the functions and adding to OMPKinds.def et al looks separate to folding them, should land that first. Makes the functional diff easier to read and reduces the amount of churn if the functional change needs to be backed out.

openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
98 ↗(On Diff #358964)

git-clang-format HEAD^ may be useful for things like this

openmp/libomptarget/deviceRTLs/common/support.h
53 ↗(On Diff #358964)

Noinline seems bad. Why? Also looks like the functions tagged noinline here are different to the ones that are renamed.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
97 ↗(On Diff #358964)

rename the functions in amdgcn/src/target_impl.hip too please

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3651

I haven't read through this part, but if we can only fold it to a constant sometimes, we shouldn't mark the calls that survive noinline, as that'll be expensive for the cases that this pass misses.

jdoerfert retitled this revision from Folding threadLimit and numThreads when single value in kernels to [OpenMP] Folding threadLimit and numThreads when single value in kernels.Jul 20 2021, 8:26 AM
jdoerfert added inline comments.Jul 20 2021, 8:27 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3651

We are about to remove noinline from known runtime functions such that we can keep them around until we get to OpenMP-Opt as calls. This will have the effect we want without any drawbacks. Thus, adding noinline in the runitme will be totally fine.

tianshilei1992 added inline comments.Jul 20 2021, 2:35 PM
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
209

Probably worth a separate patch to add them.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3669

No need to have else here because the code above already returns.

3672

no need for else as well

3673

Directly indicate pessimistic state because we don't know clearly what the number is.

3683

This has to be updated as SimplifiedValue is not part of BooleanState. Refer to foldIsSPMDExecMode.

3689

Update accordingly

3705

This part needs to be changed. Refer to the trunk for more details. Basically it should be IRPosition::callsite_returned(*CI).

openmp/libomptarget/deviceRTLs/common/src/reduction.cu
200 ↗(On Diff #358964)

Since this part is moved to another patch, they should go in this patch.

Modifying to adapt to most of the comments already in here. Will provide more detail soon. Tests are still failing.

Ups, missed one comment from Shilei

josemonsalve2 marked 7 inline comments as done and an inline comment as not done.Jul 22 2021, 8:21 PM
josemonsalve2 added inline comments.
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
209

Do you mean just to add these 4 lines? Or should I add these lines to the front end patch D106298?

jdoerfert added inline comments.Jul 22 2021, 9:47 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
1777–1778

copy and paste.

3661
if (!CallerKernelInfoAA.ReachingKernelEntries.isValid())
  return indicatePessimisticFixpoint();

also below

3672

use early exit instead,
if (!...)

return ...
3681

no reaching kernels is fine, keep it at none (which is the default) and just return UNCHANGED.

3682

make this function take the string attribute such that we can have a single one and not two.

3702

leftover. (and you can also just pipe an IRPosition into errs())

  • Created a single function receiving a string instead of one per attribute: foldKernelFnAttribute
  • Removed pesimisticFixpoint if no kernel is found.
  • Adding Check for the ReachingKernelEntries valid state
  • Removed unnecesary comments
josemonsalve2 marked 6 inline comments as done.Jul 23 2021, 7:24 AM

LGTM with one nit. @jdoerfert WDYT?

llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
209

It's better to add them to the patch where they are introduced.

My test is still failing, but it fails on an assertion on the changeToSPMD method:

opt: /home/josem/postdoc/llvm_stuff/llvm-project/llvm/lib/Transforms/IPO/OpenMPOpt.cpp:2898: bool (anonymous namespace)::AAKernelInfoFunction::changeToSPMDMode(llvm::Attributor &): Assertion `ExecMode && "Kernel without exec mode?"' failed.

I'm trying to debug it, but if it is obvious for any of you let me know if it is something I am doing or I am just triggering another bug

jdoerfert accepted this revision.Jul 25 2021, 9:08 AM

Some minor notes, assuming the tests passes properly after, LGTM

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3652
3674

You should set CurrentAttrValue = NextAttrValue at the end of the loop.

3679
llvm/test/Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
7

you need a variable KERNEL_NAME_exec_mode per kernel here. Look at other tests or what clang generates.

This revision is now accepted and ready to land.Jul 25 2021, 9:08 AM

Fixing final comments from @jdoerfert

josemonsalve2 marked 7 inline comments as done.Jul 26 2021, 11:10 AM

Can we add the pass that removes NOINLINE from these functions before landing this one?

Can we add the pass that removes NOINLINE from these functions before landing this one?

I landed D106482 which does this. One problem is that this was getting run on the .bc libraries and removing it prematurely. @tianshilei1992 was going to make a patch to stop running OpenMPOpt when we build the libraries for this.

@tianshilei1992 was going to make a patch to stop running OpenMPOpt when we build the libraries for this.

It's already done. ;-)

JonChesterfield accepted this revision.Jul 26 2021, 11:52 AM

Ah, good. I wasn't tagged on D106482 and missed the emails about it.

Rebasing to main

Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2021, 4:41 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Rebasing to main this time for real

This revision was landed with ongoing or failed builds.Jul 27 2021, 6:47 PM
This revision was automatically updated to reflect the committed changes.