This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Create custom state machines for generic target regions
ClosedPublic

Authored by jdoerfert on May 6 2021, 12:01 AM.

Details

Summary

In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Diff Detail

Event Timeline

jdoerfert created this revision.May 6 2021, 12:01 AM
jdoerfert requested review of this revision.May 6 2021, 12:01 AM
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
jdoerfert updated this revision to Diff 343302.May 6 2021, 12:08 AM

Fix porting mistakes

jdoerfert updated this revision to Diff 344652.May 11 2021, 9:39 PM

Add comments and tests

FWIW, here is what clang currently generates: https://godbolt.org/z/ncWq8f437
With these two patches the result is better in basically every case except the last where there is no actual improvement to be made.

jdoerfert updated this revision to Diff 345172.May 13 2021, 9:11 AM

Update tests with --function-signature to avoid check-label matching calls

LG, some remarks

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
510

Nit, concatenation?

613

Assertion?

687

skeleton

ggeorgakoudis added inline comments.May 17 2021, 4:04 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
497

Is equality comparison enough for ParallelRegions or should we change ParallelRegions refer to same function? I guess size equality should be enough since we only add ParallelRegions, but still checking.

jdoerfert added inline comments.May 17 2021, 4:38 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
497

I will check the content, it should not matter how we use it but you're right, better to plan for the future.

jdoerfert marked 3 inline comments as done.

Addressed comments
Also moved logic from the AAKernelInfoFunction::updateImpl to
AAKernelInfoCallSite::initialize as it looks cleaner to fix the
state of runtime calls right away.

jdoerfert added a comment.EditedMay 19 2021, 10:32 PM

[Wrong review -.- ]

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
510

It won't always be described properly by "concatenation". Already the |= part is on the border, once we do SPMD detection there will be a &= as well.

ggeorgakoudis accepted this revision.EditedJun 13 2021, 11:09 AM

LGTM, minor comment, fix tests?

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
795

Remove dump()?

This revision is now accepted and ready to land.Jun 13 2021, 11:09 AM
jdoerfert updated this revision to Diff 354770.Jun 27 2021, 1:53 PM
jdoerfert marked an inline comment as done.

Rebase

jdoerfert updated this revision to Diff 354774.Jun 27 2021, 2:58 PM

Update test and check for internalized functions too.

jdoerfert updated this revision to Diff 355413.Jun 29 2021, 5:42 PM

Track unknown parallel calls and add remarks for the state machine

jdoerfert updated this revision to Diff 355959.Jul 1 2021, 11:24 AM

rebase, remarks, and more tests

jhuber6 added inline comments.Jul 1 2021, 11:39 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2972

I think it should be clear that the optimization pass changed something. I feel just saying "Is executed" is a little ambiguous to what actually happened.

2996

Should this be an analysis remark? It pairs with the analysis remarks below.

jdoerfert added inline comments.Jul 1 2021, 12:04 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2972

sure.

2996

this is the matching remark to the ones above. Why would we make one transformation and the other one analysis, this is describing the transformation as the other two are.

jdoerfert updated this revision to Diff 356095.Jul 1 2021, 9:51 PM

Use a proper state to track unknown and known reached parallel regions.

tianshilei1992 added inline comments.Jul 2 2021, 8:18 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2935

Is it good to return ahead of time? In later patches we also create AAKernelInfoFunction for "regular" functions.

jdoerfert added inline comments.Jul 2 2021, 8:35 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2935

It is good to keep control flow simple. Later, as needed, we can have an initializeKernel and initializeNonKernel, for example.

tianshilei1992 added inline comments.Jul 7 2021, 9:41 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
461
tianshilei1992 added inline comments.Jul 7 2021, 10:32 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
467

If the state is invalid or in pessimistic state, do we want to clear the set?

jdoerfert added inline comments.Jul 7 2021, 12:29 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
461

size_t is probably the worst, but sure.

467

no, we want the content to emit remarks.

This revision was landed with ongoing or failed builds.Jul 10 2021, 10:33 AM
This revision was automatically updated to reflect the committed changes.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2971

I'm seeing this assert fire. The kernel contains both the init and deinit calls, but the InitRFI/DeinitRFI for each use doesn't see any uses, so StoreCallBase isn't called, and nothing is assumed to Storage. Kernel otherwise looks uninteresting, run into while trying to track down another bug.

For my reference, issue_001.c from aomp, debug clang build at trunk today, invoked as:

clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 issue_001.c -o issue_001
gets bitcode corruption, invalid record

If passed -openmp-opt-disable=true, no corruption seen. If that is used to extract amdgpu bitcode which is fed back into opt, this assert trips.

jdoerfert added inline comments.Aug 9 2021, 8:53 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
2971

Can you emit the IR (module) before openmp-opt?