This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Abstract parallel IR analyzes & optimizations + OpenMP implementations
Needs ReviewPublic

Authored by jdoerfert on May 23 2018, 4:16 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This patch is part of an RFC to add an abstract parallel IR interface
that allows us to analyze and optimize parallel codes in different
representations.

The relationship of the parts contained in this initial commit is shown
below. The attribute annotator transformation pass will query the
abstract parallel region and communication info interfaces to determine
if communicated values can be tagged as no-alias, no-capture, readnone,
or readonly. If so, this is done through the abstract ParallelIR/Builder
interface. Both the analyses parts, as well as the builder interface, is
implemented for the OpenMP KMPC runtime library call representation that
is used by clang.

      Optimization         Analysis/Transformation           Implementation
---------------------------------------------------------------------------
                     /---> ParallelRegionInfo (A) ---------|-> KMPCImpl (A)
                     |                                     |
AttributeAnnotator --|---> ParallelCommunicationInfo (A) --/
                     |
                     \---> ParallelIR/Builder (T) -----------> KMPCImpl (T)

In addition to the attribute annotator, we have four more parallel IR
specific optimizations that achieve high speedups for Rodinia OpenMP
benchmarks (see [0]). However, to keep this first commit simple, only
a simplified form of our attribute annotator was included.

[0] http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf

Diff Detail

Event Timeline

jdoerfert created this revision.May 23 2018, 4:16 PM
jdoerfert edited the summary of this revision. (Show Details)May 23 2018, 4:18 PM
jdoerfert removed a subscriber: llvm-commits.

I've added a few specific comments, but I think that you should move forward with the associated RFC.

As a general point, I wonder how much of the logic here which recognizes kmp_* functions could be replaced with attributes/metadata on the functions themselves. If everything needed could be done this way, then it could be done, perhaps, for other runtime functions used by other frontends without optimizer modifications, and moreover, maybe we could use it for C++ lambdas "for free." I realize that this might apply to what is used for this attribute-propagation logic, although might not be true for other parallelism-aware optimization (e.g., barrier removal, region fusion, etc.). Nevertheless, it might be worthwhile even if only useful for information propagation.

include/llvm/Analysis/ParallelIR/RegionInfo.h
38

I wonder if there's any value here in using the term 'Captured' instead of 'Communicated' here. This seems very similar to the concept of captured values in C++, and so using that term might be helpful.

In general, I wonder how much of this can be generalized to handle C++ lambdas.

220

Given that you have subclass ids in the parent class definition, it seems like we have a closed class hierarchy, and so we might as well use LLVM's isa/dyn_cast classof-based system (as that's more efficient than making virtual function calls when doing typeof-like testing).

lib/Transforms/ParallelIR/AttributeAnnotator.cpp
19

When you add support for the new pass manager, can you make it an CGSCC pass there?

219

I think that I understand why you're doing this, but it really deserves a comment. Also, I don't think that it gives you all of what you want. Even an identified function local could have been captured into a global, and that global could be accessed in the parallel code. If you intend to rule out aliasing via that channel, you also need to explicitly ensure that the value is not captured before the dispatching call site. You can do that with PointerMayBeCapturedBefore.

The trick is that if you have multiple successive dispatch calls, you need the first one to not to capture everything, thus inhibiting the transformation for later parallel-region dispatches. I'm guessing that propagating the nocapture attribute will do this.

jdoerfert marked an inline comment as done.Jun 6 2018, 4:42 AM

Thanks for these initial comments. Second revision and actual RFC mail is coming.

include/llvm/Analysis/ParallelIR/RegionInfo.h
38

I wonder if there's any value here in using the term 'Captured' instead of 'Communicated' here. This seems very similar to the concept of captured values in C++, and so using that term might be helpful.

The problem I have with captured is that it is supposed to be either "by-value" or "by-reference". Neither conveys the direction information is flowing which is important here. However, I do not have a strong opinion on the naming scheme I used and if people have arguments to change any of the names I'm not opposed.

In general, I wonder how much of this can be generalized to handle C++ lambdas.

I think it should quite nicely. There are three reasons for this interface. First, it provides a unified API for both "outlined regions", e.g., OpenMP runtime calls, and "embedded regions", e.g., Tapir/IntelPIR. While the former provides parameter/argument attributes (readonly, writeonly, ...), we can use an analysis to determine the communication patterns for the latter too. Second, it helps to overcome the indirection that separates "outlined regions" from their original source location. This basically means this interface maps runtime library call arguments to parameters of the outlined function. Third, this interface can hide indirection through a sturct as for example used by the GOMP or pthreads library. Especially the last two points should be interesting for lambdas too if they get passed as callbacks.

220

I want to minimize the casting to subclasses of the parallel region as much as possible but since it is not always avoidable I will look into adopting the isa/dyn_cast system here.

lib/Transforms/ParallelIR/AttributeAnnotator.cpp
19

Currently, both pass managers are supported. The scheme is the same as for most of LLVM currently. Once these parallel IR passes drop support for the old pass manager, changing to a CGSCC pass should not be a problem (if I understood Chandler correctly).

jdoerfert updated this revision to Diff 150110.Jun 6 2018, 4:45 AM

Fix capture problem and small improvements

jdoerfert updated this revision to Diff 150111.Jun 6 2018, 4:50 AM

Fix spelling and improve comments

jdoerfert edited the summary of this revision. (Show Details)Jun 6 2018, 6:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2020, 1:46 PM