Page MenuHomePhabricator

OpenMP for loop fusion
Needs RevisionPublic

Authored by abidmalikwaterloo on Feb 28 2020, 1:35 PM.



The patch combine two openmp for loop with static scheduling strategy having the same parameters.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 1:35 PM

This patch *does not* perform loop fusion. Nor do we want it to. We want to deduplicate OpenMP runtime calls that are involved in OpenMP worksharing loops. Please make this clear in the commit message and description. The latter should go into more detail what is happening here and why.

There are various problems wrt. the coding standards. I will only comment on a few but the patch needs to be updated according to the coding standard and the surrounding code patterns.

Why are there no tests?

The code is complicated and hard to read, especially given the lack of comments and documentation. Please read through my inline comments. Afterwards, I suggest you start by

  1. identifying call sites as the surrounding code does it, then
  2. filter call sites that are not legible on their own, then
  3. for each call site, find a matching one, if found, replace and rewrite and try again.

You should have test cases that are developed as part of the patch. You should also put early versions for review so you can get feedback.


As mentioned in the main message, this does not perform loop fusion. It is a more complex deduplication of runtime calls that may enable loop fusion but for the sake of this patch and this code it is just runtime call deduplication.


While I am pretty sure the entire map construction is overly complicated and should be replaced, you can achieve the above with something like:

for (const auto &It : m)
  if (llvm::any_of(It.second, [I](CallInst *CI) { return I == CI;}))
    return true;
return false;

Why does it always return false?


Please use modern C++. LLVM is currently based on C++14, for years we allow range based loops:
for (Instruction *I : instructions(F)) {


Please look take a look at deduplicateRuntimeCalls and deleteParallelRegions. Their we utilize a caching system to find calls to OpenMP runtime functions. It is important that we do not cause a constant overhead in this pass by scanning instructions multiple times.


Since you are interested in special stores only it would make sense to look for them explicitly, e.g., by following the uses of arguments passed into the __kmpc_for_static_init_4 call.


Most of these functions lack documentation and all of them lack comments explaining the code.
The names need to be adjusted wrt. the coding standard *and* to make them expressive. For example, "clean" is nothing I can associate with an action.


We cannot have stray debug output.

abidmalikwaterloo marked an inline comment as done.

You need to squash your commits or create the diff against the proper base commit. More comments inlined but there is a lot more to comment on.


The example above is generally nice but maybe we should not print values in parallel, just initialize two arrays.

Style: Clang format all code and comments (=the entire file). Also adjust the indention in the example.


Nit: two spaces, if you write sentences add the '.'.


Changed should never be set to false once it is true. However, only because you do that this works. You need a local changed or just if(...) somewhere.


Having a function like this is often a side-effect of improper data-structures. If you need to cache all calls that have been looked at, add a SmallPtrSet<CallInst*, 8> to do so instead of iterating over all vectors that are keys in a map.


Please use llvm data structures and give variables proper names.


This function is quadratic in the number of __kmpc_static_init4 calls.


This class provides a way to access/iterate over all call sites of a known OpenMP runtime call. Please use that.


The formatting of this function makes it really hard to read it. Please clang-format your patch before uploading it.


Please do not iterate over the entire function, basically ever. Start with the things you are interested in and go from there. Iterating over the entire function is at some point costly and always wasteful.


No commented out code.

jdoerfert requested changes to this revision.Thu, Mar 19, 10:29 PM

This doesn't seem to use dominance at all. How do you handle

if (a) {
#pragma omp for
for (int i = 0; i < 10; i++)
} else {
#pragma omp for
for (int i = 0; i < 10; i++)
This revision now requires changes to proceed.Thu, Mar 19, 10:29 PM

Hmm. Random question: are these optimizations friendly to the openmp tooling?

Hmm. Random question: are these optimizations friendly to the openmp tooling?

Not yet but it's on my list. The plan was to start improving this once we have an "intrusive" transformation, e.g., parallel region merging I have downstream. As of now, I expect us to emit remarks that a tool can use to match the code and events seen at runimte with the input. Similarly, we need to deal with source location changes, update source locations when we merge calls, etc. (there is a TODO in OpenMPOpt already). Feedback on this is very welcome. I also wanted the input of @jmellorcrummey on that.