Page MenuHomePhabricator

OpenMP for loop fusion
Needs ReviewPublic

Authored by abidmalikwaterloo on Feb 28 2020, 1:35 PM.

Details

Reviewers
jdoerfert
Summary

The patch combine two openmp for loop with static scheduling strategy having the same parameters.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 1:35 PM

This patch *does not* perform loop fusion. Nor do we want it to. We want to deduplicate OpenMP runtime calls that are involved in OpenMP worksharing loops. Please make this clear in the commit message and description. The latter should go into more detail what is happening here and why.

There are various problems wrt. the coding standards. I will only comment on a few but the patch needs to be updated according to the coding standard and the surrounding code patterns.

Why are there no tests?


The code is complicated and hard to read, especially given the lack of comments and documentation. Please read through my inline comments. Afterwards, I suggest you start by

  1. identifying call sites as the surrounding code does it, then
  2. filter call sites that are not legible on their own, then
  3. for each call site, find a matching one, if found, replace and rewrite and try again.

You should have test cases that are developed as part of the patch. You should also put early versions for review so you can get feedback.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
130

As mentioned in the main message, this does not perform loop fusion. It is a more complex deduplication of runtime calls that may enable loop fusion but for the sake of this patch and this code it is just runtime call deduplication.

171

While I am pretty sure the entire map construction is overly complicated and should be replaced, you can achieve the above with something like:

for (const auto &It : m)
  if (llvm::any_of(It.second, [I](CallInst *CI) { return I == CI;}))
    return true;
return false;
224

Why does it always return false?

230

Please use modern C++. LLVM is currently based on C++14, for years we allow range based loops:
for (Instruction *I : instructions(F)) {

233

Please look take a look at deduplicateRuntimeCalls and deleteParallelRegions. Their we utilize a caching system to find calls to OpenMP runtime functions. It is important that we do not cause a constant overhead in this pass by scanning instructions multiple times.

242

Since you are interested in special stores only it would make sense to look for them explicitly, e.g., by following the uses of arguments passed into the __kmpc_for_static_init_4 call.

248

Most of these functions lack documentation and all of them lack comments explaining the code.
The names need to be adjusted wrt. the coding standard *and* to make them expressive. For example, "clean" is nothing I can associate with an action.

251

We cannot have stray debug output.

abidmalikwaterloo marked an inline comment as done.

You need to squash your commits or create the diff against the proper base commit. More comments inlined but there is a lot more to comment on.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
151

The example above is generally nice but maybe we should not print values in parallel, just initialize two arrays.

Style: Clang format all code and comments (=the entire file). Also adjust the indention in the example.

153

Nit: two spaces, if you write sentences add the '.'.

166

Changed should never be set to false once it is true. However, only because you do that this works. You need a local changed or just if(...) somewhere.

178

Having a function like this is often a side-effect of improper data-structures. If you need to cache all calls that have been looked at, add a SmallPtrSet<CallInst*, 8> to do so instead of iterating over all vectors that are keys in a map.

191–193

Please use llvm data structures and give variables proper names.

228

This function is quadratic in the number of __kmpc_static_init4 calls.

250–254

This class provides a way to access/iterate over all call sites of a known OpenMP runtime call. Please use that.

309

The formatting of this function makes it really hard to read it. Please clang-format your patch before uploading it.

309

Please do not iterate over the entire function, basically ever. Start with the things you are interested in and go from there. Iterating over the entire function is at some point costly and always wasteful.

313

No commented out code.

jdoerfert requested changes to this revision.Mar 19 2020, 10:29 PM

This doesn't seem to use dominance at all. How do you handle

if (a) {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
} else {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
}
This revision now requires changes to proceed.Mar 19 2020, 10:29 PM

Hmm. Random question: are these optimizations friendly to the openmp tooling?

Hmm. Random question: are these optimizations friendly to the openmp tooling?

Not yet but it's on my list. The plan was to start improving this once we have an "intrusive" transformation, e.g., parallel region merging I have downstream. As of now, I expect us to emit remarks that a tool can use to match the code and events seen at runimte with the input. Similarly, we need to deal with source location changes, update source locations when we merge calls, etc. (there is a TODO in OpenMPOpt already). Feedback on this is very welcome. I also wanted the input of @jmellorcrummey on that.

abidmalikwaterloo marked 7 inline comments as done and an inline comment as not done.EditedApr 1 2020, 4:07 PM

This doesn't seem to use dominance at all. How do you handle

if (a) {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
} else {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
}

if the value of a is known at the time of compilation, we will only have one "for loop". Therefore, for this specific case, the implemented technique will see only one "for loop" at the IR level. This case should be a problem.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
151

done

153

not sure what does this means. I should add ".. at the end of each comment line????

166

Changed variable is confused with the Changed variable within the run() function. Changed it to local "changed=false"

313

done

abidmalikwaterloo marked 3 inline comments as done.Apr 7 2020, 9:32 AM

This doesn't seem to use dominance at all. How do you handle

if (a) {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
} else {
#pragma omp for
for (int i = 0; i < 10; i++)
  ;
}

if the value of a is known at the time of compilation, we will only have one "for loop". Therefore, for this specific case, the implemented technique will see only one "for loop" at the IR level. This case should be a problem.

This can be done if we collapse the loops within basic blocks. Then this will make the two for loops independent of each other.

Any update on this?

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
153

You should a dot, ".", at the end of a full sentence.

Can you please diff this against the master branch (any commit). You updated it with a single commit against the old version of this patch. You should probably just merge all you have into a single commit for now.

openmp/runtime/src/kmp_sched.cpp
568

leftover

Also clang format the patch please.