This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP50]Codegen for scan directives in parallel for regions.
ClosedPublic

Authored by ABataev on Jun 9 2020, 9:35 AM.

Details

Summary

Added codegen for scan directives in parallel for regions.

Emits the code for the directive with inscan reductions.
Original code:

#pragma omp parallel for reduction(inscan, op : ...)
for() {
  <input phase>;
  #pragma omp scan (in)exclusive(...)
  <scan phase>
}

is transformed to something:

 #pragma omp parallel
{
size num_iters = <num_iters>;
<type> buffer[num_iters];
 #pragma omp for
for (i: 0..<num_iters>) {
  <input phase>;
  buffer[i] = red;
}
 #pragma omp barrier
for (int k = 0; k != ceil(log2(num_iters)); ++k)
for (size cnt = last_iter; cnt >= pow(2, k); --k)
  buffer[i] op= buffer[i-pow(2,k)];
 #pragma omp for
for (0..<num_iters>) {
  red = InclusiveScan ? buffer[i] : buffer[i-1];
  <scan phase>;
}
}

Diff Detail

Event Timeline

ABataev created this revision.Jun 9 2020, 9:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2020, 9:35 AM
jdoerfert added inline comments.Jun 16 2020, 11:34 AM
clang/lib/CodeGen/CGStmtOpenMP.cpp
3634

This looks pretty much like D81658, right? Could we avoid the duplication and maybe use a templated function (or something else)?

ABataev marked an inline comment as done.Jun 16 2020, 11:38 AM
ABataev added inline comments.
clang/lib/CodeGen/CGStmtOpenMP.cpp
3634

The duplication is quite small. Here we don't need to check for lastprivates update, we need to check for the cancellation region. I don't think that the outlined functions are going to be much simpler and easier to read.

jdoerfert added inline comments.Jun 16 2020, 5:40 PM
clang/lib/CodeGen/CGStmtOpenMP.cpp
3634

I mean, these ~25 lines will exist 3 times at least. 2 times tacking lastprivate, which we can do for the third time at no extra cost, and 2 times with the cancelation RAII. Any change will need to be duplicated 3 times as well, etc.

If we do

template<typename RAII>
void emitWorksharingHelper(..., bool &HasLastprivate)
    if (llvm::any_of(S.getClausesOfKind<OMPReductionClause>(),
                     [](const OMPReductionClause *C) {
                       return C->getModifier() == OMPC_REDUCTION_inscan;
                     })) {
      const auto &&NumIteratorsGen = [&S](CodeGenFunction &CGF) {
        OMPLocalDeclMapRAII Scope(CGF);
        OMPLoopScope LoopScope(CGF, S);
        return CGF.EmitScalarExpr(S.getNumIterations());
      };
      const auto &&FirstGen = [&S](CodeGenFunction &CGF) {
        RAII CancelRegion(CGF, OMPD_for, S.hasCancel());
        (void)CGF.EmitOMPWorksharingLoop(S, S.getEnsureUpperBound(),
                                         emitForLoopBounds,
                                         emitDispatchForLoopBounds);
        // Emit an implicit barrier at the end.
        CGF.CGM.getOpenMPRuntime().emitBarrierCall(CGF, S.getBeginLoc(),
                                                   OMPD_for);
      };
      const auto &&SecondGen = [&S, &HasLastprivate](CodeGenFunction &CGF) {
        RAII CancelRegion(CGF, OMPD_for, S.hasCancel());
        HasLastprivate = CGF.EmitOMPWorksharingLoop(S, S.getEnsureUpperBound(),
                                         emitForLoopBounds,
                                         emitDispatchForLoopBounds);
      };
      emitScanBasedDirective(CGF, S, NumIteratorsGen, FirstGen, SecondGen);
    } else {
      OMPCancelStackRAII CancelRegion(CGF, OMPD_parallel_for, S.hasCancel());
        CGF.EmitOMPWorksharingLoop(S, S.getEnsureUpperBound(), emitForLoopBounds,
                                   emitDispatchForLoopBounds);
    }

We can call it three times:

emitWorksharingHelper<OMPCancelStackRAII>(...);
emitWorksharingHelper<OMPCancelStackRAII>(...);
emitWorksharingHelper<OMPDummyRAII>(...);

Wouldn't that be cleaner?

ABataev marked an inline comment as done.Jun 17 2020, 4:36 AM
ABataev added inline comments.
clang/lib/CodeGen/CGStmtOpenMP.cpp
3634
  1. It requires the new OMPDummyRAII class.
  2. Thу functionality under control of the else branch also must be affected.
  3. We don't always need to check for HasLastprivate.
  4. The NumIteratorsGen might be different.
  5. We'll have a lot of trouble with adding new functionality to this function, especially if this functionality is unique for one construct and not required for others.
jdoerfert added inline comments.Jun 17 2020, 9:32 AM
clang/lib/CodeGen/CGStmtOpenMP.cpp
3634
  1. It requires the new OMPDummyRAII class.

Sure, is it more than this?

struct OMPDummyRAII {
  OMPDummyRAII(...) {}
};
  1. Thу functionality under control of the else branch also must be affected.

Right: sed -e/OMPCancelStackRAII/RAII/ :)

  1. We don't always need to check for HasLastprivate.

There is no cost in doing so though.

  1. The NumIteratorsGen might be different.

Might be or is? Eyeballing this it looks like 3 times exact the same code.

  1. We'll have a lot of trouble with adding new functionality to this function, especially if this functionality is unique for one construct and not required for others.

What new functionality do you expect to be unique between them? If there is stuff planned, we might not want to unify the implementation, if it is just a hunch, we want to go ahead and have a single implemenation so it is easier to read and do global modifications. Finding out that something needs special handling is at the end of the day easier than finding all other (unrelated) locations that need to be changed.

ABataev marked an inline comment as done.Jun 17 2020, 10:03 AM
ABataev added inline comments.
clang/lib/CodeGen/CGStmtOpenMP.cpp
3634

What new functionality do you expect to be unique between them? If there is stuff planned, we might not want to unify the implementation, if it is just a hunch, we want to go ahead and have a single implemenation so it is easier to read and do global modifications. Finding out that something needs special handling is at the end of the day easier than finding all other (unrelated) locations that need to be changed.

I have a different opinion here. When you will some specific implementation for one construct that is not required for other, you can easily miss it, and it might lead to unexpected results, like a compiler crash or something else. Adding step by step is much better, because you not only modify the code for the one particular construct, but also definitely will modify the tests for this construct. Having a common solution, you may easily miss something.

ABataev updated this revision to Diff 271437.Jun 17 2020, 12:03 PM

Rebase + avoid copy paste.

jdoerfert accepted this revision.Jun 18 2020, 7:09 AM

LGTM and cleaner than my sketch, thanks a lot :)

This revision is now accepted and ready to land.Jun 18 2020, 7:09 AM
This revision was automatically updated to reflect the committed changes.