This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add capture for threadprivate variables used in copyin clause
ClosedPublic

Authored by sfantao on Jul 21 2015, 1:46 PM.

Details

Summary

This patch aims at fixing the regression reported in https://llvm.org/bugs/show_bug.cgi?id=24120. The problem is exposed when TLS is used to implement OpenMP threadprivate directive. This patch is not final, but I'd like to gather some feedback.

The problem is described in this code snippet:

int a = 0;
#pragma omp threadprivate(a)

foo() {
  a = 1;

  #pragma omp parallel copyin(a)
  {
    // a lookup is made by each thread to obtain the value of `a` in the master thread.
    // in the TLS implementation this maps to taking the value of the global, so each thread will get its own private copy ,and not the master's, hence the error.

    // will print 0 for all threads other than master
    printf("%d\n",a);
  }

}

This patch aims at fixing the problem by doing:

int a = 0;
#pragma omp threadprivate(a)

foo() {
  a = 1;

  // capture a reference to `a` here and pass it to the outlined function
  #pragma omp parallel copyin(a)
  {
    // use the reference passed to the outlined function to load the master value and copy it to the private copy of `a`, local to the thread. 

    // will print 1 for all threads
    printf("%d\n",a);
  }

}

Notwithstanding this fixes https://llvm.org/bugs/show_bug.cgi?id=24120, this change seems profitable also if no TLS is used: only the master will do the (expensive) lookup at the cost of passing an extra reference to the outlined function, instead of having all the threads doing the exact same look-up.

This patch would fix the problem for the snippet above, but not if the use of the privatized global is not closely nested in the parallel region that has the copyin clause. So if we have:

int a = 0;
#pragma omp threadprivate(a)

foo() {
  a = 1;

  #pragma omp parallel copyin(a) // <-- Capture 'a' here
  {
    #pragma omp critical // <-- do NOT capture 'a' here
    {
      printf("%d\n",a);
    }
  }
}

We want to capture a only for the scope that is closely nested by the parallel directive. For the other innermost scopes, using the global is just fine, and we don't want to pass another reference to the outlined function for nothing.
The natural place to handle the capture seems to be IsOpenMPCapturedVar, but in my understanding (please correct me if I'm wrong) there is no logic to selectively capture something at a given level without capturing it for all the scopes until the use is reached.

Should this logic be added? Do you have other ideas on how to tackle this?

Let me know your thoughts.

Thanks!
Samuel

Diff Detail

Event Timeline

sfantao updated this revision to Diff 30282.Jul 21 2015, 1:46 PM
sfantao retitled this revision from to [OpenMP] Add capture for threadprivate variables used in copyin clause.
sfantao updated this object.
sfantao added reviewers: ABataev, hfinkel, rjmccall.
sfantao added a subscriber: cfe-commits.
ABataev added inline comments.Jul 21 2015, 9:24 PM
lib/CodeGen/CGStmtOpenMP.cpp
230–238

DeclRefExpr DRE(const_cast<VarDecl *>(VD),
CapturedStmtInfo->lookup(VD) != nullptr,
(*IRef)->getType(), VK_LValue, (*IRef)->getExprLoc());
auto *MasterAddr = EmitLValue(&DRE).getAddress();

lib/Sema/SemaOpenMP.cpp
160–667

I'd better write a new function isOpenMPPrivateOrCopyin() function and used it instead of isOpenMPPrivate() in Sema::IsOpenMPCapturedVar(). Besides, note, that copyin variables must be captured only if TLS mode is on, if we're using runtime calls we don't need to capture threadprivates.

ABataev added inline comments.Jul 21 2015, 9:29 PM
lib/Sema/SemaOpenMP.cpp
160–667

If you want to capture copyin variable only in closely nested OpenMP region, it is enough just to make next changes in Sema::IsOpenMPCapturedVar():
...
auto DVarPrivate = DSAStack->getTopDSA(VD, DSAStack->isClauseParsingMode());
if (DVarPrivate.CKind != OMPC_unknown && (isOpenMPPrivate(DVarPrivate.CKind) || (getLangOpts().OpenMPTLS && DVarPrivate.CKind == OMPC_copyin))
...

sfantao updated this revision to Diff 30526.EditedJul 23 2015, 2:54 PM
sfantao updated this object.

Add logic to capture variables in the copyin clause even if there is no uses in the captured region. We need to capture it anyway because there may be uses that are not visible in compilation unit.

The new logic also prevents the capture declaration from being propagated to innermost scopes.

This patch correctly fixes the issue, but wanted to make sure this approach is approved so that I can go ahead and update the regression tests.

Thanks!
Samuel

sfantao added inline comments.Jul 23 2015, 3:00 PM
lib/CodeGen/CGStmtOpenMP.cpp
230–246

Done with a small modification: the implementation always expect the CapturedStmtInfo to contain a valid lookup.

lib/Sema/SemaOpenMP.cpp
160–667

The TLS check is now implemented in IsOpenMPCapturedVar. However I am not combining private with copyin because the action takes is slightly different.

160–667

Capturing in the closely nested region is not sufficient: declarations have to be captured even if no uses are found. Please, take a look at the last patch and let me know your thoughts.

ABataev edited edge metadata.Jul 23 2015, 9:27 PM

Samuel, thanks for the review. But please, provide full diff as described in http://llvm.org/docs/Phabricator.html. Sometimes it is important to see full context.

include/clang/Sema/Sema.h
7711 ↗(On Diff #30526)

Copyin variables are private also, so don't need to modify the name of the function

7748–7751 ↗(On Diff #30526)

Clauses arg is not required, capture copyin variables in Sema::ActOnOpenMPRegionEnd() function.

lib/Sema/SemaOpenMP.cpp
663

I don't think that you need so many changes to capture copyin variables. I think the only thing that must be done is just to modify Sema::IsOpenMPCapturedVar() function and instead of isOpenMPPrivate() function use special function that combines isOpenMPPrivate() and check for OMPC_copyin. But the check for copyin must be done only if TLS mode is on

1373

This must be done in Sema::ActOnOpenMPRegionEnd() and only if TLS mode is on.

sfantao updated this revision to Diff 30561.Jul 23 2015, 10:37 PM
sfantao edited edge metadata.

Move forced capture of copyin variables to ActOnOpenMPRegionEnd and fix function names/signatures.

Regression test is not yet included. Will do it once the adopted approach is acceptable.

sfantao added inline comments.Jul 23 2015, 10:47 PM
include/clang/Sema/Sema.h
7711 ↗(On Diff #30561)

Done!

7748–7751 ↗(On Diff #30561)

Done!

lib/Sema/SemaOpenMP.cpp
657

That was my initial approach, however that will not do what is required. Let me explain with an example:

int A = 0, B = 0, C = 0;
#pragma omp threadprivate(A,B,C)

foo(){
  ++A; ++B; ++C;
  #pragma omp parallel copyin(A,B,C)
  {
    ++B;
    #pragma omp some_other_directive
    {
       ++C;
       bar(); // Uses A 
    }
  }
}

If we just change IsOpenMPCapturedVar as you say, this is what happens:

  • A will not be captured so bar() will not get the right value of A
  • C will be captured by some_other_directive, and that is not necessary, and will pose an avoidable overhead.
  • Only B would be captured properly.

I am not saying that there is no better way of doing, but after testing several cases this is the best solution I was able to get. Let me know if you have something different to propose or if you need me to clarify anything.

1360–1364

Done!

ABataev added inline comments.Jul 24 2015, 1:48 AM
lib/Sema/SemaOpenMP.cpp
657

Try this one:

patch
Index: SemaOpenMP.cpp
===================================================================
--- SemaOpenMP.cpp (revision 243097)
+++ SemaOpenMP.cpp (working copy)
@@ -120,6 +120,7 @@
/// from current directive.
OpenMPClauseKind ClauseKindMode;
Sema &SemaRef;
+ bool ForceCapturing;

typedef SmallVector<SharingMapTy, 8>::reverse_iterator reverse_iterator;

@@ -130,7 +131,8 @@

public:
explicit DSAStackTy(Sema &S)
- : Stack(1), ClauseKindMode(OMPC_unknown), SemaRef(S) {}
+ : Stack(1), ClauseKindMode(OMPC_unknown), SemaRef(S),
+ ForceCapturing(false) {}

bool isClauseParsingMode() const { return ClauseKindMode != OMPC_unknown; }
void setClauseParsingMode(OpenMPClauseKind K) { ClauseKindMode = K; }
@@ -146,6 +148,9 @@
Stack.pop_back();
}

+ void forceVarCapturing(bool V) { ForceCapturing = V; }
+ bool isForceVarCapturing() const { return ForceCapturing; }
+
/// \brief If 'aligned' declaration for given variable \a D was not seen yet,
/// add it and return NULL; otherwise return previous occurrence's expression
/// for diagnostics.
@@ -655,7 +660,8 @@
if (DSAStack->getCurrentDirective() != OMPD_unknown) {
if (DSAStack->isLoopControlVariable(VD) ||
(VD->hasLocalStorage() &&
- isParallelOrTaskRegion(DSAStack->getCurrentDirective())))
+ isParallelOrTaskRegion(DSAStack->getCurrentDirective())) ||
+ DSAStack->isForceVarCapturing())
return true;
auto DVarPrivate = DSAStack->getTopDSA(VD, DSAStack->isClauseParsingMode());
if (DVarPrivate.CKind != OMPC_unknown && isOpenMPPrivate(DVarPrivate.CKind))
@@ -1351,13 +1357,18 @@
// This is required for proper codegen.
for (auto *Clause : Clauses) {
if (isOpenMPPrivate(Clause->getClauseKind()) ||
- Clause->getClauseKind() == OMPC_copyprivate) {
+ Clause->getClauseKind() == OMPC_copyprivate ||
+ (getLangOpts().OpenMPUseTLS &&
+ getASTContext().getTargetInfo().isTLSSupported() &&
+ Clause->getClauseKind() == OMPC_copyin)) {
+ DSAStack->forceVarCapturing(Clause->getClauseKind() == OMPC_copyin);
// Mark all variables in private list clauses as used in inner region.
for (auto *VarRef : Clause->children()) {
if (auto *E = cast_or_null<Expr>(VarRef)) {
MarkDeclarationsReferencedInExpr(E);
}
}
+ DSAStack->forceVarCapturing(/*V=*/false);
} else if (isParallelOrTaskRegion(DSAStack->getCurrentDirective()) &&
Clause->getClauseKind() == OMPC_schedule) {
// Mark all variables in private list clauses as used in inner region.

sfantao updated this revision to Diff 30668.Jul 26 2015, 7:27 PM

Add flag to drive the capture of copyin variables on at the the clause level. Add regression tests.

sfantao added inline comments.Jul 26 2015, 7:29 PM
lib/Sema/SemaOpenMP.cpp
657

Thanks Alexey, that works. The new patch uses what you are suggesting and updates the regression tests accordingly.

ABataev accepted this revision.Jul 26 2015, 9:02 PM
ABataev edited edge metadata.
This revision is now accepted and ready to land.Jul 26 2015, 9:02 PM

Committed in r243277.

Alexey, can you recommend this patch to be merged with the release branch to the release coordinator? Given that this fixes a regression in copyin clause, I think it would make sense to do so.

Thanks!
Samuel

Had to do few minimal changes to the regression test to tackle some issues that came up in the buildbot:

r243280
r243285
r243289

Samuel, ask Hans Wennborg to promote your commits to 3.7 branch

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

28.07.2015 3:17, Samuel Antao пишет:

sfantao added a comment.

Had to do few minimal changes to the regression test to tackle some issues that came up in the buildbot:

r243280
r243285
r243289

http://reviews.llvm.org/D11395

hfinkel edited edge metadata.Jul 28 2015, 6:58 AM
hfinkel added a subscriber: hfinkel.

[+Hans]

  • Original Message -----

From: "Alexey Bataev" <a.bataev@hotmail.com>
To: sfantao@us.ibm.com, hfinkel@anl.gov, rjmccall@gmail.com, "a bataev" <a.bataev@hotmail.com>
Sent: Monday, July 27, 2015 10:23:41 PM
Subject: Re: [PATCH] D11395: [OpenMP] Add capture for threadprivate variables used in copyin clause

ABataev added a subscriber: ABataev.
ABataev added a comment.

Samuel, ask Hans Wennborg to promote your commits to 3.7 branch

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

28.07.2015 3:17, Samuel Antao пишет:

sfantao added a comment.

Had to do few minimal changes to the regression test to tackle some
issues that came up in the buildbot:

r243280

r243285

r243289

http://reviews.llvm.org/D11395

http://reviews.llvm.org/D11395

hans added a subscriber: hans.Jul 28 2015, 9:48 AM

Merged r243277, r243280, r243285, and r243289 to 3.7 in r243443.

sfantao closed this revision.Jul 28 2015, 11:22 AM
In D11395#213601, @hans wrote:

Merged r243277, r243280, r243285, and r243289 to 3.7 in r243443.

Thanks Hans!