Page MenuHomePhabricator

[OPENMP] private and firstprivate clauses of teams code generation for nvptx
AcceptedPublic

Authored by carlo.bertolli on Mar 18 2016, 3:37 PM.

Details

Summary

This patch adds initial support for codegen of private and firstprivate clauses of nvptx.
I simply use the same support in host codegeneration, even though optimizations are possible.
Optimizations may be done by having codegen first analyze data sharing attributes on target and teams (e.g. target firstprivate and teams private on the same variable can drop firstprivate on target). These optimizations will be subject of a successive patch.

Diff Detail

Repository
rL LLVM

Event Timeline

carlo.bertolli retitled this revision from to [OPENMP] private and firstprivate clauses of teams code generation for nvptx.
carlo.bertolli updated this object.
carlo.bertolli set the repository for this revision to rL LLVM.
mkuron added a subscriber: mkuron.Mar 20 2016, 9:15 AM
ABataev added inline comments.Mar 20 2016, 11:26 PM
lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
53–57 ↗(On Diff #51080)

After some investigation I found out that this the same code, that a;ready exists in CodeGenFunction::EmitOMPTeamsDirective().
I think you must remove emitTeamsCall() from CGOpenMPRuntimeNVPTX class and modify CodeGenFunction::EmitOMPTeamsDirective() like this:

void CodeGenFunction::EmitOMPTeamsDirective(const OMPTeamsDirective &S) {
  OMPLexicalScope(*this, S);
  // Emit parallel region as a standalone region.
  auto &&CodeGen = [&S](CodeGenFunction &CGF) {
    OMPPrivateScope PrivateScope(CGF);
    (void)CGF.EmitOMPFirstprivateClause(S, PrivateScope);
    CGF.EmitOMPPrivateClause(S, PrivateScope);
    (void)PrivateScope.Privatize();
    CGF.EmitStmt(cast<CapturedStmt>(S.getAssociatedStmt())->getCapturedStmt());
  };
  if (getLangOpts().OpenMPIsDevice)
    emitInlinedDirective(CGF, OMPD_teams, CodeGen);
  else
    emitCommonOMPTeamsDirective(*this, S, OMPD_teams, CodeGen);
}

Hi Alexey

Thanks for your comment. The suggested change will not work as I intended in my patch when using the host as a device. This happens when you select the following options:
-fomptargets=powerpc64le-ibm-linux-gnu -fopenmp-is-device

In this case we generate device code and the target is ppc64. In ppc64 we need to generate a call to kmpc_fork_teams. In your proposed change, we treat all devices in an undistinguished way and we do not generate a call to fork_teams.
There are various reasons why we should not do that, the most clear ones to me being:

  • When using the host as host or as target device we generate different codes. This would mess up with performance profiling.
  • On a host it is still important to have teams as that may be the place where coarse grain parallelism comes from.

If you still want no specialization in CGOpenMPRuntimeNVPTX, we will need to check if we are targeting a device and if that device is an nvptx one.

I know that the problem is that we have two CodeGen objects being created in different places if we target nvptx or host. However, by the way the interface is currently structured, I do not see any way out of this duplication.

Thanks!

To make clear my comment I an updating the patch following Alexey's comment and modifying it to make it work.
You can see that now we have the following check:
CGM.getTarget().getTriple().getArch() == llvm::Triple::nvptx ||
CGM.getTarget().getTriple().getArch() == llvm::Triple::nvptx64

Hahnfeld added inline comments.Mar 24 2016, 5:25 AM
lib/CodeGen/CGStmtOpenMP.cpp
3074–3078 ↗(On Diff #51285)

This now duplicates logic from CodeGenModule::createOpenMPRuntime and has to be extended for every new triple in multiple places.

I would like to propose a new function like virtual bool CGOpenMPRuntime::shouldEmitInlinedTeamsDirective which defaults to false and can be overriden in CGOpenMPRuntimeNVPTX. This way we can later on easily reuse the logic for new device types and target triples.

Hi Jonas

Thanks for your comment. That scheme would not work for #parallel on the gpu in case we do not want to inline the parallel region.
There are patches coming in which that is going to happen so I think that we really need to distinguish between different pragmas.

Hahnfeld edited edge metadata.Mar 24 2016, 10:45 AM

Hi Jonas

Thanks for your comment. That scheme would not work for #parallel on the gpu in case we do not want to inline the parallel region.
There are patches coming in which that is going to happen so I think that we really need to distinguish between different pragmas.

Is a #pragma omp parallel changing the way code is generated for #pragma omp teams? How would that look like in CodeGenFunction::EmitOMPTeamsDirective?

My point is that specialization should happen in CGOpenMPRuntimeNVPTX (by means of overriding) without duplicating code, as Alexey said.

Hi

Thinking more carefully, I believe you are right - no, it is not. I guess we would not call at all shouldEmitInlinedTeamsDirective for the case of paralle.

Thanks!

ABataev edited edge metadata.EditedMar 29 2016, 8:07 PM

Thinking more about this patch I think it is a right solution to override void llvm::Value *emitParallelOrTeamsOutlinedFunction() and void emitTeamsCall() for CGOpenMPRuntimeNVPTX class:

llvm::Value *CGOpenMPRuntimeNVPTX::emitParallelOrTeamsOutlinedFunction() {
  llvm::Value *OutlinedFn = CGOpenMPRuntime::emitParallelOrTeamsOutlinedFunction();
  cast<llvm::Function>(OutlinedFn)->addFnAttr(llvm::Attribute::AlwaysInline);
  return OutlinedFn;
}
void CGOpenMPRuntimeNVPTX::emitTeamsCall() {
  if (!CGF.HaveInsertPoint())
    return;

  Address ZeroAddr =
        CGF.CreateTempAlloca(CGF.Int32Ty, CharUnits::fromQuantity(4),
                             /*Name*/ ".zero.addr");
  CGF.InitTempAlloca(ZeroAddr, CGF.Builder.getInt32(/*C*/ 0));
  llvm::SmallVector<llvm::Value *, 16> OutlinedFnArgs;
  OutlinedFnArgs.push_back(ZeroAddr.getPointer());
  OutlinedFnArgs.push_back(ZeroAddr.getPointer());
  OutlinedFnArgs.append(CapturedVars.begin(), CapturedVars.end());
  CGF.EmitCallOrInvoke(OutlinedFn, OutlinedFnArgs);
}

Also, please update patch to the latest revision

carlo.bertolli edited edge metadata.

[OPENMP] Rebase patch on new support for codegen for teams in nvptx target. private and firstprivate clauses are now dealt with in the teams patch and this one only becomes regression tests.

ABataev accepted this revision.Apr 4 2016, 10:03 PM
ABataev edited edge metadata.

LG

This revision is now accepted and ready to land.Apr 4 2016, 10:03 PM

Hi Carlo,

I think these tests are not yet committed, right?

Thanks,
Jonas

Hi

Yes, the purpose of these patches was to check correctness of code gen for the related clauses when targeting nvptx exclusively.
Since I wrote these tests, the base support changed to generate same code on host and any device type, and it seemed to me that the need for a differentiated regression test was not there any more (happy to be told otherwise).

Of course, there are things that we would like to do a bit different on nvptx, especially for firstprivate, but I have not yet had time to work on this and to reflect this into this regression test. If I do, I would like to update this patch, but it seemed too early to just abandon this.

Thanks and please do let me know what is your viewpoint on this

  • Carlo
Hahnfeld resigned from this revision.Feb 17 2018, 2:01 AM