This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Basic support for a parallel directive in a target region on an NVPTX device.
ClosedPublic

Authored by arpith-jacob on Dec 28 2016, 3:15 PM.

Details

Summary

This patch introduces support for the execution of parallel constructs in a target
region on the NVPTX device. Parallel regions must be in the lexical scope of the
target directive.

The master thread in the master warp signals parallel work for worker threads in worker
warps on encountering a parallel region.

Note: The patch does not yet support capture of arguments in a parallel region so
the test cases are simple.

Diff Detail

Repository
rL LLVM

Event Timeline

arpith-jacob retitled this revision from to [OpenMP] Basic support for a parallel directive in a target region on an NVPTX device..
arpith-jacob updated this object.
arpith-jacob added a subscriber: cfe-commits.
arpith-jacob added inline comments.Dec 28 2016, 3:18 PM
lib/CodeGen/CGOpenMPRuntime.cpp
114 ↗(On Diff #82624)

On the nvptx device, it is illegal for an identifier to contain a dot ('.') so I've modified it here. If there is a better way to do this, please let me know.

ABataev added inline comments.Dec 28 2016, 11:58 PM
lib/CodeGen/CGOpenMPRuntime.cpp
114 ↗(On Diff #82624)

Could you just override this function in CGOpenMPRuntimeNVPTX?

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
127 ↗(On Diff #82624)

I believe this FIXME is not needed as there is already no reminder operation.

130 ↗(On Diff #82624)

s/it/if/g?

325–326 ↗(On Diff #82624)

Why you need double casting, could you just use Bld.CreatePointerBitCastOrAddrSpaceCast()?

339 ↗(On Diff #82624)

auto *

343–345 ↗(On Diff #82624)

Could you use just an array here instead of SmallVector?

358 ↗(On Diff #82624)

Use llvm::None instead of default constructor.

561 ↗(On Diff #82624)

use llvm::ConstantPointerNull::get() instead.

563 ↗(On Diff #82624)

use llvm::ConstantPointerNull::get() instead.

573–575 ↗(On Diff #82624)

No need for braces in one-line substatement

arpith-jacob added inline comments.Dec 29 2016, 2:24 PM
lib/CodeGen/CGOpenMPRuntime.cpp
114 ↗(On Diff #82624)

Alexey, thank you for your review of this patch.

To override getHelperName() in CGOpenMPRuntime.cpp, I will have to move the two classes CGOpenMPRegionInfo and CGOpenMPOutlinedRegionInfo from the anonymous namespace to the header file in CGOpenMPRuntime.h (under protected mode). I would prefer to do this since I will need to use these classes for implementing codegen of other directives in the future.

Is that okay with you?

ABataev added inline comments.Dec 30 2016, 12:36 AM
lib/CodeGen/CGOpenMPRuntime.cpp
114 ↗(On Diff #82624)

Yes, go ahead

Alexey, thank you for your review. I've updated the patch addressing your comments.

  • I experimented with various ways of changing the name of the outlined function. In the end I decided against moving the two classes to the header file. Instead I pass the name to the class via the constructor. The name is returned by a virtual function.
  • I also removed the overloaded implementation of getThreadID(). I'm fine using the default implementation. We do not use the value returned by getThreadID() in our runtime and prefer to recompute it using PTX reserved registers.
arpith-jacob marked 9 inline comments as done.Dec 30 2016, 4:49 PM
ABataev added inline comments.Jan 2 2017, 1:03 AM
lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
539–542 ↗(On Diff #82750)

It is better to emit this code as PrePostAction, so it is called upon exit of any cleanup scope

arpith-jacob added inline comments.Jan 2 2017, 4:38 AM
lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
539–542 ↗(On Diff #82750)

Alexey, do you mean clean up during the execution of the serialized parallel region? Is something like this what you have in mind? Thanks.

auto &&SeqGen = [this, Fn, &CapturedVars, &RTLoc, &Loc](CodeGenFunction &CGF,
                                                        PrePostActionTy &) {
  auto &&CodeGen = [..](..) {
      llvm::Value *Args[] = {RTLoc, ThreadID};
      CGF.EmitRuntimeCall(
          createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_serialized_parallel),
          Args);

      llvm::SmallVector<llvm::Value *, 16> OutlinedFnArgs;
      OutlinedFnArgs.push_back(
          llvm::ConstantPointerNull::get(CGM.Int32Ty->getPointerTo()));
      OutlinedFnArgs.push_back(
          llvm::ConstantPointerNull::get(CGM.Int32Ty->getPointerTo()));
      OutlinedFnArgs.append(CapturedVars.begin(), CapturedVars.end());
      CGF.EmitCallOrInvoke(Fn, OutlinedFnArgs);
  };

  RegionCodeGenTy RCG(CodeGen);
  CommonActionTy Action(
      nullptr, llvm::None,
      createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_end_serialized_parallel),
      {emitUpdateLocation(CGF, Loc), ThreadID});
  RCG.setAction(Action);
  RCG(CGF);
};
tra added a subscriber: tra.Jan 3 2017, 10:27 AM
tra added inline comments.
lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
511–516 ↗(On Diff #82750)

Are two back-to-back syncCTAThreads() intentional or did you mean to call something else to activate workers?

arpith-jacob added inline comments.Jan 3 2017, 10:37 AM
lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
511–516 ↗(On Diff #82750)

Intentional. The first wakes up worker warps to execute the parallel region. The second is for the master warp to wait until the region has completed execution. This is to implement the semantics of an implicit barrier at the end of an OpenMP parallel region.

tra added inline comments.Jan 3 2017, 10:52 AM
lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
511–516 ↗(On Diff #82750)

OK. It may be good to add few comments documenting these assumptions (or point to existing docs) here.

Updated patch based on reviews.

The serialized parallel region executed by the master was modified to call the 'end' runtime call with a PrePostActionTy so that it is called upon exit of any cleanup scope.

Added reference to OpenMP spec regarding implicit barrier at the end of a parallel region.

arpith-jacob marked 2 inline comments as done.Jan 3 2017, 11:48 AM
ABataev added inline comments.Jan 9 2017, 12:49 AM
lib/CodeGen/CGOpenMPRuntime.h
63 ↗(On Diff #82922)

I don't think it is good to name it CommonActionTy, I think it is something specific to NVPTX codegen. Could you give it some specific name + move it somewhere to NVPTX-related part of code (to .cpp file, if possible)

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
365 ↗(On Diff #82922)

Does it really return I1 type? Or I8?

539–542 ↗(On Diff #82750)

Yes, exactly.

Moved CommonActionTy to CGOpenMPRuntimeNVPTX.cpp and renamed it to NVPTXActionTy, allowing us to customize the class in the future, if necessary.

arpith-jacob marked 2 inline comments as done.Jan 9 2017, 4:09 AM
arpith-jacob added inline comments.
lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
365 ↗(On Diff #82922)

Alexey, the runtime function is called by every worker thread. It returns a 'bool' that indicates if the thread has been activated in the parallel region, which is why I used an Int1Ty. Please let me know if you see problems with this. Thanks.

arpith-jacob marked an inline comment as done.

Using CGF.ConvertTypeForMem(Context.getBoolType()) to get the right type for 'bool' rather than using i1.

Use i1 type for bool after all. But this time use the api ConvertType().

ABataev accepted this revision.Jan 9 2017, 10:57 AM
ABataev edited edge metadata.

LG with a nit

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
409 ↗(On Diff #83641)

llvm::None instead of {}

This revision is now accepted and ready to land.Jan 9 2017, 10:57 AM
This revision was automatically updated to reflect the committed changes.