This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Parallel reduction on the NVPTX device.
ClosedPublic

Authored by arpith-jacob on Feb 9 2017, 6:48 AM.

Details

Summary

This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Diff Detail

Repository
rL LLVM

Event Timeline

arpith-jacob created this revision.Feb 9 2017, 6:48 AM
ABataev added inline comments.Feb 10 2017, 7:41 AM
lib/CodeGen/CGOpenMPRuntime.h
956–962 ↗(On Diff #87799)

Number of parameters is getting too big, maybe it is better to aggregate them into a struct/class?

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
118–133 ↗(On Diff #87799)

It's better to use /// style of comments here

653–675 ↗(On Diff #87799)

Use // instead of ///

963–965 ↗(On Diff #87799)

Comments here?

969–974 ↗(On Diff #87799)

Use ///

1272 ↗(On Diff #87799)

/// style here

1488 ↗(On Diff #87799)

/// here

lib/CodeGen/CGOpenMPRuntimeNVPTX.h
245 ↗(On Diff #87799)

Bo \brief

263 ↗(On Diff #87799)

No \brief

arpith-jacob added inline comments.Feb 10 2017, 7:49 AM
lib/CodeGen/CGOpenMPRuntime.h
956–962 ↗(On Diff #87799)

Thanks Alexey for your comments. I can place 'WithNoWait, SimpleReduction, ReductionKind' in a struct.

Can you explain what 'SimpleReduction' stands for? It isn't create to me when the reduction is simple...

Thanks.

Updated patch to address Alexey's comments. Condensed parameters in emitReduction() to a struct Options.

arpith-jacob marked 9 inline comments as done.Feb 12 2017, 12:59 PM

Minor fixup of comment style on emitInterWarpCopyFunction().

This revision is now accepted and ready to land.Feb 13 2017, 12:55 AM
This revision was automatically updated to reflect the committed changes.