This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Specialize default schedule on a worksharing loop on the NVPTX device.
Needs ReviewPublic

Authored by arpith-jacob on Feb 13 2017, 1:47 PM.

Details

Summary

The default schedule type on a worksharing loop is implementation
defined according to the OpenMP specifications. Currently, the
compiler codegens a doubly nested loop that effectively implements
a schedule of type (static). This is ideal for threads on CPUs.

On the NVPTX and other SIMT GPUs, this schedule provides very poor
performance because consecutive threads in a warp access loop arrays
in a non-coalesced manner. That is, to achieve coalescing, and good
performance, the best schedule is static with a chunk size of 1.

This patch adds support for target devices to select the best default
schedule depending on their architecture. It modifies loop codegen
to generate optimized code for (static,1) on the NVPTX device, i.e.,
by using a single loop instead of a doubly nested loop as is
currently the case.

Diff Detail

Event Timeline

arpith-jacob created this revision.Feb 13 2017, 1:47 PM
ABataev edited edge metadata.Feb 14 2017, 1:06 AM

General comment: do you really need to define default schedule kind in Sema? Could you do it during codegen phase?

include/clang/Basic/OpenMPKinds.h
22–24

No way, these classes should not be used here

247–253

You can't use Sema and OMPClause in Basic, it is not allowed

lib/CodeGen/CGStmtOpenMP.cpp
2215–2218

What's the difference between this code and next else branch?

Hi Alexey,

Thank you for your review. The main difference in the specialized codegen (if vs. else part in CGStmtOpenMP.cpp).

If-part: emitForStaticInit uses the Chunk parameter (else has it set to null).
If-part: does not use EmitIgnoredExpr()

I can combine if- and else- part with appropriate guards over the above two statements if you like.

General comment: do you really need to define default schedule kind in Sema? Could you do it during codegen phase?

Yes, we need to define default schedule in Sema because it has to set getCond() and getInc() in CheckOpenMPLoop(), which is in Sema. For example, the Inc expression is calculated as follows based on the default schedule:

// Loop increment (IV = IV + 1) or (IV = IV + ST) if (static,1) scheduling.
ExprResult Inc =
  DefaultScheduleKind == OMPDSK_static_chunkone
      ? SemaRef.BuildBinOp(CurScope, IncLoc, BO_Add, IV.get(), ST.get())
      : SemaRef.BuildBinOp(CurScope, IncLoc, BO_Add, IV.get(),
                           SemaRef.ActOnIntegerConstant(IncLoc, 1).get());

You can't use Sema and OMPClause in Basic, it is not allowed

Ok, I can move getDefaultSchedule() from OpenMPKinds.cpp and make it a static function in SemaOpenMP.cpp. Then, CheckOpenMPLoop() can directly call this static function. Does this sound okay?

Thanks.

Hi Alexey,

Thank you for your review. The main difference in the specialized codegen (if vs. else part in CGStmtOpenMP.cpp).

If-part: emitForStaticInit uses the Chunk parameter (else has it set to null).
If-part: does not use EmitIgnoredExpr()

I can combine if- and else- part with appropriate guards over the above two statements if you like.

Ok, please try to do it.

General comment: do you really need to define default schedule kind in Sema? Could you do it during codegen phase?

Yes, we need to define default schedule in Sema because it has to set getCond() and getInc() in CheckOpenMPLoop(), which is in Sema. For example, the Inc expression is calculated as follows based on the default schedule:

// Loop increment (IV = IV + 1) or (IV = IV + ST) if (static,1) scheduling.
ExprResult Inc =
  DefaultScheduleKind == OMPDSK_static_chunkone
      ? SemaRef.BuildBinOp(CurScope, IncLoc, BO_Add, IV.get(), ST.get())
      : SemaRef.BuildBinOp(CurScope, IncLoc, BO_Add, IV.get(),
                           SemaRef.ActOnIntegerConstant(IncLoc, 1).get());

The maybe it is better to define a variable with loop increment and define this variable during codegen with 1 or stride (for GPU)? I don't like the idea of adding some kind of default scheduling, that is not defined in standard in Sema, preferer to define it during codegen and somewhere in NVPTX specific runtime support class.

You can't use Sema and OMPClause in Basic, it is not allowed

Ok, I can move getDefaultSchedule() from OpenMPKinds.cpp and make it a static function in SemaOpenMP.cpp. Then, CheckOpenMPLoop() can directly call this static function. Does this sound okay?

Thanks.

Hi Alexey,

Thank you for reviewing this patch.

I don't like the idea of adding some kind of default scheduling, that is not defined in standard in Sema

Actually, "default scheduling" is defined in the OpenMP spec. It is called "def-sched-var" and controls the scheduling of loops. It's value is implementation (compiler) defined. So why not allow the target device to choose this value in the compiler?

http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf

Section 2.3.1: ICV Descriptions, pg 46:
def-sched-var - controls the implementation defined default scheduling of loop regions. There is one copy of this ICV per device.

Section 2.3.2: ICV Initialization, pg 47:
Table 2.1:
def-sched-var   No environment variable      Initial value is implementation defined

Section 2.7.1.1: Determining the Schedule of a Worksharing Loop
When execution encounters a loop directive, the schedule clause (if any) on the directive, and the run-sched-var and def-sched-var ICVs are used to determine how loop iterations are assigned to threads. See Section 2.3 on page 36 for details of how the values of the ICVs are determined. If the loop directive does not have a schedule clause then the current value of the def-sched-var ICV determines the schedule.

I've reworked the patch to handle the default scheduling in Sema and removed the function from OpenMPKind.cpp. Please let me know if this looks good.

I can rewrite the patch as you suggested, involving NVPTX specific RT, but I think the code will look quite ugly.

Hi Alexey,

Thank you for reviewing this patch.

I don't like the idea of adding some kind of default scheduling, that is not defined in standard in Sema

Actually, "default scheduling" is defined in the OpenMP spec. It is called "def-sched-var" and controls the scheduling of loops. It's value is implementation (compiler) defined. So why not allow the target device to choose this value in the compiler?

http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf

Section 2.3.1: ICV Descriptions, pg 46:
def-sched-var - controls the implementation defined default scheduling of loop regions. There is one copy of this ICV per device.

Section 2.3.2: ICV Initialization, pg 47:
Table 2.1:
def-sched-var   No environment variable      Initial value is implementation defined

Section 2.7.1.1: Determining the Schedule of a Worksharing Loop
When execution encounters a loop directive, the schedule clause (if any) on the directive, and the run-sched-var and def-sched-var ICVs are used to determine how loop iterations are assigned to threads. See Section 2.3 on page 36 for details of how the values of the ICVs are determined. If the loop directive does not have a schedule clause then the current value of the def-sched-var ICV determines the schedule.

I've reworked the patch to handle the default scheduling in Sema and removed the function from OpenMPKind.cpp. Please let me know if this looks good.

I can rewrite the patch as you suggested, involving NVPTX specific RT, but I think the code will look quite ugly.

Arpith, yes, there is such variable. But also standard says, that it is device specific. My opinion, all device-specific things must be defined by runtime library or at least runtime-support class, not by Sema. Sema must be as much platform independent as possible