This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host
ClosedPublic

Authored by carlo.bertolli on Feb 3 2017, 12:52 PM.

Details

Summary

This patch makes the following additions:

  1. It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation.
  1. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses.

It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code.

Looking forward to comments.

Diff Detail

Repository
rL LLVM

Event Timeline

carlo.bertolli created this revision.Feb 3 2017, 12:52 PM
ABataev edited edge metadata.Feb 7 2017, 11:08 AM

Please provide diff with full context.
Also, the patch is too big. Could you split it into several patches?

lib/CodeGen/CGStmtOpenMP.cpp
1251

Do not use SmallVector as parameter type, use SmallVectorImpl instead.

[OpenMP] Updating the patch to show full context

I will split this patch into smaller patches, but I will keep this open in case I need to refer back to it: some of the changes that can go into independent patches require a justification and this patch shows it.
Thanks for your comments!

I will split this patch into smaller patches, but I will keep this open in case I need to refer back to it: some of the changes that can go into independent patches require a justification and this patch shows it.
Thanks for your comments!

Sounds good, thanks!

General note: many new lambdas are the same, it is worth it to convert them to functions, if possible

lib/CodeGen/CGOpenMPRuntime.h
678

This function has too many parameters, maybe it's worth it to put (some of)them in a struct.

lib/CodeGen/CGStmtOpenMP.cpp
1257

Do we really need the bounds emission generator in a parallel-like stuff?

1284–1288

I believe it's better to do it in Sema. You need to create variables for UB/LB and capture them, so you don't need to add non-parallel stuff to parallel code.

2012–2014

Better to use CGF.EmitScalarConversion()

2016–2018

Same here

carlo.bertolli marked 4 inline comments as done.Feb 15 2017, 6:17 PM

About this:

General note: many new lambdas are the same, it is worth it to convert them to functions, if possible

I added two static functions that return, respectively, two lambdas that are being used in the code gen for various compositions of pragma 'for' (e.g. 'pragma omp for',
'pragma omp for simd', etc.). As these are all the same, I could merge those in two functions.

lib/CodeGen/CGStmtOpenMP.cpp
1257

See next comment below.

1284–1288

About this one: I need help to understand what you precisely mean. I searched how to add a new stmt in the capture list, but by the time I am doing ActOnOpenMPDistributeParallelForDirective, the AssociateStmt list has already been created by parsing the content of the 'distribute parallel for' region.

Let me tell you what the generated code looks currently like and why I chose to have the code gen for the bound parameters in this file.
This is, roughly, what we need to generate:

void main() {
  ; forgive my lazyness in the following two lines..
  %lb.param = (int64_t) %lb
  %ub.param = (int64_t) %ub
  call @kmpc_fork_call (outlined, %lb.param, %ub.param, ..)

}

void outlined(%prev.lb, %prev.ub) {

  %lb = alloca i64,
  %ub = alloca i64,

  store %prev.lb, %lb
  store %prev.ub, %ub
}

A previous patch already takes care of adding the prev.lb and prev.ub to the formal parameter list of the outlined function. This is done in SemaOpenMP.cpp function ActOnOpenMPRegionStart.

The storing of the prev.lb and prev.ub parameters into the #for loop upper and lower bound variables (%lb, %ub) is also done in SemaOpenMP.cpp, CheckOpenMPLoop function. This corresponds to the code in the outlined function shown above.

For the caller, I followed the same logic as for 'taskloop' and 'taskloop simd', where the lower and upper bound actual parameters are populated in CGOpenMPRuntime.cpp, emitTaskLoopCall function.
This corresponds to the code in the caller of fork_call shown above.

Of course, I understand that taskloop is different from 'distribute parallel for', as the former is not a composite one.

Address comments on initial patch after splitting. Added a dependency with sema patch.

ABataev added inline comments.Feb 16 2017, 2:32 AM
lib/CodeGen/CGOpenMPRuntime.cpp
2488

Seems to Chunk = part is erroneous

lib/CodeGen/CGOpenMPRuntime.h
675–680

Use /// style of comments here and add comments to all fields in this struct

682

Please, add comments for this function too

lib/CodeGen/CGStmtOpenMP.cpp
1249–1251

You don't need to use param names here, just types (inside function type)

1284–1288

This is what I meant. You're trying to capture variables %lb.param and %ub.param during codegen. But I think it is better to do it during sema analysis. You need to build some artificial variables for UB and LB and capture them in the captured region, if this is possible

1316

S is not required

1744–1748

Again, could you try to reduce number of parameters here?

1979–1980

Parameters are not needed here

1993

Why you're capturing S here if it is a parameter already?

1997

This can be defined outside of CGParallelFor lambda, too many enclosing levels

2035

Also, could you define it outside of CGParallelFor lambda?

2057

Do not capture S here (moreover, do not capture it by value), it is a parameter already. And move it out of CGParallelFor

2417–2426

No, you can convert lambda to a function and pass a pointer to function instead of lambda

3124–3128

Again, you can define it as a function and pass a pointer to this function in line 3130

4031

Name S is not needed

lib/CodeGen/CodeGenFunction.h
20

Do you really need this include?

2712–2713

Can it be converted into a static function local to a .cpp module? Also, this question is for other similar function, which are using new function_ref types introduced in lines 177-194. If you can do them static, these types also can be moved to .cpp file

carlo.bertolli marked 17 inline comments as done.

[OpenMP] Refactor after split of original patch in two patches (sema and cg). Transform lambdas to static functions, where possible, define default constructors, format, and update against trunk.

carlo.bertolli edited the summary of this revision. (Show Details)Apr 24 2017, 10:39 AM
This revision is now accepted and ready to land.Apr 24 2017, 10:49 AM

Committed revision 301223.

Re-Committed revision 301340.