This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Target directive host codegen
AbandonedPublic

Authored by sfantao on Jul 20 2015, 11:09 AM.

Details

Summary

This patch implements the outlining for offloading functions for code annotated with the OpenMP target directive. It uses a temporary naming of the outlined functions that will have to be updated later on once target side codegen and registration of offloading libraries is implemented - the naming needs to be made unique in the produced library.

Unlike other captured regions, target offloading cannot use directly the Capture declaration, as each captured field has to be passed explicitly to the runtime library and associated with potentially different mapping types (to/from/alloc...). Therefore, some tweaking in the function prologue codegen is required.

The current implementation still do not support capturing of global variables. That requires a modification in Sema that I will propose in a separate patch.

Thanks!
Samuel

Diff Detail

Event Timeline

sfantao updated this revision to Diff 30172.Jul 20 2015, 11:09 AM
sfantao retitled this revision from to [OpenMP] Target directive host codegen.
sfantao updated this object.
sfantao added reviewers: ABataev, hfinkel, rjmccall.
sfantao added a subscriber: cfe-commits.
ABataev added inline comments.Jul 20 2015, 10:50 PM
lib/CodeGen/CGOpenMPRuntime.h
188–212

Seems to me these enums are not used right now, must be removed

lib/CodeGen/CGStmt.cpp
2108–2109

Why do you need all this stuff here? Why you can't use variables captured in CapturedDecl?

sfantao updated this revision to Diff 30264.Jul 21 2015, 9:44 AM

Remove unused enums.

Alexey,

Thanks for the review! I am addressing your remarks as inline comments.

Thanks again,
Samuel

lib/CodeGen/CGOpenMPRuntime.h
188–212

Done!

lib/CodeGen/CGStmt.cpp
2108–2109

There is only one argument that comes with the CaptureDecl and that is the anonymous struct that captures all the references.

For the target outlined functions we have to pass each capture separately because not only are they arguments but also data mappings. Therefore we don't have to generate the anonymous struct at all and use the captures directly, signaling the proper map types. This follows what was discussed in the offloading infrastructure proposal - the compiler will generate arrays with each argument/mapping and forward that to the target specific plugin. The target function is therefore expected to have the same signature of the host function that is being generated by this patch, so that the order in the array is consistent with the signature.

The code in this patch that is generating the host version of the target region can therefore be completely reused to generate the device version.

ABataev added inline comments.Jul 21 2015, 8:56 PM
lib/CodeGen/CGOpenMPRuntime.h
188–203

Seems to me these enums are used in CGOpenMPRuntime.cpp, so we should not declare them in CGOpenMPRuntime interface.

lib/CodeGen/CGStmt.cpp
2108–2109

Could we add required functionality to CapturedStmt to make it pass mappings also, if required? Currently we have to expose some things that better to stay hidden like VLAMap, modify StartFunction(), when we could add required parameters in Sema analysis and get all required stuff for free, without any additional changes in codegen

rjmccall added inline comments.Jul 23 2015, 9:44 PM
lib/CodeGen/CGOpenMPRuntime.cpp
3070

You already know the element types for all of these. The code will be much more readable if you just use those types directly.

3100

Please move the creation of this global out of this initializer list.

lib/CodeGen/CGStmt.cpp
2108–2109

Okay, so from what I understand, __tgt_target is just a placeholder call that some later pass will rewrite to invoke a target-specific plugin that behaves like the host function. Fine, but:

  • Doesn't it still need to get passed the host function?
  • Why does it return an i32 instead of just an i1, if you're going to rewrite it anyway?
  • It would be much clearer to name it something actually intrinsic-like, like "omp.target.call".

As for the signature: as we've discussed before, because the captured statement invocation function isn't actually exposed anywhere, its physical signature is really completely an implementation detail of the captured statement emitter. So I have no problem with this kind of change, but you should really design it to be flexible for different captured-statement emitters (by, say, taking a lambda or something similar) rather than hardcoding the specific needs of this one.

lib/CodeGen/CodeGenFunction.cpp
770

Please find some way of doing this that does not rely on changing common infrastructure like StartFunction.

1512

What do you need this for that you can't use getVLASize for?

rjmccall added inline comments.Jul 23 2015, 9:44 PM
lib/CodeGen/CGOpenMPRuntime.cpp
894–895

Spurious "to" at the start.

2954

getTypeStoreSize()

2963

You should ask the ASTContext to compute this size instead of making assumptions about the layout size of the IR type.

Also, what are the semantics supposed to be for mapping to and from? Do referents need to be trivially copyable? What if there are pointers or references? What happens to virtual bases?

2973

Same thing, please ask the ASTContext.

Also, you might need to care about variably-sized types here.

3042

This is creating a bunch of dynamic allocas instead of just temporary values. Please call CreateMemTemp with an array type instead.

3047

The sizes aren't constant if you've captured a VLA. But this comment is actually just wrong, because this isn't building something for the sizes at all; it's building something for the flags.

That said, I think you ought to be able to do the same thing with the sizes when you don't have a VLA.

3050

Please comment boolean arguments like this:

/*constant*/ true

And please mark this variable unnamed_addr.

3077

It makes more sense to emit the store next to the GEP. This will also let you more easily skip all the code associated with an array when you can emit it as a constant.

ABataev added inline comments.Jul 23 2015, 10:01 PM
lib/CodeGen/CGStmt.cpp
2108–2109

John, __tgt_target is not a placeholder, this is a real name of offloading runtime function. Runtime library for offloading is not a part of LLVM yet. You can find it here https://github.com/clang-omp/libomptarget. Runtime functions are declared here https://github.com/clang-omp/libomptarget/blob/master/src/omptarget.h

rjmccall added inline comments.Jul 23 2015, 11:16 PM
lib/CodeGen/CGStmt.cpp
2108–2109

Oh, I see. It's the host-pointer argument that's meant to uniquely determine the operation to perform on the target, and there's going to be some separate registration step that will associate that with a target-specific implementation?

Are you really adding this otherwise-unnecessary use of libffi just so that you can pass the captures as (indirect) arguments for no particular reason? You're free to make that decision, but I think it's a poor one. If you're assuming that all the migrating captures are POD — and I don't see how that's avoidable — then you'd be better off putting all the captured variables into a single allocation and presenting the target with the base address of that.

sfantao updated this revision to Diff 32211.Aug 14 2015, 7:31 PM

This patch tries to avoid as much as possible changing the common infrastructure, by adapting the CapturedDecl creation in SEMA and by adding support to a second type of capture - ImplicitParamDecl (on top of the existent FieldDecl).

Also adds codegen for device clause as the Parsing and SEMA support was added in the meantime.

The regression test was not updated yet. I wanted to make sure the direction this is taking is approved before diving into that.

Thanks!
Samuel

Alexey, John,

Thanks for the review! I've tried to address your concerns in the last diff. Please, check the inlined comments to find answers for the remarks of the previous diff.

Thanks again!
Samuel

lib/CodeGen/CGOpenMPRuntime.cpp
894–895

Fixed!

2954

Now using getTypeSizeInChars from the ASTcontext as suggested bellow.

2963

Using using getTypeSizeInChars from the ASTcontext.

This patch only deals with trivially copiable types. By default, a variable that is captured in the target region is mapped "by value" using a to-from policy. In order to do something different than the default, the user has to use a map clause (I'll send out a patch for it once we have the Parsing/SEMA in place). As per the current spec, the map clause allows a user to map the pointee of a pointer as well as only mapping a section of an array or pointee. In the next version of the OpenMP spec we will have the ability to map aggregate fields and more support for "deep" copy. We expect to be able to handle all the cases with the proper flags in OpenMPOffloadMappingFlags.

About virtual bases: OpenMP 4.0 forbids virtual members in mappable variable. Nevertheless, it is possible this constraint be lifted in future versions.

2973

Now using ASTContext.

I was planing to deal with the VLA types once I add support for the map clause, but I agree it makes more sense to do it now. Thanks for the suggestion.

3042

Done!

3047

Sorry for the error in the comment. It is fixed now.

Also, I added code to deal with constant sizes for when we don't have VLAs.

3050

Added comments for the boolean.

Setting unnamed_addr now, thanks!

3070

Ok, now using the types explicitly.

3077

Moved the stores next to the GEP. Also skipping the sizes for when no VLAs are used.

3100

Done!

lib/CodeGen/CGStmt.cpp
2108–2109

Correct, there is going to be a separate registration step where all the host pointers are going to be captured in a descriptor and passed to the OpenMP runtime library (__tgt_register_lib()). I'll contribute that in a separate patch.

libffi is not required unless one wants to do offloading to some host thread. The host version of the target region is not launched from the runtime library. If the runtime decides to execute the target region on the host it should return an error code in tgt_target and it will just work. What I think is key here is:

  • To have the runtime be more efficient while scanning the mapped pointers/sizes/map types, and using arrays is a good solution for that.
  • For offloading APIs I have experience with, e.g. CUDA, having an array of pointers is exactly what is expected. By passing the parameters separately will let the implementation decide what is best for the target, usually use some dedicated memory or registers for parameters as defined in the calling/launching convention it has in place (in the limit it could implement a single allocation and load the parameters from there as you are suggesting).

Let me know if you need me to clarify any detail about the interface.

lib/CodeGen/CodeGenFunction.cpp
770

In the new patch CaptureDecl was changed and CodeGenFunction::PrepareOMPTargetDirectiveBodyEmission was created so that this code is not required anymore.

1512

What I need is the size of each dimension and what getVLASize returns is the size of the type. I can probably find ways of doing this without this new member function, but that will be definitely more messy. Should I do that? Any suggestion?

ABataev added inline comments.Aug 16 2015, 8:49 PM
lib/CodeGen/CGExpr.cpp
1969

Samuel, why you don't want to capture all used variables in CapturedDecl instead of creating ImplicitParamDecl for each captured variable? Also, you will solve possible trouble with VLAs automatically using CapturedDecl.

lib/CodeGen/CGOpenMPRuntime.cpp
2953

Use CGM.getContext().getTypeSizeInChars() instead of CGM.getContext().getTypeSize() / 8.

2954

Maybe llvm::ConstantInt::get(CGF.SizeTy, SizeVal)?

2963

Use CGM.getContext().getTypeSizeInChars() instead of CGM.getContext().getTypeSize() / 8.

2980

The same

lib/CodeGen/CGStmtOpenMP.cpp
2144–2145

MakeNaturalAlignAddrLValue(ThisRef, ri->getType())?

2147

Extra semicolon

sfantao updated this revision to Diff 32322.Aug 17 2015, 9:59 AM

Adress reviewers concerns.

Also fix issue with target regions with no arguments and in the VLA size computation I found in the meantime.

Alexey,

Thanks for the review! Find my comments inlined.

Thanks again!
Samuel

lib/CodeGen/CGExpr.cpp
1969

Alexey, I'm not sure I understand what you mean here. Unlike the other captured regions, the target region outlined function does not take a context that captures all the variables in fields of a record as argument. Instead, it takes all the captured references as arguments. This will enable the device runtime library to decide what is best to pass the arguments to the device (see my response to John's question in my previous diff).

It happens that all the machinery in the common infrastructure that creates the outlined functions (CodeGenFunction::StartFunction and GenerateCapturedStmtFunction) is prepared to get the context record from the CapuredDecl. Therefore, in order to not disrupt the common infrastructure, in Sema::ActOnOpenMPRegionEnd I am creating a new CapturedDecl that contains implicit parameters. I gather the information to build the new CapturedDecl from the CapturedDecl that is created with the context argument and the RecordDecl fields so that I don't need to touch the capturing code in Sema.

Having CapturedDecl with implicit parameters will drive StartFunction to create the outlined region with the right signature without having to change anything in there. I only had to guard the initialization of VLAs and 'this' in GenerateCapturedStmtFunction to not do anything that expects the context argument. However, during the emission of the VLAs that happens in StartFunction, the emission of these implicit parameters is attempted based on references that are marked as refersToEnclosingVariableOrCapture- this is the reason for the change in EmitDeclRefLValue.

Given that the references in the outlined function statements are still expecting the VLAs expression used in the caller of the outlined function, PrepareOMPTargetDirectiveBodyEmission will make sure that the mapped values to those expressions will be the same as the ones that use the new expression based on implicit parameters.

Let me know if you need me to clarify anything.
Thanks!

lib/CodeGen/CGOpenMPRuntime.cpp
2953

Done!

2954

I agree, it makes more sense to use size_t. Thanks for the suggestion!

2963

done!

2980

Done!

lib/CodeGen/CGStmtOpenMP.cpp
2144–2145

Now using MakeNaturalAlignAddrLValue.

2147

Fixed.

ABataev added inline comments.Aug 19 2015, 4:09 AM
lib/CodeGen/CGExpr.cpp
1969–1971

Instead I would do the same thing I did for tasks.
Generate captured function as is. But also create the second function with the profile required for target codegen. This function must gather all its parameters into a single record and then call auto generated captured function. This captured function must be marked as AlwaysInline.
In this case you don't need to make some additional changes in Sema for particular processing of target directives, you will just need to generate simple function in codegen.
This may result in a little bit slower performance, but we may improve it later, when we have time to improve codegen for outlined functions for CapturedDecls.
I don't like the idea of reinventing of features, that are invented already, like capturing of VLAs, exposing some private functions (like getVLASizeMap()) etc.
so the code would be like this:

void .omp_outlined.(%captures *ptr) always_inline {
 <captured code>;
}
void .target_omp_outlined.(int* a, float* b, ...) {
  %captures rec;
  rec.a_ref = a;
  rec.b_ref = b;
  ...
  .omp_outlined.(&rec);
}
lib/CodeGen/CGOpenMPRuntime.cpp
3065

Maybe CGF.getContext().getSizeType()?

3163

CGF.Builder.CreateIsNotNull()?

test/OpenMP/target_codegen.cpp
2–4

It would be good to see the tests for 32 bit target.

sfantao updated this revision to Diff 32796.EditedAug 20 2015, 10:07 PM

Implement proxy function for target directive.

Move the creation of the target region parameters from CGOpenMPRuntime::emitTargetCall to CodeGenFunction::EmitTargetDirective because we need to access the VLA Maps of the target enclosing function.

Update regression test and add run directive for 32-bit target.

Thanks for review. The new diff now uses a proxy function. See other comments inlined.

Thanks again!
Samuel

lib/CodeGen/CGExpr.cpp
1969–1971

Ok, I am now using the proxy function.

lib/CodeGen/CGOpenMPRuntime.cpp
3065

Done!

3163

Done!

test/OpenMP/target_codegen.cpp
2–4

Done!

ABataev added inline comments.Aug 21 2015, 1:29 AM
lib/CodeGen/CGOpenMPRuntime.cpp
2909

I don't think you need this argument. You're emitting a new outlined function here and don't need info about your current function.

2928–2933

You'd better to emit internal function separately in a new static function. Then you don't need to create TargetAuxCGF and TargetMainCGF. You should use just CGF everywhere. One CodeGenFunction instance per function.

2992–2994

Variable names should start with an upper case letter (e.g. Leader or Boats).

3092–3129

Try instead:

SizesArrayInit = llvm::ConstantArray::get(llvm::ArrayType::get(CGM.SizeTy, Sizes.size()), Sizes);
3183–3186

llvm::ConstantPointerNull::get(<type>);

lib/CodeGen/CGOpenMPRuntime.h
190–204

These enums must not be exposed by CGOpenMPRuntime until they are used in arguments of runtime member functions.

766

I don't like the idea of using 'unsigned' as a map type. You should create some particular OpenMPMapClauseKind (just like OpenMPDefaultClauseKind, OpenMPDependClauseKind, OpenMPProcBindClauseKind etc.) and use it where required.

lib/CodeGen/CGStmtOpenMP.cpp
2154–2218

I think this one can be moved to runtime codegen

ABataev edited edge metadata.Aug 21 2015, 1:55 AM

Another one thing I forget to mention. Current implementation of CGOpenMPRuntime is libomp-specific. You're trying to add functionality that is libtarget-specific. Maybe it is a good idea to separate support for libomp and libtarget runtime libraries?

sfantao updated this revision to Diff 32843.Aug 21 2015, 10:51 AM
sfantao edited edge metadata.

Address reviewer concerns.

Another one thing I forget to mention. Current implementation of CGOpenMPRuntime is libomp-specific. You're trying to add functionality that is libtarget-specific. Maybe it is a good idea to separate support for libomp and libtarget runtime libraries?

Not sure what do you mean by separation. Different files? Different codegen class? My perspective is that the two things should be together given that they both address the same specification, and I see that interaction is required between the two components. E.g. teams codegen will have to interact with the target codegen (communicate number of teams/threads ) and the teams codegen will require the libomp interface in its implementation. We could always separate the two things in the future if we see that is a better way to organize the code.

lib/CodeGen/CGOpenMPRuntime.cpp
2909

Done!

2928–2933

Done!

2992–2994

Ok, thought iterators were an exception to that rule. Fixed now!

3092–3129

Done!

3183–3186

Done!

lib/CodeGen/CGStmtOpenMP.cpp
2154–2218

I moved this to codegen, but I still had to forward some VLA information that I can only get from CGF, so that the VLAMaps are exposed. I need that information so that I can initialize the target region parameters.

Two more inlined comments that I forgot to integrate in my previous response.

Thanks!
Samuel

lib/CodeGen/CGOpenMPRuntime.h
190–204

Got it, not exposed anymore.

761

Unlike the other enums, more than one map types need to be combined. E.g., to/from are two different enums. Once the map clause and 4.1 get to be support, we will have more combinations. I see two options here: add enums for all combinations or use a typedef each time an ineger refer to map types, so the code is more readable. Let me know your thoughts.

Samuel, Yes, I thought about different files and different classes. Runtime for offloading codegen is not a part of libomp and it would be good to have separate runtime handler class for target codegen also. We need to think about it in future.

lib/CodeGen/CGOpenMPRuntime.h
188–203

Move them to .cpp file.

761

Yes, I think we need to add separate enums for different combination in Basic/OpenMPKinds.{def, h} for AST. In runtime support library we can represent these combinations as a bit-or of single mapping types.

sfantao updated this revision to Diff 33111.Aug 25 2015, 1:29 PM

Move map type and device id enums from CGOpenMPRuntime.h to CGOpenMPRuntime.cpp.

Thanks for the review!

Samuel, Yes, I thought about different files and different classes. Runtime for offloading codegen is not a part of libomp and it would be good to have separate runtime handler class for target codegen also. We need to think about it in future.

Sure, we can definitely do something like that in the future if we detect a clear place to separate. I guess that when we do that, the design of the offloading-unrelated lib has to change a little. There are things like the capture statement info that may make sense to share between the two implementation.

lib/CodeGen/CGOpenMPRuntime.h
188–203

Done!

761

Ok, I see, it makes sense to do that from the AST for the map clause SEMA. I'm keeping the bit-or in the runtime library call codegen as you say.

ABataev added inline comments.Aug 28 2015, 3:56 AM
lib/CodeGen/CGOpenMPRuntime.cpp
3020–3034

Move them to CGOpenMPRuntime::emitTargetCall(), they can be made local

test/OpenMP/target_codegen.cpp
9

Some of your tests has triple i386, they don't need PowerPC target

sfantao updated this revision to Diff 33640.Aug 31 2015, 4:03 PM

Address last review comments.

lib/CodeGen/CGOpenMPRuntime.cpp
3020–3034

Ok, done!

test/OpenMP/target_codegen.cpp
9

True, I'm not using any target specific property here. So I guess it is safe to remove the requirement. Not using // REQUIRES anymore. Thanks.

Seems good to me, but it would be good if John McCall could look at the patch.

Needs two small changes to work with current trunk

lib/CodeGen/CGStmtOpenMP.cpp
2139–2140

This now has to be S.getSingleClause<OMPIfClause>() and we can therefore omit the cast...

2145–2146

Likewise S.getSingleClause<OMPDeviceClause>() without the need for an extra cast

rjmccall edited edge metadata.Sep 10 2015, 9:46 PM

Sorry for putting off the final review on this; I was heads-down trying to get the alignment patch done. It's looking good; obviously you'll need to update it to work with Addresses properly, but hopefully that won't be too much of a problem.

When you do, maybe you should start a new review; I think there's some way to do that in Phabricator that ties it to the old one. Phabricator seems to not be very happy with the extent to which the code has changed, and the old comments now just make it harder to review the current patch.

sfantao abandoned this revision.Sep 15 2015, 2:34 PM

Closing revision. It has been replaced by http://reviews.llvm.org/D12871 has suggested by John.

Thanks!
Samuel