This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Target directive host codegen
AbandonedPublic

Authored by sfantao on Jul 20 2015, 11:09 AM.

Details

Summary

This patch implements the outlining for offloading functions for code annotated with the OpenMP target directive. It uses a temporary naming of the outlined functions that will have to be updated later on once target side codegen and registration of offloading libraries is implemented - the naming needs to be made unique in the produced library.

Unlike other captured regions, target offloading cannot use directly the Capture declaration, as each captured field has to be passed explicitly to the runtime library and associated with potentially different mapping types (to/from/alloc...). Therefore, some tweaking in the function prologue codegen is required.

The current implementation still do not support capturing of global variables. That requires a modification in Sema that I will propose in a separate patch.

Thanks!
Samuel

Diff Detail

Event Timeline

sfantao updated this revision to Diff 30172.Jul 20 2015, 11:09 AM
sfantao retitled this revision from to [OpenMP] Target directive host codegen.
sfantao updated this object.
sfantao added reviewers: ABataev, hfinkel, rjmccall.
sfantao added a subscriber: cfe-commits.
ABataev added inline comments.Jul 20 2015, 10:50 PM
lib/CodeGen/CGOpenMPRuntime.h
188–212

Seems to me these enums are not used right now, must be removed

lib/CodeGen/CGStmt.cpp
2212–2240 ↗(On Diff #30172)

Why do you need all this stuff here? Why you can't use variables captured in CapturedDecl?

sfantao updated this revision to Diff 30264.Jul 21 2015, 9:44 AM

Remove unused enums.

Alexey,

Thanks for the review! I am addressing your remarks as inline comments.

Thanks again,
Samuel

lib/CodeGen/CGOpenMPRuntime.h
188–212

Done!

lib/CodeGen/CGStmt.cpp
2215–2245 ↗(On Diff #30264)

There is only one argument that comes with the CaptureDecl and that is the anonymous struct that captures all the references.

For the target outlined functions we have to pass each capture separately because not only are they arguments but also data mappings. Therefore we don't have to generate the anonymous struct at all and use the captures directly, signaling the proper map types. This follows what was discussed in the offloading infrastructure proposal - the compiler will generate arrays with each argument/mapping and forward that to the target specific plugin. The target function is therefore expected to have the same signature of the host function that is being generated by this patch, so that the order in the array is consistent with the signature.

The code in this patch that is generating the host version of the target region can therefore be completely reused to generate the device version.

ABataev added inline comments.Jul 21 2015, 8:56 PM
lib/CodeGen/CGOpenMPRuntime.h
188–203

Seems to me these enums are used in CGOpenMPRuntime.cpp, so we should not declare them in CGOpenMPRuntime interface.

lib/CodeGen/CGStmt.cpp
2215–2245 ↗(On Diff #30264)

Could we add required functionality to CapturedStmt to make it pass mappings also, if required? Currently we have to expose some things that better to stay hidden like VLAMap, modify StartFunction(), when we could add required parameters in Sema analysis and get all required stuff for free, without any additional changes in codegen

rjmccall added inline comments.Jul 23 2015, 9:44 PM
lib/CodeGen/CGOpenMPRuntime.cpp
3041

You already know the element types for all of these. The code will be much more readable if you just use those types directly.

3071

Please move the creation of this global out of this initializer list.

lib/CodeGen/CGStmt.cpp
2215–2245 ↗(On Diff #30264)

Okay, so from what I understand, __tgt_target is just a placeholder call that some later pass will rewrite to invoke a target-specific plugin that behaves like the host function. Fine, but:

  • Doesn't it still need to get passed the host function?
  • Why does it return an i32 instead of just an i1, if you're going to rewrite it anyway?
  • It would be much clearer to name it something actually intrinsic-like, like "omp.target.call".

As for the signature: as we've discussed before, because the captured statement invocation function isn't actually exposed anywhere, its physical signature is really completely an implementation detail of the captured statement emitter. So I have no problem with this kind of change, but you should really design it to be flexible for different captured-statement emitters (by, say, taking a lambda or something similar) rather than hardcoding the specific needs of this one.

lib/CodeGen/CodeGenFunction.cpp
769 ↗(On Diff #30264)

Please find some way of doing this that does not rely on changing common infrastructure like StartFunction.

1550 ↗(On Diff #30264)

What do you need this for that you can't use getVLASize for?

rjmccall added inline comments.Jul 23 2015, 9:44 PM
lib/CodeGen/CGOpenMPRuntime.cpp
865–866

Spurious "to" at the start.

2925

getTypeStoreSize()

2934

You should ask the ASTContext to compute this size instead of making assumptions about the layout size of the IR type.

Also, what are the semantics supposed to be for mapping to and from? Do referents need to be trivially copyable? What if there are pointers or references? What happens to virtual bases?

2944

Same thing, please ask the ASTContext.

Also, you might need to care about variably-sized types here.

3013

This is creating a bunch of dynamic allocas instead of just temporary values. Please call CreateMemTemp with an array type instead.

3018

The sizes aren't constant if you've captured a VLA. But this comment is actually just wrong, because this isn't building something for the sizes at all; it's building something for the flags.

That said, I think you ought to be able to do the same thing with the sizes when you don't have a VLA.

3021

Please comment boolean arguments like this:

/*constant*/ true

And please mark this variable unnamed_addr.

3048

It makes more sense to emit the store next to the GEP. This will also let you more easily skip all the code associated with an array when you can emit it as a constant.

ABataev added inline comments.Jul 23 2015, 10:01 PM
lib/CodeGen/CGStmt.cpp
2215–2245 ↗(On Diff #30264)

John, __tgt_target is not a placeholder, this is a real name of offloading runtime function. Runtime library for offloading is not a part of LLVM yet. You can find it here https://github.com/clang-omp/libomptarget. Runtime functions are declared here https://github.com/clang-omp/libomptarget/blob/master/src/omptarget.h

rjmccall added inline comments.Jul 23 2015, 11:16 PM
lib/CodeGen/CGStmt.cpp
2215–2245 ↗(On Diff #30264)

Oh, I see. It's the host-pointer argument that's meant to uniquely determine the operation to perform on the target, and there's going to be some separate registration step that will associate that with a target-specific implementation?

Are you really adding this otherwise-unnecessary use of libffi just so that you can pass the captures as (indirect) arguments for no particular reason? You're free to make that decision, but I think it's a poor one. If you're assuming that all the migrating captures are POD — and I don't see how that's avoidable — then you'd be better off putting all the captured variables into a single allocation and presenting the target with the base address of that.

sfantao updated this revision to Diff 32211.Aug 14 2015, 7:31 PM

This patch tries to avoid as much as possible changing the common infrastructure, by adapting the CapturedDecl creation in SEMA and by adding support to a second type of capture - ImplicitParamDecl (on top of the existent FieldDecl).

Also adds codegen for device clause as the Parsing and SEMA support was added in the meantime.

The regression test was not updated yet. I wanted to make sure the direction this is taking is approved before diving into that.

Thanks!
Samuel

Alexey, John,

Thanks for the review! I've tried to address your concerns in the last diff. Please, check the inlined comments to find answers for the remarks of the previous diff.

Thanks again!
Samuel

lib/CodeGen/CGOpenMPRuntime.cpp
865–866

Fixed!

2925

Now using getTypeSizeInChars from the ASTcontext as suggested bellow.

2934

Using using getTypeSizeInChars from the ASTcontext.

This patch only deals with trivially copiable types. By default, a variable that is captured in the target region is mapped "by value" using a to-from policy. In order to do something different than the default, the user has to use a map clause (I'll send out a patch for it once we have the Parsing/SEMA in place). As per the current spec, the map clause allows a user to map the pointee of a pointer as well as only mapping a section of an array or pointee. In the next version of the OpenMP spec we will have the ability to map aggregate fields and more support for "deep" copy. We expect to be able to handle all the cases with the proper flags in OpenMPOffloadMappingFlags.

About virtual bases: OpenMP 4.0 forbids virtual members in mappable variable. Nevertheless, it is possible this constraint be lifted in future versions.

2944

Now using ASTContext.

I was planing to deal with the VLA types once I add support for the map clause, but I agree it makes more sense to do it now. Thanks for the suggestion.

3013

Done!

3018

Sorry for the error in the comment. It is fixed now.

Also, I added code to deal with constant sizes for when we don't have VLAs.

3021

Added comments for the boolean.

Setting unnamed_addr now, thanks!

3041

Ok, now using the types explicitly.

3048

Moved the stores next to the GEP. Also skipping the sizes for when no VLAs are used.

3071

Done!

lib/CodeGen/CGStmt.cpp
2215–2245 ↗(On Diff #30264)

Correct, there is going to be a separate registration step where all the host pointers are going to be captured in a descriptor and passed to the OpenMP runtime library (__tgt_register_lib()). I'll contribute that in a separate patch.

libffi is not required unless one wants to do offloading to some host thread. The host version of the target region is not launched from the runtime library. If the runtime decides to execute the target region on the host it should return an error code in tgt_target and it will just work. What I think is key here is:

  • To have the runtime be more efficient while scanning the mapped pointers/sizes/map types, and using arrays is a good solution for that.
  • For offloading APIs I have experience with, e.g. CUDA, having an array of pointers is exactly what is expected. By passing the parameters separately will let the implementation decide what is best for the target, usually use some dedicated memory or registers for parameters as defined in the calling/launching convention it has in place (in the limit it could implement a single allocation and load the parameters from there as you are suggesting).

Let me know if you need me to clarify any detail about the interface.

lib/CodeGen/CodeGenFunction.cpp
769 ↗(On Diff #30264)

In the new patch CaptureDecl was changed and CodeGenFunction::PrepareOMPTargetDirectiveBodyEmission was created so that this code is not required anymore.

1550 ↗(On Diff #30264)

What I need is the size of each dimension and what getVLASize returns is the size of the type. I can probably find ways of doing this without this new member function, but that will be definitely more messy. Should I do that? Any suggestion?

ABataev added inline comments.Aug 16 2015, 8:49 PM
lib/CodeGen/CGExpr.cpp
1969 ↗(On Diff #32211)

Samuel, why you don't want to capture all used variables in CapturedDecl instead of creating ImplicitParamDecl for each captured variable? Also, you will solve possible trouble with VLAs automatically using CapturedDecl.

lib/CodeGen/CGOpenMPRuntime.cpp
2924

Use CGM.getContext().getTypeSizeInChars() instead of CGM.getContext().getTypeSize() / 8.

2925

Maybe llvm::ConstantInt::get(CGF.SizeTy, SizeVal)?

2934

Use CGM.getContext().getTypeSizeInChars() instead of CGM.getContext().getTypeSize() / 8.

2951

The same

lib/CodeGen/CGStmtOpenMP.cpp
2140–2141

MakeNaturalAlignAddrLValue(ThisRef, ri->getType())?

2143

Extra semicolon

sfantao updated this revision to Diff 32322.Aug 17 2015, 9:59 AM

Adress reviewers concerns.

Also fix issue with target regions with no arguments and in the VLA size computation I found in the meantime.

Alexey,

Thanks for the review! Find my comments inlined.

Thanks again!
Samuel

lib/CodeGen/CGExpr.cpp
1969 ↗(On Diff #32211)

Alexey, I'm not sure I understand what you mean here. Unlike the other captured regions, the target region outlined function does not take a context that captures all the variables in fields of a record as argument. Instead, it takes all the captured references as arguments. This will enable the device runtime library to decide what is best to pass the arguments to the device (see my response to John's question in my previous diff).

It happens that all the machinery in the common infrastructure that creates the outlined functions (CodeGenFunction::StartFunction and GenerateCapturedStmtFunction) is prepared to get the context record from the CapuredDecl. Therefore, in order to not disrupt the common infrastructure, in Sema::ActOnOpenMPRegionEnd I am creating a new CapturedDecl that contains implicit parameters. I gather the information to build the new CapturedDecl from the CapturedDecl that is created with the context argument and the RecordDecl fields so that I don't need to touch the capturing code in Sema.

Having CapturedDecl with implicit parameters will drive StartFunction to create the outlined region with the right signature without having to change anything in there. I only had to guard the initialization of VLAs and 'this' in GenerateCapturedStmtFunction to not do anything that expects the context argument. However, during the emission of the VLAs that happens in StartFunction, the emission of these implicit parameters is attempted based on references that are marked as refersToEnclosingVariableOrCapture- this is the reason for the change in EmitDeclRefLValue.

Given that the references in the outlined function statements are still expecting the VLAs expression used in the caller of the outlined function, PrepareOMPTargetDirectiveBodyEmission will make sure that the mapped values to those expressions will be the same as the ones that use the new expression based on implicit parameters.

Let me know if you need me to clarify anything.
Thanks!

lib/CodeGen/CGOpenMPRuntime.cpp
2924

Done!

2925

I agree, it makes more sense to use size_t. Thanks for the suggestion!

2934

done!

2951

Done!

lib/CodeGen/CGStmtOpenMP.cpp
2140–2141

Now using MakeNaturalAlignAddrLValue.

2143

Fixed.

ABataev added inline comments.Aug 19 2015, 4:09 AM
lib/CodeGen/CGExpr.cpp
1969–1970 ↗(On Diff #32322)

Instead I would do the same thing I did for tasks.
Generate captured function as is. But also create the second function with the profile required for target codegen. This function must gather all its parameters into a single record and then call auto generated captured function. This captured function must be marked as AlwaysInline.
In this case you don't need to make some additional changes in Sema for particular processing of target directives, you will just need to generate simple function in codegen.
This may result in a little bit slower performance, but we may improve it later, when we have time to improve codegen for outlined functions for CapturedDecls.
I don't like the idea of reinventing of features, that are invented already, like capturing of VLAs, exposing some private functions (like getVLASizeMap()) etc.
so the code would be like this:

void .omp_outlined.(%captures *ptr) always_inline {
 <captured code>;
}
void .target_omp_outlined.(int* a, float* b, ...) {
  %captures rec;
  rec.a_ref = a;
  rec.b_ref = b;
  ...
  .omp_outlined.(&rec);
}
lib/CodeGen/CGOpenMPRuntime.cpp
3036

Maybe CGF.getContext().getSizeType()?

3134

CGF.Builder.CreateIsNotNull()?

test/OpenMP/target_codegen.cpp
2–4

It would be good to see the tests for 32 bit target.

sfantao updated this revision to Diff 32796.EditedAug 20 2015, 10:07 PM

Implement proxy function for target directive.

Move the creation of the target region parameters from CGOpenMPRuntime::emitTargetCall to CodeGenFunction::EmitTargetDirective because we need to access the VLA Maps of the target enclosing function.

Update regression test and add run directive for 32-bit target.

Thanks for review. The new diff now uses a proxy function. See other comments inlined.

Thanks again!
Samuel

lib/CodeGen/CGExpr.cpp
1969–1970 ↗(On Diff #32322)

Ok, I am now using the proxy function.

lib/CodeGen/CGOpenMPRuntime.cpp
3036

Done!

3134

Done!

test/OpenMP/target_codegen.cpp
2–4

Done!

ABataev added inline comments.Aug 21 2015, 1:29 AM
lib/CodeGen/CGOpenMPRuntime.cpp
2880

I don't think you need this argument. You're emitting a new outlined function here and don't need info about your current function.

2899–2904

You'd better to emit internal function separately in a new static function. Then you don't need to create TargetAuxCGF and TargetMainCGF. You should use just CGF everywhere. One CodeGenFunction instance per function.

2963–2965

Variable names should start with an upper case letter (e.g. Leader or Boats).

3063–3100

Try instead:

SizesArrayInit = llvm::ConstantArray::get(llvm::ArrayType::get(CGM.SizeTy, Sizes.size()), Sizes);
3154–3157

llvm::ConstantPointerNull::get(<type>);

lib/CodeGen/CGOpenMPRuntime.h
190–204

These enums must not be exposed by CGOpenMPRuntime until they are used in arguments of runtime member functions.

751

I don't like the idea of using 'unsigned' as a map type. You should create some particular OpenMPMapClauseKind (just like OpenMPDefaultClauseKind, OpenMPDependClauseKind, OpenMPProcBindClauseKind etc.) and use it where required.

lib/CodeGen/CGStmtOpenMP.cpp
2150–2214

I think this one can be moved to runtime codegen

ABataev edited edge metadata.Aug 21 2015, 1:55 AM

Another one thing I forget to mention. Current implementation of CGOpenMPRuntime is libomp-specific. You're trying to add functionality that is libtarget-specific. Maybe it is a good idea to separate support for libomp and libtarget runtime libraries?

sfantao updated this revision to Diff 32843.Aug 21 2015, 10:51 AM
sfantao edited edge metadata.

Address reviewer concerns.

Another one thing I forget to mention. Current implementation of CGOpenMPRuntime is libomp-specific. You're trying to add functionality that is libtarget-specific. Maybe it is a good idea to separate support for libomp and libtarget runtime libraries?

Not sure what do you mean by separation. Different files? Different codegen class? My perspective is that the two things should be together given that they both address the same specification, and I see that interaction is required between the two components. E.g. teams codegen will have to interact with the target codegen (communicate number of teams/threads ) and the teams codegen will require the libomp interface in its implementation. We could always separate the two things in the future if we see that is a better way to organize the code.

lib/CodeGen/CGOpenMPRuntime.cpp
2880

Done!

2899–2904

Done!

2963–2965

Ok, thought iterators were an exception to that rule. Fixed now!

3063–3100

Done!

3154–3157

Done!

lib/CodeGen/CGStmtOpenMP.cpp
2150–2214

I moved this to codegen, but I still had to forward some VLA information that I can only get from CGF, so that the VLAMaps are exposed. I need that information so that I can initialize the target region parameters.

Two more inlined comments that I forgot to integrate in my previous response.

Thanks!
Samuel

lib/CodeGen/CGOpenMPRuntime.h
190–204

Got it, not exposed anymore.

746

Unlike the other enums, more than one map types need to be combined. E.g., to/from are two different enums. Once the map clause and 4.1 get to be support, we will have more combinations. I see two options here: add enums for all combinations or use a typedef each time an ineger refer to map types, so the code is more readable. Let me know your thoughts.

Samuel, Yes, I thought about different files and different classes. Runtime for offloading codegen is not a part of libomp and it would be good to have separate runtime handler class for target codegen also. We need to think about it in future.

lib/CodeGen/CGOpenMPRuntime.h
188–203

Move them to .cpp file.

746

Yes, I think we need to add separate enums for different combination in Basic/OpenMPKinds.{def, h} for AST. In runtime support library we can represent these combinations as a bit-or of single mapping types.

sfantao updated this revision to Diff 33111.Aug 25 2015, 1:29 PM

Move map type and device id enums from CGOpenMPRuntime.h to CGOpenMPRuntime.cpp.

Thanks for the review!

Samuel, Yes, I thought about different files and different classes. Runtime for offloading codegen is not a part of libomp and it would be good to have separate runtime handler class for target codegen also. We need to think about it in future.

Sure, we can definitely do something like that in the future if we detect a clear place to separate. I guess that when we do that, the design of the offloading-unrelated lib has to change a little. There are things like the capture statement info that may make sense to share between the two implementation.

lib/CodeGen/CGOpenMPRuntime.h
188–203

Done!

746

Ok, I see, it makes sense to do that from the AST for the map clause SEMA. I'm keeping the bit-or in the runtime library call codegen as you say.

ABataev added inline comments.Aug 28 2015, 3:56 AM
lib/CodeGen/CGOpenMPRuntime.cpp
2991–3005

Move them to CGOpenMPRuntime::emitTargetCall(), they can be made local

test/OpenMP/target_codegen.cpp
8

Some of your tests has triple i386, they don't need PowerPC target

sfantao updated this revision to Diff 33640.Aug 31 2015, 4:03 PM

Address last review comments.

lib/CodeGen/CGOpenMPRuntime.cpp
2991–3005

Ok, done!

test/OpenMP/target_codegen.cpp
8

True, I'm not using any target specific property here. So I guess it is safe to remove the requirement. Not using // REQUIRES anymore. Thanks.

Seems good to me, but it would be good if John McCall could look at the patch.

Needs two small changes to work with current trunk

lib/CodeGen/CGStmtOpenMP.cpp
2135–2136

This now has to be S.getSingleClause<OMPIfClause>() and we can therefore omit the cast...

2141–2142

Likewise S.getSingleClause<OMPDeviceClause>() without the need for an extra cast

rjmccall edited edge metadata.Sep 10 2015, 9:46 PM

Sorry for putting off the final review on this; I was heads-down trying to get the alignment patch done. It's looking good; obviously you'll need to update it to work with Addresses properly, but hopefully that won't be too much of a problem.

When you do, maybe you should start a new review; I think there's some way to do that in Phabricator that ties it to the old one. Phabricator seems to not be very happy with the extent to which the code has changed, and the old comments now just make it harder to review the current patch.

sfantao abandoned this revision.Sep 15 2015, 2:34 PM

Closing revision. It has been replaced by http://reviews.llvm.org/D12871 has suggested by John.

Thanks!
Samuel