This is an archive of the discontinued LLVM Phabricator instance.

[Flang][OpenMP][MLIR] Filter emitted code depending on declare target and device
ClosedPublic

Authored by skatrak on Apr 5 2023, 10:05 AM.

Details

Summary

This patch adds support for selecting which functions are lowered to LLVM IR from MLIR depending on declare target information and whether host or device code is being generated.

The approach proposed by this patch is to perform the filtering in two stages:

  • An MLIR transformation pass, which is added to the Flang translation flow before the VerifierPass. The functions that are kept are those that match the OpenMP processor (host or device) the compiler invocation is targeting, according to the presence of the -fopenmp-is-device compiler option and declare target information. All functions contaning an omp.target are also kept, regardless of the declare target information of the function, due to the need for keeping target regions visible for both host and device compilation.
  • A filtering step during translation to LLVM IR, which is peformed for those functions that were kept because of the presence of a target region inside. If the targeted OpenMP processor does not match the declare target information of the function, then it is removed from the LLVM IR after its contents have been processed and translated. Since they should only contain an omp.target operation which, in turn, should have been outlined into another LLVM function, the wrapper can be deleted at that point.

Depends on D150328 and D150329.

Diff Detail

Event Timeline

skatrak created this revision.Apr 5 2023, 10:05 AM
Herald added a project: Restricted Project. · View Herald Transcript
skatrak requested review of this revision.Apr 5 2023, 10:05 AM
skatrak updated this revision to Diff 516735.Apr 25 2023, 3:54 AM

Update and add test.

skatrak updated this revision to Diff 518778.May 2 2023, 9:54 AM

Rebase and fix tests

generating functions and then deleting them is costly and will likely not work. We need to not emit them in the first place.

generating functions and then deleting them is costly and will likely not work. We need to not emit them in the first place.

I agree that we should strive to avoid doing unnecessary work rather than discarding it after the fact. However, the current implementation of target region outlining interacts with this in a way that prevents us from doing that in all cases. Let's say we implemented this function filtering patch as a semantic analysis pass:

  • We could forward the information from the frontend about whether we're compiling for the host or device (-fopenmp-is-device) to this new pass.
  • We could add the pass after the ImplicitDeclareTargetCapture pass implemented in D146063 or similar, so implicit declare target information is already propagated and available.
  • Using the knowledge about whether functions are only intended for the host, device or both, and what we're compiling for, we could remove from the parse tree those functions that won't need lowering.
  • For the host that would mean removing declare target device_type(nohost) functions, and for the device it would mean removing declare target device_type(host) and non-declare target functions.

However, the issues come when we consider the interaction of that approach with target regions:

  • Target regions (which should be lowered for the device) can appear inside of host functions.
  • Reverse-offload target regions (which should eventually be lowered for the host) can appear inside of device functions and target regions.
  • The first two points mean that, while generating host code, you must also look for target regions inside device functions and vice versa.
  • Target region outlining currently happens very late, at the MLIR to LLVM IR translation stage, during the lowering of their parent function, when the related omp.target MLIR operation is visited. D147172 was the main patch implementing this, and D147940 is a proposal to extend it for the device.
  • If we remove a function before that point due to being intended for another device, then there won't be any outlining of the target regions defined inside of it either.

I've got a couple of ideas about to handle this problem, but I'm not particularly convinced about any of them:

  • We could relax the requirement of the hypothetical semantics pass to avoid discarding too early functions that contain target regions, so that they can be outlined later. But then they will have to be pruned after translating them from MLIR to LLVM IR, like this patch is doing. We would be implementing the same kind of idea in two different places so that we could reduce the number of functions that we unnecessarily lower. Perhaps it would also be possible to reduce further the work done on these functions by only lowering the target regions inside (perhaps by hijacking ModuleTranslation::convertBlock), and not the rest of operations in them.
  • Another option could be to try to move part of the outlining work into a semantics pass as well, so that we wouldn't have to keep around the parent functions intended for the other device. We would have to do all of the used variable capturing there (which it's not currently implemented AFAIK), so that they can be part of the new function as well. The result of this would be that at the MLIR level there would be some compiler-generated functions with just an omp.target operation inside (not called from anywhere else), together with the rest of functions in the module that apply to the device we're compiling for (possibly containing other omp.target operations that should actually be implemented by the opposite device). We would have to mark these compiler-generated functions somehow to make sure that they are treated differently during MLIR to LLVM IR translation -- for compiler-generated functions, only the contents of the omp.target operation inside should be lowered, whereas for other functions everything should be lowered and any occurrences of that operation should become a kernel call or a reverse-offload call, ignoring the region attached to them.

The implementation in this patch, although not very efficient, is a very simple first approach that doesn't require significantly reworking different parts of offloading support. I'd like to know what you think about this, or if @jsjodin, @domada, @agozillon, @kiranchandramohan or others have any thoughts about it as well. It'd be good to reach some sort of consensus before committing to significant work.

I don't think moving the outlining early on is the best approach since we want this code to be shared with clang. Would it be sufficient to filter the functions that do not contain target regions after the analysis in https://reviews.llvm.org/D146063 somewhere in the front-end and let the MLIR->LLVM IR translation handle the functions that has target regions?

Ping @jdoerfert. Do you have any other thoughts different to what Jan's suggesting? Just looking to make sure we're in the same page to avoid redoing the work multiple times.

skatrak updated this revision to Diff 523385.May 18 2023, 7:56 AM

Rebase and update to depend on declare target work split into multiple patches.

jsjodin added inline comments.Jun 6 2023, 7:50 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
1051 ↗(On Diff #523385)

The maskedFunction information lives only inside this function. Can we make it in to a local variable and pass it in to convertOneFunction?

Ping again in case there any opinions against following Jan's suggestion before starting its implementation. In summary, the idea would be to:

  • Add a semantics pass to filter out functions that are not intended for the device we're compiling, according to implicit/explicit declare target information, except for those with a target region inside.
  • Keep also the approach in this first patch, so that the IR for the enclosing functions where target regions are located is deleted after the target regions themselves are outlined by the OpenMPIRBuilder.

I am thinking that there could be situations where this approach might break. Let's say there's a host-only subroutine f() called by a host-only subroutine g() that contains a target region as well (f() not called from inside the target region). Then, during compilation for the device, f() would be removed in semantics whereas g() would be kept due to it having a target region to be processed later during MLIR to LLVM IR translation. What would happen then with the call to the f() subroutine that would have become undefined?

Should we just replace the body of functions to be deleted in semantics by an empty body or a default 'return' statement instead? Or would it maybe be a preferable alternative to delete these functions altogether but also delete any calls to them? The first option could become difficult if these subroutines/functions return complex data types, and also we may end up generating many unused functions. The only issue I see with the second option is what would happen if function calls are used as part of an expression to initialize some data. I'm leaning towards just deleting all statements except variable declarations and target regions from functions intended for another device, which seems like solves all of the issues above.

Let me know if you have any more thoughts on this, and thank you for taking the time.

Ping again in case there any opinions against following Jan's suggestion before starting its implementation. In summary, the idea would be to:

  • Add a semantics pass to filter out functions that are not intended for the device we're compiling, according to implicit/explicit declare target information, except for those with a target region inside.
  • Keep also the approach in this first patch, so that the IR for the enclosing functions where target regions are located is deleted after the target regions themselves are outlined by the OpenMPIRBuilder.

I am thinking that there could be situations where this approach might break. Let's say there's a host-only subroutine f() called by a host-only subroutine g() that contains a target region as well (f() not called from inside the target region). Then, during compilation for the device, f() would be removed in semantics whereas g() would be kept due to it having a target region to be processed later during MLIR to LLVM IR translation. What would happen then with the call to the f() subroutine that would have become undefined?

Should we just replace the body of functions to be deleted in semantics by an empty body or a default 'return' statement instead? Or would it maybe be a preferable alternative to delete these functions altogether but also delete any calls to them? The first option could become difficult if these subroutines/functions return complex data types, and also we may end up generating many unused functions. The only issue I see with the second option is what would happen if function calls are used as part of an expression to initialize some data. I'm leaning towards just deleting all statements except variable declarations and target regions from functions intended for another device, which seems like solves all of the issues above.

Let me know if you have any more thoughts on this, and thank you for taking the time.

Can we add declarations for them instead of having empty bodies?

Can we add declarations for them instead of having empty bodies?

Since semantics work at the PFT level, I've looked into how to define the Fortran-equivalent to forward declarations. It seems like external procedures could be one solution to this. So it should be possible to delete all function and subroutine definitions intended for another device and add an external declaration to all their callers. These calls and external procedure declarations will remain in MLIR but they will be removed in the MLIR to LLVM IR translation step.

skatrak updated this revision to Diff 535717.Jun 29 2023, 4:30 AM

Implement early function filtering as an MLIR pass.

Herald added a project: Restricted Project. · View Herald Transcript
skatrak edited the summary of this revision. (Show Details)Jun 29 2023, 5:43 AM
skatrak updated this revision to Diff 536167.Jun 30 2023, 3:14 AM

Update after splitting off pass infrastructure into D154194.

skatrak updated this revision to Diff 538106.Jul 7 2023, 5:49 AM

Move MLIR transformation pass to Flang and implement 2-stage filtering to work together with early target region outlining.

jsjodin added inline comments.Jul 13 2023, 6:52 AM
flang/lib/Frontend/FrontendActions.cpp
712

I don't think this is needed. The FIR should already have been filtered by the front end.

flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
49

This could potentially be simplified if the early outlining is run before this pass. I think this is still okay though since the first instruction in the outlined functions is a TargetOp so it should be efficient, and it still works without the early outlining. So I think it is okay to keep as-is.

skatrak updated this revision to Diff 540367.Jul 14 2023, 5:16 AM
skatrak marked 2 inline comments as done.

Address review comments and integrate with recently-landed early target region outlining. Small refactor to improve readability and minimize TargetOp search overhead.

skatrak updated this revision to Diff 540368.Jul 14 2023, 5:19 AM

Move misplaced function.

Thank you for your feedback Jan, your comments should have been addressed.

flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
49

I thought about this, but I think it's still necessary to check here whether there are omp.target ops inside because the outlined functions are host functions in the device pass. So they would by default be filtered out here and the omp.target inside wouldn't reach the MLIR to LLVM IR translation stage.

But you're right. By running this after early outlining, the check should be very quick.

jsjodin accepted this revision.Jul 14 2023, 7:21 AM

LGTM, Please wait for another acceptance.

flang/lib/Optimizer/Transforms/OMPFunctionFiltering.cpp
49

It would be possible to check the early outlining interface to see if the parent function name is set, but that solution is less desirable imo.

This revision is now accepted and ready to land.Jul 14 2023, 7:21 AM
kiranchandramohan requested changes to this revision.Jul 14 2023, 7:38 AM

See comment inline.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

The change in convertOneFunction is intrusive. Code here must not be aware of the OpenMP attributes.

This revision now requires changes to proceed.Jul 14 2023, 7:38 AM
jsjodin added inline comments.Jul 14 2023, 7:45 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

The change in convertOneFunction is intrusive. Code here must not be aware of the OpenMP attributes.

Would it be ensure that the omp.declare_target dialect attribute is available so that convertDialectAttributes can handle everything?

jsjodin added inline comments.Jul 14 2023, 7:47 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

The change in convertOneFunction is intrusive. Code here must not be aware of the OpenMP attributes.

Would it be ensure that the omp.declare_target dialect attribute is available so that convertDialectAttributes can handle everything?

934–944 ↗(On Diff #540368)

The change in convertOneFunction is intrusive. Code here must not be aware of the OpenMP attributes.

Would it be ensure that the omp.declare_target dialect attribute is available so that convertDialectAttributes can handle everything?

Correction: Would it be possible to ensure that the omp.declare_target dialect attribute is available so that convertDialectAttributes can handle everything?

skatrak added inline comments.Jul 14 2023, 8:23 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

The change in convertOneFunction is intrusive. Code here must not be aware of the OpenMP attributes.

The problem here is that the amendOperation flow is called if there is an omp.declare_target attribute explicitly set for the function. However, functions without the attribute must also be treated as declare target with device type = host, so they must be filtered out during device compilation.

So, the options I see are either to access the attribute directly here or ensure that all functions have the attribute, either by setting it here or at an earlier stage. Though it might be enough to mark outlined functions with the attribute, which should be the only ones that will be removed at this stage.

jsjodin added inline comments.Jul 14 2023, 8:28 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

The change in convertOneFunction is intrusive. Code here must not be aware of the OpenMP attributes.

The problem here is that the amendOperation flow is called if there is an omp.declare_target attribute explicitly set for the function. However, functions without the attribute must also be treated as declare target with device type = host, so they must be filtered out during device compilation.

So, the options I see are either to access the attribute directly here or ensure that all functions have the attribute, either by setting it here or at an earlier stage. Though it might be enough to mark outlined functions with the attribute, which should be the only ones that will be removed at this stage.

I think the best option is to modify the early outlining to set the attribute. You can add that to this patch and a test if the current tests don't already cover this.

agozillon added inline comments.Jul 14 2023, 8:37 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

The type of the attribute is just a boolean though, so it has no dependencies on the OpenMPDialect, we already use something similar inside of the getOpenMPBuilder call: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L1311 Is it possible to do a similar segment of code inside of the destructor of ModuleTranslation, where we already have some OpenMP specific code (destruction of the OMPIRBuilder)? And is that a more acceptable approach?

skatrak updated this revision to Diff 540468.Jul 14 2023, 9:59 AM

Remove handling of OpenMP dialect attributes from generic translation. Not needed because the early outlining pass already adds the omp.declare_target attribute to the new function, so the second-stage filter can assume it will be always explicitly set.

This revision is now accepted and ready to land.Jul 14 2023, 10:04 AM
skatrak marked 6 inline comments as done.Jul 14 2023, 10:09 AM
skatrak added inline comments.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

After some testing it turns out that, in this specific case, we can assume that the attribute will be explicitly set by the early outlining. So there is no need to check for implicit host functions here.

The type of the attribute is just a boolean though, so it has no dependencies on the OpenMPDialect, we already use something similar inside of the getOpenMPBuilder call: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L1311 Is it possible to do a similar segment of code inside of the destructor of ModuleTranslation, where we already have some OpenMP specific code (destruction of the OMPIRBuilder)? And is that a more acceptable approach?

I think that is also something we should try to remove from this file if we can. In D147219 I implemented an approach that moves the OMPIRBuilder configuration to the amendOperation flow as well, which has the advantage of allowing the use of OpenMP custom attribute types. It's not been accepted, but I think it's a better way to go about that.

agozillon added inline comments.Jul 14 2023, 10:27 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

While I'm all for doing that, I don't think moving it into amendOperation would work at this time, as the attributes are tied to the ModuleOp and I believe the amendOperation for the ModuleOp is ran as one of the final things in the lowering process. The information in the Config for IRBuilder is required very early on, before lowering global ops (for declare target) which happens prior to even the function lowering which is when TargetOp will currently need it. I could be incorrect though!

skatrak marked 2 inline comments as done.Jul 17 2023, 12:36 AM

Thank you all for the comments, I'll land this patch shortly.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
934–944 ↗(On Diff #540368)

Yes, I noticed that issue as well. If you look at that patch, you can see that I moved the convertOperation() call to the module inside of translateModuleToLLVMIR() to be before translating the functions inside. Doing it in that order means that the OpenMPIRBuilder is initialized before translating the contents of the module. I believe that is a good approach to avoid having dialect-specific attributes handling in the generic ModuleTranslation, since (at least at this time) there is no particular reason why the module should be converted/amended last.