This is an archive of the discontinued LLVM Phabricator instance.

[Flang][MLIR][OpenMP] Update OMPEarlyOutlining to support Bounds, MapEntry and declare target globals
ClosedPublic

Authored by agozillon on Aug 24 2023, 7:15 AM.

Details

Summary

This patch is a required change for the device side IR to
maintain apporpiate links for declare target variables to
their global variables for later lowering.

It is also a requirement to clone over map bounds and
entry operations to maintain the correct information for
later lowering of the IR.

It simply tries to clone over the relevant information
maintaining the appropriate links they would have
maintained prior to the pass, rather than redirecting
them to new function arguments which causes a
loss of information in the case of Declare Target
and map information.

Depends on D158734

Diff Detail

Event Timeline

agozillon created this revision.Aug 24 2023, 7:15 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
agozillon requested review of this revision.Aug 24 2023, 7:15 AM

Patch 3/4 of the map changes from the patch series made by: https://reviews.llvm.org/D158732, https://reviews.llvm.org/D158735, https://reviews.llvm.org/D158737 and https://reviews.llvm.org/D158734 that aims to expand the current map support of the OpenMP dialect and lower from Fortran -> MLIR -> LLVM IR, future support on expanding the lowering from the OpenMP dialect to LLVM IR for map on TargetOp (omp target) will be forthcoming for declare target and explicit map variables hopefully in the near future (implicit likely a little while after that).

This patch should not pass CI on its own, patch 4: https://reviews.llvm.org/D158737, should pass provided the dependencies are setup correctly.

A small ping for some attention on this patch if a reviewer can spare some time, thank you very much!

This change seems OK to me - but I have limited experience with the omp device pipeline.

Just one comment - in another change (https://reviews.llvm.org/D158734) I noticed an explicit mention about " Add To+USM mode case"
I am guessing the reason declare variables are handled specially here is because in some cases there is a device version which can be resolved at link time and thus doesn't need passed into the kernel. I guess a safer alternate implementation is to always pass the pointer in - this would work regardless of the memory mode.

flang/lib/Optimizer/Transforms/OMPEarlyOutlining.cpp
228

Commented out code should be removed.

This change seems OK to me - but I have limited experience with the omp device pipeline.

Just one comment - in another change (https://reviews.llvm.org/D158734) I noticed an explicit mention about " Add To+USM mode case"

Yes, a lot of the flow in that segment (and the lowering I'll create patches for on Github eventually) try to mimic segments of Clang's current code generation process. And I think that specific check is something related to a specific case the OpenMP specification requests, but I'd have to look up the specifics as I don't have them on hand at the moment.

I am guessing the reason declare variables are handled specially here is because in some cases there is a device version which can be resolved at link time and thus doesn't need passed into the kernel. I guess a safer alternate implementation is to always pass the pointer in - this would work regardless of the memory mode.

I believe declare target "to/enter" clause variables are statically allocated on the devices data environment, whereas I think declare target "link" variables mapping/lifetime is handled by the runtime using OpenMP's data mapping control rules (emitted to IR as a nullptr and the runtime magics the value into it for the device to use), unsure on "indirect" as it's not something I've looked into handling at the moment. Someone with more OpenMP knowledge can likely fact check me on this, as it's been a little while since I've taken a look at declare targets semantics and I imagine I've gotten something wrong (and learnings always good!). Declare target variables are device side globals for the most part I believe that persist for the length of there data environment (longer perhaps).

However, It's quite possible that it would be a safer implementation, but the lowering, for the moment, is trying to stick somewhat close to the IR emitted from Clang's OpenMP implementation in part to try and achieve a shared implementation but also to make it easier to complete features/reach the standard Clang currently has, the old if it ain't broken don't fix it adage. But I imagine as we progress further we'll deviate, especially as we try to become more performant rather than just conformant.

Hopefully this makes some sense, it's a little late at night here!

flang/lib/Optimizer/Transforms/OMPEarlyOutlining.cpp
228

nice catch, thank you! that one snuck by me it seems.

This change seems OK to me - but I have limited experience with the omp device pipeline.

Just one comment - in another change (https://reviews.llvm.org/D158734) I noticed an explicit mention about " Add To+USM mode case"
I am guessing the reason declare variables are handled specially here is because in some cases there is a device version which can be resolved at link time and thus doesn't need passed into the kernel. I guess a safer alternate implementation is to always pass the pointer in - this would work regardless of the memory mode.

Sorry, I managed to conflate two different issues with the last message (that teaches me for trying to answer one last message when I'm tired I suppose)! The more appropriate answer is that this pass simply outlines the target region into a separate function, isolating it from other instructions (and optimisations to the region which can cause difficulties for the current lowering, which the pass is here to try and avoid). Most map clauses are mapped into arguments to this function then remapped to the target and the getUsedValuesDefinedAbove is a placeholder for implicit map captures that don't get added to map clauses upstream currently (but I have something that's wip downstream that will raise implicit captures into clauses). The issue with this is it loses information in certain cases like declare target as we'd be pointing at a block argument and the declare target interface only applies to globals and we'd be missing symbol information also required for the lowering (but perhaps worked around by renaming the block argument). In other cases we need to clone over specific operations (bounds, map entry and in the future likely fir box).

agozillon updated this revision to Diff 555853.Sep 5 2023, 6:15 AM

remove left over comment

razvanlupusoru accepted this revision.Sep 6 2023, 12:41 PM

Based on last explanation on why cloning is needed, this change seems reasonable to me.

This revision is now accepted and ready to land.Sep 6 2023, 12:41 PM
agozillon marked an inline comment as done.Sep 7 2023, 7:40 AM

Based on last explanation on why cloning is needed, this change seems reasonable to me.

Thank you very much for the review! If anyone else has anything they wish to add please feel free to do so.

razvanlupusoru accepted this revision.Sep 11 2023, 8:11 AM

@razvanlupusoru thank you very much for your review!

I will leave the patch series open until Thursday to give time for further comments/feedback from other reviewers (in particular @kiranchandramohan and @jsjodin) at that point if no further comments/change requests are received I will push it upstream!

agozillon updated this revision to Diff 556779.Sep 14 2023, 6:04 AM

Update based on name changes to operation, just altering the pass slightly with the new type name for the op

TIFitis accepted this revision.Sep 18 2023, 5:17 AM
agozillon updated this revision to Diff 556988.Sep 18 2023, 8:03 PM

Updating early outliner pass to clone box types as it does downstream (also some slight extensions beyond it to handle certain edge cases better), tidying it up and rebasing as well.