This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
include/flang/Optimizer/Transforms/
-
flang/
-
Optimizer/
-
Transforms/
-
Passes.h
2/2
Passes.td
-
lib/
-
Frontend/
-
FrontendActions.cpp
-
Optimizer/Transforms/
-
Transforms/
1
CMakeLists.txt
4/4
OMPMarkDeclareTarget.cpp
-
test/Lower/OpenMP/
-
Lower/
-
OpenMP/
-
declare-target-data.f90
-
declare-target-func-and-subr.f90
-
declare-target-implicit-func-and-subr-cap.f90
-
declare-target-implicit-tarop-cap.f90
-
host-ir-flag.f90
-
is-device.f90
-
omp-declare-target-data.f90
-
omp-declare-target-func-and-subr.f90
-
omp-host-ir-flag.f90
-
omp-is-device.f90

Differential D154247

[Flang][OpenMP][MLIR] An mlir transformation pass for marking FuncOp's implicitly called from TargetOp's and declare target marked FuncOp's as implicitly declare target
ClosedPublic

Authored by agozillon on Jun 30 2023, 11:50 AM.

Download Raw Diff

Details

Reviewers

sscalpone
kiranchandramohan
jdoerfert
dpalermo
jsjodin
skatrak
domada
raghavendhra
TIFitis
RogerV-AMD

Commits

rG062fce6f4dd1: [Flang][OpenMP][MLIR] An mlir transformation pass for marking FuncOp's…

Summary

This pass will mark functions called from TargetOp's
and declare target functions as implicitly declare
target by adding the MLIR declare target attribute
directly to the function.

This pass executes after the initial lowering of Fortran's PFT
to MLIR (FIR/OMP+Arith etc.) and is one of a series of passes
that aim to clean up the MLIR for offloading (seperate passes
in different patches, one for early outlining, another for declare
target function filtering).

The pass is an upgraded version of the prior MLIR Opt
pass that I created that did the same thing. In this case
better handling of recursive cases, updated to work with
the interface and the ability to work with TargetOp's.

The semantic pass equivalent of this can be found in:
https://reviews.llvm.org/D150323

So we can choose to go either direction, however, this
pass is simpler to maintain and understand and having
it as an transformation pass keeps it consistent with the
other transformation passes that deal with OpenMP
function and target op processing.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

agozillon created this revision.Jun 30 2023, 11:50 AM

Herald added a reviewer: sscalpone. · View Herald TranscriptJun 30 2023, 11:50 AM

Herald added a reviewer: kiranchandramohan. · View Herald Transcript

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: sunshaoce, bzcheeseman, mehdi_amini and 3 others. · View Herald Transcript

agozillon requested review of this revision.Jun 30 2023, 11:50 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 30 2023, 11:50 AM

Herald added subscribers: jplehr, sstefan1, stephenneuendorffer, jdoerfert. · View Herald Transcript

agozillon added reviewers: dpalermo, jsjodin, skatrak, domada, raghavendhra, TIFitis, RogerV-AMD.Jun 30 2023, 11:51 AM

Harbormaster completed remote builds in B242492: Diff 536346.Jun 30 2023, 1:22 PM

I like this approach, and I think it is a more manageable approach than semantics, but it looks to me like this pass could be moved outside of the Flang project and into the OpenMP MLIR dialect as a transformation pass. I may be wrong, so please correct me if that's the case, but I understand that implicit declare target propagation should be common to Fortran and C/C++ OpenMP. So it makes sense to me to have the pass defined where it can be reused.

So I would propose, rather than defining the pass in Flang and propagating declare target information through fir.call ops, building on top of D154194 to create the pass as part of the OpenMP dialect (D147641 shows an example of what I mean) and to propagate declare target through func.call ops. The implementation of the pass itself I think would remain mostly unchanged except from swapping fir.call with func.call, and then it would just be a matter of moving some files around. I think this way the offloading-related passes (implicit declare target, target region outlining and host/device function filtering for now) would be implemented in a consistent and reusable way.

I may be missing some other considerations, so I'd be interested to hear any other opinions you or others may have.

flang/lib/Optimizer/Transforms/OMPCaptureImplicitDeclTar.cpp
79 ↗	(On Diff #536346)	In the case of a regular target region (`omp.target` inside of a host function) this is correct. However, a reverse-offloading target region (`omp.target` inside of a device function and with the 'ancestor' clause set) should instead mark functions called inside as host functions. Currently there is no support for reverse offloading, so maybe it's not important to already mark functions properly here, but at least it would be good to acknowledge this known limitation in a TODO comment.

In D154247#4470828, @skatrak wrote:

I like this approach, and I think it is a more manageable approach than semantics, but it looks to me like this pass could be moved outside of the Flang project and into the OpenMP MLIR dialect as a transformation pass. I may be wrong, so please correct me if that's the case, but I understand that implicit declare target propagation should be common to Fortran and C/C++ OpenMP. So it makes sense to me to have the pass defined where it can be reused.

So I would propose, rather than defining the pass in Flang and propagating declare target information through fir.call ops, building on top of D154194 to create the pass as part of the OpenMP dialect (D147641 shows an example of what I mean) and to propagate declare target through func.call ops. The implementation of the pass itself I think would remain mostly unchanged except from swapping fir.call with func.call, and then it would just be a matter of moving some files around. I think this way the offloading-related passes (implicit declare target, target region outlining and host/device function filtering for now) would be implemented in a consistent and reusable way.

I may be missing some other considerations, so I'd be interested to hear any other opinions you or others may have.

I have not gone through the details. But the function calls in FIR always exist as a fir.call till conversion to llvm.call.

I agree that this is simpler to maintain than the Semantics pass. My preference (not a strong opinion) is to keep this pass in Flang, and be called early on (one of the first passes after lowering) and be looked at as part of the lowering process. Practically, this also gives you a lot of leeway to experiment and make changes. Once we are all clear with the flow then we can think of moving these to the OpenMP Dialect.

I have not gone through the details. But the function calls in FIR always exist as a fir.call till conversion to llvm.call.

There is the CallOpInterface that can be used instead if you do not want to rely on a FIR operation.

In D154247#4470828, @skatrak wrote:

I like this approach, and I think it is a more manageable approach than semantics, but it looks to me like this pass could be moved outside of the Flang project and into the OpenMP MLIR dialect as a transformation pass. I may be wrong, so please correct me if that's the case, but I understand that implicit declare target propagation should be common to Fortran and C/C++ OpenMP. So it makes sense to me to have the pass defined where it can be reused.

So I would propose, rather than defining the pass in Flang and propagating declare target information through fir.call ops, building on top of D154194 to create the pass as part of the OpenMP dialect (D147641 shows an example of what I mean) and to propagate declare target through func.call ops. The implementation of the pass itself I think would remain mostly unchanged except from swapping fir.call with func.call, and then it would just be a matter of moving some files around. I think this way the offloading-related passes (implicit declare target, target region outlining and host/device function filtering for now) would be implemented in a consistent and reusable way.

I may be missing some other considerations, so I'd be interested to hear any other opinions you or others may have.

I would have also preferred to have kept them together as an OpenMP pass, but the pass unfortunately relies on fir.call ops as like @kiranchandramohan says the func.call's aren't a thing in FIR as far as I was aware and I don't believe there is inheritance to dynamically cast between them or infer one is the same as the other, which means it has to sit inside of Flang as FIR isn't part of the MLIR project, we'd need to utilize an interface as @kiranchandramohan suggested (also discussed with @jsjodin but decided it wasn't worth the effort to implement an interface at the moment for this reason alone), however, as @kiranchandramohan mentioned there is a CallOpInterface so perhaps this would work. Although, perhaps there is something I am missing (very likely) and func.call or some other replacement for fir.call other than the interface can be used!

In D154247#4470898, @kiranchandramohan wrote:

In D154247#4470828, @skatrak wrote:

I like this approach, and I think it is a more manageable approach than semantics, but it looks to me like this pass could be moved outside of the Flang project and into the OpenMP MLIR dialect as a transformation pass. I may be wrong, so please correct me if that's the case, but I understand that implicit declare target propagation should be common to Fortran and C/C++ OpenMP. So it makes sense to me to have the pass defined where it can be reused.

So I would propose, rather than defining the pass in Flang and propagating declare target information through fir.call ops, building on top of D154194 to create the pass as part of the OpenMP dialect (D147641 shows an example of what I mean) and to propagate declare target through func.call ops. The implementation of the pass itself I think would remain mostly unchanged except from swapping fir.call with func.call, and then it would just be a matter of moving some files around. I think this way the offloading-related passes (implicit declare target, target region outlining and host/device function filtering for now) would be implemented in a consistent and reusable way.

I may be missing some other considerations, so I'd be interested to hear any other opinions you or others may have.

I have not gone through the details. But the function calls in FIR always exist as a fir.call till conversion to llvm.call.

I agree that this is simpler to maintain than the Semantics pass. My preference (not a strong opinion) is to keep this pass in Flang, and be called early on (one of the first passes after lowering) and be looked at as part of the lowering process. Practically, this also gives you a lot of leeway to experiment and make changes. Once we are all clear with the flow then we can think of moving these to the OpenMP Dialect.

I'm not overly bothered where the opt pass rests at the moment personally (provided I can make it a little more operation agnostic), so I'm happy to change this opt pass to reside in the OpenMP dialect or the FIR dialects depending on the outcome of both (@kiranchandramohan @skatrak) your discussions and if the CallOpInterface yields any success.

I do not think it being part of the OpenMP dialect will prevent it being called early on (it currently happens just before verification, in the same opt pipeline, which I believe is directly after lowering, it is only optional added to the pipeline based on the OpenMP flag being set currently. As far as I am aware this is where the rest will be called as well), but being cautious with testing the flow before making it part of the OpenMP Dialect seems reasonable, although I don't think anything is going to use it in the meantime, at least until/if Clang swaps over to generating MLIR.

flang/lib/Optimizer/Transforms/OMPCaptureImplicitDeclTar.cpp
79 ↗	(On Diff #536346)	I'll see how difficult this is to add, but it may end up being a TODO!

In D154247#4470898, @kiranchandramohan wrote:

I have not gone through the details. But the function calls in FIR always exist as a fir.call till conversion to llvm.call.

You are right, I wasn't aware of this. Then, as you say, the alternative to having this pass work as part of the OpenMP dialect would be to use the CallOpInterface rather than searching for func.call.

In D154247#4470898, @kiranchandramohan wrote:

Practically, this also gives you a lot of leeway to experiment and make changes. Once we are all clear with the flow then we can think of moving these to the OpenMP Dialect.

That is a good point. By the same logic, then it would be probably good to also define the passes for target region outlining and host/device function filtering inside of Flang at first. Once all this stabilizes, then we can think of moving these (and potentially other) transformation passes to the dialect. Even then, it would be worthwhile to try implementing these passes without dependencies to the FIR dialect as much as possible, so that the eventual move won't be too involved.

In D154247#4471118, @agozillon wrote:

I'm not overly bothered where the opt pass rests at the moment personally (provided I can make it a little more operation agnostic), so I'm happy to change this opt pass to reside in the OpenMP dialect or the FIR dialects depending on the outcome of both (@kiranchandramohan @skatrak) your discussions and if the CallOpInterface yields any success.

I think it's fine to keep it as part of Flang for now, but trying to use the CallOpInterface if possible, to remove the dependency on the FIR dialect.

In D154247#4471118, @agozillon wrote:

I do not think it being part of the OpenMP dialect will prevent it being called early on (it currently happens just before verification, in the same opt pipeline, which I believe is directly after lowering, it is only optional added to the pipeline based on the OpenMP flag being set currently. As far as I am aware this is where the rest will be called as well), but being cautious with testing the flow before making it part of the OpenMP Dialect seems reasonable, although I don't think anything is going to use it in the meantime, at least until/if Clang swaps over to generating MLIR.

I agree, I think at the moment it's not really that important where the pass is defined because it will be created in the same spots inside of Flang regardless.

[Flang][MLIR][OpenMP] Make the pass more generic and remove the use of Fir.CallOps via the CallOpInterface
[Flang][MLIR][OpenMP] Add comment for future extension for reverse-offloading

Add missing newlines at eof

Added utilisation of the CallOpInterface in the most recent set of commits so it's now no longer dependent on FIR in it's current incarnation. Also added a TODO for the reverse-offloading support, as unfortunately the lowering from the PFT to the Op doesn't handle the device clause for the moment. Otherwise fixed the standard forgetting newlines at the end of new files that I always do (sorry)!

flang/lib/Optimizer/Transforms/OMPCaptureImplicitDeclTar.cpp
79 ↗	(On Diff #536346)	I've added a TODO comment for the time being, I don't think implementing it will be particularly difficult, the current issue is that there is no lowering of the device clause for the TargetOp so no way to test it unfortunately.

Harbormaster completed remote builds in B243048: Diff 537114.Jul 4 2023, 9:06 AM

skatrak mentioned this in D154194: [MLIR][OpenMP] Set up MLIR transform pass infrastructure for the OpenMP dialect.Jul 6 2023, 7:21 AM

If it would be possible to get some further review on this it would be greatly appreciated! Thank you very much ahead of time.

Nit: Better to refer to this as a transformation pass and not an optimization pass in the title.
Nit: Drop the omp prefix from the test files.
Nit: I think it is better to have tests with 2 spacing and lowercase.

In general (even for c/c++), how does this work for called functions/subroutines that are in a different file? e.g below where module a, module b are in different files.

module a
  use b
contains
  function double() result(x)
    integer :: x
    !$omp declare target
    x = add(10,20)
  end function
end module

module b
contains
  function add(x,y)
    integer :: x, y
    add = x + y
  end function
end module

What is the expected behaviour for operations that might lower to function calls later in the lowering pipeline? A simple example is the math sin function. But there could be others as well.

function mysin(x)
  real :: x
  !$omp declare target
  print *, sin(x)
end function

flang/include/flang/Optimizer/Transforms/Passes.td
301	Prefix the name with OpenMP or OMP. Would `OMPMarkDeclareTargetPass` be a better name?
303

agozillon retitled this revision from [Flang][OpenMP][MLIR] An mlir optimisation pass for marking FuncOp's implicitly called from TargetOp's and declare target marked FuncOp's as implicitly declare target to [Flang][OpenMP][MLIR] An mlir transformation pass for marking FuncOp's implicitly called from TargetOp's and declare target marked FuncOp's as implicitly declare target.Jul 12 2023, 6:58 AM

agozillon edited the summary of this revision. (Show Details)

In D154247#4489011, @kiranchandramohan wrote:
Nit: Better to refer to this as a transformation pass and not an optimization pass in the title.
Nit: Drop the omp prefix from the test files.
Nit: I think it is better to have tests with 2 spacing and lowercase.

In general (even for c/c++), how does this work for called functions/subroutines that are in a different file? e.g below where module a, module b are in different files.
module a
  use b
contains
  function double() result(x)
    integer :: x
    !$omp declare target
    x = add(10,20)
  end function
end module
module b
contains
  function add(x,y)
    integer :: x, y
    add = x + y
  end function
end module

In the above case the pass simply does nothing, it will not mark add as declare target as it's in another compilation unit, and the pass won't fail either (from running a test at least). This is specification defined though from my understanding/reading, so it isn't something uncommon:

"If a procedure appears in an enter clause in the same compilation unit in which the definition of the procedure occurs then a device-specific version of the procedure is created for all devices to which the directive of the clause applies.", Section 5.8.4 of specification 5.2

The to clause got renamed to enter after 5.1, so it's essentially an alias. But in this case to get functioning code, the user would I think have to decorate add with its own declare target directive I believe to make it "legal" OpenMP code. Someone with more experience in OpenMP may be able to shed more light or refute my understanding of it though.

The above sort of also hinges on if two seperate modules in two seperate files are considered seperate compilation units, e.g. it's not like it's some C++ header only library (from doing a test, it doesn't appear to be like that though, but I'm fairly new to Fortran).

What is the expected behaviour for operations that might lower to function calls later in the lowering pipeline? A simple example is the math sin function. But there could be others as well.
function mysin(x)
  real :: x
  !$omp declare target
  print *, sin(x)
end function

In this case, it's not going to apply declare target to it unfortunately, as it's not a function or a call, it's I suppose effectively a builtin at this point, until it's lowered. In theory, you could extend the pass to support marking of these I imagine, or look at it as falling under something defined in another compilation unit e.g. some kind of library.

However, in either case, from my current understanding, I believe it's a non-issue. The function filtering pass will execute immediately after this and provided the function it exists in, in this example. mysin survives the culling, it will just be lowered as normal with no other affects at the moment, so the math.sin function would just go through it's regular lowering, and if it happens to spawn a new function, then that's fine, it will not be harmed or go through any other alterations other than regular lowering.

Pass name change
remove omp from test
rename other tests I've added with similar omp prefix
tidy up test as suggested

Applied reviewer nits in the last series of commits and also removed omp- prefix from previous tests I wrote and added the prefix to while I was there, just to keep consistency.

Harbormaster completed remote builds in B244835: Diff 539616.Jul 12 2023, 1:24 PM

However, in either case, from my current understanding, I believe it's a non-issue. The function filtering pass will execute immediately after this and provided the function it exists in, in this example. mysin survives the culling, it will just be lowered as normal with no other affects at the moment, so the math.sin function would just go through it's regular lowering, and if it happens to spawn a new function, then that's fine, it will not be harmed or go through any other alterations other than regular lowering.

When it is written as different passes, there is no guarantee that these passes will be invoked one after the other. In Flang we can probably control this. Would combining the marking and erasing passes be better? Are there any other uses for having these declare target attributes? From your existing reply, i am assuming that is the only use.

Also, the best example for the situation I mentioned previously is the following.

subroutine sb1(x, y, i)
  real :: x, y
  integer(kind=8) :: i
  !$omp declare target
  y = x ** i
end subroutine

func.func @sb1_(%arg0: !fir.ref<f32> {fir.bindc_name = "x"}, %arg1: !fir.ref<f32> {fir.bindc_name = "y"}, %arg2: !fir.ref<i64> {fir.bindc_name = "i"}) attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
  %0 = fir.load %arg0 : !fir.ref<f32>
  %1 = fir.load %arg2 : !fir.ref<i64>
  %2 = call @__mlir_math_fpowi_f32_i64(%0, %1) : (f32, i64) -> f32
  fir.store %2 to %arg1 : !fir.ref<f32>
  return
}

func.func private @__mlir_math_fpowi_f32_i64(%arg0: f32, %arg1: i64) -> f32 attributes {llvm.linkage = #llvm.linkage<linkonce_odr>} {
  %cst = arith.constant 1.000000e+00 : f32
  ...
}

Some comments inline. Looks Good. Please wait for one more acceptance.

flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp
26	Cant the moduleOp be part of the class so that you do not have to pass it around?
27–33	Would a set or map be better here?

In D154247#4496810, @kiranchandramohan wrote:

However, in either case, from my current understanding, I believe it's a non-issue. The function filtering pass will execute immediately after this and provided the function it exists in, in this example. mysin survives the culling, it will just be lowered as normal with no other affects at the moment, so the math.sin function would just go through it's regular lowering, and if it happens to spawn a new function, then that's fine, it will not be harmed or go through any other alterations other than regular lowering.

When it is written as different passes, there is no guarantee that these passes will be invoked one after the other. In Flang we can probably control this. Would combining the marking and erasing passes be better? Are there any other uses for having these declare target attributes? From your existing reply, i am assuming that is the only use.

if they're both passes built to execute on ModuleOp, are they not guaranteed to execute in-order? I have a very minimal understanding of how the pipeline manager runs, but wouldn't one of the passes have to be either added as a nested pass explicitly or implicitly (only seems to happen if a pass executes on a different Op type than the PassManager is given) for them to not run in the provided sequence? I am not against combining them though if that is something we wish to do @skatrak @jsjodin @kiranchandramohan, I'm just not sure it's necessary in the context of them being run out of order, perhaps someone with more MLIR knowledge than me can feed into the discussion. In either case, it would require a bit of a merger of this patch and https://reviews.llvm.org/D147641 or waiting until one is landed and then having the other rebase and do the merging of components on top of it.

For the moment, for functions it's the only use. For global data the attribute has a different use-case, which another patch is up for: https://reviews.llvm.org/D149368

Also, the best example for the situation I mentioned previously is the following.

subroutine sb1(x, y, i)
  real :: x, y
  integer(kind=8) :: i
  !$omp declare target
  y = x ** i
end subroutine

func.func @sb1_(%arg0: !fir.ref<f32> {fir.bindc_name = "x"}, %arg1: !fir.ref<f32> {fir.bindc_name = "y"}, %arg2: !fir.ref<i64> {fir.bindc_name = "i"}) attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
  %0 = fir.load %arg0 : !fir.ref<f32>
  %1 = fir.load %arg2 : !fir.ref<i64>
  %2 = call @__mlir_math_fpowi_f32_i64(%0, %1) : (f32, i64) -> f32
  fir.store %2 to %arg1 : !fir.ref<f32>
  return
}

func.func private @__mlir_math_fpowi_f32_i64(%arg0: f32, %arg1: i64) -> f32 attributes {llvm.linkage = #llvm.linkage<linkonce_odr>} {
  %cst = arith.constant 1.000000e+00 : f32
  ...
}

Thank you for the example! I believe if we did encounter it causing issues in this situation, we could try to move the passes after the math lowering, which would I think be in the FIRToLLVMLowering pass (perhaps I'm wrong on the location though). The ordering may be more difficult to restrict there though as there's a number of math related passes going on. But for the moment, filtering as close to the original lowering as possible is likely the best option I think.

Some comments inline. Looks Good. Please wait for one more acceptance.

Thank you, I'll try to address these and ask if someone else can have a look.

In D154247#4496810, @kiranchandramohan wrote:

However, in either case, from my current understanding, I believe it's a non-issue. The function filtering pass will execute immediately after this and provided the function it exists in, in this example. mysin survives the culling, it will just be lowered as normal with no other affects at the moment, so the math.sin function would just go through it's regular lowering, and if it happens to spawn a new function, then that's fine, it will not be harmed or go through any other alterations other than regular lowering.

When it is written as different passes, there is no guarantee that these passes will be invoked one after the other. In Flang we can probably control this. Would combining the marking and erasing passes be better? Are there any other uses for having these declare target attributes? From your existing reply, i am assuming that is the only use.

We can create another patch that implements a pass that invokes the three different passes in the order we want. It seems cleaner to keep the code separate so that it is easier to reason about each transform individually.

Pass name change
remove omp from test
rename other tests I've added with similar omp prefix
tidy up test as suggested
Remove ModuleOp Arg
Replace vector with set
Move addPass into same segment as other OpenMP addPass

Rebased and updated patch based on reviewer comments (use set and remove ModuleOp argument)

Harbormaster completed remote builds in B245139: Diff 540056.Jul 13 2023, 9:51 AM

jsjodin added inline comments.Jul 13 2023, 10:17 AM

flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp
86	Can this set be a SmallPtrSet (for Operation *) instead of a StringRef? I am thinking doing mem compare might be expensive.

Pass name change
remove omp from test
rename other tests I've added with similar omp prefix
tidy up test as suggested
Remove ModuleOp Arg
Replace vector with set
Move addPass into same segment as other OpenMP addPass
Use a smallptrset rather than a regualr set

Update to utilise SmallPtrSet<Operation*> instead of std::set<StringRef>

flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp
86	Thank you for pointing it out, the last commit should address this and use SmallPtrSet!

LGTM! Please wait for another acceptance.

This revision is now accepted and ready to land.Jul 13 2023, 11:01 AM

Harbormaster completed remote builds in B245180: Diff 540112.Jul 13 2023, 1:23 PM

@kiranchandramohan would it be possible to get a 2nd acceptance from you if possible please, if you're happy with the patch in its current state of course! or @skatrak if you wish to give the patch a final look over too see if it's in a state for acceptance.

Thank you Andrew, LGTM.

In D154247#4497729, @jsjodin wrote:

In D154247#4496810, @kiranchandramohan wrote:

However, in either case, from my current understanding, I believe it's a non-issue. The function filtering pass will execute immediately after this and provided the function it exists in, in this example. mysin survives the culling, it will just be lowered as normal with no other affects at the moment, so the math.sin function would just go through it's regular lowering, and if it happens to spawn a new function, then that's fine, it will not be harmed or go through any other alterations other than regular lowering.

When it is written as different passes, there is no guarantee that these passes will be invoked one after the other. In Flang we can probably control this. Would combining the marking and erasing passes be better? Are there any other uses for having these declare target attributes? From your existing reply, i am assuming that is the only use.

We can create another patch that implements a pass that invokes the three different passes in the order we want. It seems cleaner to keep the code separate so that it is easier to reason about each transform individually.

I agree with this approach. We can define each of these as an independent patch, potentially relying on having one of the others run prior to it, and then make sure that either they all run in their expected order or none of them do by grouping them into a kind of "OpenMP-offloading pipeline" pass (which we can implement as a following patch, if we decide to follow that approach). This pass should run as soon as possible to avoid marking compiler-generated calls like @__mlir_math_fpowi_f32_i64 in the example above.

flang/lib/Optimizer/Transforms/CMakeLists.txt
23	Nit: Seems like there are some leftover spaces there

skatrak accepted this revision.Jul 14 2023, 6:22 AM

In D154247#4500767, @skatrak wrote:

Thank you Andrew, LGTM.

Thank you for the review!

In D154247#4497729, @jsjodin wrote:

In D154247#4496810, @kiranchandramohan wrote:

However, in either case, from my current understanding, I believe it's a non-issue. The function filtering pass will execute immediately after this and provided the function it exists in, in this example. mysin survives the culling, it will just be lowered as normal with no other affects at the moment, so the math.sin function would just go through it's regular lowering, and if it happens to spawn a new function, then that's fine, it will not be harmed or go through any other alterations other than regular lowering.

When it is written as different passes, there is no guarantee that these passes will be invoked one after the other. In Flang we can probably control this. Would combining the marking and erasing passes be better? Are there any other uses for having these declare target attributes? From your existing reply, i am assuming that is the only use.

We can create another patch that implements a pass that invokes the three different passes in the order we want. It seems cleaner to keep the code separate so that it is easier to reason about each transform individually.

I agree with this approach. We can define each of these as an independent patch, potentially relying on having one of the others run prior to it, and then make sure that either they all run in their expected order or none of them do by grouping them into a kind of "OpenMP-offloading pipeline" pass (which we can implement as a following patch, if we decide to follow that approach). This pass should run as soon as possible to avoid marking compiler-generated calls like @__mlir_math_fpowi_f32_i64 in the example above.

This approach sounds good to me.

I'll push this patch on Monday afternoon and fix the nit before I push, provided nobody has any further comments!

I am going to land this in the next couple of hours now, provided I run into no issues applying it! Thank you all for your reviews and help.

Closed by commit rG062fce6f4dd1: [Flang][OpenMP][MLIR] An mlir transformation pass for marking FuncOp's… (authored by agozillon). · Explain WhyJul 17 2023, 6:32 AM

This revision was automatically updated to reflect the committed changes.

agozillon added a commit: rG062fce6f4dd1: [Flang][OpenMP][MLIR] An mlir transformation pass for marking FuncOp's….

The gfortran testsuite is failing after this change. Could you fix immediately or revert until it is fixed?
https://lab.llvm.org/buildbot/#/builders/179/builds/6693
https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/defaultmap-7.f90

In D154247#4509624, @kiranchandramohan wrote:

The gfortran testsuite is failing after this change. Could you fix immediately or revert until it is fixed?
https://lab.llvm.org/buildbot/#/builders/179/builds/6693
https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/defaultmap-7.f90

Hi Kiran, looking into it just now and have an idea of what the problem is, discussing a fix and it should be committed by the end of today.

In D154247#4509624, @kiranchandramohan wrote:

The gfortran testsuite is failing after this change. Could you fix immediately or revert until it is fixed?
https://lab.llvm.org/buildbot/#/builders/179/builds/6693
https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/defaultmap-7.f90

It's an interaction between my pass and @skatrak 's pass added in: https://reviews.llvm.org/D147641

I've made a commit/fix for the time being that deactivates the pass for the host, but leaves it enabled for device: https://reviews.llvm.org/rG64f5a7642a05

@skatrak has an idea of how to do the filtering for the host in a way that won't adversely affect things hopefully, so I believe the current plan is for @skatrak to investigate his idea a little further and that should allow us to re-enable filtering for the host. The host filtering is primarily an optimisation/something to allow closer following of the OpenMP specification, it's not a necessity for offloading but it would be nice to have, especially if we aim to implement reverse offloading in the future.

In D154247#4512187, @agozillon wrote:

In D154247#4509624, @kiranchandramohan wrote:

The gfortran testsuite is failing after this change. Could you fix immediately or revert until it is fixed?
https://lab.llvm.org/buildbot/#/builders/179/builds/6693
https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/defaultmap-7.f90

It's an interaction between my pass and @skatrak 's pass added in: https://reviews.llvm.org/D147641

I've made a commit/fix for the time being that deactivates the pass for the host, but leaves it enabled for device: https://reviews.llvm.org/rG64f5a7642a05

@skatrak has an idea of how to do the filtering for the host in a way that won't adversely affect things hopefully, so I believe the current plan is for @skatrak to investigate his idea a little further and that should allow us to re-enable filtering for the host. The host filtering is primarily an optimisation/something to allow closer following of the OpenMP specification, it's not a necessity for offloading but it would be nice to have, especially if we aim to implement reverse offloading in the future.

Thanks @agozillon for the quick response and the plan for fixing this.

I have spent some time working on this problem and I just want to ask for your opinions on what the expected behavior is, because I'm not sure. Let's say we have the following application:

subroutine g()
  !$omp declare target to(g) device_type(nohost)
end subroutine device

subroutine f()
  !$omp target
  call g()
  !$omp end target
end subroutine host

I was thinking that g, since it's a device subroutine, should be filtered out when compiling for the host. But code generation for the contents of target regions is also triggered during host compilation, to provide a host fallback implementation. So it seems like it actually would be necessary to keep "nohost" functions also for the host, so that these fallbacks can actually work. If that's the case, then the current solution of only triggering function filtering during device compilation would be correct.

I think this wouldn't impact our ability of implementing reverse offloading later because that just requires function filtering to make sure device functions containing a target region are kept. The only problem could be, during device compilation, the potential calls to host-only functions inside of these reverse offload target regions (if they are removed, the function calls turn into calls to undefined symbols). The patch I am preparing changes the way in which functions are filtered so that undefined references are avoided in that situation.

So would the solution to this be to only filter host-only functions while compiling for the device and tweaking it slightly to avoid calls to undefined references or am I going the wrong way about this?

In D154247#4514230, @skatrak wrote:
I have spent some time working on this problem and I just want to ask for your opinions on what the expected behavior is, because I'm not sure. Let's say we have the following application:
subroutine g()
  !$omp declare target to(g) device_type(nohost)
end subroutine device

subroutine f()
  !$omp target
  call g()
  !$omp end target
end subroutine host
I was thinking that g, since it's a device subroutine, should be filtered out when compiling for the host. But code generation for the contents of target regions is also triggered during host compilation, to provide a host fallback implementation. So it seems like it actually would be necessary to keep "nohost" functions also for the host, so that these fallbacks can actually work. If that's the case, then the current solution of only triggering function filtering during device compilation would be correct.

I think this wouldn't impact our ability of implementing reverse offloading later because that just requires function filtering to make sure device functions containing a target region are kept. The only problem could be, during device compilation, the potential calls to host-only functions inside of these reverse offload target regions (if they are removed, the function calls turn into calls to undefined symbols). The patch I am preparing changes the way in which functions are filtered so that undefined references are avoided in that situation.

So would the solution to this be to only filter host-only functions while compiling for the device and tweaking it slightly to avoid calls to undefined references or am I going the wrong way about this?

I think that would be correct behavior, as we do need the functions for the fallback.

I suppose you could in theory mark functions needed for fallback as device_type::Any and that'd allow discarding those marked as nohost only, but that's a bit of an excessive and unnecessary optimisation I think, I imagine the cases where a nohost function is not needed in a fallback host implementation is few and far between and something else is likely to clean it up.

skatrak mentioned this in D155827: [Flang][MLIR][OpenMP] Improve device-only function filtering.Jul 20 2023, 5:46 AM

In D154247#4514372, @agozillon wrote:

I think that would be correct behavior, as we do need the functions for the fallback.

I suppose you could in theory mark functions needed for fallback as device_type::Any and that'd allow discarding those marked as nohost only, but that's a bit of an excessive and unnecessary optimisation I think, I imagine the cases where a nohost function is not needed in a fallback host implementation is few and far between and something else is likely to clean it up.

Thank you Andrew for your comments. I just created D155827 to follow this approach and clean up a bit some remaining TODO comments.

skatrak mentioned this in rGf20b67a81c66: [Flang][MLIR][OpenMP] Improve device-only function filtering.Aug 10 2023, 3:30 AM

Revision Contents

Path

Size

flang/

include/

flang/

Optimizer/

Transforms/

Passes.h

2 lines

Passes.td

6 lines

lib/

Frontend/

FrontendActions.cpp

1 line

Optimizer/

Transforms/

CMakeLists.txt

1 line

OMPMarkDeclareTarget.cpp

97 lines

test/

Lower/

OpenMP/

	declare-target-data.f90
	omp-declare-target-data.f90

	declare-target-func-and-subr.f90
	omp-declare-target-func-and-subr.f90

declare-target-implicit-func-and-subr-cap.f90

216 lines

declare-target-implicit-tarop-cap.f90

69 lines

	host-ir-flag.f90
	omp-host-ir-flag.f90

	is-device.f90
	omp-is-device.f90

omp-declare-target-data.f90

omp-declare-target-func-and-subr.f90

omp-host-ir-flag.f90

omp-is-device.f90

Diff 540992

flang/include/flang/Optimizer/Transforms/Passes.h

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	std::unique_ptr<mlir::Pass> createAlgebraicSimplificationPass();			std::unique_ptr<mlir::Pass> createAlgebraicSimplificationPass();
	std::unique_ptr<mlir::Pass>			std::unique_ptr<mlir::Pass>
	createAlgebraicSimplificationPass(const mlir::GreedyRewriteConfig &config);			createAlgebraicSimplificationPass(const mlir::GreedyRewriteConfig &config);
	std::unique_ptr<mlir::Pass> createPolymorphicOpConversionPass();			std::unique_ptr<mlir::Pass> createPolymorphicOpConversionPass();

	std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>			std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
	createOMPEarlyOutliningPass();			createOMPEarlyOutliningPass();
	std::unique_ptr<mlir::Pass> createOMPFunctionFilteringPass();			std::unique_ptr<mlir::Pass> createOMPFunctionFilteringPass();
				std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
				createOMPMarkDeclareTargetPass();

	// declarative passes			// declarative passes
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "flang/Optimizer/Transforms/Passes.h.inc"			#include "flang/Optimizer/Transforms/Passes.h.inc"

	} // namespace fir			} // namespace fir

	#endif // FORTRAN_OPTIMIZER_TRANSFORMS_PASSES_H			#endif // FORTRAN_OPTIMIZER_TRANSFORMS_PASSES_H

flang/include/flang/Optimizer/Transforms/Passes.td

Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines let description = [{

array is an assumed shape array, to optimize for the (often common) case where array is an assumed shape array, to optimize for the (often common) case where

an array has element sized stride. The element sizes stride allows some an array has element sized stride. The element sizes stride allows some

loops to be vectorized as well as other loop optimizations. loops to be vectorized as well as other loop optimizations.

}]; }];

let constructor = "::fir::createLoopVersioningPass()"; let constructor = "::fir::createLoopVersioningPass()";

let dependentDialects = [ "fir::FIROpsDialect" ]; let dependentDialects = [ "fir::FIROpsDialect" ];

} }

def OMPEarlyOutliningPass def OMPEarlyOutliningPass

kiranchandramohanUnsubmitted

Done

Prefix the name with OpenMP or OMP.
Would OMPMarkDeclareTargetPass be a better name?

kiranchandramohan: Prefix the name with OpenMP or OMP. Would `OMPMarkDeclareTargetPass` be a better name?

: Pass<"omp-early-target-outlining", "mlir::ModuleOp"> { : Pass<"omp-early-target-outlining", "mlir::ModuleOp"> {

let summary = "Outlines all target ops into separate functions"; let summary = "Outlines all target ops into separate functions";

kiranchandramohanUnsubmitted

Done

: Pass<"capture-implicitly-declare-target", "mlir::ModuleOp"> {

- let summary = "Marks all functions referenced by a declare target function as declare target";

+ let summary = "Marks all functions called by a OpenMP declare target function as declare target";

let constructor = "::fir::createCaptureImplicitlyDeclareTargetPass()";

kiranchandramohan:

let description = [{ let description = [{

This pass outlines all omp.target operations into individual functions. This pass outlines all omp.target operations into individual functions.

It is invoked in the front end after the initial FIR has been constructed. It is invoked in the front end after the initial FIR has been constructed.

This pass is only needed when compiling for the target device to prevent This pass is only needed when compiling for the target device to prevent

the optimizer to perform transforms across target region boundaries. the optimizer to perform transforms across target region boundaries.

}]; }];

let constructor = "::fir::createOMPEarlyOutliningPass()"; let constructor = "::fir::createOMPEarlyOutliningPass()";

}

def OMPMarkDeclareTargetPass

: Pass<"omp-mark-declare-target", "mlir::ModuleOp"> {

let summary = "Marks all functions called by an OpenMP declare target function as declare target";

let constructor = "::fir::createOMPMarkDeclareTargetPass()";

let dependentDialects = ["mlir::omp::OpenMPDialect"]; let dependentDialects = ["mlir::omp::OpenMPDialect"];

} }

def OMPFunctionFiltering : Pass<"omp-function-filtering"> { def OMPFunctionFiltering : Pass<"omp-function-filtering"> {

let summary = "Filters out functions intended for the host when compiling " let summary = "Filters out functions intended for the host when compiling "

"for the device and vice versa."; "for the device and vice versa.";

let constructor = "::fir::createOMPFunctionFilteringPass()"; let constructor = "::fir::createOMPFunctionFilteringPass()";

let dependentDialects = [ let dependentDialects = [

"mlir::func::FuncDialect" "mlir::func::FuncDialect"

]; ];

} }

#endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES #endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES

flang/lib/Frontend/FrontendActions.cpp

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	bool CodeGenAction::beginSourceFileAction() {
// that the FIR is correct with respect to OpenMP operations/attributes.		// that the FIR is correct with respect to OpenMP operations/attributes.
if (ci.getInvocation().getFrontendOpts().features.IsEnabled(		if (ci.getInvocation().getFrontendOpts().features.IsEnabled(
Fortran::common::LanguageFeature::OpenMP)) {		Fortran::common::LanguageFeature::OpenMP)) {
bool isDevice = false;		bool isDevice = false;
if (auto offloadMod = llvm::dyn_cast<mlir::omp::OffloadModuleInterface>(		if (auto offloadMod = llvm::dyn_cast<mlir::omp::OffloadModuleInterface>(
mlirModule->getOperation()))		mlirModule->getOperation()))
isDevice = offloadMod.getIsTargetDevice();		isDevice = offloadMod.getIsTargetDevice();

		pm.addPass(fir::createOMPMarkDeclareTargetPass());
if (isDevice)		if (isDevice)
pm.addPass(fir::createOMPEarlyOutliningPass());		pm.addPass(fir::createOMPEarlyOutliningPass());
pm.addPass(fir::createOMPFunctionFilteringPass());		pm.addPass(fir::createOMPFunctionFilteringPass());
}		}

pm.enableVerifier(/verifyPasses=/true);		pm.enableVerifier(/verifyPasses=/true);
pm.addPass(std::make_unique<Fortran::lower::VerifierPass>());		pm.addPass(std::make_unique<Fortran::lower::VerifierPass>());

▲ Show 20 Lines • Show All 745 Lines • Show Last 20 Lines

flang/lib/Optimizer/Transforms/CMakeLists.txt

Show All 12 Lines	add_flang_library(FIRTransforms
SimplifyRegionLite.cpp		SimplifyRegionLite.cpp
AlgebraicSimplification.cpp		AlgebraicSimplification.cpp
SimplifyIntrinsics.cpp		SimplifyIntrinsics.cpp
AddDebugFoundation.cpp		AddDebugFoundation.cpp
PolymorphicOpConversion.cpp		PolymorphicOpConversion.cpp
LoopVersioning.cpp		LoopVersioning.cpp
OMPEarlyOutlining.cpp		OMPEarlyOutlining.cpp
OMPFunctionFiltering.cpp		OMPFunctionFiltering.cpp
		OMPMarkDeclareTarget.cpp

DEPENDS		DEPENDS
		skatrakUnsubmitted Not Done Reply Inline Actions Nit: Seems like there are some leftover spaces there skatrak: Nit: Seems like there are some leftover spaces there
FIRDialect		FIRDialect
FIROptTransformsPassIncGen		FIROptTransformsPassIncGen

LINK_LIBS		LINK_LIBS
FIRBuilder		FIRBuilder
FIRCodeGen		FIRCodeGen
FIRDialect		FIRDialect
FIRDialectSupport		FIRDialectSupport
Show All 11 Lines

flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp

This file was added.

				#include "flang/Optimizer/Transforms/Passes.h"
				#include "mlir/Dialect/Func/IR/FuncOps.h"
				#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
				#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
				#include "mlir/IR/BuiltinDialect.h"
				#include "mlir/IR/BuiltinOps.h"
				#include "mlir/IR/Operation.h"
				#include "mlir/IR/SymbolTable.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Support/LLVM.h"
				#include "llvm/ADT/SmallPtrSet.h"

				namespace fir {
				#define GEN_PASS_DEF_OMPMARKDECLARETARGETPASS
				#include "flang/Optimizer/Transforms/Passes.h.inc"
				} // namespace fir

				namespace {
				class OMPMarkDeclareTargetPass
				: public fir::impl::OMPMarkDeclareTargetPassBase<OMPMarkDeclareTargetPass> {

				void markNestedFuncs(mlir::omp::DeclareTargetDeviceType parentDevTy,
				mlir::omp::DeclareTargetCaptureClause parentCapClause,
				mlir::Operation *currOp,
				llvm::SmallPtrSet<mlir::Operation *, 16> visited) {
				if (visited.contains(currOp))
				kiranchandramohanUnsubmitted Done Reply Inline Actions Cant the moduleOp be part of the class so that you do not have to pass it around? kiranchandramohan: Cant the moduleOp be part of the class so that you do not have to pass it around?
				return;
				visited.insert(currOp);

				currOp->walk([&, this](mlir::Operation *op) {
				if (auto callOp = llvm::dyn_cast<mlir::CallOpInterface>(op)) {
				if (auto symRef = llvm::dyn_cast_if_present<mlir::SymbolRefAttr>(
				callOp.getCallableForCallee())) {
				kiranchandramohanUnsubmitted Done Reply Inline Actions Would a set or map be better here? kiranchandramohan: Would a set or map be better here?
				if (auto currFOp =
				getOperation().lookupSymbol<mlir::func::FuncOp>(symRef)) {
				auto current = llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(
				currFOp.getOperation());

				if (current.isDeclareTarget()) {
				auto currentDt = current.getDeclareTargetDeviceType();

				// Found the same function twice, with different device_types,
				// mark as Any as it belongs to both
				if (currentDt != parentDevTy &&
				currentDt != mlir::omp::DeclareTargetDeviceType::any) {
				current.setDeclareTarget(
				mlir::omp::DeclareTargetDeviceType::any,
				current.getDeclareTargetCaptureClause());
				}
				} else {
				current.setDeclareTarget(parentDevTy, parentCapClause);
				}

				markNestedFuncs(parentDevTy, parentCapClause, currFOp, visited);
				}
				}
				}
				});
				}

				// This pass executes on mlir::ModuleOp's marking functions contained within
				// as implicitly declare target if they are called from within an explicitly
				// marked declare target function or a target region (TargetOp)
				void runOnOperation() override {
				for (auto functionOp : getOperation().getOps<mlir::func::FuncOp>()) {
				auto declareTargetOp = llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(
				functionOp.getOperation());
				if (declareTargetOp.isDeclareTarget()) {
				llvm::SmallPtrSet<mlir::Operation *, 16> visited;
				markNestedFuncs(declareTargetOp.getDeclareTargetDeviceType(),
				declareTargetOp.getDeclareTargetCaptureClause(),
				functionOp, visited);
				}
				}

				// TODO: Extend to work with reverse-offloading, this shouldn't
				// require too much effort, just need to check the device clause
				// when it's lowering has been implemented and change the
				// DeclareTargetDeviceType argument from nohost to host depending on
				// the contents of the device clause
				getOperation()->walk([&](mlir::omp::TargetOp tarOp) {
				llvm::SmallPtrSet<mlir::Operation *, 16> visited;
				markNestedFuncs(mlir::omp::DeclareTargetDeviceType::nohost,
				mlir::omp::DeclareTargetCaptureClause::to, tarOp,
				visited);
				});
				jsjodinUnsubmitted Done Reply Inline Actions Can this set be a SmallPtrSet (for Operation ) instead of a StringRef? I am thinking doing mem compare might be expensive. jsjodin:* Can this set be a SmallPtrSet (for Operation *) instead of a StringRef? I am thinking doing mem…
				agozillonAuthorUnsubmitted Done Reply Inline Actions Thank you for pointing it out, the last commit should address this and use SmallPtrSet! agozillon: Thank you for pointing it out, the last commit should address this and use SmallPtrSet!
				}
				};

				} // namespace

				namespace fir {
				std::unique_ptr<mlir::OperationPass<mlir::ModuleOp>>
				createOMPMarkDeclareTargetPass() {
				return std::make_unique<OMPMarkDeclareTargetPass>();
				}
				} // namespace fir

flang/test/Lower/OpenMP/declare-target-data.f90

This file was moved from flang/test/Lower/OpenMP/omp-declare-target-data.f90.

The contents of this file were not changed.

flang/test/Lower/OpenMP/declare-target-func-and-subr.f90

This file was moved from flang/test/Lower/OpenMP/omp-declare-target-func-and-subr.f90.

The contents of this file were not changed.

flang/test/Lower/OpenMP/declare-target-implicit-func-and-subr-cap.f90

This file was added.

				!RUN: %flang_fc1 -emit-fir -fopenmp %s -o - \| FileCheck %s
				!RUN: %flang_fc1 -emit-fir -fopenmp -fopenmp-is-device %s -o - \| FileCheck %s --check-prefix=DEVICE

				! CHECK-LABEL: func.func @_QPimplicitly_captured
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured(toggle) result(k)
				integer :: i, j, k
				logical :: toggle
				i = 10
				j = 5
				if (toggle) then
				k = i
				else
				k = j
				end if
				end function implicitly_captured


				! CHECK-LABEL: func.func @_QPtarget_function
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function target_function(toggle) result(i)
				!$omp declare target
				integer :: i
				logical :: toggle
				i = implicitly_captured(toggle)
				end function target_function

				!! -----

				! CHECK-LABEL: func.func @_QPimplicitly_captured_twice
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured_twice() result(k)
				integer :: i
				i = 10
				k = i
				end function implicitly_captured_twice

				! CHECK-LABEL: func.func @_QPtarget_function_twice_host
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>{{.}}}
				function target_function_twice_host() result(i)
				!$omp declare target to(target_function_twice_host) device_type(host)
				integer :: i
				i = implicitly_captured_twice()
				end function target_function_twice_host

				! DEVICE-LABEL: func.func @_QPtarget_function_twice_device
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function target_function_twice_device() result(i)
				!$omp declare target to(target_function_twice_device) device_type(nohost)
				integer :: i
				i = implicitly_captured_twice()
				end function target_function_twice_device

				!! -----

				! DEVICE-LABEL: func.func @_QPimplicitly_captured_nest
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function implicitly_captured_nest() result(k)
				integer :: i
				i = 10
				k = i
				end function implicitly_captured_nest

				! DEVICE-LABEL: func.func @_QPimplicitly_captured_one
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to){{.}}}
				function implicitly_captured_one() result(k)
				k = implicitly_captured_nest()
				end function implicitly_captured_one

				! DEVICE-LABEL: func.func @_QPimplicitly_captured_two
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function implicitly_captured_two() result(k)
				integer :: i
				i = 10
				k = i
				end function implicitly_captured_two

				! DEVICE-LABEL: func.func @_QPtarget_function_test
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function target_function_test() result(j)
				!$omp declare target to(target_function_test) device_type(nohost)
				integer :: i, j
				i = implicitly_captured_one()
				j = implicitly_captured_two() + i
				end function target_function_test

				!! -----

				! CHECK-LABEL: func.func @_QPimplicitly_captured_nest_twice
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured_nest_twice() result(k)
				integer :: i
				i = 10
				k = i
				end function implicitly_captured_nest_twice

				! CHECK-LABEL: func.func @_QPimplicitly_captured_one_twice
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured_one_twice() result(k)
				k = implicitly_captured_nest_twice()
				end function implicitly_captured_one_twice

				! CHECK-LABEL: func.func @_QPimplicitly_captured_two_twice
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured_two_twice() result(k)
				integer :: i
				i = 10
				k = i
				end function implicitly_captured_two_twice

				! DEVICE-LABEL: func.func @_QPtarget_function_test_device
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function target_function_test_device() result(j)
				!$omp declare target to(target_function_test_device) device_type(nohost)
				integer :: i, j
				i = implicitly_captured_one_twice()
				j = implicitly_captured_two_twice() + i
				end function target_function_test_device

				! CHECK-LABEL: func.func @_QPtarget_function_test_host
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>{{.}}}
				function target_function_test_host() result(j)
				!$omp declare target to(target_function_test_host) device_type(host)
				integer :: i, j
				i = implicitly_captured_one_twice()
				j = implicitly_captured_two_twice() + i
				end function target_function_test_host

				!! -----

				! DEVICE-LABEL: func.func @_QPimplicitly_captured_with_dev_type_recursive
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				recursive function implicitly_captured_with_dev_type_recursive(increment) result(k)
				!$omp declare target to(implicitly_captured_with_dev_type_recursive) device_type(host)
				integer :: increment, k
				if (increment == 10) then
				k = increment
				else
				k = implicitly_captured_with_dev_type_recursive(increment + 1)
				end if
				end function implicitly_captured_with_dev_type_recursive

				! DEVICE-LABEL: func.func @_QPtarget_function_with_dev_type_recurse
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function target_function_with_dev_type_recurse() result(i)
				!$omp declare target to(target_function_with_dev_type_recurse) device_type(nohost)
				integer :: i
				i = implicitly_captured_with_dev_type_recursive(0)
				end function target_function_with_dev_type_recurse

				!! -----

				module test_module
				contains
				! CHECK-LABEL: func.func @_QMtest_modulePimplicitly_captured_nest_twice
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured_nest_twice() result(i)
				integer :: i
				i = 10
				end function implicitly_captured_nest_twice

				! CHECK-LABEL: func.func @_QMtest_modulePimplicitly_captured_one_twice
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured_one_twice() result(k)
				!$omp declare target to(implicitly_captured_one_twice) device_type(host)
				k = implicitly_captured_nest_twice()
				end function implicitly_captured_one_twice

				! DEVICE-LABEL: func.func @_QMtest_modulePimplicitly_captured_two_twice
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function implicitly_captured_two_twice() result(y)
				integer :: y
				y = 5
				end function implicitly_captured_two_twice

				! DEVICE-LABEL: func.func @_QMtest_modulePtarget_function_test_device
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function target_function_test_device() result(j)
				!$omp declare target to(target_function_test_device) device_type(nohost)
				integer :: i, j
				i = implicitly_captured_one_twice()
				j = implicitly_captured_two_twice() + i
				end function target_function_test_device
				end module test_module

				!! -----

				program mb
				interface
				subroutine caller_recursive
				!$omp declare target to(caller_recursive) device_type(nohost)
				end subroutine

				recursive subroutine implicitly_captured_recursive(increment)
				integer :: increment
				end subroutine
				end interface
				end program

				! DEVICE-LABEL: func.func @_QPimplicitly_captured_recursive
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				recursive subroutine implicitly_captured_recursive(increment)
				integer :: increment
				if (increment == 10) then
				return
				else
				call implicitly_captured_recursive(increment + 1)
				end if
				end subroutine

				! DEVICE-LABEL: func.func @_QPcaller_recursive
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				subroutine caller_recursive
				!$omp declare target to(caller_recursive) device_type(nohost)
				call implicitly_captured_recursive(0)
				end subroutine

flang/test/Lower/OpenMP/declare-target-implicit-tarop-cap.f90

This file was added.

				!RUN: %flang_fc1 -emit-fir -fopenmp %s -o - \| FileCheck %s
				!RUN: %flang_fc1 -emit-fir -fopenmp -fopenmp-is-device %s -o - \| FileCheck %s --check-prefix=DEVICE

				! DEVICE-LABEL: func.func @_QPimplicit_capture
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function implicit_capture() result(i)
				implicit none
				integer :: i
				i = 1
				end function implicit_capture

				subroutine subr_target()
				integer :: n
				!$omp target map(tofrom:n)
				n = implicit_capture()
				!$omp end target
				end subroutine

				!! -----

				! CHECK-LABEL: func.func @_QPimplicitly_captured_nest_twice
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured_nest_twice() result(i)
				integer :: i
				i = 10
				end function implicitly_captured_nest_twice

				! CHECK-LABEL: func.func @_QPimplicitly_captured_one_twice
				! CHECK-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>{{.}}}
				function implicitly_captured_one_twice() result(k)
				!$omp declare target to(implicitly_captured_one_twice) device_type(host)
				k = implicitly_captured_nest_twice()
				end function implicitly_captured_one_twice

				! DEVICE-LABEL: func.func @_QPimplicitly_captured_two_twice
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				function implicitly_captured_two_twice() result(y)
				integer :: y
				y = 5
				end function implicitly_captured_two_twice


				function target_function_test_device() result(j)
				integer :: i, j
				!$omp target map(tofrom: i, j)
				i = implicitly_captured_one_twice()
				j = implicitly_captured_two_twice() + i
				!$omp end target
				end function target_function_test_device

				!! -----

				! DEVICE-LABEL: func.func @_QPimplicitly_captured_recursive
				! DEVICE-SAME: {{.}}attributes {omp.declare_target = #omp.declaretarget<device_type = (nohost), capture_clause = (to)>{{.}}}
				recursive function implicitly_captured_recursive(increment) result(k)
				integer :: increment, k
				if (increment == 10) then
				k = increment
				else
				k = implicitly_captured_recursive(increment + 1)
				end if
				end function implicitly_captured_recursive

				function target_function_recurse() result(i)
				integer :: i
				!$omp target map(tofrom: i)
				i = implicitly_captured_recursive(0)
				!$omp end target
				end function target_function_recurse

flang/test/Lower/OpenMP/host-ir-flag.f90

This file was moved from flang/test/Lower/OpenMP/omp-host-ir-flag.f90.

The contents of this file were not changed.

flang/test/Lower/OpenMP/is-device.f90

This file was moved from flang/test/Lower/OpenMP/omp-is-device.f90.

The contents of this file were not changed.

flang/test/Lower/OpenMP/omp-declare-target-data.f90

This file was moved to flang/test/Lower/OpenMP/declare-target-data.f90.

flang/test/Lower/OpenMP/omp-declare-target-func-and-subr.f90

This file was moved to flang/test/Lower/OpenMP/declare-target-func-and-subr.f90.

flang/test/Lower/OpenMP/omp-host-ir-flag.f90

This file was moved to flang/test/Lower/OpenMP/host-ir-flag.f90.

flang/test/Lower/OpenMP/omp-is-device.f90

This file was moved to flang/test/Lower/OpenMP/is-device.f90.

This is an archive of the discontinued LLVM Phabricator instance.

[Flang][OpenMP][MLIR] An mlir transformation pass for marking FuncOp's implicitly called from TargetOp's and declare target marked FuncOp's as implicitly declare targetClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 540992

flang/include/flang/Optimizer/Transforms/Passes.h

flang/include/flang/Optimizer/Transforms/Passes.td

flang/lib/Frontend/FrontendActions.cpp

flang/lib/Optimizer/Transforms/CMakeLists.txt

flang/lib/Optimizer/Transforms/OMPMarkDeclareTarget.cpp

flang/test/Lower/OpenMP/declare-target-data.f90

flang/test/Lower/OpenMP/declare-target-func-and-subr.f90

flang/test/Lower/OpenMP/declare-target-implicit-func-and-subr-cap.f90

flang/test/Lower/OpenMP/declare-target-implicit-tarop-cap.f90

flang/test/Lower/OpenMP/host-ir-flag.f90

flang/test/Lower/OpenMP/is-device.f90

flang/test/Lower/OpenMP/omp-declare-target-data.f90

flang/test/Lower/OpenMP/omp-declare-target-func-and-subr.f90

flang/test/Lower/OpenMP/omp-host-ir-flag.f90

flang/test/Lower/OpenMP/omp-is-device.f90

[Flang][OpenMP][MLIR] An mlir transformation pass for marking FuncOp's implicitly called from TargetOp's and declare target marked FuncOp's as implicitly declare target
ClosedPublic