This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
lib/Lower/
-
Lower/
12/12
OpenMP.cpp
-
test/Lower/OpenMP/
-
Lower/
-
OpenMP/
2/2
target.f90

Differential D152824

[MLIR][OpenMP]Add Flang lowering support for device_ptr and device_addr clauses
ClosedPublic

Authored by TIFitis on Jun 13 2023, 9:10 AM.

Download Raw Diff

Details

Reviewers

kiranchandramohan
kiranktp
raghavendhra
jdoerfert
nicolasvasilache
ftynse

Commits

rGd21580c30657: [MLIR][OpenMP]Add Flang lowering support for device_ptr and device_addr clauses

Summary

Add lowering support for the use_device_ptr and use_Device_addr clauses for the Target Data directive.

Depends on D152822

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

TIFitis created this revision.Jun 13 2023, 9:10 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 13 2023, 9:10 AM

Herald added subscribers: sunshaoce, bzcheeseman, mehdi_amini and 3 others. · View Herald Transcript

TIFitis requested review of this revision.Jun 13 2023, 9:10 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 13 2023, 9:10 AM

Herald added subscribers: jplehr, sstefan1, stephenneuendorffer, jdoerfert. · View Herald Transcript

Harbormaster completed remote builds in B238510: Diff 530932.Jun 13 2023, 9:39 AM

Weren't you planning to make the device_ptr a block argument operand?

In D152824#4418735, @kiranchandramohan wrote:

Weren't you planning to make the device_ptr a block argument operand?

I made that change when lowering to llvm IR in mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Here's a snippet of code showing how I did it. If there's a better way of doing this I'd be happy to know.

for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg =
      region.addArgument(devPtrOp.getType(), devPtrOp.getLoc());
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
  replaceAllUsesInRegionWith(devPtrOp, arg, region);
}

In D152824#4420853, @TIFitis wrote:
In D152824#4418735, @kiranchandramohan wrote:

Weren't you planning to make the device_ptr a block argument operand?

I made that change when lowering to llvm IR in mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Here's a snippet of code showing how I did it. If there's a better way of doing this I'd be happy to know.
for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg =
      region.addArgument(devPtrOp.getType(), devPtrOp.getLoc());
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
  replaceAllUsesInRegionWith(devPtrOp, arg, region);
}

The modelling for the Operation (target data) should be changed to represent both use_device_addr and use_device_addr as a block argument.
May be something like omp.target_data use_device_addr(%daddr -> %a), where %daddr is an entry block argument and %daddr will be used inside the region and not %a.
Note that this might require a custom printer and parser.
This will be a more accurate modelling since the standard explicitly says that the address will be a device address (and not the host address) and hence it is incorrect to use the host address in the body of the region.

The device_addr example from your test will look like the following.

func.func @_QPomp_target_device_addr() {
  %0 = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "a", uniq_name = "_QFomp_target_device_addrEa"}
  %1 = fir.zero_bits !fir.ptr<i32>
  %2 = fir.embox %1 : (!fir.ptr<i32>) -> !fir.box<!fir.ptr<i32>>
  fir.store %2 to %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>
  omp.target_data   map((tofrom -> %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>))   use_device_addr(%daddr -> %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>) {
    %c10_i32 = arith.constant 10 : i32
    %3 = fir.load %daddr : !fir.ref<!fir.box<!fir.ptr<i32>>>
    %4 = fir.box_addr %3 : (!fir.box<!fir.ptr<i32>>) -> !fir.ptr<i32>
    fir.store %c10_i32 to %4 : !fir.ptr<i32>
    omp.terminator
  }
  return
}

While lowering to LLVM, you can hence do this in fewer steps.

for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg = getTheArgumentAssociatedWith(devPtrOp)
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
}

For handling entry block arguments, you can refer to both the wsloop op or the atomic update op.

In D152824#4421288, @kiranchandramohan wrote:
In D152824#4420853, @TIFitis wrote:
In D152824#4418735, @kiranchandramohan wrote:

Weren't you planning to make the device_ptr a block argument operand?

I made that change when lowering to llvm IR in mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Here's a snippet of code showing how I did it. If there's a better way of doing this I'd be happy to know.
for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg =
      region.addArgument(devPtrOp.getType(), devPtrOp.getLoc());
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
  replaceAllUsesInRegionWith(devPtrOp, arg, region);
}
In D152824#4420853, @TIFitis wrote:
In D152824#4418735, @kiranchandramohan wrote:

Weren't you planning to make the device_ptr a block argument operand?

I made that change when lowering to llvm IR in mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Here's a snippet of code showing how I did it. If there's a better way of doing this I'd be happy to know.
for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg =
      region.addArgument(devPtrOp.getType(), devPtrOp.getLoc());
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
  replaceAllUsesInRegionWith(devPtrOp, arg, region);
}
The modelling for the Operation (target data) should be changed to represent both use_device_addr and use_device_addr as a block argument.
May be something like omp.target_data use_device_addr(%daddr -> %a), where %daddr is an entry block argument and %daddr will be used inside the region and not %a.
Note that this might require a custom printer and parser.
This will be a more accurate modelling since the standard explicitly says that the address will be a device address (and not the host address) and hence it is incorrect to use the host address in the body of the region.

The device_addr example from your test will look like the following.
func.func @_QPomp_target_device_addr() {
  %0 = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "a", uniq_name = "_QFomp_target_device_addrEa"}
  %1 = fir.zero_bits !fir.ptr<i32>
  %2 = fir.embox %1 : (!fir.ptr<i32>) -> !fir.box<!fir.ptr<i32>>
  fir.store %2 to %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>
  omp.target_data   map((tofrom -> %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>))   use_device_addr(%daddr -> %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>) {
    %c10_i32 = arith.constant 10 : i32
    %3 = fir.load %daddr : !fir.ref<!fir.box<!fir.ptr<i32>>>
    %4 = fir.box_addr %3 : (!fir.box<!fir.ptr<i32>>) -> !fir.ptr<i32>
    fir.store %c10_i32 to %4 : !fir.ptr<i32>
    omp.terminator
  }
  return
}
While lowering to LLVM, you can hence do this in fewer steps.
for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg = getTheArgumentAssociatedWith(devPtrOp)
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
}
For handling entry block arguments, you can refer to both the wsloop op or the atomic update op.

Hi,
I see what you mean. At a glance it looks to me like it would make the lowering to both MLIR and later to LLVM IR a little bit more complex. But I don't see any obvious benefit from doing it this way.
Wouldn't we want all the processing for the omp ops to be done in one stage?

Also, if we do decide to do it. Would I have to use Variadic_of_Variadic for storing the map between the device_addr_value and orig_value or do you have anything else in mind?

In D152824#4424477, @TIFitis wrote:
In D152824#4421288, @kiranchandramohan wrote:
In D152824#4420853, @TIFitis wrote:
In D152824#4418735, @kiranchandramohan wrote:

Weren't you planning to make the device_ptr a block argument operand?

I made that change when lowering to llvm IR in mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Here's a snippet of code showing how I did it. If there's a better way of doing this I'd be happy to know.
for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg =
      region.addArgument(devPtrOp.getType(), devPtrOp.getLoc());
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
  replaceAllUsesInRegionWith(devPtrOp, arg, region);
}
In D152824#4420853, @TIFitis wrote:
In D152824#4418735, @kiranchandramohan wrote:

Weren't you planning to make the device_ptr a block argument operand?

I made that change when lowering to llvm IR in mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
Here's a snippet of code showing how I did it. If there's a better way of doing this I'd be happy to know.
for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg =
      region.addArgument(devPtrOp.getType(), devPtrOp.getLoc());
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
  replaceAllUsesInRegionWith(devPtrOp, arg, region);
}
The modelling for the Operation (target data) should be changed to represent both use_device_addr and use_device_addr as a block argument.
May be something like omp.target_data use_device_addr(%daddr -> %a), where %daddr is an entry block argument and %daddr will be used inside the region and not %a.
Note that this might require a custom printer and parser.
This will be a more accurate modelling since the standard explicitly says that the address will be a device address (and not the host address) and hence it is incorrect to use the host address in the body of the region.

The device_addr example from your test will look like the following.
func.func @_QPomp_target_device_addr() {
  %0 = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "a", uniq_name = "_QFomp_target_device_addrEa"}
  %1 = fir.zero_bits !fir.ptr<i32>
  %2 = fir.embox %1 : (!fir.ptr<i32>) -> !fir.box<!fir.ptr<i32>>
  fir.store %2 to %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>
  omp.target_data   map((tofrom -> %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>))   use_device_addr(%daddr -> %0 : !fir.ref<!fir.box<!fir.ptr<i32>>>) {
    %c10_i32 = arith.constant 10 : i32
    %3 = fir.load %daddr : !fir.ref<!fir.box<!fir.ptr<i32>>>
    %4 = fir.box_addr %3 : (!fir.box<!fir.ptr<i32>>) -> !fir.ptr<i32>
    fir.store %c10_i32 to %4 : !fir.ptr<i32>
    omp.terminator
  }
  return
}
While lowering to LLVM, you can hence do this in fewer steps.
for (auto &devPtrOp : useDevPtrOperands) {
  llvm::Value *mapOpValue = moduleTranslation.lookupValue(devPtrOp);
  const auto &arg = getTheArgumentAssociatedWith(devPtrOp)
  moduleTranslation.mapValue(arg,
                             info.DevicePtrInfoMap[mapOpValue].second);
}
For handling entry block arguments, you can refer to both the wsloop op or the atomic update op.
Hi,
I see what you mean. At a glance it looks to me like it would make the lowering to both MLIR and later to LLVM IR a little bit more complex.

It will not increase the lowering complexity. For the MLIR to LLVMIR lowering it will simplify it.

But I don't see any obvious benefit from doing it this way.

It is incorrect to use the host pointer in the region and the modelling will not be in line with the standard.

Wouldn't we want all the processing for the omp ops to be done in one stage?

This step is only modelling it accurately. All processing (generating runtime calls, getting the GPU pointer etc) will all be done by the OpenMP IRBuilder.

Also, if we do decide to do it. Would I have to use Variadic_of_Variadic for storing the map between the device_addr_value and orig_value or do you have anything else in mind?

Variadic_of_Variadic might work. But the verifier should check that there are only two entries per inner Variadic. You can also implement it as two operand list/arrays.

Thinking of this again, we might not need a change in the operands for use_device_ptr or use_device_addr since we are only adding block arguments.

In D152824#4427174, @kiranchandramohan wrote:

Thinking of this again, we might not need a change in the operands for use_device_ptr or use_device_addr since we are only adding block arguments.

Are there any examples of this currently? I saw wsloop and atomic update but they aren't exactly what we're doing here I think.

When lowering we still need to correspond the device addresses to their original host address counterpart to correctly set the mapping, type etc. I was thinking of achieving this by having 2 separate operands each for device_ptr and addr like Variadic<OpenMP_PointerLikeType>:$use_device_ptr_device and Variadic<OpenMP_PointerLikeType>:$use_device_ptr_host for the device_ptr clause.

In D152824#4427658, @TIFitis wrote:

In D152824#4427174, @kiranchandramohan wrote:

Thinking of this again, we might not need a change in the operands for use_device_ptr or use_device_addr since we are only adding block arguments.

Are there any examples of this currently? I saw wsloop and atomic update but they aren't exactly what we're doing here I think.

I will try to find out something for reference today.

When lowering we still need to correspond the device addresses to their original host address counterpart to correctly set the mapping, type etc. I was thinking of achieving this by having 2 separate operands each for device_ptr and addr like Variadic<OpenMP_PointerLikeType>:$use_device_ptr_device and Variadic<OpenMP_PointerLikeType>:$use_device_ptr_host for the device_ptr clause.

Block arguments are not available outside the operation and thus cannot be operands.

My current implementation generates the following FIR.

Fortran:

subroutine omp_target_data
    integer :: a
    a = 10
    !$omp target data map(tofrom: a) use_device_ptr(a)
        a = 20
    !$omp end target data
    a = 30
end subroutine omp_target_data

FIR:

func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %0 : !fir.ref<i32>
  %1 = fir.alloca i32 {bindc_name = "a", pinned, uniq_name = "_QFomp_target_dataEa"}
  omp.target_data   map((tofrom -> %0 : !fir.ref<i32>))   use_device_ptr((%1 -> %0 : !fir.ref<i32>)) {
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %1 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %0 : !fir.ref<i32>
  return
}

Is the above code more or less what you are expecting?

Also, my initial plan was to have OMPIRBuilder entirely handle codegen for both clang and mlir.

As such I have already moved the device ptr and addr related codegen from clang to OMPIRBuilder.
The IRBuilder creates private versions of the operands for us in llvm. However, since MLIR would also be creating private versions for us now, we would have to add in code in the IRBuilder to treat clang and mlir separately which I'd like to avoid as much as possible.

In D152824#4432675, @TIFitis wrote:

My current implementation generates the following FIR.

Fortran:

subroutine omp_target_data
    integer :: a
    a = 10
    !$omp target data map(tofrom: a) use_device_ptr(a)
        a = 20
    !$omp end target data
    a = 30
end subroutine omp_target_data

FIR:

func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %0 : !fir.ref<i32>
  %1 = fir.alloca i32 {bindc_name = "a", pinned, uniq_name = "_QFomp_target_dataEa"}
  omp.target_data   map((tofrom -> %0 : !fir.ref<i32>))   use_device_ptr((%1 -> %0 : !fir.ref<i32>)) {
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %1 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %0 : !fir.ref<i32>
  return
}

Is the above code more or less what you are expecting?

No. What I was thinking was something like the following.

func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %0 : !fir.ref<i32>
  omp.target_data   map((tofrom -> %0 : !fir.ref<i32>))   use_device_ptr((%1 -> %0 : !fir.ref<i32>)) {
  ^bb0(%1: !fir.ref<i32>):
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %1 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %0 : !fir.ref<i32>
  return
}

However, since MLIR would also be creating private versions for us now, we would have to add in code in the IRBuilder to treat clang and mlir separately which I'd like to avoid as much as possible.

MLIR will not be creating a private version. It will only be providing a block argument that you can use to lower easily to the private version created by the IRBuilder. (I believe, you previously reported that it was easy to use block arguments to lower to LLVM IR.)

In D152824#4432759, @kiranchandramohan wrote:
In D152824#4432675, @TIFitis wrote:
My current implementation generates the following FIR.

Fortran:
subroutine omp_target_data
    integer :: a
    a = 10
    !$omp target data map(tofrom: a) use_device_ptr(a)
        a = 20
    !$omp end target data
    a = 30
end subroutine omp_target_data
FIR:
func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %0 : !fir.ref<i32>
  %1 = fir.alloca i32 {bindc_name = "a", pinned, uniq_name = "_QFomp_target_dataEa"}
  omp.target_data   map((tofrom -> %0 : !fir.ref<i32>))   use_device_ptr((%1 -> %0 : !fir.ref<i32>)) {
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %1 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %0 : !fir.ref<i32>
  return
}
Is the above code more or less what you are expecting?
No. What I was thinking was something like the following.
func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %0 : !fir.ref<i32>
  omp.target_data   map((tofrom -> %0 : !fir.ref<i32>))   use_device_ptr((%1 -> %0 : !fir.ref<i32>)) {
  ^bb0(%1: !fir.ref<i32>):
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %1 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %0 : !fir.ref<i32>
  return
}
However, since MLIR would also be creating private versions for us now, we would have to add in code in the IRBuilder to treat clang and mlir separately which I'd like to avoid as much as possible.

MLIR will not be creating a private version. It will only be providing a block argument that you can use to lower easily to the private version created by the IRBuilder. (I believe, you previously reported that it was easy to use block arguments to lower to LLVM IR.)

Yes having a block argument works. I am just finding it tricky to add it here in the frontend lowering stage. I will see if I am able to generate something like the code you shared above.

In D152824#4432954, @TIFitis wrote:
In D152824#4432759, @kiranchandramohan wrote:
In D152824#4432675, @TIFitis wrote:
My current implementation generates the following FIR.

Fortran:
subroutine omp_target_data
    integer :: a
    a = 10
    !$omp target data map(tofrom: a) use_device_ptr(a)
        a = 20
    !$omp end target data
    a = 30
end subroutine omp_target_data
FIR:
func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %0 : !fir.ref<i32>
  %1 = fir.alloca i32 {bindc_name = "a", pinned, uniq_name = "_QFomp_target_dataEa"}
  omp.target_data   map((tofrom -> %0 : !fir.ref<i32>))   use_device_ptr((%1 -> %0 : !fir.ref<i32>)) {
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %1 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %0 : !fir.ref<i32>
  return
}
Is the above code more or less what you are expecting?
No. What I was thinking was something like the following.
func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %0 : !fir.ref<i32>
  omp.target_data   map((tofrom -> %0 : !fir.ref<i32>))   use_device_ptr((%1 -> %0 : !fir.ref<i32>)) {
  ^bb0(%1: !fir.ref<i32>):
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %1 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %0 : !fir.ref<i32>
  return
}
However, since MLIR would also be creating private versions for us now, we would have to add in code in the IRBuilder to treat clang and mlir separately which I'd like to avoid as much as possible.

MLIR will not be creating a private version. It will only be providing a block argument that you can use to lower easily to the private version created by the IRBuilder. (I believe, you previously reported that it was easy to use block arguments to lower to LLVM IR.)
Yes having a block argument works. I am just finding it tricky to add it here in the frontend lowering stage. I will see if I am able to generate something like the code you shared above.

It should be similar to the worksharing loop, just that this will be for the device_ptr clause and not the index variable.

[WIP] Add block arguments.

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJun 19 2023, 10:33 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 20 others. · View Herald Transcript

Harbormaster completed remote builds in B239846: Diff 532706.Jun 19 2023, 10:34 AM

I've updated the patch. I am not sure if this is the correct way of adding the block arguments.

It is generating the following code currently. I don't know why it's adding the new %0 variable and using it inside the region.

func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {adapt.valuebyref}
  %1 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %1 : !fir.ref<i32>
  omp.target_data   map((tofrom -> %1 : !fir.ref<i32>))   use_device_ptr((%1 -> %1 : !fir.ref<i32>)) {
  ^bb0(%arg0: i32):
    fir.store %arg0 to %0 : !fir.ref<i32>
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %0 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %1 : !fir.ref<i32>
  return
}

flang/lib/Lower/OpenMP.cpp
931	The arg has type i32 here instead of fir.ref<i32> and as a result getting an error when adding it as an operand to dev_ptr clause.

In D152824#4433328, @TIFitis wrote:

I've updated the patch. I am not sure if this is the correct way of adding the block arguments.

It is generating the following code currently. I don't know why it's adding the new %0 variable and using it inside the region.

func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {adapt.valuebyref}
  %1 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %1 : !fir.ref<i32>
  omp.target_data   map((tofrom -> %1 : !fir.ref<i32>))   use_device_ptr((%1 -> %1 : !fir.ref<i32>)) {
  ^bb0(%arg0: i32):
    fir.store %arg0 to %0 : !fir.ref<i32>
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %0 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %1 : !fir.ref<i32>
  return
}

I was suggesting that you can do something similar as the Worksharing Loop's boyd creation, the same code will not work. Only a subset of what is implemented for the worksharing loop's arguments are required here.

Basically, you have to create a block in the body of the operation with the appropriate block argument types and locations. The argument types will have the same type and location as the type of the device_ptr operands.

SmallVector<Type> tiv; // fill this with the type of the device_ptr      
SmallVector<Location> locs; // fill this with the location of the device_ptr
// Create a block in the body of the Operation with as many block arguments as there are elements in the device_ptr clause.
firOpBuilder.createBlock(&op.getRegion(), {}, tiv, locs);

 // Now get these block arguments and bind them to the device_ptr symbols
 // so that it is the block arguments that get used for these symbols in the
 // body of the operation
 // Note: here the `args` are the device_ptr symbols.
 int argIndex = 0;
  for (const Fortran::semantics::Symbol *arg : args) {
    mlir::Value val =
        fir::getBase(op.getRegion().front().getArgument(argIndex));
    converter.bindSymbol(*arg, val);
    argIndex++;
  }

Note: The above is pseudocode only intended to convey the idea.

Added block arguments.

@kiranchandramohan Thanks a lot for helping me with the pseudocode.

I have updated the patch and it seems to be working as intended except when using pointers.

The following is the code currently being generated:

Fortran:

subroutine omp_target_data
    integer :: a, b, c
    !$omp target data map(tofrom: a, b, c) use_device_ptr(a) use_device_addr(b)
        a = 20
        b = 2
        c = 10
    !$omp end target data
    a = 30
    b = 3
    c = 10
end subroutine omp_target_data

FIR:

func.func @_QPomp_target_data() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
  %1 = fir.alloca i32 {bindc_name = "b", uniq_name = "_QFomp_target_dataEb"}
  %2 = fir.alloca i32 {bindc_name = "c", uniq_name = "_QFomp_target_dataEc"}
  omp.target_data   map((tofrom -> %0 : !fir.ref<i32>), (tofrom -> %1 : !fir.ref<i32>), (tofrom -> %2 : !fir.ref<i32>))   use_device_ptr(%0 : !fir.ref<i32>) use_device_addr(%1 : !fir.ref<i32>) {
  ^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
    %c20_i32 = arith.constant 20 : i32
    fir.store %c20_i32 to %arg0 : !fir.ref<i32>
    %c2_i32 = arith.constant 2 : i32
    fir.store %c2_i32 to %arg1 : !fir.ref<i32>
    %c10_i32_0 = arith.constant 10 : i32
    fir.store %c10_i32_0 to %2 : !fir.ref<i32>
    omp.terminator
  }
  %c30_i32 = arith.constant 30 : i32
  fir.store %c30_i32 to %0 : !fir.ref<i32>
  %c3_i32 = arith.constant 3 : i32
  fir.store %c3_i32 to %1 : !fir.ref<i32>
  %c10_i32 = arith.constant 10 : i32
  fir.store %c10_i32 to %2 : !fir.ref<i32>
  return
}

However, if I change it to integer, pointer :: a, b, c then I am getting the following assertion failure

llvm-project/flang/lib/Lower/Bridge.cpp:3446: auto (anonymous namespace)::FirConverter::genAssignment(const Fortran::evaluate::Assignment &)::(anonymous class)::operator()(const Fortran::evaluate::Assignment::Intrinsic &) const: Assertion 'isFuncResultDesignator(assign.lhs) && "type mismatch"' failed.

This seems to be generating from the store inside the region and binding its symbol. I wasn't able to come up with a quick fix for it. If you have any suggestions that would be great!!!

Harbormaster completed remote builds in B240008: Diff 532919.Jun 20 2023, 8:04 AM

kiranchandramohan added inline comments.Jun 21 2023, 6:28 AM

flang/lib/Lower/OpenMP.cpp
937	I think the issue here is that we are binding the symbol to an MLIR value. The original symbol is bound to an ExtendedValue (fir::MutableBoxValue). When some checks are done to see whether the value is a box of type MutableBoxValue, it fails. And hence it does not insert the load and box_addr instructions. I think the solution is to create an ExtendedValue of the matching type. if sym is bound to MutableBoxValue box converter.bindSymbol(sym, fir::MutableBoxValue(val, box.lenparams, {})); else if sym is bound to CharBoxValue converter.bindSymbol(sym, fir::CharBoxValue( else converter.bindSymbol(*sym, val); Note: This code is just to convey the idea. Can you try something like this and see whether it works for the different types and boxes. If there are issues, we might have to chat with the lowering team.

Fixed support for pointer type.

TIFitis marked an inline comment as done.Jun 21 2023, 9:01 AM

TIFitis added inline comments.

flang/lib/Lower/OpenMP.cpp
937	Thanks for the pointer, it worked. I have only added support for the PointerType here as UseDevice clause operands must always be pointers as per standard. I checked with integer and character pointers and both are working fine.

TIFitis marked an inline comment as done.Jun 21 2023, 9:06 AM

Harbormaster completed remote builds in B240266: Diff 533287.Jun 21 2023, 9:44 AM

kiranchandramohan added inline comments.Jun 21 2023, 10:11 AM

flang/lib/Lower/OpenMP.cpp
443	Is `fir::factory::getNonDeferredLenParams` not available here?
927–941	Can you move this into a `createBodyOfTargetOp` function?
flang/test/Lower/OpenMP/target.f90
172	Are these tests valid for `device_ptr`? Should they be of type `c_ptr` as per the standard?

Addressed reviewer comments

TIFitis marked 3 inline comments as done.Jun 21 2023, 1:40 PM

TIFitis added inline comments.

flang/lib/Lower/OpenMP.cpp
443	Sorry about that, I've fixed it.
753–758	Are these error messages alright?
flang/test/Lower/OpenMP/target.f90
172	I've updated the test. Let me know if its okay now.

Harbormaster completed remote builds in B240338: Diff 533379.Jun 21 2023, 1:50 PM

Please test with the use_device_ptr or use_device_addr tests in the gfortran testsuite.

This is a reasonable start for modelling device_ptr and device_addr. More changes will probably be required to model the dataflow and integrating with the data-mapping clause. Using a block argument hopefully makes lowering to LLVM easier.

LG. Please wait for one more acceptance.

flang/lib/Lower/OpenMP.cpp
733	Nit: remove auto here and elsewhere where the type is not in the RHS.
748	Since all boxes are not MutableBoxes, this is probably not correct.
753–758	If you are pressed for time, I would recommend passing through only the types that are required and marking everything else as a TODO. Please consult the standard, gfortran testsuite, and your team for the types that are allowed for device_ptr, device_addr.

kiranchandramohan accepted this revision.Jun 22 2023, 2:42 AM

This revision is now accepted and ready to land.Jun 22 2023, 2:42 AM

Addressed reviewer comments.

flang/lib/Lower/OpenMP.cpp
748	Thanks, I've updated this.
753–758	I have added the same checks for the use_device operands that we had added earlier for the map clause operands. Hope that's satisfactory for now. AFAIK use_device clauses are more restrictive as they only allow pointers. I'll update this in the future as we add support for more types.

Harbormaster completed remote builds in B240474: Diff 533559.Jun 22 2023, 5:45 AM

LGTM

Closed by commit rGd21580c30657: [MLIR][OpenMP]Add Flang lowering support for device_ptr and device_addr clauses (authored by TIFitis). · Explain WhyJun 22 2023, 7:52 AM

This revision was automatically updated to reflect the committed changes.

TIFitis added a commit: rGd21580c30657: [MLIR][OpenMP]Add Flang lowering support for device_ptr and device_addr clauses.

Revision Contents

Path

Size

flang/

lib/

Lower/

OpenMP.cpp

117 lines

test/

Lower/

OpenMP/

target.f90

36 lines

Diff 533607

flang/lib/Lower/OpenMP.cpp

Show First 20 Lines • Show All 434 Lines • ▼ Show 20 Lines	return base.match(
},		},
[&](const auto &) -> fir::ExtendedValue {		[&](const auto &) -> fir::ExtendedValue {
return fir::substBase(base, val);		return fir::substBase(base, val);
});		});
}		}

static void threadPrivatizeVars(Fortran::lower::AbstractConverter &converter,		static void threadPrivatizeVars(Fortran::lower::AbstractConverter &converter,
Fortran::lower::pft::Evaluation &eval) {		Fortran::lower::pft::Evaluation &eval) {
auto &firOpBuilder = converter.getFirOpBuilder();		auto &firOpBuilder = converter.getFirOpBuilder();
		kiranchandramohanUnsubmitted Done Reply Inline Actions Is `fir::factory::getNonDeferredLenParams` not available here? kiranchandramohan: Is `fir::factory::getNonDeferredLenParams` not available here?
		TIFitisAuthorUnsubmitted Done Reply Inline Actions Sorry about that, I've fixed it. TIFitis: Sorry about that, I've fixed it.
mlir::Location currentLocation = converter.getCurrentLocation();		mlir::Location currentLocation = converter.getCurrentLocation();
auto insPt = firOpBuilder.saveInsertionPoint();		auto insPt = firOpBuilder.saveInsertionPoint();
firOpBuilder.setInsertionPointToStart(firOpBuilder.getAllocaBlock());		firOpBuilder.setInsertionPointToStart(firOpBuilder.getAllocaBlock());

// Get the original ThreadprivateOp corresponding to the symbol and use the		// Get the original ThreadprivateOp corresponding to the symbol and use the
// symbol value from that opeartion to create one ThreadprivateOp copy		// symbol value from that opeartion to create one ThreadprivateOp copy
// operation inside the parallel region.		// operation inside the parallel region.
auto genThreadprivateOp = [&](Fortran::lower::SymbolRef sym) -> mlir::Value {		auto genThreadprivateOp = [&](Fortran::lower::SymbolRef sym) -> mlir::Value {
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	createBodyOfOp(Op &op, Fortran::lower::AbstractConverter &converter,

if constexpr (std::is_same_v<Op, omp::ParallelOp>) {		if constexpr (std::is_same_v<Op, omp::ParallelOp>) {
threadPrivatizeVars(converter, eval);		threadPrivatizeVars(converter, eval);
if (clauses)		if (clauses)
genCopyinClause(converter, *clauses);		genCopyinClause(converter, *clauses);
}		}
}		}

		static void createBodyOfTargetOp(
		Fortran::lower::AbstractConverter &converter, mlir::omp::DataOp &dataOp,
		const llvm::SmallVector<mlir::Type> &useDeviceTypes,
		const llvm::SmallVector<mlir::Location> &useDeviceLocs,
		const SmallVector<const Fortran::semantics::Symbol *> &useDeviceSymbols,
		const mlir::Location &currentLocation) {
		fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
		mlir::Region &region = dataOp.getRegion();
		kiranchandramohanUnsubmitted Done Reply Inline Actions Nit: remove auto here and elsewhere where the type is not in the RHS. kiranchandramohan: Nit: remove auto here and elsewhere where the type is not in the RHS.

		firOpBuilder.createBlock(&region, {}, useDeviceTypes, useDeviceLocs);
		firOpBuilder.create<mlir::omp::TerminatorOp>(currentLocation);
		firOpBuilder.setInsertionPointToStart(&region.front());

		unsigned argIndex = 0;
		for (auto *sym : useDeviceSymbols) {
		const mlir::BlockArgument &arg = region.front().getArgument(argIndex);
		mlir::Value val = fir::getBase(arg);
		fir::ExtendedValue extVal = converter.getSymbolExtendedValue(*sym);
		if (auto refType = val.getType().dyn_cast<fir::ReferenceType>()) {
		if (fir::isa_builtin_cptr_type(refType.getElementType())) {
		converter.bindSymbol(*sym, val);
		} else {
		extVal.match(
		kiranchandramohanUnsubmitted Done Reply Inline Actions Since all boxes are not MutableBoxes, this is probably not correct. kiranchandramohan: Since all boxes are not MutableBoxes, this is probably not correct.
		TIFitisAuthorUnsubmitted Done Reply Inline Actions Thanks, I've updated this. TIFitis: Thanks, I've updated this.
		[&](const fir::MutableBoxValue &mbv) {
		converter.bindSymbol(
		*sym,
		fir::MutableBoxValue(
		val, fir::factory::getNonDeferredLenParams(extVal), {}));
		},
		[&](const auto &) {
		TODO(converter.getCurrentLocation(),
		"use_device clause operand unsupported type");
		});
		TIFitisAuthorUnsubmitted Done Reply Inline Actions Are these error messages alright? TIFitis: Are these error messages alright?
		kiranchandramohanUnsubmitted Done Reply Inline Actions If you are pressed for time, I would recommend passing through only the types that are required and marking everything else as a TODO. Please consult the standard, gfortran testsuite, and your team for the types that are allowed for device_ptr, device_addr. kiranchandramohan: If you are pressed for time, I would recommend passing through only the types that are required…
		TIFitisAuthorUnsubmitted Done Reply Inline Actions I have added the same checks for the use_device operands that we had added earlier for the map clause operands. Hope that's satisfactory for now. AFAIK use_device clauses are more restrictive as they only allow pointers. I'll update this in the future as we add support for more types. TIFitis: I have added the same checks for the use_device operands that we had added earlier for the map…
		}
		} else {
		TODO(converter.getCurrentLocation(),
		"use_device clause operand unsupported type");
		}
		argIndex++;
		}
		}

static void createTargetOp(Fortran::lower::AbstractConverter &converter,		static void createTargetOp(Fortran::lower::AbstractConverter &converter,
const Fortran::parser::OmpClauseList &opClauseList,		const Fortran::parser::OmpClauseList &opClauseList,
const llvm::omp::Directive &directive,		const llvm::omp::Directive &directive,
Fortran::lower::pft::Evaluation *eval = nullptr) {		Fortran::lower::pft::Evaluation *eval = nullptr) {
Fortran::lower::StatementContext stmtCtx;		Fortran::lower::StatementContext stmtCtx;
fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();		fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();

mlir::Value ifClauseOperand, deviceOperand, threadLmtOperand;		mlir::Value ifClauseOperand, deviceOperand, threadLmtOperand;
mlir::UnitAttr nowaitAttr;		mlir::UnitAttr nowaitAttr;
llvm::SmallVector<mlir::Value> useDevicePtrOperand, useDeviceAddrOperand,		llvm::SmallVector<mlir::Value> mapOperands, devicePtrOperands,
mapOperands;		deviceAddrOperands;
llvm::SmallVector<mlir::IntegerAttr> mapTypes;		llvm::SmallVector<mlir::IntegerAttr> mapTypes;
		llvm::SmallVector<mlir::Type> useDeviceTypes;
		llvm::SmallVector<mlir::Location> useDeviceLocs;
		SmallVector<const Fortran::semantics::Symbol *> useDeviceSymbols;

auto addMapClause = [&firOpBuilder, &converter, &mapOperands,		/// Check for unsupported map operand types.
&mapTypes](const auto &mapClause,		auto checkType = [](auto currentLocation, mlir::Type type) {
		if (auto refType = type.dyn_cast<fir::ReferenceType>())
		type = refType.getElementType();
		if (auto boxType = type.dyn_cast_or_null<fir::BoxType>())
		if (!boxType.getElementType().isa<fir::PointerType>())
		TODO(currentLocation, "OMPD_target_data MapOperand BoxType");
		};

		auto addMapClause = [&](const auto &mapClause,
mlir::Location &currentLocation) {		mlir::Location &currentLocation) {
auto mapType = std::get<Fortran::parser::OmpMapType::Type>(		auto mapType = std::get<Fortran::parser::OmpMapType::Type>(
std::get<std::optional<Fortran::parser::OmpMapType>>(mapClause->v.t)		std::get<std::optional<Fortran::parser::OmpMapType>>(mapClause->v.t)
->t);		->t);
llvm::omp::OpenMPOffloadMappingFlags mapTypeBits =		llvm::omp::OpenMPOffloadMappingFlags mapTypeBits =
llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_NONE;		llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_NONE;
switch (mapType) {		switch (mapType) {
case Fortran::parser::OmpMapType::Type::To:		case Fortran::parser::OmpMapType::Type::To:
mapTypeBits \|= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO;		mapTypeBits \|= llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO;
Show All 38 Lines	for (const Fortran::parser::OmpObject &ompObject :
ompObject))		ompObject))
TODO(currentLocation,		TODO(currentLocation,
"OMPD_target_data for Array Expressions or Structure Components");		"OMPD_target_data for Array Expressions or Structure Components");
}		}
genObjectList(std::get<Fortran::parser::OmpObjectList>(mapClause->v.t),		genObjectList(std::get<Fortran::parser::OmpObjectList>(mapClause->v.t),
converter, mapOperand);		converter, mapOperand);

for (mlir::Value mapOp : mapOperand) {		for (mlir::Value mapOp : mapOperand) {
/// Check for unsupported map operand types.		checkType(mapOp.getLoc(), mapOp.getType());
mlir::Type checkType = mapOp.getType();
if (auto refType = checkType.dyn_cast<fir::ReferenceType>())
checkType = refType.getElementType();
if (checkType.isa<fir::BoxType>())
TODO(currentLocation, "OMPD_target_data MapOperand BoxType");

mapOperands.push_back(mapOp);		mapOperands.push_back(mapOp);
mapTypes.push_back(mapTypeAttr);		mapTypes.push_back(mapTypeAttr);
}		}
};		};

		auto addUseDeviceClause = [&](const auto &useDeviceClause, auto &operands) {
		genObjectList(useDeviceClause, converter, operands);
		for (auto &operand : operands) {
		checkType(operand.getLoc(), operand.getType());
		useDeviceTypes.push_back(operand.getType());
		useDeviceLocs.push_back(operand.getLoc());
		}
		for (const Fortran::parser::OmpObject &ompObject : useDeviceClause.v) {
		Fortran::semantics::Symbol *sym = getOmpObjectSymbol(ompObject);
		useDeviceSymbols.push_back(sym);
		}
		};

for (const Fortran::parser::OmpClause &clause : opClauseList.v) {		for (const Fortran::parser::OmpClause &clause : opClauseList.v) {
mlir::Location currentLocation = converter.genLocation(clause.source);		mlir::Location currentLocation = converter.genLocation(clause.source);
if (const auto &ifClause =		if (const auto &ifClause =
std::get_if<Fortran::parser::OmpClause::If>(&clause.u)) {		std::get_if<Fortran::parser::OmpClause::If>(&clause.u)) {
ifClauseOperand = getIfClauseOperand(converter, stmtCtx, ifClause);		ifClauseOperand = getIfClauseOperand(converter, stmtCtx, ifClause);
} else if (const auto &deviceClause =		} else if (const auto &deviceClause =
std::get_if<Fortran::parser::OmpClause::Device>(&clause.u)) {		std::get_if<Fortran::parser::OmpClause::Device>(&clause.u)) {
if (auto deviceModifier = std::get<		if (auto deviceModifier = std::get<
std::optional<Fortran::parser::OmpDeviceClause::DeviceModifier>>(		std::optional<Fortran::parser::OmpDeviceClause::DeviceModifier>>(
deviceClause->v.t)) {		deviceClause->v.t)) {
if (deviceModifier ==		if (deviceModifier ==
Fortran::parser::OmpDeviceClause::DeviceModifier::Ancestor) {		Fortran::parser::OmpDeviceClause::DeviceModifier::Ancestor) {
TODO(currentLocation, "OMPD_target Device Modifier Ancestor");		TODO(currentLocation, "OMPD_target Device Modifier Ancestor");
}		}
}		}
if (const auto *deviceExpr = Fortran::semantics::GetExpr(		if (const auto *deviceExpr = Fortran::semantics::GetExpr(
std::get<Fortran::parser::ScalarIntExpr>(deviceClause->v.t))) {		std::get<Fortran::parser::ScalarIntExpr>(deviceClause->v.t))) {
deviceOperand =		deviceOperand =
fir::getBase(converter.genExprValue(*deviceExpr, stmtCtx));		fir::getBase(converter.genExprValue(*deviceExpr, stmtCtx));
}		}
} else if (std::get_if<Fortran::parser::OmpClause::UseDevicePtr>(
&clause.u)) {
TODO(currentLocation, "OMPD_target Use Device Ptr");
} else if (std::get_if<Fortran::parser::OmpClause::UseDeviceAddr>(
&clause.u)) {
TODO(currentLocation, "OMPD_target Use Device Addr");
} else if (const auto &threadLmtClause =		} else if (const auto &threadLmtClause =
std::get_if<Fortran::parser::OmpClause::ThreadLimit>(		std::get_if<Fortran::parser::OmpClause::ThreadLimit>(
&clause.u)) {		&clause.u)) {
threadLmtOperand = fir::getBase(converter.genExprValue(		threadLmtOperand = fir::getBase(converter.genExprValue(
*Fortran::semantics::GetExpr(threadLmtClause->v), stmtCtx));		*Fortran::semantics::GetExpr(threadLmtClause->v), stmtCtx));
} else if (std::get_if<Fortran::parser::OmpClause::Nowait>(&clause.u)) {		} else if (std::get_if<Fortran::parser::OmpClause::Nowait>(&clause.u)) {
nowaitAttr = firOpBuilder.getUnitAttr();		nowaitAttr = firOpBuilder.getUnitAttr();
		} else if (const auto &devPtrClause =
		std::get_if<Fortran::parser::OmpClause::UseDevicePtr>(
		&clause.u)) {
		addUseDeviceClause(devPtrClause->v, devicePtrOperands);
		} else if (const auto &devAddrClause =
		std::get_if<Fortran::parser::OmpClause::UseDeviceAddr>(
		&clause.u)) {
		addUseDeviceClause(devAddrClause->v, deviceAddrOperands);
} else if (const auto &mapClause =		} else if (const auto &mapClause =
std::get_if<Fortran::parser::OmpClause::Map>(&clause.u)) {		std::get_if<Fortran::parser::OmpClause::Map>(&clause.u)) {
addMapClause(mapClause, currentLocation);		addMapClause(mapClause, currentLocation);
} else {		} else {
TODO(currentLocation, "OMPD_target unhandled clause");		TODO(currentLocation, "OMPD_target unhandled clause");
}		}
}		}

llvm::SmallVector<mlir::Attribute> mapTypesAttr(mapTypes.begin(),		llvm::SmallVector<mlir::Attribute> mapTypesAttr(mapTypes.begin(),
mapTypes.end());		mapTypes.end());
mlir::ArrayAttr mapTypesArrayAttr =		mlir::ArrayAttr mapTypesArrayAttr =
ArrayAttr::get(firOpBuilder.getContext(), mapTypesAttr);		ArrayAttr::get(firOpBuilder.getContext(), mapTypesAttr);
mlir::Location currentLocation = converter.getCurrentLocation();		mlir::Location currentLocation = converter.getCurrentLocation();

if (directive == llvm::omp::Directive::OMPD_target) {		if (directive == llvm::omp::Directive::OMPD_target) {
auto targetOp = firOpBuilder.create<omp::TargetOp>(		auto targetOp = firOpBuilder.create<omp::TargetOp>(
currentLocation, ifClauseOperand, deviceOperand, threadLmtOperand,		currentLocation, ifClauseOperand, deviceOperand, threadLmtOperand,
nowaitAttr, mapOperands, mapTypesArrayAttr);		nowaitAttr, mapOperands, mapTypesArrayAttr);
createBodyOfOp(targetOp, converter, currentLocation, *eval, &opClauseList);		createBodyOfOp(targetOp, converter, currentLocation, *eval, &opClauseList);
} else if (directive == llvm::omp::Directive::OMPD_target_data) {		} else if (directive == llvm::omp::Directive::OMPD_target_data) {
auto dataOp = firOpBuilder.create<omp::DataOp>(		auto dataOp = firOpBuilder.create<omp::DataOp>(
currentLocation, ifClauseOperand, deviceOperand, useDevicePtrOperand,		currentLocation, ifClauseOperand, deviceOperand, devicePtrOperands,
useDeviceAddrOperand, mapOperands, mapTypesArrayAttr);		deviceAddrOperands, mapOperands, mapTypesArrayAttr);
createBodyOfOp(dataOp, converter, currentLocation, *eval, &opClauseList);		createBodyOfTargetOp(converter, dataOp, useDeviceTypes, useDeviceLocs,
		useDeviceSymbols, currentLocation);
} else if (directive == llvm::omp::Directive::OMPD_target_enter_data) {		} else if (directive == llvm::omp::Directive::OMPD_target_enter_data) {
firOpBuilder.create<omp::EnterDataOp>(currentLocation, ifClauseOperand,		firOpBuilder.create<omp::EnterDataOp>(currentLocation, ifClauseOperand,
deviceOperand, nowaitAttr,		deviceOperand, nowaitAttr,
mapOperands, mapTypesArrayAttr);		mapOperands, mapTypesArrayAttr);
		TIFitisAuthorUnsubmitted Done Reply Inline Actions The arg has type i32 here instead of fir.ref<i32> and as a result getting an error when adding it as an operand to dev_ptr clause. TIFitis: The arg has type i32 here instead of fir.ref<i32> and as a result getting an error when adding…
} else if (directive == llvm::omp::Directive::OMPD_target_exit_data) {		} else if (directive == llvm::omp::Directive::OMPD_target_exit_data) {
firOpBuilder.create<omp::ExitDataOp>(currentLocation, ifClauseOperand,		firOpBuilder.create<omp::ExitDataOp>(currentLocation, ifClauseOperand,
deviceOperand, nowaitAttr, mapOperands,		deviceOperand, nowaitAttr, mapOperands,
mapTypesArrayAttr);		mapTypesArrayAttr);
} else {		} else {
TODO(currentLocation, "OMPD_target directive unknown");		TODO(currentLocation, "OMPD_target directive unknown");
		kiranchandramohanUnsubmitted Done Reply Inline Actions I think the issue here is that we are binding the symbol to an MLIR value. The original symbol is bound to an ExtendedValue (fir::MutableBoxValue). When some checks are done to see whether the value is a box of type MutableBoxValue, it fails. And hence it does not insert the load and box_addr instructions. I think the solution is to create an ExtendedValue of the matching type. if sym is bound to MutableBoxValue box converter.bindSymbol(sym, fir::MutableBoxValue(val, box.lenparams, {})); else if sym is bound to CharBoxValue converter.bindSymbol(sym, fir::CharBoxValue( else converter.bindSymbol(sym, val); Note: This code is just to convey the idea. Can you try something like this and see whether it works for the different types and boxes. If there are issues, we might have to chat with the lowering team. kiranchandramohan:* I think the issue here is that we are binding the symbol to an MLIR value. The original symbol…
		TIFitisAuthorUnsubmitted Done Reply Inline Actions Thanks for the pointer, it worked. I have only added support for the PointerType here as UseDevice clause operands must always be pointers as per standard. I checked with integer and character pointers and both are working fine. TIFitis: Thanks for the pointer, it worked. I have only added support for the PointerType here as…
}		}
}		}

static void genOMP(Fortran::lower::AbstractConverter &converter,		static void genOMP(Fortran::lower::AbstractConverter &converter,
		kiranchandramohanUnsubmitted Done Reply Inline Actions Can you move this into a `createBodyOfTargetOp` function? kiranchandramohan: Can you move this into a `createBodyOfTargetOp` function?
Fortran::lower::pft::Evaluation &eval,		Fortran::lower::pft::Evaluation &eval,
const Fortran::parser::OpenMPSimpleStandaloneConstruct		const Fortran::parser::OpenMPSimpleStandaloneConstruct
&simpleStandaloneConstruct) {		&simpleStandaloneConstruct) {
const auto &directive =		const auto &directive =
std::get<Fortran::parser::OmpSimpleStandaloneDirective>(		std::get<Fortran::parser::OmpSimpleStandaloneDirective>(
simpleStandaloneConstruct.t);		simpleStandaloneConstruct.t);
fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();		fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
const Fortran::parser::OmpClauseList &opClauseList =		const Fortran::parser::OmpClauseList &opClauseList =
▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	if (const auto &ifClause =
// for semantic checks.		// for semantic checks.
continue;		continue;
}		}
} else if (std::get_if<Fortran::parser::OmpClause::Threads>(&clause.u)) {		} else if (std::get_if<Fortran::parser::OmpClause::Threads>(&clause.u)) {
// Nothing needs to be done for threads clause.		// Nothing needs to be done for threads clause.
continue;		continue;
} else if (std::get_if<Fortran::parser::OmpClause::Map>(&clause.u)) {		} else if (std::get_if<Fortran::parser::OmpClause::Map>(&clause.u)) {
// Map clause is exclusive to Target Data directives. It is handled		// Map clause is exclusive to Target Data directives. It is handled
// as part of the DataOp creation.		// as part of the TargetOp creation.
		continue;
		} else if (std::get_if<Fortran::parser::OmpClause::UseDevicePtr>(
		&clause.u)) {
		// UseDevicePtr clause is exclusive to Target Data directives. It is
		// handled as part of the TargetOp creation.
		continue;
		} else if (std::get_if<Fortran::parser::OmpClause::UseDeviceAddr>(
		&clause.u)) {
		// UseDeviceAddr clause is exclusive to Target Data directives. It is
		// handled as part of the TargetOp creation.
continue;		continue;
} else if (std::get_if<Fortran::parser::OmpClause::ThreadLimit>(		} else if (std::get_if<Fortran::parser::OmpClause::ThreadLimit>(
&clause.u)) {		&clause.u)) {
// Handled as part of TargetOp creation.		// Handled as part of TargetOp creation.
continue;		continue;
} else if (const auto &finalClause =		} else if (const auto &finalClause =
std::get_if<Fortran::parser::OmpClause::Final>(&clause.u)) {		std::get_if<Fortran::parser::OmpClause::Final>(&clause.u)) {
mlir::Value finalVal = fir::getBase(converter.genExprValue(		mlir::Value finalVal = fir::getBase(converter.genExprValue(
▲ Show 20 Lines • Show All 1,808 Lines • Show Last 20 Lines

flang/test/Lower/OpenMP/target.f90

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	subroutine omp_target_thread_limit
!CHECK: %[[VAL_1:.*]] = arith.constant 64 : i32		!CHECK: %[[VAL_1:.*]] = arith.constant 64 : i32
!CHECK: omp.target thread_limit(%[[VAL_1]] : i32) map((tofrom -> %[[VAL_0]] : !fir.ref<i32>)) {		!CHECK: omp.target thread_limit(%[[VAL_1]] : i32) map((tofrom -> %[[VAL_0]] : !fir.ref<i32>)) {
!$omp target map(tofrom: a) thread_limit(64)		!$omp target map(tofrom: a) thread_limit(64)
a = 10		a = 10
!CHECK: omp.terminator		!CHECK: omp.terminator
!$omp end target		!$omp end target
!CHECK: }		!CHECK: }
end subroutine omp_target_thread_limit		end subroutine omp_target_thread_limit

		!===============================================================================
		! Target `use_device_ptr` clause
		!===============================================================================

		!CHECK-LABEL: func.func @_QPomp_target_device_ptr() {
		subroutine omp_target_device_ptr
		use iso_c_binding, only : c_ptr, c_loc
		kiranchandramohanUnsubmitted Done Reply Inline Actions Are these tests valid for `device_ptr`? Should they be of type `c_ptr` as per the standard? kiranchandramohan: Are these tests valid for `device_ptr`? Should they be of type `c_ptr` as per the standard?
		TIFitisAuthorUnsubmitted Done Reply Inline Actions I've updated the test. Let me know if its okay now. TIFitis: I've updated the test. Let me know if its okay now.
		type(c_ptr) :: a
		integer, target :: b
		!CHECK: omp.target_data map((tofrom -> %[[VAL_0:.*]] : !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>)) use_device_ptr(%[[VAL_0]] : !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>)
		!$omp target data map(tofrom: a) use_device_ptr(a)
		!CHECK: ^bb0(%[[VAL_1:.*]]: !fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>):
		!CHECK: {{.}} = fir.coordinate_of %[[VAL_1:.]], {{.*}} : (!fir.ref<!fir.type<_QM__fortran_builtinsT__builtin_c_ptr{__address:i64}>>, !fir.field) -> !fir.ref<i64>
		a = c_loc(b)
		!CHECK: omp.terminator
		!$omp end target data
		!CHECK: }
		end subroutine omp_target_device_ptr

		!===============================================================================
		! Target `use_device_addr` clause
		!===============================================================================

		!CHECK-LABEL: func.func @_QPomp_target_device_addr() {
		subroutine omp_target_device_addr
		integer, pointer :: a
		!CHECK: omp.target_data map((tofrom -> %[[VAL_0:.*]] : !fir.ref<!fir.box<!fir.ptr<i32>>>)) use_device_addr(%[[VAL_0]] : !fir.ref<!fir.box<!fir.ptr<i32>>>)
		!$omp target data map(tofrom: a) use_device_addr(a)
		!CHECK: ^bb0(%[[VAL_1:.*]]: !fir.ref<!fir.box<!fir.ptr<i32>>>):
		!CHECK: {{.*}} = fir.load %[[VAL_1]] : !fir.ref<!fir.box<!fir.ptr<i32>>>
		a = 10
		!CHECK: omp.terminator
		!$omp end target data
		!CHECK: }
		end subroutine omp_target_device_addr

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][OpenMP]Add Flang lowering support for device_ptr and device_addr clausesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 533607

flang/lib/Lower/OpenMP.cpp

flang/test/Lower/OpenMP/target.f90

[MLIR][OpenMP]Add Flang lowering support for device_ptr and device_addr clauses
ClosedPublic