This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
lib/Lower/
-
Lower/
-
OpenMP.cpp
-
test/Lower/OpenMP/
-
Lower/
-
OpenMP/
5/7
target_data.f90

Differential D144013

[MLIR][OpenMP] Add Lowering support for OpenMP Target Data with region
ClosedPublic

Authored by TIFitis on Feb 14 2023, 7:26 AM.

Download Raw Diff

Details

Reviewers

kiranchandramohan
raghavendhra
clementval
jdoerfert

Commits

rG7d9be4368a36: [MLIR][OpenMP] Add Lowering support for OpenMP Target Data with region

Summary

This patch adds Fortran Lowering support for the OpenMP Target Data Op with associated region.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

TIFitis created this revision.Feb 14 2023, 7:26 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 14 2023, 7:26 AM

Herald added subscribers: bzcheeseman, mehdi_amini, rriddle and 2 others. · View Herald Transcript

TIFitis requested review of this revision.Feb 14 2023, 7:26 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptFeb 14 2023, 7:26 AM

Herald added subscribers: sstefan1, stephenneuendorffer, jdoerfert. · View Herald Transcript

Harbormaster completed remote builds in B213660: Diff 497323.Feb 14 2023, 8:13 AM

We probably need to update the Conversion in MLIR to OpenMP to handle the region as well. If we add a loop inside the target data construct, I believe currently it will fail to convert to LLVM.
https://github.com/llvm/llvm-project/blob/15b90805bcb8a567525a0a3605904f3ea59490dd/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp#L102

flang/test/Lower/OpenMP/target_data.f90
115	Can we add a CHECK for the contents as well?

In D144013#4128941, @kiranchandramohan wrote:

We probably need to update the Conversion in MLIR to OpenMP to handle the region as well. If we add a loop inside the target data construct, I believe currently it will fail to convert to LLVM.
https://github.com/llvm/llvm-project/blob/15b90805bcb8a567525a0a3605904f3ea59490dd/mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp#L102

This can come in as a separate patch.

Updated test.

Previous patch was incorrect.

This patch fixes the tests.

Harbormaster completed remote builds in B214137: Diff 497987.Feb 16 2023, 6:09 AM

TIFitis marked an inline comment as done.Feb 17 2023, 6:14 AM

LG.

flang/test/Lower/OpenMP/target_data.f90
114	I have not looked at how the lowering to LLVM IR works. I have a question here. When the array is mapped to the address-space of the target, I assume there will be a separate allocation there and it should be the new allocation that should be used in the body of this region. The `fir.coordinate` below continues to use the current `fir.alloca` in the host. How will this work. Generally, we handle this by making it an argument of the region and using it instead. The argument will be materialised as an argument in the target. Hence the use in the body of the construct need not be changed.

TIFitis added inline comments.Feb 17 2023, 8:46 AM

flang/test/Lower/OpenMP/target_data.f90
114	I haven't explicitly handled this in the IRBuilder. I am just now familiarising myself with the issue. There is a patch up for review from Jan Sjodin in our team for alloca address space handling in FIR->LLVMIR here D144203. My guess for now is that we would need that patch to handle this scenario. Do you think this is something that needs addressing in this patch? I plan on putting another patch for review for the FIR->LLVMIR conversion handling of region for Target Data.

kiranchandramohan requested changes to this revision.Feb 22 2023, 7:17 AM

kiranchandramohan added inline comments.

flang/test/Lower/OpenMP/target_data.f90
114	I would like to understand how this works fully before proceeding in order to ensure that we are doing this correctly. My concern is primarily because this seems similar to privatisation clauses and we handle privatisation a bit differently due to some issues. Doesn't the IRBuilder patch (https://reviews.llvm.org/D142914) lower `target data map`?

This revision now requires changes to proceed.Feb 22 2023, 7:17 AM

TIFitis marked 2 inline comments as done.Feb 22 2023, 10:57 AM

TIFitis added inline comments.

flang/test/Lower/OpenMP/target_data.f90
114	Yes, lowering of Target Data with region is handled. But as you mentioned in your earlier comments, I haven't made the argument local to the region. I was following clang generated code and it didn't seem to do anything special. I am currently looking into what needs to be done for handling the map arguments inside of the region.

Hi Kiran,

Here's what the final llvm IR looks like using clang vs flang-new.

From what I can see clang is still using the same alloca inside the region body, and furthermore if you change the map type to to, from, tofrom and alloc, the only difference in IR is the @.offload_maptypes global variable.

As I said previously, I am getting to grips with offloading as well. From reading the specification it seems the use_device_addr and use_device_ptr clauses are used to deal with the interaction between host and device data maps.
Please let me know if I am missing something here.

Fortran code:

subroutine openmp_target_data
    integer :: i
    !$omp target data map(tofrom: a)
        i = 99
    !$omp end target data
end subroutine openmp_target_data

flang generated llvm ir:

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

%struct.ident_t = type { i32, i32, i32, i32, ptr }

@0 = private unnamed_addr constant [26 x i8] c";test.mlir;unknown;4;10;;\00", align 1
@1 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
@2 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @1 }, align 8
@.offload_maptypes = private unnamed_addr constant [1 x i64] [i64 3]
@.offload_mapnames = private constant [1 x ptr] [ptr @0]

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPopenmp_target_data() {
  %1 = alloca [1 x ptr], align 8
  %2 = alloca [1 x ptr], align 8
  %3 = alloca [1 x i64], align 8
  %4 = alloca float, i64 1, align 4
  %5 = alloca i32, i64 1, align 4
  br label %entry

entry:                                            ; preds = %0
  %6 = getelementptr inbounds [1 x ptr], ptr %1, i32 0, i32 0
  store ptr %4, ptr %6, align 8
  %7 = getelementptr inbounds [1 x ptr], ptr %2, i32 0, i32 0
  store ptr %4, ptr %7, align 8
  %8 = getelementptr inbounds [1 x i64], ptr %3, i32 0, i32 0
  store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %8, align 8
  %9 = getelementptr inbounds [1 x ptr], ptr %1, i32 0, i32 0
  %10 = getelementptr inbounds [1 x ptr], ptr %2, i32 0, i32 0
  %11 = getelementptr inbounds [1 x i64], ptr %3, i32 0, i32 0
  call void @__tgt_target_data_begin_mapper(ptr @2, i64 -1, i32 1, ptr %9, ptr %10, ptr %11, ptr @.offload_maptypes, ptr @.offload_mapnames, ptr null)
  br label %omp.data.region

omp.data.region:                                  ; preds = %entry
  store i32 99, ptr %5, align 4
  br label %omp.region.cont

omp.region.cont:                                  ; preds = %omp.data.region
  %12 = getelementptr inbounds [1 x ptr], ptr %1, i32 0, i32 0
  %13 = getelementptr inbounds [1 x ptr], ptr %2, i32 0, i32 0
  %14 = getelementptr inbounds [1 x i64], ptr %3, i32 0, i32 0
  call void @__tgt_target_data_end_mapper(ptr @2, i64 -1, i32 1, ptr %12, ptr %13, ptr %14, ptr @.offload_maptypes, ptr @.offload_mapnames, ptr null)
  ret void
}

; Function Attrs: nounwind
declare void @__tgt_target_data_begin_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #0

; Function Attrs: nounwind
declare void @__tgt_target_data_end_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #0

attributes #0 = { nounwind }

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

C code:

#include <omp.h>
int main() {
  int i;
#pragma omp target data map(alloc : i)
  { i = 99; }
  return 0;
}

clang generated llvm ir ( clang -fopenmp -fopenmp-extensions -fopenmp-offload-mandatory -fopenmp-targets=amdgcn-amd-amdhsa -S -emit-llvm test.c -o test.ll )

; __CLANG_OFFLOAD_BUNDLE____START__ openmp-amdgcn-amd-amdhsa-gfx90a
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7"
target triple = "amdgcn-amd-amdhsa"

@__omp_rtl_debug_kind = weak_odr hidden addrspace(1) constant i32 0
@__omp_rtl_assume_teams_oversubscription = weak_odr hidden addrspace(1) constant i32 0
@__omp_rtl_assume_threads_oversubscription = weak_odr hidden addrspace(1) constant i32 0
@__omp_rtl_assume_no_thread_state = weak_odr hidden addrspace(1) constant i32 0
@__omp_rtl_assume_no_nested_parallelism = weak_odr hidden addrspace(1) constant i32 0

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5}
!llvm.ident = !{!6}

!0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 7, !"openmp", i32 50}
!3 = !{i32 7, !"openmp-device", i32 50}
!4 = !{i32 8, !"PIC Level", i32 2}
!5 = !{i32 7, !"frame-pointer", i32 2}
!6 = !{!"AOMP_STANDALONE_16.0-4 clang version 17.0.0 (ssh://dpalermo@gerrit-git.amd.com:29418/lightning/ec/llvm-project a5701609098bc48735f5feafe84084a93c00639f)"}

; __CLANG_OFFLOAD_BUNDLE____END__ openmp-amdgcn-amd-amdhsa-gfx90a

; __CLANG_OFFLOAD_BUNDLE____START__ host-x86_64-unknown-linux-gnu
; ModuleID = '/tmp/test-117796.bc'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.ident_t = type { i32, i32, i32, i32, ptr }

@.offload_sizes = private unnamed_addr constant [1 x i64] [i64 4]
@.offload_maptypes = private unnamed_addr constant [1 x i64] [i64 3]
@0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
@1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @0 }, align 8

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
entry:
  %retval = alloca i32, align 4
  %i = alloca i32, align 4
  %.offload_baseptrs = alloca [1 x ptr], align 8
  %.offload_ptrs = alloca [1 x ptr], align 8
  %.offload_mappers = alloca [1 x ptr], align 8
  store i32 0, ptr %retval, align 4
  %0 = getelementptr inbounds [1 x ptr], ptr %.offload_baseptrs, i32 0, i32 0
  store ptr %i, ptr %0, align 8
  %1 = getelementptr inbounds [1 x ptr], ptr %.offload_ptrs, i32 0, i32 0
  store ptr %i, ptr %1, align 8
  %2 = getelementptr inbounds [1 x ptr], ptr %.offload_mappers, i64 0, i64 0
  store ptr null, ptr %2, align 8
  %3 = getelementptr inbounds [1 x ptr], ptr %.offload_baseptrs, i32 0, i32 0
  %4 = getelementptr inbounds [1 x ptr], ptr %.offload_ptrs, i32 0, i32 0
  call void @__tgt_target_data_begin_mapper(ptr @1, i64 -1, i32 1, ptr %3, ptr %4, ptr @.offload_sizes, ptr @.offload_maptypes, ptr null, ptr null)
  store i32 99, ptr %i, align 4
  %5 = getelementptr inbounds [1 x ptr], ptr %.offload_baseptrs, i32 0, i32 0
  %6 = getelementptr inbounds [1 x ptr], ptr %.offload_ptrs, i32 0, i32 0
  call void @__tgt_target_data_end_mapper(ptr @1, i64 -1, i32 1, ptr %5, ptr %6, ptr @.offload_sizes, ptr @.offload_maptypes, ptr null, ptr null)
  ret i32 0
}

; Function Attrs: nounwind
declare void @__tgt_target_data_begin_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #1

; Function Attrs: nounwind
declare void @__tgt_target_data_end_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #1

attributes #0 = { noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { nounwind }

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5}
!llvm.ident = !{!6}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"openmp", i32 50}
!2 = !{i32 8, !"PIC Level", i32 2}
!3 = !{i32 7, !"PIE Level", i32 2}
!4 = !{i32 7, !"uwtable", i32 2}
!5 = !{i32 7, !"frame-pointer", i32 2}
!6 = !{!"AOMP_STANDALONE_16.0-4 clang version 17.0.0 (ssh://dpalermo@gerrit-git.amd.com:29418/lightning/ec/llvm-project a5701609098bc48735f5feafe84084a93c00639f)"}

; __CLANG_OFFLOAD_BUNDLE____END__ host-x86_64-unknown-linux-gnu

Thanks,
Akash

In my previous comment I made a mistake. The C code map_type should be tofrom. The IR is still correct and generated with tofrom as the map_type.

In D144013#4150284, @TIFitis wrote:

In my previous comment I made a mistake. The C code map_type should be tofrom. The IR is still correct and generated with tofrom as the map_type.

Sorry to be making these requests. But the following example is mapping a and not i. Would it be OK for you to give the IR for this?

subroutine openmp_target_data
    integer :: i
    !$omp target data map(tofrom: a)
        i = 99
    !$omp end target data
end subroutine openmp_target_data

Sorry to be making these requests. But the following example is mapping a and not i. Would it be OK for you to give the IR for this?
subroutine openmp_target_data
    integer :: i
    !$omp target data map(tofrom: a)
        i = 99
    !$omp end target data
end subroutine openmp_target_data

Sorry about that. Here's the correct IR.

Fortran Code:

subroutine openmp_target_data
    integer :: i
    !$omp target data map(tofrom: i)
        i = 99
    !$omp end target data
end subroutine openmp_target_data

FIR ( flang-new -fc1 -emit-fir -fopenmp test.f90 -o test.fir ):

module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<i64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f80, dense<128> : vector<2xi32>>, #dlti.dl_entry<i1, dense<8> : vector<2xi32>>, #dlti.dl_entry<i8, dense<8> : vector<2xi32>>, #dlti.dl_entry<i16, dense<16> : vector<2xi32>>, #dlti.dl_entry<i32, dense<32> : vector<2xi32>>, #dlti.dl_entry<f16, dense<16> : vector<2xi32>>, #dlti.dl_entry<f64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f128, dense<128> : vector<2xi32>>>, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
  func.func @_QPopenmp_target_data() {
    %0 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFopenmp_target_dataEi"}
    omp.target_data   map((tofrom -> %0 : !fir.ref<i32>)) {
      %c99_i32 = arith.constant 99 : i32
      fir.store %c99_i32 to %0 : !fir.ref<i32>
      omp.terminator
    }
    return
  }
}

LLVMIR ( fir-opt --split-input-file --cfg-conversion --fir-to-llvm-ir="target=aarch64-unknown-linux-gnu" test.fir -o test.mlir ):

module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<i64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f80, dense<128> : vector<2xi32>>, #dlti.dl_entry<i1, dense<8> : vector<2xi32>>, #dlti.dl_entry<i8, dense<8> : vector<2xi32>>, #dlti.dl_entry<i16, dense<16> : vector<2xi32>>, #dlti.dl_entry<i32, dense<32> : vector<2xi32>>, #dlti.dl_entry<f16, dense<16> : vector<2xi32>>, #dlti.dl_entry<f64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f128, dense<128> : vector<2xi32>>>, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "aarch64-unknown-linux-gnu"} {
  llvm.func @_QPopenmp_target_data() {
    %0 = llvm.mlir.constant(1 : i64) : i64
    %1 = llvm.alloca %0 x i32 {bindc_name = "i", in_type = i32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFopenmp_target_dataEi"} : (i64) -> !llvm.ptr<i32>
    omp.target_data   map((tofrom -> %1 : !llvm.ptr<i32>)) {
      %2 = llvm.mlir.constant(99 : i32) : i32
      llvm.store %2, %1 : !llvm.ptr<i32>
      omp.terminator
    }
    llvm.return
  }
}

llvm IR ( mlir-translate -allow-unregistered-dialect --mlir-to-llvmir test.mlir -o test.ll ):

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

%struct.ident_t = type { i32, i32, i32, i32, ptr }

@0 = private unnamed_addr constant [26 x i8] c";test.mlir;unknown;4;10;;\00", align 1
@1 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
@2 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @1 }, align 8
@.offload_maptypes = private unnamed_addr constant [1 x i64] [i64 3]
@.offload_mapnames = private constant [1 x ptr] [ptr @0]

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPopenmp_target_data() {
  %1 = alloca [1 x ptr], align 8
  %2 = alloca [1 x ptr], align 8
  %3 = alloca [1 x i64], align 8
  %4 = alloca i32, i64 1, align 4
  br label %entry

entry:                                            ; preds = %0
  %5 = getelementptr inbounds [1 x ptr], ptr %1, i32 0, i32 0
  store ptr %4, ptr %5, align 8
  %6 = getelementptr inbounds [1 x ptr], ptr %2, i32 0, i32 0
  store ptr %4, ptr %6, align 8
  %7 = getelementptr inbounds [1 x i64], ptr %3, i32 0, i32 0
  store i64 ptrtoint (ptr getelementptr (ptr, ptr null, i32 1) to i64), ptr %7, align 8
  %8 = getelementptr inbounds [1 x ptr], ptr %1, i32 0, i32 0
  %9 = getelementptr inbounds [1 x ptr], ptr %2, i32 0, i32 0
  %10 = getelementptr inbounds [1 x i64], ptr %3, i32 0, i32 0
  call void @__tgt_target_data_begin_mapper(ptr @2, i64 -1, i32 1, ptr %8, ptr %9, ptr %10, ptr @.offload_maptypes, ptr @.offload_mapnames, ptr null)
  br label %omp.data.region

omp.data.region:                                  ; preds = %entry
  store i32 99, ptr %4, align 4
  br label %omp.region.cont

omp.region.cont:                                  ; preds = %omp.data.region
  %11 = getelementptr inbounds [1 x ptr], ptr %1, i32 0, i32 0
  %12 = getelementptr inbounds [1 x ptr], ptr %2, i32 0, i32 0
  %13 = getelementptr inbounds [1 x i64], ptr %3, i32 0, i32 0
  call void @__tgt_target_data_end_mapper(ptr @2, i64 -1, i32 1, ptr %11, ptr %12, ptr %13, ptr @.offload_maptypes, ptr @.offload_mapnames, ptr null)
  ret void
}

; Function Attrs: nounwind
declare void @__tgt_target_data_begin_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #0

; Function Attrs: nounwind
declare void @__tgt_target_data_end_mapper(ptr, i64, i32, ptr, ptr, ptr, ptr, ptr, ptr) #0

attributes #0 = { nounwind }

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

Thanks,
Akash

Thanks, that looks good. The runtime is in charge of mapping and copying the data. I guess it will transparently map the address from the host pointer to the device address space since it stores the mapping.

That target data construct has device, if, map, use_device_addr, use_device_ptr clauses. Are all these supported? If not could you add TODO failures for the ones that are not supported. For the supported ones, could you add tests?

In D144013#4154851, @kiranchandramohan wrote:

Thanks, that looks good. The runtime is in charge of mapping and copying the data. I guess it will transparently map the address from the host pointer to the device address space since it stores the mapping.

That target data construct has device, if, map, use_device_addr, use_device_ptr clauses. Are all these supported? If not could you add TODO failures for the ones that are not supported. For the supported ones, could you add tests?

Yes, these things were already addressed in this patch : https://reviews.llvm.org/D142357. This only adds the region lowering support for Target Data.

Thanks. LGTM.

This revision is now accepted and ready to land.Feb 27 2023, 8:10 AM

It will also be helpful if you can add TODOs for the types that are not handled.

vzakhari added a subscriber: vzakhari.Feb 27 2023, 4:18 PM

vzakhari added inline comments.

flang/test/Lower/OpenMP/target_data.f90
120	Is there a document that I can read to understand how the OpenMP MLIR dialect is going to work with generic MLIR optimizations? Sorry for hijacking this review, but I am curious about the following case: %0 = fir.alloca !fir.array<1024xi32> %1 = fir.coordinate_of %0, %cstN // use of %1 omp.target_data map((tofrom -> %0)) { %2 = fir.coordinate_of %0, %cstN // use of %2 } How is it guaranteed that the above will not be optimized by some MLIR pass into the following: %0 = fir.alloca !fir.array<1024xi32> %1 = fir.coordinate_of %0, %cstN // use of %1 omp.target_data map((tofrom -> %0)) { // %2 = fir.coordinate_of %0, %cstN // eliminated or hoisted // use of %1 // there is no any association between the references inside the region and the references in the map clauses } In other words, is there something in OMP dialect operations's definitions that prevents other MLIR passes to introduce new live-ins into the OMP regions? This may not be a problem for `target data map`, but it may be a problem for `target data use_device_ptr` and `target map`.

kiranchandramohan added inline comments.Feb 28 2023, 3:08 AM

flang/test/Lower/OpenMP/target_data.f90
120	Thanks @vzakhari for bringing up this issue. There is nothing at the moment in the OpenMP dialect that might prevent this. The `reduction declare` OpenMP operation is marked as `isolated from above` to prevent operations from being hoisted out or in. Would you know, -> How freely are operations moved into or out of a region? -> Is the side-effects interface sufficient to control this? I would assume the side-effecting operations are not moved into or out of a region by default. If we mark an operation (with a region) as having side effects, would it be sufficient to disallow sinking/hoisting of other operations to/from it? If you prefer, we can discuss this in MLIR discourse to get opinions from others as well.

In D144013#4156618, @kiranchandramohan wrote:

It will also be helpful if you can add TODOs for the types that are not handled.

Hi, can you please tell me which types you are referring to here?

In D144013#4158333, @TIFitis wrote:

In D144013#4156618, @kiranchandramohan wrote:

It will also be helpful if you can add TODOs for the types that are not handled.

Hi, can you please tell me which types you are referring to here?

All types that can be present in a map clause. Should be most of the following, please consult the restrictions for the clause.
char, char array
integer, real, complex and arrays of these with both fixed and variable length
pointer, allocatable, assumed shape arrays
derived type
derived type components, array-sections
polymorphic

In D144013#4158654, @kiranchandramohan wrote:

In D144013#4158333, @TIFitis wrote:

In D144013#4156618, @kiranchandramohan wrote:

It will also be helpful if you can add TODOs for the types that are not handled.

Hi, can you please tell me which types you are referring to here?

All types that can be present in a map clause. Should be most of the following, please consult the restrictions for the clause.
char, char array
integer, real, complex and arrays of these with both fixed and variable length
pointer, allocatable, assumed shape arrays
derived type
derived type components, array-sections
polymorphic

Yes all these types should work for map operands. Is this patch good to go?

In D144013#4164369, @TIFitis wrote:

In D144013#4158654, @kiranchandramohan wrote:

In D144013#4158333, @TIFitis wrote:

In D144013#4156618, @kiranchandramohan wrote:

It will also be helpful if you can add TODOs for the types that are not handled.

Hi, can you please tell me which types you are referring to here?

All types that can be present in a map clause. Should be most of the following, please consult the restrictions for the clause.
char, char array
integer, real, complex and arrays of these with both fixed and variable length
pointer, allocatable, assumed shape arrays
derived type
derived type components, array-sections
polymorphic

Yes all these types should work for map operands. Is this patch good to go?

This patch can go ahead.
I would recommend the following follow-up work:

If all the above types are working then please add tests for them. I assume you are talking about the end-to-end flow once https://reviews.llvm.org/D142914 lands.
Remove the use_device_addr and use_device_ptr Operands from the MLIR operation if the presence of those oeprands would make the semantics of the operation different.
Change the type of map_operands to OpenMP_PointerLikeType.

Closed by commit rG7d9be4368a36: [MLIR][OpenMP] Add Lowering support for OpenMP Target Data with region (authored by TIFitis). · Explain WhyMar 3 2023, 5:05 AM

This revision was automatically updated to reflect the committed changes.

TIFitis added a commit: rG7d9be4368a36: [MLIR][OpenMP] Add Lowering support for OpenMP Target Data with region.

Revision Contents

Path

Size

flang/

lib/

Lower/

OpenMP.cpp

12 lines

test/

Lower/

OpenMP/

target_data.f90

31 lines

Diff 502104

flang/lib/Lower/OpenMP.cpp

Show First 20 Lines • Show All 617 Lines • ▼ Show 20 Lines	if constexpr (std::is_same_v<Op, omp::ParallelOp>) {
if (clauses)		if (clauses)
genCopyinClause(converter, *clauses);		genCopyinClause(converter, *clauses);
}		}
}		}

static void		static void
createTargetDataOp(Fortran::lower::AbstractConverter &converter,		createTargetDataOp(Fortran::lower::AbstractConverter &converter,
const Fortran::parser::OmpClauseList &opClauseList,		const Fortran::parser::OmpClauseList &opClauseList,
const llvm::omp::Directive &directive) {		const llvm::omp::Directive &directive,
		Fortran::lower::pft::Evaluation *eval = nullptr) {
Fortran::lower::StatementContext stmtCtx;		Fortran::lower::StatementContext stmtCtx;
fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();		fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();

mlir::Value ifClauseOperand, deviceOperand;		mlir::Value ifClauseOperand, deviceOperand;
mlir::UnitAttr nowaitAttr;		mlir::UnitAttr nowaitAttr;
llvm::SmallVector<mlir::Value> useDevicePtrOperand, useDeviceAddrOperand,		llvm::SmallVector<mlir::Value> useDevicePtrOperand, useDeviceAddrOperand,
mapOperands;		mapOperands;
llvm::SmallVector<mlir::IntegerAttr> mapTypes;		llvm::SmallVector<mlir::IntegerAttr> mapTypes;
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	createTargetDataOp(Fortran::lower::AbstractConverter &converter,

llvm::SmallVector<mlir::Attribute> mapTypesAttr(mapTypes.begin(),		llvm::SmallVector<mlir::Attribute> mapTypesAttr(mapTypes.begin(),
mapTypes.end());		mapTypes.end());
mlir::ArrayAttr mapTypesArrayAttr =		mlir::ArrayAttr mapTypesArrayAttr =
ArrayAttr::get(firOpBuilder.getContext(), mapTypesAttr);		ArrayAttr::get(firOpBuilder.getContext(), mapTypesAttr);
mlir::Location currentLocation = converter.getCurrentLocation();		mlir::Location currentLocation = converter.getCurrentLocation();

if (directive == llvm::omp::Directive::OMPD_target_data) {		if (directive == llvm::omp::Directive::OMPD_target_data) {
firOpBuilder.create<omp::DataOp>(		auto dataOp = firOpBuilder.create<omp::DataOp>(
currentLocation, ifClauseOperand, deviceOperand, useDevicePtrOperand,		currentLocation, ifClauseOperand, deviceOperand, useDevicePtrOperand,
useDeviceAddrOperand, mapOperands, mapTypesArrayAttr);		useDeviceAddrOperand, mapOperands, mapTypesArrayAttr);
		createBodyOfOp(dataOp, converter, currentLocation, *eval, &opClauseList);
} else if (directive == llvm::omp::Directive::OMPD_target_enter_data) {		} else if (directive == llvm::omp::Directive::OMPD_target_enter_data) {
firOpBuilder.create<omp::EnterDataOp>(currentLocation, ifClauseOperand,		firOpBuilder.create<omp::EnterDataOp>(currentLocation, ifClauseOperand,
deviceOperand, nowaitAttr,		deviceOperand, nowaitAttr,
mapOperands, mapTypesArrayAttr);		mapOperands, mapTypesArrayAttr);
} else if (directive == llvm::omp::Directive::OMPD_target_exit_data) {		} else if (directive == llvm::omp::Directive::OMPD_target_exit_data) {
firOpBuilder.create<omp::ExitDataOp>(currentLocation, ifClauseOperand,		firOpBuilder.create<omp::ExitDataOp>(currentLocation, ifClauseOperand,
deviceOperand, nowaitAttr, mapOperands,		deviceOperand, nowaitAttr, mapOperands,
mapTypesArrayAttr);		mapTypesArrayAttr);
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	if (const auto &ifClause =
// Default clause with shared or none do not require any handling since		// Default clause with shared or none do not require any handling since
// Shared is the default behavior in the IR and None is only required		// Shared is the default behavior in the IR and None is only required
// for semantic checks.		// for semantic checks.
continue;		continue;
}		}
} else if (std::get_if<Fortran::parser::OmpClause::Threads>(&clause.u)) {		} else if (std::get_if<Fortran::parser::OmpClause::Threads>(&clause.u)) {
// Nothing needs to be done for threads clause.		// Nothing needs to be done for threads clause.
continue;		continue;
		} else if (std::get_if<Fortran::parser::OmpClause::Map>(&clause.u)) {
		// Map clause is exclusive to Target Data directives. It is handled
		// as part of the DataOp creation.
		continue;
} else if (const auto &finalClause =		} else if (const auto &finalClause =
std::get_if<Fortran::parser::OmpClause::Final>(&clause.u)) {		std::get_if<Fortran::parser::OmpClause::Final>(&clause.u)) {
mlir::Value finalVal = fir::getBase(converter.genExprValue(		mlir::Value finalVal = fir::getBase(converter.genExprValue(
*Fortran::semantics::GetExpr(finalClause->v), stmtCtx));		*Fortran::semantics::GetExpr(finalClause->v), stmtCtx));
finalClauseOperand = firOpBuilder.createConvert(		finalClauseOperand = firOpBuilder.createConvert(
currentLocation, firOpBuilder.getI1Type(), finalVal);		currentLocation, firOpBuilder.getI1Type(), finalVal);
} else if (std::get_if<Fortran::parser::OmpClause::Untied>(&clause.u)) {		} else if (std::get_if<Fortran::parser::OmpClause::Untied>(&clause.u)) {
untiedAttr = firOpBuilder.getUnitAttr();		untiedAttr = firOpBuilder.getUnitAttr();
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	if (blockDirective.v == llvm::omp::OMPD_parallel) {
createBodyOfOp(taskOp, converter, currentLocation, eval, &opClauseList);		createBodyOfOp(taskOp, converter, currentLocation, eval, &opClauseList);
} else if (blockDirective.v == llvm::omp::OMPD_taskgroup) {		} else if (blockDirective.v == llvm::omp::OMPD_taskgroup) {
// TODO: Add task_reduction support		// TODO: Add task_reduction support
auto taskGroupOp = firOpBuilder.create<mlir::omp::TaskGroupOp>(		auto taskGroupOp = firOpBuilder.create<mlir::omp::TaskGroupOp>(
currentLocation, /task_reduction_vars=/ValueRange(),		currentLocation, /task_reduction_vars=/ValueRange(),
/task_reductions=/nullptr, allocateOperands, allocatorOperands);		/task_reductions=/nullptr, allocateOperands, allocatorOperands);
createBodyOfOp(taskGroupOp, converter, currentLocation, eval,		createBodyOfOp(taskGroupOp, converter, currentLocation, eval,
&opClauseList);		&opClauseList);
		} else if (blockDirective.v == llvm::omp::OMPD_target_data) {
		createTargetDataOp(converter, opClauseList, blockDirective.v, &eval);
} else {		} else {
TODO(converter.getCurrentLocation(), "Unhandled block directive");		TODO(converter.getCurrentLocation(), "Unhandled block directive");
}		}
}		}

/// This function returns the identity value of the operator \p reductionOpName.		/// This function returns the identity value of the operator \p reductionOpName.
/// For example:		/// For example:
/// 0 + x = x,		/// 0 + x = x,
▲ Show 20 Lines • Show All 1,118 Lines • Show Last 20 Lines

flang/test/Lower/OpenMP/target_data.f90

	!RUN: %flang_fc1 -emit-fir -fopenmp %s -o - \| FileCheck %s			!RUN: %flang_fc1 -emit-fir -fopenmp %s -o - \| FileCheck %s

	!===============================================================================			!===============================================================================
	! Target_Enter Simple			! Target_Enter Simple
	!===============================================================================			!===============================================================================


	!CHECK-LABEL: func.func @_QPomp_target_enter_simple() {			!CHECK-LABEL: func.func @_QPomp_target_enter_simple() {
	subroutine omp_target_enter_simple			subroutine omp_target_enter_simple
	integer :: a(1024)			integer :: a(1024)
	!CHECK: omp.target_enter_data map((to -> {{.*}} : !fir.ref<!fir.array<1024xi32>>))			!CHECK: omp.target_enter_data map((to -> {{.*}} : !fir.ref<!fir.array<1024xi32>>))
	!$omp target enter data map(to: a)			!$omp target enter data map(to: a)
	end subroutine omp_target_enter_simple			end subroutine omp_target_enter_simple

	!===============================================================================			!===============================================================================
	Show All 27 Lines

	!CHECK-LABEL: func.func @_QPomp_target_enter_if() {			!CHECK-LABEL: func.func @_QPomp_target_enter_if() {
	subroutine omp_target_enter_if			subroutine omp_target_enter_if
	integer :: a(1024)			integer :: a(1024)
	integer :: i			integer :: i
	i = 5			i = 5
	!CHECK: %[[VAL_3:.]] = fir.load %[[VAL_1:.]] : !fir.ref<i32>			!CHECK: %[[VAL_3:.]] = fir.load %[[VAL_1:.]] : !fir.ref<i32>
	!CHECK: %[[VAL_4:.*]] = arith.constant 10 : i32			!CHECK: %[[VAL_4:.*]] = arith.constant 10 : i32
	!CHECK: %[[VAL_5:.]] = arith.cmpi slt, %[[VAL_3:.]], %[[VAL_4:.*]] : i32			!CHECK: %[[VAL_5:.*]] = arith.cmpi slt, %[[VAL_3]], %[[VAL_4]] : i32
	!CHECK: omp.target_enter_data if(%[[VAL_5:.]] : i1) map((to -> {{.}} : !fir.ref<!fir.array<1024xi32>>))			!CHECK: omp.target_enter_data if(%[[VAL_5]] : i1) map((to -> {{.*}} : !fir.ref<!fir.array<1024xi32>>))
	!$omp target enter data if(i<10) map(to: a)			!$omp target enter data if(i<10) map(to: a)
	end subroutine omp_target_enter_if			end subroutine omp_target_enter_if

	!===============================================================================			!===============================================================================
	! `device` clause			! `device` clause
	!===============================================================================			!===============================================================================

	!CHECK-LABEL: func.func @_QPomp_target_enter_device() {			!CHECK-LABEL: func.func @_QPomp_target_enter_device() {
	subroutine omp_target_enter_device			subroutine omp_target_enter_device
	integer :: a(1024)			integer :: a(1024)
	!CHECK: %[[VAL_1:.*]] = arith.constant 2 : i32			!CHECK: %[[VAL_1:.*]] = arith.constant 2 : i32
	!CHECK: omp.target_enter_data device(%[[VAL_1:.]] : i32) map((to -> {{.}} : !fir.ref<!fir.array<1024xi32>>))			!CHECK: omp.target_enter_data device(%[[VAL_1]] : i32) map((to -> {{.*}} : !fir.ref<!fir.array<1024xi32>>))
	!$omp target enter data map(to: a) device(2)			!$omp target enter data map(to: a) device(2)
	end subroutine omp_target_enter_device			end subroutine omp_target_enter_device

	!===============================================================================			!===============================================================================
	! Target_Exit Simple			! Target_Exit Simple
	!===============================================================================			!===============================================================================

	!CHECK-LABEL: func.func @_QPomp_target_exit_simple() {			!CHECK-LABEL: func.func @_QPomp_target_exit_simple() {
	Show All 22 Lines
	! `device` clause			! `device` clause
	!===============================================================================			!===============================================================================

	!CHECK-LABEL: func.func @_QPomp_target_exit_device() {			!CHECK-LABEL: func.func @_QPomp_target_exit_device() {
	subroutine omp_target_exit_device			subroutine omp_target_exit_device
	integer :: a(1024)			integer :: a(1024)
	integer :: d			integer :: d
	!CHECK: %[[VAL_2:.]] = fir.load %[[VAL_1:.]] : !fir.ref<i32>			!CHECK: %[[VAL_2:.]] = fir.load %[[VAL_1:.]] : !fir.ref<i32>
	!CHECK: omp.target_exit_data device(%[[VAL_2:.]] : i32) map((from -> {{.}} : !fir.ref<!fir.array<1024xi32>>))			!CHECK: omp.target_exit_data device(%[[VAL_2]] : i32) map((from -> {{.*}} : !fir.ref<!fir.array<1024xi32>>))
	!$omp target exit data map(from: a) device(d)			!$omp target exit data map(from: a) device(d)
	end subroutine omp_target_exit_device			end subroutine omp_target_exit_device

				!===============================================================================
				! Target_Data with region
				!===============================================================================

				!CHECK-LABEL: func.func @_QPomp_target_data() {
				subroutine omp_target_data
				!CHECK: %[[VAL_0:.*]] = fir.alloca !fir.array<1024xi32> {bindc_name = "a", uniq_name = "_QFomp_target_dataEa"}
				integer :: a(1024)
				!CHECK: omp.target_data map((tofrom -> %[[VAL_0]] : !fir.ref<!fir.array<1024xi32>>)) {
				kiranchandramohanUnsubmitted Done Reply Inline Actions I have not looked at how the lowering to LLVM IR works. I have a question here. When the array is mapped to the address-space of the target, I assume there will be a separate allocation there and it should be the new allocation that should be used in the body of this region. The `fir.coordinate` below continues to use the current `fir.alloca` in the host. How will this work. Generally, we handle this by making it an argument of the region and using it instead. The argument will be materialised as an argument in the target. Hence the use in the body of the construct need not be changed. kiranchandramohan: I have not looked at how the lowering to LLVM IR works. I have a question here. When the array…
				TIFitisAuthorUnsubmitted Done Reply Inline Actions I haven't explicitly handled this in the IRBuilder. I am just now familiarising myself with the issue. There is a patch up for review from Jan Sjodin in our team for alloca address space handling in FIR->LLVMIR here D144203. My guess for now is that we would need that patch to handle this scenario. Do you think this is something that needs addressing in this patch? I plan on putting another patch for review for the FIR->LLVMIR conversion handling of region for Target Data. TIFitis: I haven't explicitly handled this in the IRBuilder. I am just now familiarising myself with the…
				kiranchandramohanUnsubmitted Done Reply Inline Actions I would like to understand how this works fully before proceeding in order to ensure that we are doing this correctly. My concern is primarily because this seems similar to privatisation clauses and we handle privatisation a bit differently due to some issues. Doesn't the IRBuilder patch (https://reviews.llvm.org/D142914) lower `target data map`? kiranchandramohan: I would like to understand how this works fully before proceeding in order to ensure that we…
				TIFitisAuthorUnsubmitted Done Reply Inline Actions Yes, lowering of Target Data with region is handled. But as you mentioned in your earlier comments, I haven't made the argument local to the region. I was following clang generated code and it didn't seem to do anything special. I am currently looking into what needs to be done for handling the map arguments inside of the region. TIFitis: Yes, lowering of Target Data with region is handled. But as you mentioned in your earlier…
				!$omp target data map(tofrom: a)
				kiranchandramohanUnsubmitted Done Reply Inline Actions Can we add a CHECK for the contents as well? kiranchandramohan: Can we add a CHECK for the contents as well?
				!CHECK: %[[VAL_1:.*]] = arith.constant 10 : i32
				!CHECK: %[[VAL_2:.*]] = arith.constant 1 : i64
				!CHECK: %[[VAL_3:.*]] = arith.constant 1 : i64
				!CHECK: %[[VAL_4:.*]] = arith.subi %[[VAL_2]], %[[VAL_3]] : i64
				!CHECK: %[[VAL_5:.*]] = fir.coordinate_of %[[VAL_0]], %[[VAL_4]] : (!fir.ref<!fir.array<1024xi32>>, i64) -> !fir.ref<i32>
				vzakhariUnsubmitted Not Done Reply Inline Actions Is there a document that I can read to understand how the OpenMP MLIR dialect is going to work with generic MLIR optimizations? Sorry for hijacking this review, but I am curious about the following case: %0 = fir.alloca !fir.array<1024xi32> %1 = fir.coordinate_of %0, %cstN // use of %1 omp.target_data map((tofrom -> %0)) { %2 = fir.coordinate_of %0, %cstN // use of %2 } How is it guaranteed that the above will not be optimized by some MLIR pass into the following: %0 = fir.alloca !fir.array<1024xi32> %1 = fir.coordinate_of %0, %cstN // use of %1 omp.target_data map((tofrom -> %0)) { // %2 = fir.coordinate_of %0, %cstN // eliminated or hoisted // use of %1 // there is no any association between the references inside the region and the references in the map clauses } In other words, is there something in OMP dialect operations's definitions that prevents other MLIR passes to introduce new live-ins into the OMP regions? This may not be a problem for `target data map`, but it may be a problem for `target data use_device_ptr` and `target map`. vzakhari: Is there a document that I can read to understand how the OpenMP MLIR dialect is going to work…
				kiranchandramohanUnsubmitted Not Done Reply Inline Actions Thanks @vzakhari for bringing up this issue. There is nothing at the moment in the OpenMP dialect that might prevent this. The `reduction declare` OpenMP operation is marked as `isolated from above` to prevent operations from being hoisted out or in. Would you know, -> How freely are operations moved into or out of a region? -> Is the side-effects interface sufficient to control this? I would assume the side-effecting operations are not moved into or out of a region by default. If we mark an operation (with a region) as having side effects, would it be sufficient to disallow sinking/hoisting of other operations to/from it? If you prefer, we can discuss this in MLIR discourse to get opinions from others as well. kiranchandramohan: Thanks @vzakhari for bringing up this issue. There is nothing at the moment in the OpenMP…
				!CHECK: fir.store %[[VAL_1]] to %[[VAL_5]] : !fir.ref<i32>
				a(1) = 10
				!CHECK: omp.terminator
				!$omp end target data
				!CHECK: }
				end subroutine omp_target_data