This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][OpenMP] Added MLIR translation support for use_device clauses
ClosedPublic

Authored by TIFitis on Mar 22 2023, 11:14 AM.

Details

Summary

Added MLIR support for translating use_device_ptr and use_device_addr clauses for LLVMIR lowering.

  • use_device_ptr: The mapped variables marked with use_device_ptr are accessed through a copy of the base pointer mappers. The mapper is copied onto a new temporary pointer variable.
  • use_device_addr: The mapped variables marked with use_device_addr are accessed directly through the base pointer mappers.
  • If mapping information is not provided explicitly then default map_type of alloc/release is assumed and the map_size is set to 0.

Depends on D152554

Diff Detail

Event Timeline

TIFitis created this revision.Mar 22 2023, 11:14 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
TIFitis requested review of this revision.Mar 22 2023, 11:14 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
TIFitis added inline comments.Mar 22 2023, 11:29 AM
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
4176–4177

use_dev_ptr operands must always be pointer types so I think it is safe to always just create a ptr for them.

However if we want to support typed pointers then we would have to create something like:
Builder.CreateAlloca(UseDeviceOp.first->getType()->getPointerElementType())

This is not possible as getPointerElementType is deprecated and I couldn't find any alternative for this.

4329–4339

I have commented out some code in OpenMPToLLVMIRTranslation.cpp::1489. I think that is a better way of generating code for the use_dev_ptr, but the mapValue doesn't allow remapping.

This I am stuck with letting it use the old value inside the region and replacing it later here.

llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
5082–5084

This unit test fails because of opaque pointer mismatch, The unit test runs with opaque pointers disabled however the my code only runs with it enabled.

I could not find a reasonable way to support typed pointers, I have explained this in another comment.

TIFitis updated this revision to Diff 507452.Mar 22 2023, 11:40 AM

Split use_dev_ptr test into two.

TIFitis marked an inline comment as not done.Apr 14 2023, 9:01 AM
TIFitis planned changes to this revision.May 18 2023, 3:00 AM

Need to change this patch to account for all the clang migration changes and make use of those functions.

TIFitis updated this revision to Diff 535036.Jun 27 2023, 9:58 AM

Made changes to use updated OMPIRBuilder

TIFitis retitled this revision from [MLIR][OpenMP] Added OMPIRBuilder support for use_device_ptr clause to [MLIR][OpenMP] Added MLIR translation support for use_device clauses.Jun 27 2023, 10:00 AM
TIFitis edited the summary of this revision. (Show Details)

Ping for review :)

Could you describe the changes in the Summary? Particularly providing a short description of the code that is being emitted for use_device_ptr and use_device_addr?

TIFitis edited the summary of this revision. (Show Details)Jul 12 2023, 4:23 AM

Here is an example of what clang generates:

C code:

void foo(int *a, int *b, int *c){
#pragma omp target data map(from:b, c) use_device_ptr(a) use_device_addr(b){
    *a = 10;
    *b = 20;
    *c = 30;
  }
}

LLVM IR:

@.offload_sizes = private unnamed_addr constant [3 x i64] [i64 8, i64 8, i64 0]
@.offload_maptypes = private unnamed_addr constant [3 x i64] [i64 66, i64 2, i64 64]

define dso_local void @foo(ptr noundef %a, ptr noundef %b, ptr noundef %c) #0 {
entry:
  %a.addr = alloca ptr, align 8
  %b.addr = alloca ptr, align 8
  %c.addr = alloca ptr, align 8
  %.offload_baseptrs = alloca [3 x ptr], align 8
  %.offload_ptrs = alloca [3 x ptr], align 8
  %.offload_mappers = alloca [3 x ptr], align 8
  %0 = alloca ptr, align 8
  store ptr %a, ptr %a.addr, align 8
  store ptr %b, ptr %b.addr, align 8
  store ptr %c, ptr %c.addr, align 8
  %1 = load ptr, ptr %a.addr, align 8
  %2 = getelementptr inbounds [3 x ptr], ptr %.offload_baseptrs, i32 0, i32 0
  store ptr %b.addr, ptr %2, align 8
  %3 = getelementptr inbounds [3 x ptr], ptr %.offload_ptrs, i32 0, i32 0
  store ptr %b.addr, ptr %3, align 8
  %4 = getelementptr inbounds [3 x ptr], ptr %.offload_mappers, i64 0, i64 0
  store ptr null, ptr %4, align 8
  %5 = getelementptr inbounds [3 x ptr], ptr %.offload_baseptrs, i32 0, i32 1
  store ptr %c.addr, ptr %5, align 8
  %6 = getelementptr inbounds [3 x ptr], ptr %.offload_ptrs, i32 0, i32 1
  store ptr %c.addr, ptr %6, align 8
  %7 = getelementptr inbounds [3 x ptr], ptr %.offload_mappers, i64 0, i64 1
  store ptr null, ptr %7, align 8
  %8 = getelementptr inbounds [3 x ptr], ptr %.offload_baseptrs, i32 0, i32 2
  store ptr %1, ptr %8, align 8
  %9 = getelementptr inbounds [3 x ptr], ptr %.offload_ptrs, i32 0, i32 2
  store ptr %1, ptr %9, align 8
  %10 = getelementptr inbounds [3 x ptr], ptr %.offload_mappers, i64 0, i64 2
  store ptr null, ptr %10, align 8
  %11 = getelementptr inbounds [3 x ptr], ptr %.offload_baseptrs, i32 0, i32 0
  %12 = getelementptr inbounds [3 x ptr], ptr %.offload_ptrs, i32 0, i32 0
  call void @__tgt_target_data_begin_mapper(ptr @1, i64 -1, i32 3, ptr %11, ptr %12, ptr @.offload_sizes, ptr @.offload_maptypes, ptr null, ptr null)
  %13 = load ptr, ptr %8, align 8
  store ptr %13, ptr %0, align 8
  %14 = load ptr, ptr %2, align 8
  %15 = load ptr, ptr %0, align 8
  store i32 10, ptr %15, align 4
  %16 = load ptr, ptr %14, align 8
  store i32 20, ptr %16, align 4
  %17 = load ptr, ptr %c.addr, align 8
  store i32 30, ptr %17, align 4
  %18 = getelementptr inbounds [3 x ptr], ptr %.offload_baseptrs, i32 0, i32 0
  %19 = getelementptr inbounds [3 x ptr], ptr %.offload_ptrs, i32 0, i32 0
  call void @__tgt_target_data_end_mapper(ptr @1, i64 -1, i32 3, ptr %18, ptr %19, ptr @.offload_sizes, ptr @.offload_maptypes, ptr null, ptr null)
  ret void
}

Ping for review

I am starting to go through the patch.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1460–1476

This block seems to be similar to devPtrOperands. Could you share code here?

mlir/test/Target/LLVMIR/omptarget-llvm.mlir
170–406

Please add a testcase with both use_device_ptr and use_device_addr.

TIFitis updated this revision to Diff 541962.Jul 19 2023, 4:54 AM

Rebased. Addressed reviewer comments. Fixed bug in generating map info for device_addr - device info type was incorrectly being set to pointer.

TIFitis marked 2 inline comments as done.Jul 19 2023, 4:58 AM
TIFitis added inline comments.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1460–1476

I've moved them into a functor.

There is a build failure currently.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1409

failure()?

1428

failure()?

TIFitis updated this revision to Diff 542886.Jul 21 2023, 6:18 AM
TIFitis marked an inline comment as done.

Renamed failure() to fail()

TIFitis marked 2 inline comments as done.Jul 21 2023, 6:20 AM
TIFitis added inline comments.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1409

I was calling the failed() functor defined above. I've renamed it to fail to avoid confusion.

TIFitis updated this revision to Diff 543450.Jul 24 2023, 3:31 AM
TIFitis marked an inline comment as done.

Attempt at fixing build error

While looking into this patch, I noticed that the current implementation in Semantics and MLIR requires the MAP clause to always be present. But the standard only says the following,

At least one map, use_device_addr or use_device_ptr clause must appear on the directive.

I will fix this. But this made having a look at the code generated only with use_device_addr, use_device_ptr and without map difficult.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1387–1401

Nit:This only needs to take in as reference combinedInfo.

1554

Nit: expand the type here. Put an assert before this line for the op being a DataOp.

1558

Should this be an assert instead of an if? If not, Please add a comment on when it can and cannot be empty.

1560–1573

Might be worth creating a lambda and callling it separately with useDevPtrOperands, useDevAddrOperands.

1586

Should this be an assert instead of an if? If not, Please add a comment on when it can and cannot be empty.

mlir/test/Target/LLVMIR/omptarget-llvm.mlir
215

Is the default usage of dev_addr also with the pointer type?
If not using a single reference example might be easier to follow.

llvm.func @_QPopenmp_target_use_dev_addr() {
  %0 = llvm.mlir.constant(1 : i64) : i64
  %a = llvm.alloca %0 x !llvm.ptr<i32> : (i64) -> !llvm.ptr<i32>
  omp.target_data  map((from -> %a : !llvm.ptr<i32>)) use_device_addr(%a : !llvm.ptr<i32>)  {
  ^bb0(%arg0: !llvm.ptr<i32>): 
    %1 = llvm.mlir.constant(10 : i32) : i32
    llvm.store %1, %arg0 : !llvm.ptr<i32>
    omp.terminator
  }
  llvm.return
}
TIFitis updated this revision to Diff 543515.Jul 24 2023, 6:27 AM
TIFitis marked 5 inline comments as done.

Addressed reviewer comments. Added missing LoadInst for use_device_addr inside region block.

While looking into this patch, I noticed that the current implementation in Semantics and MLIR requires the MAP clause to always be present. But the standard only says the following,

At least one map, use_device_addr or use_device_ptr clause must appear on the directive.

I will fix this. But this made having a look at the code generated only with use_device_addr, use_device_ptr and without map difficult.

The test openmp_target_use_dev_addr_nomap does not provide any map clause for %a, so for %a marked with use_device_addr it has the same effect as no map clause being present. Key change when use_device_addr or use_device_ptr is used without providing any map information is that the map_size should be set to 0, and only the OMP_MAP_RETURN_PARAM flag should be set for the map_type.

TIFitis marked an inline comment as done.Jul 24 2023, 9:47 AM

BTW I am not sure how to resolve the windows build error that I am getting repeatedly. It's giving an unknown identifier error for the fail functor being called at OpenMPToLLVMIRTranslation.cpp:1429. Any insight would be much appreciated.

mlir/test/Target/LLVMIR/omptarget-llvm.mlir
215

I've added a openmp_target_use_dev_addr_no_ptr test

BTW I am not sure how to resolve the windows build error that I am getting repeatedly. It's giving an unknown identifier error for the fail functor being called at OpenMPToLLVMIRTranslation.cpp:1429. Any insight would be much appreciated.

Might be a bug in the compiler in windows. You are trying to call a lambda function inside a lambda function here. Try moving the second lambda into a static function or try capturing the second lambda explicitly in the first lambda.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1386

Can you move this to just before the loop at 1406?

TIFitis updated this revision to Diff 543931.Jul 25 2023, 5:13 AM
TIFitis marked an inline comment as done.

Addressed reviewer comments. Try to fix build error.

BTW I am not sure how to resolve the windows build error that I am getting repeatedly. It's giving an unknown identifier error for the fail functor being called at OpenMPToLLVMIRTranslation.cpp:1429. Any insight would be much appreciated.

Might be a bug in the compiler in windows. You are trying to call a lambda function inside a lambda function here. Try moving the second lambda into a static function or try capturing the second lambda explicitly in the first lambda.

Thanks for the suggestion, explicitly capturing the lambda worked. The builds are all clean now.

TIFitis marked an inline comment as done.Aug 2 2023, 2:52 AM

Ping for review :)

use_device_ptr: The mapped variables marked with use_device_ptr are accessed through new pointer allocas inside the region.
use_device_addr: The mapped variables marked with use_device_addr are accessed through the base pointer mappers.

This explanation in the summary talks of mapped variables. Mapping is always not necessary. Could you also expand the explanation a little more? And also talk a bit about the difference between use_device_ptr and use_device_addr.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1559

If you do not need any codegen in this case for other switch options as well, then you can consider moving the if outside the switch.

1573–1574

Why is the load required here and not for devPtr?

TIFitis marked 2 inline comments as done.Aug 3 2023, 3:31 AM

use_device_ptr: The mapped variables marked with use_device_ptr are accessed through new pointer allocas inside the region.
use_device_addr: The mapped variables marked with use_device_addr are accessed through the base pointer mappers.

This explanation in the summary talks of mapped variables. Mapping is always not necessary. Could you also expand the explanation a little more? And also talk a bit about the difference between use_device_ptr and use_device_addr.

The regular behaviour is for mapping to be present. In the event that mapping information is absent we simply don't have the map_type and size information available. As such according to the specification the size gets set to 0 and the map_type bits are omitted, i.e, default alloc/release is used as appropriate.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1559

No, we need the if condition here in combination with the switch case to generate code for device privatisation.

1573–1574

In case of dev_ptr the load inst is added by OMPIRBuilder itself along with a store Inst to store it into the new temp ptr var added.

For dev_addr we don't add the LoadInst in the OMPIRBuilder only because of compatibility issues with Clang.

TIFitis edited the summary of this revision. (Show Details)Aug 3 2023, 3:36 AM
kiranchandramohan accepted this revision.Aug 3 2023, 4:08 AM

Looks OK. It will be good to have one more review.

This revision is now accepted and ready to land.Aug 3 2023, 4:08 AM
This revision was automatically updated to reflect the committed changes.
TIFitis marked 2 inline comments as done.